AISP.EN · 3-Day Classroom Training

EXIN AI Security Professional

Based on the OWASP AI Exchange

Three days from threat map to exam-ready.

Day 1 of 3

Day 1 — Foundations

Intro & exam format · AI security in the organization · Input threats

Day 2: development-time & runtime threats + controls · Day 3: testing, privacy & compliance, review.

Run introductions and an expectations round before this slide (~15 min); ask who already runs AI in production.

Exam At A Glance

40
Multiple-choice questions
90 min
2 minutes 15 seconds per question
65%
Pass mark — 26 of 40 correct
Bloom 2–3
Understand & apply — no pure recall
Closed book
No materials in the exam room
A–E pairs
Two-part pairing questions — all-or-nothing

Where the Marks Are

1. AI Security in the Organization15%
2. AI Security Threats37.5%
3. AI Security Controls27.5%
4. AI Security Testing7.5%
5. Privacy & Compliance12.5%
Threats + Controls = 65% of the marks

Anatomy of an AISP Question

1 · Scenario classification
Short business scenario, then: “What type of attack is this?” The wrong answers are always adjacent concepts.
2 · Two-part pairing (A–E)
Two incidents; each option is a pair of labels. All-or-nothing: both halves must match.
3 · “What is the next step?”
You are placed mid-framework and asked what comes next. Free marks — if you know the order.
Pairing strategy

Judge each half independently; eliminate every option where either half is wrong. Usually one survives.

2:15 Per Question

90 minutes for 40 questions — time is not the enemy.

Read each scenario once, carefully, instead of three times in a hurry.

Your Study Toolkit

Textbook
Full prose for every learning objective, with scenarios
4 Practice Exams
Timed 40-question sets in the official register
Flashcards
Frameworks, lists and locked threat names
Study Guide
Every list, compressed for revision
Cheat Sheet
One page for the night before
All here
Everything lives on this prep site — bookmark it tonight
🗺

Learn the Map

65% of the exam is threats + controls — learn the map, the rest follows.

Every threat pairs with its controls; master that pairing and Topics 1, 4 and 5 fall into place.

Topic 1 · Subtopic 1.1 · 10% of exam · ~4 questions

Organizing AI Security

G.U.A.R.D. · responsible vs trustworthy · assets & threats

The board says “get AI security organized.” Where do you start — and in what order?

G.U.A.R.D. — Five Steps to Organize AI Security

1 · Govern
AI inventory, policies, responsibilities, compliance, education
2 · Understand
Which threats apply; educate engineers and security staff
3 · Adapt
Extend threat modeling, testing and supply-chain management to AI
4 · Reduce
Minimize sensitive data; limit model behavior and impact
5 · Demonstrate
Evidence for management, regulators and clients

What Each Step Involves

Govern
Inventory where AI is used, set policies, name owners — the Chief AI Officer enters here; ISO/IEC 42001 AIMS
Understand
Which threats apply per system; draw the line between your controls and your supplier's
Adapt
Reshape existing processes: AI threat modeling, AI security testing, supply chain for data, models, hosting
Reduce
Assume the model can fail: shrink sensitive data, cap privileges, add oversight
Demonstrate
Prove safeguards work: transparency, test results, documentation
🧠

Memorize This

The five steps

Govern Understand Adapt Reduce Demonstrate

“G.U.A.R.D. your AI — in exactly this order.”

The anchor transition

  • Understand identifies and teaches
  • Adapt changes your processes
  • Reduce limits the impact

After Govern and Understand comes Adapt — the exam's favorite “next step.”

BUILD IT TOGETHER

Name the 5 G.U.A.R.D. Steps

Call them out in order — each click reveals the next.

1Govern — inventory, policies, responsibilities
2Understand — which threats apply + educate
3Adapt — AI threat modeling, testing, supply chain
4Reduce — minimize data, limit behavior
5Demonstrate — evidence that it all works
Let the room call each step before clicking; ask “what actually happens in this step?” before revealing the next.

Responsible AI vs Trustworthy AI

Responsible AI

  • Ethics, society, governance
  • Fairness, societal impact, accountability
  • Owned by boards & ethics committees
vs

Trustworthy AI

  • Technical & operational qualities
  • Robustness, reliability, transparency, explainability
  • Owned by engineering & operations
How to tell them apart

Ask what the concern attaches to: people and governance structures → responsible; measurable system qualities → trustworthy.

AI Security vs Conventional Security

Still true
AI systems are IT systems: every conventional threat and control still applies.
The equation
AI security = threats to AI-specific assets + threats to all other assets.
Novelty 1 · New assets
Training data, augmentation data, model, input, output.
Novelty 2 · New attack surface
Legitimate model use — simply querying can be attacking.
Novelty 3 · New suppliers
Data, ready-made models, model hosting.
Conventional security is necessary but not sufficient
🛡

Why Your WAF Waves It Through

AI attacks manipulate meaning and statistics — through fully legitimate channels.

An adversarial example is a well-formed request; a poisoned training record carries no malware signature.

Exam Question

Kestrel Mobility has an AI inventory with named owners, its engineers are trained on the threats relevant to each system, and it has just finished extending its ISMS, threat modeling, security testing and supplier management to cover AI. According to G.U.A.R.D., what comes next?

A) Understand — determine which threats apply to each system
B) Adapt — introduce AI-specific threat modeling and testing
C) Reduce — minimize sensitive data and limit model behavior
D) Demonstrate — document evidence for the regulator
Answer: C
Govern, Understand and Adapt are complete, so Reduce comes next: shrink the sensitive data footprint and cap what the model can do. D is the trap — you cannot demonstrate safeguards you have not yet deployed; A and B are already done.

Five Assets — and How They Break

Training data
Leaks: development-time data leak; via use: sensitive data disclosure through use, model inversion, membership inference. Manipulated: data poisoning.
Augmentation data (incl. system prompts)
Direct augmentation data leak · augmentation data manipulation.
Model
Leaks: direct development-time model leak, direct runtime model leak, model exfiltration. Poisoned: direct development-time / supply-chain / direct runtime model poisoning.
Input
Input data leak — prompts can carry secrets.
Output
Output containing conventional injection — attacks downstream systems.
The rhythm: data and model leak and get poisoned — development-time and runtime; input leaks; output injects
🕵

Shadow AI: The Pasted Source Code

2023: engineers at a major electronics maker pasted confidential source code into a public chatbot while debugging.

  • Sensitive input flowed straight to an external provider
  • A written ban existed — bans alone don't work
  • Usage just moves out of sight, where no control applies
The fix that works

Provide a sanctioned, secure, good-quality alternative — and make the risks of unsanctioned tools explicitly clear.

SORT EXERCISE

Whose Step Is It Anyway?

Sort each activity into its G.U.A.R.D. step.

Add AI-specific threat modeling to the SDLC
Publish test evidence for the regulator
Build the AI inventory
Strip customer identifiers from training data
Train engineers on the threats that apply to each system
Extend supplier contracts to cover model hosting
Appoint a Chief AI Officer
Require human sign-off on model-triggered actions
Decide which controls are the supplier's job
Govern
  • Build the AI inventory
  • Appoint a Chief AI Officer
Understand
  • Train engineers on applicable threats
  • Decide which controls are the supplier's job
Adapt
  • AI-specific threat modeling in the SDLC
  • Supplier contracts cover model hosting
Reduce
  • Strip customer identifiers from training data
  • Human sign-off on model-triggered actions
Demonstrate
  • Publish test evidence for the regulator
Pairs, 3 minutes, then reveal. The Adapt-vs-Reduce boundary sparks the best discussion — process change vs impact limitation.
Topic 1 · Subtopic 1.2 · 5% of exam · ~2 questions

Threat Modeling and Agentic AI Risks

Four risk steps · the bridge to priorities · agents on a leash

A threat catalogue is not a risk list — and an agent is not just a chatbot.

Risk Management in Four Steps

1 · Identify
Threat modeling turns the catalogue into concrete risks
2 · Evaluate
Likelihood × severity; prioritize on a heatmap
3 · Risk treatment
Mitigate, transfer, avoid, or accept each risk
4 · Communication & monitoring
Risk register; inform stakeholders; verify treatments
↻ Repeat
Regularly — and whenever changes warrant it

Four Ways to Treat a Risk

Mitigate
Implement controls — the most common route
Transfer
Shift it to a third party: outsourcing, insurance
Avoid
Change plans so the risk disappears — maybe no AI here at all
Accept
Knowingly bear it when treatment costs more than it's worth
🌉

Threat Modeling Is the Bridge

From threat catalogue to concrete, prioritized risks — for your system.

Three questions per threat: does it apply here? How could it realistically happen? What would the impact be?

CLASS POLL

Vote Now!

Halvora Insurance keeps its claims chatbot running but buys a cyber-insurance policy covering losses from model manipulation. Which risk treatment option is this?

A) Mitigation
B) Transfer
C) Avoidance
D) Acceptance
Answer: B) Transfer
The risk is shifted to a third party through insurance. It is not acceptance — someone else bears the loss — and not mitigation, because nothing about the system itself changed.
Hands up per option before revealing; ask a “D” voter to defend, then reveal — the accept/transfer line is the point.
💥

Agents Amplify

Agentic AI doesn't create new threats so much as amplify existing ones.

Agents act, run autonomously, behave unpredictably and span systems — the injection that once embarrassed you now moves money.

🤖

The Compromised-Agent Chain

Corvid Retail's invoice-handling agent runs under one shared service account reaching ticketing, HR records and the payment gateway. A supplier email arrives with instructions hidden in white text.

  • Indirect prompt injection: untrusted data read as instructions
  • One shared account → actions chain across every connected system
  • Excessive agency: far more capability than the task needs
Blast radius

Data theft needs three ingredients: attacker-controlled data in, access to sensitive data, a way out. Remove any one and the attack collapses.

Six Controls for Agentic AI

Traceability
Log what the agent did — and why
Memory-integrity protection
Guard stored state and plans from tampering
Prompt-injection defenses
Separate untrusted content from instructions
Rule-based guardrails
Deterministic checks the model can't talk past
Least model privilege
Scope each agent's permissions to its task
Human oversight
A person approves consequential actions
"Never build access control on GenAI."
— OWASP AI Exchange
Exam Tip

Instructions are not enforcement — the right input overrides them, so authorization lives in the architecture, outside the model. And the maxim behind shared accounts: convenience is the enemy of security.

Exam Question

Marrowgate Legal investigates two incidents on its contract-review assistant. Threat 1: an attacker with write access to the document store edited the precedent texts the assistant retrieves, changing its advice. Threat 2: a generated summary contained hidden JavaScript that executed in the client portal's browser. Which pair is correct?

A) Data poisoning + direct prompt injection
B) Augmentation data manipulation + output containing conventional injection
C) Data poisoning + output containing conventional injection
D) Augmentation data manipulation + indirect prompt injection
E) Direct runtime model poisoning + direct prompt injection
Answer: B
Threat 1 rides in with each retrieval while the model stays untouched — augmentation data manipulation, not data poisoning (which targets training data). Threat 2 is model output carrying a conventional attack downstream. Judge halves independently: C fails on half 1, D fails on half 2.
Topic 2 · Subtopic 2.1 · 17.5% of exam

Input Threats

Evasion · prompt injection · disclosure · exfiltration · resource exhaustion

The single heaviest section of your exam — ~7 of 40 questions walk through this one door.

Frame the stakes: any exposed model faces these on day one. Budget the most classroom time here.

One Door, Five Threat Families

Evasion
Crafted input fools the model into doing its task wrong
Prompt injection
Instructions smuggled into the prompt — direct or indirect
Sensitive data disclosure through use
The model gives away its training data
Model exfiltration
The model itself is copied — question by question
AI resource exhaustion
Input burns availability — or your budget
Same door
All arrive through the input channel — so they share generic controls

Six Generic Controls — Shared by All Input Threats

Monitor use
Log inputs, outputs, patterns — detect and reconstruct
Rate limit
Slow the experimentation attacks depend on
Model access control
Only authenticated, authorized actors may query
Anomalous input handling
Flag the single strange input
Unwanted input series handling
Flag the suspicious sequence of inputs
Obscure confidence
Starve attacks of the feedback signal they feed on
🎯

Evasion

Crafted input — an adversarial example — misleads the model into performing its task incorrectly.

Evasion manipulates the data the model works on; prompt injection manipulates the instructions.

🔍

Zero-Knowledge Evasion: The Query-Probing Story

The model is a closed box — the attacker knows nothing and asks it everything.

  • No code, no weights, no architecture
  • Thousands of designed inputs hit the live model
  • Each response redraws the map of the decision boundary
  • Returned confidence scores make the search far faster
Tell-tale sign

Your logs fill with probing traffic — this is where rate limiting and series detection bite.

Partial-Knowledge and Perfect-Knowledge

Partial-knowledge evasion (gray-box)
Some internals known — architecture family, kind of training data. Sharpens the search or the surrogate. The most realistic real-world case.
Perfect-knowledge evasion (white-box)
Full architecture, parameters, weights in hand. Compute the minimal perturbation directly from gradients — no probing needed.
More knowledge = fewer queries — the search moves off your servers
🔄

Transfer Attack: The Surrogate

The attacker never queries your model during the search.

  • Builds or obtains a surrogate model — a copy or approximation of the target
  • Crafts adversarial examples on the surrogate, at leisure
  • Similar task → similar decision boundaries → attacks carry over
  • Zero queries against the target while crafting
Why your logs stay clean

Rate limiting, series detection and obscured confidence never see the attack being developed.

🚪

Evasion After Poisoning: The Planted Key

The odd one out — the weakness was manufactured, not found.

  • Training data was poisoned earlier — a development-time attack
  • The poison plants a backdoor: trigger input → attacker-chosen output
  • At runtime the attacker simply presents the trigger
  • No search needed — they planted the key themselves
Two-phase attack

Planted at development-time, cashed in at runtime. Full poisoning story: subtopic 2.2.

Five Evasion Types — a Ladder of Attacker Insight

Zero-knowledge
No internals — probe the live model, read its responses
Partial-knowledge
Some internals — a sharper, cheaper search
Perfect-knowledge
Full weights — compute the attack from gradients
Transfer attack
Surrogate model — the search happens elsewhere
Evasion after poisoning
Planted backdoor — no search at all

Zero-Knowledge vs Transfer Attack

Zero-knowledge

  • Search runs on the live target
  • Thousands of probing queries
  • Logs fill with traffic
  • Rate limits & series detection bite
vs

Transfer attack

  • Search runs on a surrogate
  • Zero target queries while crafting
  • Logs stay clean
  • Only per-input defenses catch it
How to tell them apart

Ask where the experimentation happens: probing the live target → zero-knowledge; crafting on a stand-in → transfer attack.

🛡

Controls Bite the Search — Not the Example

Rate limiting, series detection and obscured confidence frustrate probing.

Against transfer attacks and evasion after poisoning, only per-input defenses still work: evasion input handling, input distortion, adversarial training.

SORT EXERCISE

Which Evasion Type Is It?

Sort each mini-scenario into one of the five evasion types.

Computes the exact perturbation from the model's gradients — the full weight file is in hand
Sends thousands of mutated images to the live API, steering by the labels returned
Presents the trigger pattern an accomplice planted in last year's training data
Never touches the target — crafts stickers on a similar model they trained themselves
A leaked vendor paper reveals the architecture, making the query search far more efficient
Zero-knowledge
  • Thousands of mutated images to the live API — search on the target
Partial-knowledge
  • Leaked architecture paper sharpens the search — some internals known
Perfect-knowledge
  • Gradients from the full weight file — compute, don't probe
Transfer attack
  • Stickers crafted on their own similar model — a surrogate
Evasion after poisoning
  • Planted trigger presented at runtime — no search needed
Give pairs 3 minutes; collect answers before revealing. The classifier is always the attacker's knowledge, never the data type.

Exam Question

Drava Telecom's spam filter blocks a marketing firm's mailings. The firm never queries the filter. Instead, it crafts rewordings on an open-source spam model it runs locally — and the rewritten mailings then slip past Drava's filter. Which evasion type is this?

A) Zero-knowledge evasion
B) Partial-knowledge evasion
C) Transfer attack
D) Evasion after poisoning
Answer: C
The search ran on a surrogate — a similar model the attacker controls — and zero queries hit the target. Zero-knowledge is the tempting distractor, but it requires probing the live filter; Drava's logs stay clean.

Direct vs Indirect Prompt Injection

Direct prompt injection

  • The user typing is the attacker
  • Jailbreaks, role-play, "ignore previous instructions"
  • Result flows back to the attacker
vs

Indirect prompt injection

  • A third party attacks; the user is a victim
  • Instructions hide in content the application inserts — webpage, CV, image
  • Dedicated control: input segregation
How to tell them apart

Trace the channel: typed by the user → direct. Riding inside inserted third-party content → indirect.

🔓

Jailbreak

A direct prompt injection aimed at defeating the supplier's alignment and safety training.

Two routes in: abuse competing objectives — helpfulness overrides safety — or use inputs the safety training doesn't recognize, like unusual encodings.

Recognize the Forms, Recognize the Carriers

Direct — attack forms
Role-play · override instructions · encodings & mixed languages · split-up requests · multi-turn steering · system prompt leakage
Indirect — the carriers
Compromised webpage fetched as context · white-on-white text in a CV · pixels in an image a multimodal model reads
Agentic AI raises the stakes
If the model can act, a poisoned page acts too — untrusted data treated as executable instructions.
Shared control: prompt injection I/O handling · indirect-specific: input segregation
BUILD IT TOGETHER

Name the Seven Layers of Prompt Injection Protection

Call them out, in order! Each click reveals the next.

1Model Alignment
2Prompt Injection Defense
3Human Oversight
4Automated Oversight
5User-Based Privilege
6Intent-Based Privilege
7Just-In-Time Authorization
After the reveal, ask: which two layers prevent and detect? (1–2.) What do the other five do? (Limit the blast radius.)

Every Layer Has a Flaw

1 Model Alignment
Models stay easy to mislead — trained or not.
2 Prompt Injection Defense
An arms race; false negatives guaranteed.
3 Human Oversight
Costly, slow — and approval fatigue clicks "yes" by reflex.
4 Automated Oversight
Reactive: acts only once trouble has started.
5 User-Based Privilege
Users may do far more than the task needs.
6 Intent-Based Privilege
Intent isn't always known in advance.
7 Just-In-Time Authorization
Finest grain — but the architecture must support it.
Weak alone, strong together — assume injection succeeds; layers 3–7 shrink the blast radius
CLASS POLL

Vote Now!

Talvik Energy's outage-report agent receives, in advance, read-only access to the sensor archive — because writing reports requires reading, never sending or deleting. Which protection layer is this?

A) Layer 3 — Human Oversight
B) Layer 5 — User-Based Privilege
C) Layer 6 — Intent-Based Privilege
D) Layer 7 — Just-In-Time Authorization
Answer: C) Layer 6 — Intent-Based Privilege
Rights scoped to the task, assigned in advance. Layer 5 scopes by user identity; layer 7 grants rights at the moment, per subtask.

Model Inversion vs Membership Inference

Model inversion

  • Attacker starts with nothing
  • Optimizes inputs to chase confidence signals
  • Reconstructs approximations of training data
  • Gain: data they never had — a recognizable face
vs

Membership inference

  • Attacker already holds the record
  • Tell-tale extra confidence betrays membership
  • Gain: one bit — in or out
  • One bit can reveal a diagnosis
How to tell them apart

Inversion = unknown data out of the model. Membership inference = known data held up against the model. Both feed on confidence indications.

The Disclosure Trio

Disclosure of sensitive data in model output
The model simply emits memorized training or input data — no attack needed. Last line of defense: sensitive output handling.
Model inversion
Reconstruct what you never had — via intensive, confidence-guided querying.
Membership inference
Confirm what you brought along — was this record in the training set?
Group name: sensitive data disclosure through use — a confidentiality breach of the training set
🔬

Overfitting Is the Root Cause

A model with too much capacity memorizes individual records — which can then be reconstructed or recognized.

Paired controls: small model at development-time; obscure confidence at runtime — starve both attacks.

📡

Model Exfiltration: From Q&A to Replica

Pellucid Insurance notices one account sweeping its pricing API — 900,000 methodical quote requests.

  • Harvested input–output pairs become a manufactured training set
  • A new model trained on them replicates the original
  • The replica = a perfect-knowledge surrogate for attacking you
  • Weeks later: adversarial tricks work with zero visible probing
Also called

Model stealing · model extraction · model theft through use. Countered by the generic input-threat controls plus one dedicated control →

💧

Model Watermarking = Post-Theft Proof

A hidden marker proves a surfaced copy derives from your model — supporting ownership claims and legal action.

It does not prevent the theft. Prevention comes from access control, rate limiting, monitoring and series detection.

SPOT THE MISTAKES

Find the 3 Errors

Work in pairs — a colleague wrote this in the risk register.

"Model exfiltration means the attacker breaks into production storage and copies the parameter file. Fortunately, model watermarking prevents this theft. A related threat, membership inference, lets an attacker reconstruct training records they never possessed."

breaks into production storage harvests input–output pairs through normal use — breaking in and copying the file is a direct runtime model leak, not exfiltration.

prevents this theft proves ownership after the theft — watermarking enables post-theft verification; it stops nothing.

reconstruct records they never possessed confirm whether a record they already hold was in the training set — reconstruction is model inversion.

2 minutes in pairs. All three errors are classic exam distractors — say so explicitly.

AI Resource Exhaustion

Denial of service (DoS)
Availability: the system turns slow or unresponsive
Denial-of-wallet (DoW)
Funds: metered compute and API fees burn your budget
Sponge attack
Input crafted to maximize computation (energy-latency) — can cause both at once
DoS input validation
Reject or correct the oversized, pathological, deliberately complex
Limit resources
Cap what any single input may consume

Content Is the AI Twist

Exhaustion can come from frequency, volume — or the content of a single input.

Conventional DoS thinking counts requests. One cleverly built sponge input costs as much as a flood — so cap resources per input, not just per actor.

🧠

Memorize This

Five evasion types

Zero-knowledge Partial-knowledge Perfect-knowledge Transfer attack Evasion after poisoning

"none → some → all → surrogate → planted"

Seven protection layers

Model Alignment Prompt Injection Defense Human Oversight Automated Oversight User-Based Privilege Intent-Based Privilege Just-In-Time Authorization

"Most Prompts Hide An Unwelcome Instruction, Justifiably"

Exam Trigger Phrases — Input Threats

"probes the live API with mutated inputs"
Zero-knowledge evasion
"builds a surrogate model"
Transfer attack
"hidden text in a retrieved page or CV"
Indirect prompt injection
"confidence reveals the record was in training"
Membership inference
"harvests input–output pairs to train a copy"
Model exfiltration
"the cloud bill exploded"
Denial-of-wallet (DoW)

Exam Question

Quellhaus Bank runs a face-recognition entry system and a credit-scoring API. Incident 1: a journalist submits a specific customer's photo and concludes, from the unusually high confidence returned, that the photo was in the training set. Incident 2: a competitor scripts two million varied requests to the scoring API and trains a working copy from the answers. Which pair is correct?

A) Model inversion + direct runtime model leak
B) Membership inference + model exfiltration
C) Membership inference + direct runtime model leak
D) Model inversion + model exfiltration
Answer: B
Incident 1: the journalist brought the record; confidence only confirmed membership — inversion would mean reconstructing data she never had. Incident 2: the model was stolen through use, by harvesting I/O pairs — a leak would require breaking into the systems storing it.
Day 2 of 3

Day 2

Development-time & runtime threats, then the controls that answer them

Yesterday: the organization and the input door. Today: attacks on the build pipeline and the live system — then Topic 3's control catalog.

5-minute Day 1 recap: ask the class to name the five evasion types and the seven layers from memory.
Topic 2 · Subtopic 2.2 · 10% of exam

Development-Time Threats

Poisoning attacks integrity — leaks attack confidentiality

The attacker strikes while you build: your data, your pipeline, your supply chain. ~4 questions.

Data Poisoning

Manipulating the data a model learns from, to change the model's behavior.

Whoever controls the training data controls the behavior — no need to touch the model or the code at all.

Five Entry Points — Same Threat

Supplier
Dataset poisoned before you obtain it
Transit
Data altered on the way to storage
Storage
Training database edited in your environment
Preparation
Manipulated during cleaning and labeling
Operation
Live input collected as tomorrow's training data
The trap
Attacker used the live system — still development-time: the harm happens when the data is learned from
🚪

The Trigger Sticker

Cintra Logistics' parcel scanner waves through any box bearing a small violet sticker. Targeted poisoning planted a backdoor.

  • A few poisoned samples: subtle trigger pattern + attacker-chosen label
  • Perfect behavior on everything else — including your whole test set
  • Later, the adversary simply shows the trigger
  • No code to review; parameters mean nothing to the eye
The runtime cash-in

Exploiting the planted trigger is evasion after poisoning (2.1): planted development-time, triggered at runtime.

Sabotage vs Targeted (Backdoor) Poisoning

Sabotage

  • Degrades the model for regular inputs
  • Fraud detection simply stops working
  • Normal traffic misbehaves → surfaces quickly
vs

Targeted / backdoor (Trojan)

  • Hidden trigger + attacker-chosen label
  • Normal behavior otherwise — passes every test
  • Far more dangerous: designed for your blind spot
How to tell them apart

Detectability is the divider: sabotage announces itself; a backdoor hides until its trigger appears.

Direct Development-Time Model Poisoning

What the hands touch
Stored weights edited · model file swapped · serialized model that runs code when loaded (deserialization) · pipeline code & configuration · a compromised library in the training run
Boundary 1
Manipulated at a supplier, then shipped to you → supply-chain model poisoning.
Boundary 2
Learning data manipulated → data poisoning.
"Direct" = hands on the model itself, or on the machinery that builds it
📦

The Backdoor That Fine-Tuning Missed

Ondine Diagnostics downloads an open-source clinical model from a public hub and fine-tunes it on its own clean records.

  • Manipulated before integration — at the supplier or in transit; invisible from your side
  • Fine-tuning on clean data does not reliably erase a backdoor
  • Supplied model used for further training = a transfer learning attack
  • Your data, pipeline, people: all clean
Your remaining controls

Provenance, checksums & signatures, scan artifacts before loading, poison robust model, model ensemble, continuous validation.

Data Poisoning vs Model Poisoning

Data poisoning

  • Training data manipulated
  • The machinery works as designed — it faithfully learns corrupted material
  • Trigger words: records, labels, dataset
vs

Model poisoning

  • The model or its engineering elements manipulated
  • Weights, pipeline code, configuration, libraries
  • Direct (your environment) or supply-chain (supplier's model)
How to tell them apart

Ask what the attacker's hands touched: learning data → data poisoning; the model or its machinery → model poisoning.

CLASS POLL

Vote Now!

An audit at Sorrel Analytics finds a compromised Python package in the training pipeline: on every run it silently nudges certain model weights. The training data was never touched. What is the threat?

A) Data poisoning
B) Supply-chain model poisoning
C) Direct development-time model poisoning
D) Direct runtime model poisoning
Answer: C) Direct development-time model poisoning
Libraries are engineering elements that build the model — hands on the machinery, not the data. B tempts because the package arrived via the supply chain, but no supplied trained model was manipulated.
Expect a split B/C vote — let both camps argue before revealing. The threat is named after what is manipulated.

Three Development-Time Leaks — the Asset Decides the Name

Development-time data leak
Asset: training/test data — real data, personal data, company secrets
Direct development-time model leak
Asset: model attributes — parameters, weights, architecture
Source code/configuration leak
Asset: the recipe — pipeline code and training configuration
Leak ≠ poisoning
Copied = leak (confidentiality) · changed = poisoning (integrity)
🔑

A Model Leak Upgrades the Attacker

With a private copy, a zero-knowledge attacker becomes a perfect-knowledge one.

Evasion and inference attacks get rehearsed offline — no rate limits, no monitoring, no detection in the way.

Where Poisoning Enters the Lifecycle

Supplier
Poisoned dataset · manipulated pre-trained model (supply-chain model poisoning)
Preparation
Data tampered while being cleaned and labeled
Training
Training database hacked · parameters, code, config, libraries (direct development-time model poisoning)
Runtime
Operation-collected data poisoned · planted trigger cashed in (evasion after poisoning)

Exam Question

Vireo Health downloads a pre-trained triage model from a public hub and fine-tunes it on its own carefully validated records. Months later, red teamers find one odd token sequence that reliably produces dangerous advice. Vireo's data, pipeline, and staff all check out clean. What is the threat?

A) Data poisoning
B) Direct development-time model poisoning
C) Supply-chain model poisoning
D) Direct prompt injection
Answer: C
The model arrived manipulated, and fine-tuning on clean data does not reliably erase a planted backdoor — a transfer learning attack. A and B would locate the manipulation inside Vireo's own environment, which checked out clean; D is a runtime input threat, not a planted behavior.
Topic 2 · Subtopic 2.3 · 10% of exam

Runtime Conventional Security Threats

Old attacks, new consequences

A live AI system is still an IT system — every conventional attack works here too. ~4 questions.

The Technique Is Old — the Consequences Are AI-Specific

SQL injection, stolen credentials, ransomware: none of them care that a neural network is inside.

Steal the model → run inference attacks offline. Tamper with parameters → invisible to code review. Hack augmentation data → change behavior without touching the model.

Direct Runtime Model Poisoning vs Direct Runtime Model Leak

Direct runtime model poisoning

  • Live parameters altered — or the model's I/O logic compromised
  • Integrity breach: runs, but off-spec
  • Controls: runtime model integrity · I/O integrity
vs

Direct runtime model leak

  • Live parameters copied — storage, memory, even side channels
  • Confidentiality breach: IP theft + offline rehearsal copy
  • Controls: runtime model confidentiality · model obfuscation
How to tell them apart

Altered = poisoning (integrity) · copied = leak (confidentiality). Replicated purely by querying the API? Neither — that is model exfiltration.

💬

The Script in the Transcript

Juniper Airlines' support assistant writes answers straight into the console. A prankster makes it output hidden script — which runs in the next agent's browser.

  • Model output carried a conventional attack (cross-site scripting)
  • Victim: the downstream component that trusts the output
  • Variant: data packed into a markdown image URL — exfiltrated on render
  • Payload arrives via prompt injection, or emerges on its own
Output containing conventional injection

Decades-old lesson: treat model output as untrusted input. Control: encode model output.

Input Data Leak

Input data leak
The user's input is exposed at rest or in transit by a conventional attack. The model behaves normally — the plumbing around it bleeds.
Control
Model input confidentiality: encryption, access control, minimal retention. Plus data minimization — what you never store cannot leak.
Why the stakes are high
Prompts carry strategy papers, source code, health questions — and metadata ties them to identified users.
Where it leaks
Debug logs · provider-side prompt logging (read the fine print) · intercepted traffic · RAG context rides inside the prompt, so it leaks too.

Input Data Leak vs Sensitive Data Disclosure Through Use

Input data leak

  • Breach in storage or on the wire
  • Log file, misconfigured bucket, intercepted connection
  • The model is never touched
vs

Sensitive data disclosure through use

  • The model's own answer reveals the data
  • Memorized training data or confidential context
  • Breach happens through the input–output channel
How to tell them apart

Locate the breach: "log file", "at rest", "in transit" → input data leak. "The model revealed…" → disclosure through use.

Direct Augmentation Data Leak vs Augmentation Data Manipulation

Direct augmentation data leak

  • Attacker reads: vector database dumped, retrieval traffic sniffed
  • Confidentiality breach
  • Behavior does not change
vs

Augmentation data manipulation

  • Attacker writes: planted chunks steer every future prompt
  • Integrity breach
  • Data poisoning's logic, transplanted to runtime data
Vector databases are an attack surface

A copy of sensitive content outside its regular protection — and embeddings can be mined back into text. Read = leak · write = manipulation.

SPOT THE MISTAKES

Find the 3 Errors

Work in pairs — an intern drafted this incident summary.

"A chatbot was manipulated into emitting hidden JavaScript that ran in the next viewer's browser — a classic case of indirect prompt injection. The fix is simple: once the model is well aligned, its output can be trusted downstream. Separately, an attacker who rewrites chunks in the RAG vector database commits data poisoning."

indirect prompt injection output containing conventional injection — the model is the delivery vehicle; the victim is the downstream component that processes the output.

its output can be trusted downstream treat model output as untrusted input — encode model output — alignment never guarantees clean output.

data poisoning augmentation data manipulation — the vector database is a runtime asset feeding prompts, not training data.

SORT EXERCISE

Development-Time or Runtime?

Sort each incident by lifecycle stage — then name the threat.

An insider relabels fraud records in the training database
A misconfigured bucket exposes months of prompt logs
Weights copied from a data scientist's laptop
The deployed model's parameter file is edited on the production server
Preprocessing scripts and training config stolen from the Git repository
A planted chunk in the vector database steers the assistant's answers
Development-time
  • Relabeled fraud records — data poisoning
  • Weights from the laptop — direct development-time model leak
  • Stolen scripts & config — source code/configuration leak
Runtime
  • Exposed prompt logs — input data leak
  • Edited production parameters — direct runtime model poisoning
  • Planted vector chunk — augmentation data manipulation
Two-step call-outs: stage first, threat name second. 4 minutes, then reveal bucket by bucket.

Topic 2 Recap — Every Threat, Mapped

Input (through use)
Evasion (5 types) · direct & indirect prompt injection · sensitive data disclosure through use · model inversion · membership inference · model exfiltration · AI resource exhaustion
Development-time
Data poisoning · direct development-time model poisoning · supply-chain model poisoning · development-time data leak · direct development-time model leak · source code/configuration leak
Runtime conventional
Direct runtime model poisoning · direct runtime model leak · output containing conventional injection · input data leak · direct augmentation data leak · augmentation data manipulation
🧭

Name Any Threat in Two Questions

1. Which lifecycle stage — development-time or runtime? 2. Which asset — data, model, input, output, or augmentation data?

Answer both and the threat name follows. Tomorrow's controls walk the same map from the defender's side.

Exam Question

Ostmark Credit suffers two incidents in one week. Incident 1: an intruder on a production host copies the scoring model's parameters from memory. Incident 2: a misconfigured debug proxy exposes months of customers' prompts to the internet. Which pair is correct?

A) Model exfiltration + development-time data leak
B) Direct runtime model leak + input data leak
C) Direct runtime model leak + sensitive data disclosure through use
D) Direct development-time model leak + input data leak
Answer: B
Parameters copied by breaking into the live system = direct runtime model leak — exfiltration would harvest I/O pairs through queries, and D picks the wrong environment. Exposed prompt logs = input data leak; the model revealed nothing, which rules out disclosure through use.
Topic 3 · Subtopic 3.1 · 12.5% of exam

Governance

Six controls, eight rollout steps, provider vs deployer

AI security starts at the top — not in the code.

Topic 3: Three Families of General Controls

Governance
3.1 — six controls that manage AI through policies, roles, and risk
Limit sensitive data
3.2 — five controls that shrink the data attack surface
Limit unwanted behavior
3.3 — seven controls that cap what misbehavior can reach
Specialized controls
Input filtering, robust training, output encoding — they live with their threats in Topic 2
🏛

What "Good AI Security Governance" Means

Clear policies, defined roles, and risk management — spanning secure development, deployment, and monitoring.

Never a single tool, a one-off audit, or one lifecycle stage.

📋

The Bare Minimum

1. Make an inventory of current AI use — including ideas in the pipeline. 2. Perform a risk analysis on it.

You cannot protect what you do not know you have.

The Six General Governance Controls

AI Program · AI PROGRAM
Govern AI as an organization: inventory, impact analysis, responsibilities, AI literacy.
Security Program · SEC PROGRAM
The ISMS covers the whole AI lifecycle and its AI-specific assets and threats.
Secure Development Program · SEC DEV PROGRAM
Build security into the AI system while it is being made.
Development Program · DEV PROGRAM
Engineering best practice for AI — maintainable, reliable, future-proof. Broader than security.
Check Compliance · CHECK COMPLIANCE
Privacy and AI laws in compliance management — regulation as a driver.
Security Education · SEC EDUCATE
Teach AI security threats and controls to engineers, dev teams, security pros.
None of these invents a parallel "AI department" — they extend structures you already have.

The Near-Twins: Development vs Secure Development

Development Program

  • Lifecycle program for AI work
  • General engineering best practice: versioning, testing, documentation
  • Objective: maintainable, portable, reliable, future-proof systems
  • Security is one benefit among several
vs

Secure Development Program

  • Development processes that build security in
  • Addresses risks while the system is constructed, not after
  • Objective: reduce security risks during development
  • Security is the whole point
How to tell them apart

Read the objective. Engineering quality with security as a side benefit → Development Program. Security built into construction → Secure Development Program.

Coverage: One Word — Overarching

The general governance controls apply to all AI threats and all lifecycle stages.

Any answer that fences them into one threat, one tool, or one phase is a trap.

BUILD IT TOGETHER

The 8 Organizational Implementation Steps

Call them out! Each click reveals the next.

1Organize control of AI — ownership, inventory, risk
2Teach data obfuscation & minimization
3Extend supply-chain management to data, models, cloud
4Add AI assets & risks to the ISMS repository
5Teach DevSecOps
6Teach AI security controls for model engineering & runtime
7Extend monitoring to AI-attack behavior
8Implement model guardrails, oversight & least privilege
Go around the room — one step per participant. Note the arc: inventory first, runtime controls last. ~5 min.
📦

Ready-Made Models Change the Question

Not just "which controls do we need?" — "who implements which controls?"

A ready-made model is trained — possibly hosted — by a third party. Provider: model-level, development-time. You: application-level.

Self-Hosted vs Hosted Ready-Made Model

Self-hosted

  • Supplier: development-time, model-level controls — training-data hygiene, base alignment
  • You: everything at runtime — infrastructure, monitoring, rate limiting, access control, output validation, privileges, oversight
  • Data stays inside your environment
vs

Hosted (API)

  • Supplier also runs the platform: hosting security, its monitoring and rate limiting
  • You keep application-level controls: what data you send, output validation, injection handling, privileges, oversight
  • Your input leaves your environment — in clear text
How to tell who owns what

Ownership follows who operates the layer — not who authored the model.

🔓

The Jailbroken Tutor

Quillow, a language-learning app on a hosted LLM API, is jailbroken into producing offensive replies. Who owns the fix?

  • Base alignment belongs to the provider — report the jailbreak
  • Quillow cannot retrain or fine-tune someone else's model
  • "Stronger system prompt" = more of what just failed
The deployer's own fix

Add an output validation layer: check model responses against rules — and a filtering model where needed — before they ever reach a learner.

Hosted Model Due Diligence

Clear text
A hosted model must read your input unencrypted — outside your infrastructure
Where does it run?
Vendor's cluster, or your virtual private cloud?
What is retained?
Retention rules — and a court order can override them
What is logged?
And who — or what — reads those logs?
Used for training?
Is your input training someone else's model? Check the opt-out
The trade-off
Model quality vs data control — some data can't accept the residual risk
🛡

Your Duties Never Transfer

No provider can decide what data you send, whether to trust the output, or which privileges your users and model get.

Hosting shifts infrastructure work — application-level controls stay with the deployer.

SORT EXERCISE

Provider or Deployer?

A hosted API model. Whose job is each control?

Training-data hygiene
Decide what data is sent to the model
Base model alignment
Output validation & encoding
Hosting platform security
User & model privileges
Development environment security
Oversight of behavior in your context
Provider (supplier & host)
  • Training-data hygiene
  • Base model alignment
  • Hosting platform security
  • Development environment security
Deployer (you)
  • Decide what data is sent to the model
  • Output validation & encoding
  • User & model privileges
  • Oversight of behavior in your context
Topic 3 · Subtopic 3.2 · 7.5% of exam

Limiting Sensitive Data

Five controls that shrink the data attack surface

The cheapest data to defend is the data you never keep.

📏

Shrink the Data Attack Surface

Three dimensions: amount · variety · duration.

Fewer records, fewer kinds, kept for less time — development-time and runtime, from training data to inputs, outputs, and logs.

The Five Data-Limitation Controls

Data minimization
DATA MINIMIZE — remove fields and records the application doesn't need
Allowed data
ALLOWED DATA — remove data prohibited for this purpose ("may we use it at all?")
Short retention
SHORT RETAIN — remove or anonymize once no longer needed; minimization along the time axis
Obfuscate training data
OBFUSCATE TRAINING DATA — mask, tokenize, pseudonymize, add differential-privacy noise to what must stay
Discretion
DISCRETE — minimize access to technical details attackers could use

Delete vs Disguise

Data minimization

  • Deletes data you never needed
  • AI models tolerate reduced features better than intuition suggests
  • Nothing left = nothing to steal
vs

Obfuscate training data

  • Transforms data you must keep
  • Masking, tokenization, pseudonymization, calibrated noise
  • Reduces re-identification risk — never eliminates it
How to tell them apart

Delete first; obfuscate only what you cannot delete. And note: pseudonymization is reversible (a mapping table exists) — weaker than anonymization.

"What is not there cannot be leaked — or manipulated."
— the data-limitation mantra
Exam Tip

The benefit lands on confidentiality and integrity: nothing to disclose, nothing to corrupt. "Encryption makes retained data safe anyway" is the distractor — retained data is still a target.

Exam Question

Vantora Retail trains a churn model on customer records that still contain full bank account numbers — which have no predictive value. Which control should be applied, and how?

A) Obfuscate training data — tokenize the account numbers
B) Data minimization — remove the account numbers entirely
C) Short retention — delete the training extract after five years
D) Discretion — restrict access to the pipeline documentation
Answer: B
Data with no predictive value should be deleted, not disguised. A is the classic trap: obfuscation is for sensitive data that must stay — and tokenization leaves a mapping table that itself becomes an asset to steal.
Topic 3 · Subtopic 3.3 · 7.5% of exam

Limiting Unwanted Behavior

Seven controls that cap what misbehavior can reach

You can't prevent every cause — so limit the effects.

The Seven Behavior-Limitation Controls

Oversight · OVERSIGHT
Watch behavior — human or automated — and respond. The final checkpoint. Beware approval fatigue.
Least model privilege · LEAST MODEL PRIVILEGE
Minimize what the model can do and access. The heart of agentic AI safety.
Model alignment · MODEL ALIGNMENT
Constrain behavior inside the model — probabilistic, manipulable, never a guarantee.
AI transparency · AI TRANSPARENCY
Tell users the system's properties so they can calibrate reliance.
Continuous validation · CONTINUOUS VALIDATION
Frequently test against a test set — catches poisoning, drift, staleness. Not backdoors.
Explainability · EXPLAINABILITY
Explain individual decisions — counters overreliance, helps assessors.
Unwanted bias testing · UNWANTED BIAS TESTING
Bias tests double as a security sensor: a sudden shift can reveal manipulation.
Whatever the cause — attack or accident — limit what misbehavior can reach.
🚫

A System Prompt Is Not Access Control

"Never refund above €500" in a prompt is a suggestion to a text predictor — prompt injection walks straight through it.

Authorization lives outside the model: task-scoped least model privilege, plus human approval for high-risk actions.

AI Transparency vs Explainability

AI transparency

  • About the system: how it roughly works, its data, expected accuracy, residual risks
  • Lets users calibrate reliance and decide what to share
  • Simplest form: saying an AI is involved at all
vs

Explainability

  • About one decision: why the model produced this output
  • Builds justified trust, counters overreliance
  • Helps security assessors judge the model's risks
How to tell them apart

System properties → transparency. One specific decision → explainability.

SPOT THE MISTAKES

Find the 3 Errors

Work in pairs — each sentence hides one.

"Aldertree Bank deploys an agentic assistant that can move funds between a customer's own accounts. The security plan: a system prompt forbidding transfers above €500 serves as the assistant's access control; weekly continuous validation will also catch any backdoor poisoning; and AI transparency will explain to each customer why an individual transfer was flagged."

system prompt as access control authorization must live outside the model — a prompt is probabilistic model alignment; use least model privilege plus human approval.

validation catches backdoor poisoning backdoors are built to pass test sets — they trigger only on inputs that never appear in validation; you need data quality control and poisoning defenses.

transparency explains individual decisions that is explainability — transparency covers system properties, not single outputs.

Pairs, 3 minutes, then reveal one error at a time.
🚗

The One-Dollar Chevrolet

Late 2023: users manipulate a US car dealership's chatbot into "agreeing" to sell a new Chevrolet for $1 — "no takesies backsies."

  • The prompt injection was trivial
  • But the bot could only generate text — not execute sales
  • Damage: reputational, not financial
Blast radius in action

Rerun it with an agent that holds ordering authority and no privilege limits — the same trivial trick becomes a direct financial loss.

Ask the room: what changes if this bot can issue quotes binding to your CRM? ~2 min discussion.
💥

Blast Radius: Two Levers

1. Minimize & obfuscate data — less to lose. 2. Limit model behavior — less it can do.

Whatever goes wrong — poisoning, injection, or honest error — these levers cap what it costs.

Unwanted Behavior Needs No Attacker

Bad training data, drift and staleness, engineering mistakes, feedback loops — model collapse.

Attacks are one cause among several — so these controls are shared work, and they pay off even with zero adversaries.

Benefits Beyond Security

Fewer hallucinations
Higher task success and accuracy
Better calibration
Consistent, on-scope outputs
Less waste
No compute burned on off-task output
Fewer incidents
Less operational firefighting
Smaller attack surface
A constrained model offers less to exploit
Lower exposure
Reduced legal, security, and reputational risk
🧠

Memorize This: 6 · 5 · 7

Count the families

6governance controls 5data-limitation controls 7behavior-limitation controls

"Six govern, five slim, seven restrain."

The canonical answers

  • Bare minimum = inventory + risk analysis
  • Governance coverage = overarching
  • Jailbroken API model → output validation layer
  • Agent that acts → task-scoped least privilege + human approval
SORT EXERCISE

Threat → Primary Control

Match each Topic 2 threat to the control family that counters it first.

membership inference
development-time data leak
model exfiltration
denial-of-wallet (DoW)
data poisoning
supply-chain model poisoning
output containing conventional injection
direct prompt injection
Data limitation (minimize · obfuscate · short retain)
  • membership inference
  • development-time data leak
Monitoring & rate limiting
  • model exfiltration
  • denial-of-wallet (DoW)
Data quality control & supply chain management
  • data poisoning
  • supply-chain model poisoning
Encode model output
  • output containing conventional injection
Injection I/O handling + oversight & least model privilege
  • direct prompt injection
Pairs, 5 minutes — this is the Chapter 3 master table in miniature. Remind them the six governance controls sit over every row.

Exam Question

Solvex Energy gives an agentic AI assistant access to its billing system: it can read contracts, adjust tariffs, and issue credits. Which combination best limits the effects of manipulation?

A) A system prompt forbidding credits above €200, plus AI transparency
B) Continuous validation plus explainability
C) Task-scoped least model privilege plus human approval for high-risk actions
D) Data minimization plus short retention
Answer: C
"Agent" + "can act" always pulls toward privilege plus approval: bound what actions are possible, insert a human before the irreversible ones. A relies on probabilistic alignment — not access control. B detects and explains after the fact; D limits data, not actions.
Day 3 of 3

Day 3 — Testing, Privacy & Compliance, Exam Review

Topic 4 · Topic 5 · cross-topic recap and exam strategy

Attack your own system before someone else does.

Topic 4 · Subtopic 4.1 · 5% of exam

Threats Scope

Three testing strategies, two threat trios

A control is a hypothesis until someone tries to break it.

CLASS POLL

Vote Now!

Fenwick Insurance's annual penetration test formally included its claims-fraud model and its customer chatbot — and found no serious issues. Is the AI now security-tested?

A) Yes — both AI systems were in scope
B) No — pentesting covers only one of three testing strategies
C) Yes — provided the models also pass their accuracy checks
D) No — AI systems cannot be pentested at all
Answer: B) No — pentesting covers only one of three testing strategies
The pentest exercised the conventional stack — servers, APIs, access control. Nobody validated model behavior against acceptance criteria, and nobody simulated attacks on the models themselves. One leg of a three-legged stool.
Hands up per option before revealing — expect a split between A and B. ~3 min.

Why AI Security Testing Exists

Assess the resilience of an AI system by reproducing realistic attacks against it in a controlled environment.

Not accuracy, not compliance, not feature checks — if there is no adversary in the answer, it is not AI security testing.

Three Testing Strategies

Conventional security testing
Pentest the stack around the AI: infrastructure, APIs, access control, supply chain
Model performance validation
Benign test set vs acceptance criteria — detects permanently altered behavior and drift
AI security testing
The security part of AI red teaming: simulate attacks, probe safeguards, play the adversary

AI Security Testing vs Model Performance Validation

AI security testing

  • Adversarial by design
  • Hostile prompts, crafted inputs, bypass attempts
  • Success = finding the weakness before a real attacker does
vs

Model performance validation

  • Benign by design
  • Representative test set, accuracy vs acceptance criteria
  • Security use: spotting behavior permanently altered by poisoning
How to tell them apart

Hostile inputs → security testing. Accuracy on normal data → performance validation. "Simulates attacks" vs "acceptance criteria" — the stem words give it away.

What to Test For: Two Trios

Predictive AI

  • Evasion — crafted inputs mislead the task
  • Model exfiltration — the stolen replica becomes an attack oracle
  • Model poisoning — development-time manipulation of data, pipeline, or supply chain
vs

Generative AI

  • Prompt injection — manipulative instructions, direct or indirect
  • Sensitive data disclosure in output — the model is coaxed into revealing secrets
  • Insecure output handling — output carrying conventional injection hits downstream systems
Three vs three, no overlap

Anchor on the paradigm first: does the system predict or generate? Then check the threat against the right trio.

Exam Question

Atlas Freight runs a predictive delivery-delay model and a generative customer-support chatbot. Beyond conventional security testing, which threat belongs on each test plan?

A) Delay model: prompt injection — Chatbot: evasion
B) Delay model: evasion — Chatbot: sensitive data disclosure in output
C) Delay model: insecure output handling — Chatbot: model exfiltration
D) Delay model: sensitive data disclosure in output — Chatbot: model poisoning
Answer: B
Evasion sits in the predictive trio (evasion, model exfiltration, model poisoning); sensitive data disclosure in output sits in the generative trio (prompt injection, sensitive data disclosure in output, insecure output handling). Option A cross-wires the paradigms exactly backwards — the classic trap.
Topic 4 · Subtopic 4.2 · 2.5% of exam

AI Security Testing Strategies

The eight-step approach — iterative by design

One clean round proves only that the easy attacks failed.

BUILD IT TOGETHER

Name the 8 Testing Steps

Call them out! Each click reveals the next — the exam asks what comes next.

1Define objectives & scope
2Understand the AI system
3Identify potential threats
4Develop attack scenarios
5Test execution
6Risk assessment
7Prioritization & risk mitigation
8Validation of fixes — then iterate
Point out the shape: scoping before attacking, validation loops back into testing. ~4 min.

Test Execution Done Right

Production parity
Same model version, prompts, tools, permissions, configuration as production
Run it multiple times
GenAI output is non-deterministic — one clean run may be luck
Attack the real route
Through the system API with all filters — including untrusted-data paths, to simulate indirect prompt injection
Positive testing
Benign inputs must still work — don't drown legitimate users in false positives
🔄

Blocked Is Not Resilient

After blocked inputs: add variation — synonyms, encodings, formatting changes — and rerun.

If a paraphrase sails through, your safeguard matched surface features, not intent. In essence: an evasion attack on your own detection.

Exam Question

Juniper Telecom's red team finds four vulnerabilities in its support chatbot. Engineering implements all mitigations, and the project manager closes the engagement. According to the general AI security testing approach, what is wrong?

A) Nothing — risk mitigation is the final step
B) Risk assessment should have been repeated before mitigation
C) Validation of fixes is missing — the system must be retested post-remediation
D) The engagement should have closed with a compliance report
Answer: C
Step 8 is validation of fixes: implementing a mitigation is not evidence it works until the previously successful attacks are rerun. That loop back into testing is why the approach is iterative. A confuses "action taken" with "risk reduced"; D swaps in a compliance goal that isn't the purpose of security testing.
Topic 5 · Subtopic 5.1 · 5% of exam · ~2 questions

Privacy and AI Security

The privacy definition, AI-specific concerns, the nine privacy principles

You can encrypt everything perfectly and still violate privacy.

"Privacy is personal data protection plus respect for further individual rights."
— The two-part definition the exam wants, word for word
Exam Tip

Distractors shrink privacy down to confidentiality. "We encrypted it" never answers a question about consent, purpose, or erasure.

AI Privacy Has Two Parts

1 · The security part
Confidentiality & integrity of personal data in training data, model input, and output — plus integrity of model behavior where wrong behavior can hurt individuals.
2 · The rights part — not security
Further individual rights under privacy regulations: use limitation, consent, fairness, transparency — the rights to know, correct, object, erase.
Perfect encryption cannot fix part 2.

Why AI Makes Privacy Harder

📦 Data intensity
Data-hungry systems: extra risk at collection and retention, many sources, many legal constraints.
⌛ Long retention
Retraining keeps training data around for years — a direct tension with storage limitation.
🔧 Engineering exposure
AI teams routinely handle production personal data during development — conventional dev teams rarely do.
🕵 Model attacks
Model inversion, membership inference, sensitive data disclosure through use — the model itself becomes a leak channel.
⚖ Discriminating decisions
Decisions about people can discriminate — and outputs can trigger privacy-invading actions.
🌐 Federated learning
The AI-native mitigation: train in iterations across separate sites, so raw data never leaves its source.

Assess Before, Respond After

Privacy impact assessment (PIA / DPIA)
Structured, up-front review of privacy risks. GDPR's DPIA is mandatory when processing is likely high-risk for individuals — training AI on personal data is a textbook trigger.
Privacy incident
Personal data leaks, is accessed without authorization, or is used beyond its purpose → incident response plus GDPR breach-notification duties.
Run the DPIA before the first record enters the data science environment.

The Nine Privacy Principles — Scenario Cues

Accuracy
A wrong data point drives a harmful automated decision.
Consent
Permission never asked, bundled, or impossible to withdraw.
Data minimization & storage limitation
More data, finer grain, or longer retention than the purpose needs.
Fairness & lawfulness
Unexpected handling, no legal basis, discriminatory effects.
Privacy rights
No way to access, correct, erase, or object.
Privacy by design
Privacy bolted on after launch. Companion: privacy by default.
Security & safeguards
Personal data unprotected in training data, environment, or I/O.
Transparency & explainability
Affected people cannot learn how the decision was made.
Use limitation & purpose specification
Data collected for one purpose, reused for another — THE top pattern.
🔄

The Top Exam Pattern

Data reused beyond the purpose it was collected for → use limitation & purpose specification.

"We already have the data" is never a justification — the principle constrains use, not just collection.

📱

"We Already Have the Data"

Large platforms collected phone numbers for multi-factor authentication — a security purpose.

  • The numbers were quietly reused for advertising and targeting
  • No new data was collected; storage stayed secure
  • Regulators sanctioned it as a serious violation anyway
The lesson

The purpose users agreed to was security, not marketing. Reuse beyond purpose is the violation — not collection.

🧠

Memorize This — Nine Principles, Alphabetical

A to P

  • Accuracy
  • Consent
  • Data minimization & storage limitation
  • Fairness & lawfulness
  • Privacy rights

P to U

  • Privacy by design
  • Security & safeguards
  • Transparency & explainability
  • Use limitation & purpose specification

"Counting note: EXIN's official list shows eight bullets — it merges privacy rights + privacy by design. Know the content either way."

SORT EXERCISE

Which Principle Is Violated?

Match each scenario to the privacy principle it violates.

Loyalty-card purchases, collected for rewards, now train an ad-targeting model
The privacy team first reviews the model two months after go-live
Consent hidden inside a 40-page terms of service
An erasure request arrives — no process exists to honor it
Data sharing is ON by default; users must hunt for the opt-out
An employer "asks" employees to volunteer health data
Income data collected for KYC checks feeds a loan-pricing model
Customers have no channel to see or correct their own record
Use limitation & purpose specification
  • Loyalty-card purchases → ad-targeting model
  • KYC income data → loan-pricing model
Privacy by design
  • Privacy review only after go-live
  • Protective settings not the default (privacy by default)
Consent
  • Bundled into the terms of service
  • Employer "asking" — genuine consent impossible
Privacy rights
  • No process for erasure requests
  • No way to access or correct your record
Pairs, 3 minutes. Ask for the decisive cue before revealing each bucket — "purpose switch" should come up unprompted.

Exam Question

Fjordline Ferries trains a no-show prediction model on booking data. All personal data is encrypted, access requires MFA, and every query is logged. A passenger asks which of her data the model used and requests erasure — Fjordline has no process to respond. What is the privacy situation?

A) Privacy is covered — strong security controls protect the personal data
B) Privacy is partly covered — data protection is handled, but further individual rights are not respected
C) Privacy is not affected — prediction models make no automated decisions
D) Privacy is covered, provided a DPIA was completed before training
Answer: B
Privacy is personal data protection plus respect for further individual rights. Encryption, MFA, and logging cover the first half only; transparency and erasure belong to the second half. A shrinks privacy to confidentiality — the classic distractor. D fails because a DPIA identifies risks; it does not discharge individual rights.
Topic 5 · Subtopic 5.2 · 7.5% of exam · ~3 questions

Compliance and Regulation

Four ISO/IEC standards, the EU AI Act, the GDPR, copyright

Regulations demand outcomes; standards give you the machinery — and the evidence.

The ISO Quad — Match Standard to Job

ISO/IEC 23894
AI risk management across the lifecycle. Hook: "ISO 31000, translated for AI."
ISO/IEC 27005
Information security risk management. Hook: "the risk engine behind 27001 — not AI-specific."
ISO/IEC 42001
AI management system (AIMS): governance, policies, roles, continual improvement. Hook: "42001 is to AI what 27001 is to infosec."
ISO/IEC 5338
AI lifecycle / MLOps processes. Hook: "engineering, not governance."
Matching trap

AI risk management → 23894. Information security risk management → 27005. "Management system" language always points to 42001, never to 5338.

EU AI Act — Four Risk Tiers, Top Down

1 · Unacceptable risk
Prohibited outright — social scoring, manipulation, real-time remote biometric identification in public spaces.
2 · High risk
Permitted with compliance obligations + ex-ante conformity assessment — résumé screening, medical devices, critical infrastructure.
3 · Limited risk
Permitted with transparency obligations — a chatbot must disclose that it is a bot.
4 · Minimal / no risk
Permitted without restrictions.
Two footnotes that score marks

Tiers are not mutually exclusive — one system can owe obligations from more than one tier. And the Act protects people, not company secrets: compliance ≠ secure.

CLASS POLL

Vote Now!

Kestrel Talent launches an AI system that screens résumés and shortlists candidates for interviews. Under the EU AI Act, this system is…

A) Prohibited — it makes decisions about people
B) High risk — permitted with compliance obligations and an ex-ante conformity assessment
C) Limited risk — it only needs to disclose that it is AI
D) Minimal risk — recruitment is not named in the Act
Answer: B) High risk — conformity assessment before market
Recruitment decides access to employment — the canonical high-risk example. "Prohibited" is the planted trap: prohibition is reserved for unacceptable-tier practices like social scoring.
Hands up per option before revealing — expect a split between A and B; use it to anchor "high risk, not prohibited".

GDPR × AI — Ten Friction Points (Know the Names)

1
Lawful basis
2
Purpose limitation
3
Data minimization vs model performance
4
Transparency and explainability
5
Automated decision-making and profiling
6
Operationalizing data-subject rights
7
Accuracy and fairness
8
Security and leakage
9
International transfers
10
Accountability and roles
These are compliance difficulties you manage — not the nine principles you violate.
BUILD IT TOGETHER

Name the 10 Copyright Mitigations

Call them out! Each click reveals the next.

1Mitigate disclosure of training data in output
2Comprehensive IP audits
3Clear legal framework & policies
4Ethical data sourcing
5Define ownership of AI-generated content
6Confidentiality & trade-secret protocols
7Employee training
8Compliance monitoring systems
9Response planning for IP infringement
10Licenses and/or warranties from AI suppliers
🏠

The Data-Sourcing Hierarchy

Safest and most ethical: create training data in-house. Licensed or permissioned data comes second.

"Publicly available" is not a license — that is exactly what the AI copyright lawsuits are about.

Exam Question

Bergamot Bank runs an ISO/IEC 27001 ISMS. The board now wants an equivalent organization-wide framework for AI: governance, policies, roles, controls, and continual improvement. Which standard should the bank adopt?

A) ISO/IEC 23894
B) ISO/IEC 27005
C) ISO/IEC 42001
D) ISO/IEC 5338
Answer: C
"Management system" language — governance, policies, roles, continual improvement — always points to ISO/IEC 42001, the AI management system (AIMS) and the AI analogue of 27001. A is the tempting distractor: 23894 is AI risk management, not a management system. 5338 is lifecycle engineering; 27005 is information security risk management.
Review · All topics

Review — Putting It All Together

The threat map, the frameworks in order, the traps, the strategy

40 questions · 90 minutes · 26 correct = pass. Let's make sure the traps don't work on you.

The Threat Map — One Last Look

Input threats (runtime use)
  • evasion (5 types)
  • direct & indirect prompt injection
  • sensitive data disclosure through use
  • model inversion · membership inference
  • model exfiltration
  • AI resource exhaustion — denial of service (DoS), denial-of-wallet (DoW), sponge attack
Development-time threats
  • data poisoning
  • direct development-time model poisoning
  • supply-chain model poisoning
  • development-time data leak
  • direct development-time model leak
  • source code/configuration leak
Runtime conventional threats
  • direct runtime model poisoning
  • direct runtime model leak
  • output containing conventional injection
  • input data leak
  • direct augmentation data leak
  • augmentation data manipulation
Classify in two steps: lifecycle stage first, asset second.
🧠

Speed Round 1 — The Ordered Frameworks

G.U.A.R.D. — fixed order

Govern Understand Adapt Reduce Demonstrate

"After Govern + Understand, the next step is Adapt."

Risk management — 4 steps, repeated

  • 1 · Identify (threat modeling)
  • 2 · Evaluate (likelihood × severity)
  • 3 · Risk treatment (mitigate · transfer · avoid · accept)
  • 4 · Risk communication & monitoring

"Controls only appear at risk treatment."

BUILD IT TOGETHER

Speed Round 2 — The Seven Layers of Prompt Injection Protection

Call them out in order! Each click reveals the next.

1Model Alignment
2Prompt Injection Defense
3Human Oversight
4Automated Oversight
5User-Based Privilege
6Intent-Based Privilege
7Just-In-Time Authorization
After the reveal, ask which layers prevent (1–2) and which limit blast radius (3–7). Weak alone, strong together.
🔁

Speed Round 3 — The Two 8-Step Sequences

AI security testing (order!)

  • 1 · Define objectives & scope
  • 2 · Understand the AI system
  • 3 · Identify potential threats
  • 4 · Develop attack scenarios
  • 5 · Test execution
  • 6 · Risk assessment
  • 7 · Prioritization & risk mitigation
  • 8 · Validation of fixes

Organizational implementation

  • 1 · Organize control of AI
  • 2 · Teach data obfuscation & minimization
  • 3 · Extend supply-chain management
  • 4 · Add AI assets & risks to the ISMS
  • 5 · Teach DevSecOps
  • 6 · Teach AI security controls
  • 7 · Extend monitoring to AI-attack behavior
  • 8 · Guardrails, oversight, least privilege

"Next-step questions pay for knowing the order, not just the members."

Trap Pair 1 — Model Inversion vs Membership Inference

Model inversion

  • Reconstructs training data the attacker never had
  • Inputs optimized against confidence scores
  • Result: approximate data
vs

Membership inference

  • Confirms whether a record the attacker already holds was in the training set
  • Excess confidence betrays membership
  • Result: one bit — yes/no
How to tell them apart

Reconstruct vs confirm — both feed on confidence scores. Bonus pair: model exfiltration harvests input–output pairs to build a replica (front door); a model leak steals the real parameter file (break-in).

Trap Pair 2 — Same Words, Different Lifecycle Stage

Development-time

  • Attacker's hands reach the engineering environment
  • direct development-time model poisoning
  • direct development-time model leak
vs

Runtime

  • Attacker's hands reach the live production system
  • direct runtime model poisoning
  • direct runtime model leak
Two more quick tells

DoW vs DoS: follow the harm — money drained → denial-of-wallet (DoW); service down → denial of service (DoS). Privacy principles vs GDPR challenges: rules you violate vs difficulties you manage.

SPOT THE MISTAKES

Find the 3 Errors

Work in pairs — this incident report mislabels three threats.

"An attacker harvested thousands of input–output pairs from our public API and trained a working replica of the model — a textbook direct runtime model leak. They then flooded the endpoint with compute-heavy sponge inputs, tripling our monthly cloud bill — a classic denial of service (DoS). Finally, instructions hidden in a supplier webpage our assistant retrieves made it exfiltrate data — direct prompt injection."

direct runtime model leak model exfiltration — a replica built from harvested input–output pairs uses the front door; a leak steals the actual parameter file.

denial of service (DoS) denial-of-wallet (DoW) — the named harm is cost, not availability; one sponge attack can cause both, so follow the harm.

direct prompt injection indirect prompt injection — the instructions ride in third-party content the application inserts, not in the user's own prompt.

Quick-Fire Next-Step Anchors

Govern + Understand done?
Next G.U.A.R.D. step: Adapt — AI threat modeling and AI testing live there.
Inventory finished?
Next: a risk analysis on it — the bare-minimum pair for AI security oversight.
Threat list finished?
Next: threat modeling → prioritized risks. Controls come only at risk treatment.
First test round blocked everything?
Add input variation — synonyms, encodings, formatting. Never declare victory.
Jailbroken API model?
The deployer's fix is an output validation layer — never "retrain the provider's model".
Data reused for a new purpose?
Use limitation & purpose specification — not consent, not minimization.

Exam Strategy — Four Rules

⚖ Pairing items
Judge each half independently — one right half never rescues a wrong one.
🔁 Next-step items
Know the order of every framework, not just its members.
⏰ 2:15 per question
40 questions in 90 minutes. Flag, move on, return — no question is worth five.
👥 Sibling options
If two options sound like siblings, one is the trap — recheck lifecycle stage and asset.

Gauntlet 1 — Threats (Pairing Item)

Saffron Health finds two incidents. Incident 1: crafted queries against the diagnosis API let a researcher reconstruct recognizable patient records from the training data. Incident 2: a contractor copied the model's parameter files from a production server. Which pair names the threats?

A) membership inference + model exfiltration
B) model inversion + direct runtime model leak
C) model inversion + model exfiltration
D) sensitive data disclosure through use + direct development-time model leak
Answer: B
Reconstructing training data from outputs is model inversion — membership inference would only confirm a known record's presence. Copying parameter files from production is a direct runtime model leak; model exfiltration (C's second half) builds a replica through queries, and D's second half puts the theft in the wrong lifecycle stage. Judge each half independently.

Gauntlet 2 — Organization (Next-Step Item)

Cobalt Mobility has inventoried all AI use, published AI policies with assigned responsibilities, and educated its engineers and security staff on which AI threats apply. Following G.U.A.R.D., what is the next step?

A) Demonstrate — collect evidence for management and regulators
B) Reduce — minimize sensitive data and limit model behavior
C) Adapt — extend the ISMS, threat modeling, and testing to AI
D) Understand — map the AI threat landscape
Answer: C
The inventory and policies complete Govern; threat education completes Understand. Next in the fixed sequence is Adapt — where AI threat modeling and AI security testing live. D is the trap for anyone who files threat education under a still-open Understand; A and B skip ahead in the order.

Gauntlet 3 — Controls (Ready-Made Models)

Orbita Travel builds its booking assistant on a hosted third-party language model accessed through an API. Users discover a jailbreak that makes the assistant produce offensive text. What is Orbita's most effective measure?

A) Require the provider to retrain the model with stronger alignment
B) Add an output validation layer in its own application
C) Move to self-hosting so the runtime controls return in-house
D) Rely on the hosting platform's monitoring to catch abuse
Answer: B
Application-level duties never transfer to the provider — output validation stays with the deployer, and it works immediately regardless of the model's alignment. A is the tempting distractor: only the provider can retrain, on their timeline, and alignment is probabilistic. C changes who hosts, not the application-level gap; D outsources a duty that is yours.
EXIN AI Security Professional

You're Ready — Good Luck!

40 questions · 90 minutes · 65% to pass

Next step: take the practice exams in the portal under exam conditions — 90 minutes, no notes — then review every explanation, right or wrong.

Point students to the portal: practice exams, flashcards, cheat sheet, study guide. Collect feedback before closing.