AISP.EN · 3-Day Classroom Training

EXIN AI Security Professional

Based on the OWASP AI Exchange

Three days from threat map to exam-ready.

Day 1 of 3

Day 1 — Foundations

Intro & exam format · AI security in the organization · Input threats

Day 2: development-time & runtime threats + controls · Day 3: testing, privacy & compliance, review.

Run introductions and an expectations round before this slide (~15 min); ask who already runs AI in production.

Exam At A Glance

40

Multiple-choice questions

90 min

2 minutes 15 seconds per question

65%

Pass mark — 26 of 40 correct

Bloom 2–3

Understand & apply — no pure recall

Closed book

No materials in the exam room

A–E pairs

Two-part pairing questions — all-or-nothing

Where the Marks Are

1. AI Security in the Organization15%

2. AI Security Threats37.5%

3. AI Security Controls27.5%

4. AI Security Testing7.5%

5. Privacy & Compliance12.5%

Threats + Controls = 65% of the marks

Anatomy of an AISP Question

1 · Scenario classification

Short business scenario, then: “What type of attack is this?” The wrong answers are always adjacent concepts.

2 · Two-part pairing (A–E)

Two incidents; each option is a pair of labels. All-or-nothing: both halves must match.

3 · “What is the next step?”

You are placed mid-framework and asked what comes next. Free marks — if you know the order.

Pairing strategy

Judge each half independently; eliminate every option where either half is wrong. Usually one survives.

⏰

2:15 Per Question

90 minutes for 40 questions — time is not the enemy.

Read each scenario once, carefully, instead of three times in a hurry.

Your Study Toolkit

Textbook

Full prose for every learning objective, with scenarios

4 Practice Exams

Timed 40-question sets in the official register

Flashcards

Frameworks, lists and locked threat names

Study Guide

Every list, compressed for revision

Cheat Sheet

One page for the night before

All here

Everything lives on this prep site — bookmark it tonight

🗺

Learn the Map

65% of the exam is threats + controls — learn the map, the rest follows.

Every threat pairs with its controls; master that pairing and Topics 1, 4 and 5 fall into place.

Topic 1 · Subtopic 1.1 · 10% of exam · ~4 questions

Organizing AI Security

G.U.A.R.D. · responsible vs trustworthy · assets & threats

The board says “get AI security organized.” Where do you start — and in what order?

G.U.A.R.D. — Five Steps to Organize AI Security

1 · Govern

AI inventory, policies, responsibilities, compliance, education

2 · Understand

Which threats apply; educate engineers and security staff

3 · Adapt

Extend threat modeling, testing and supply-chain management to AI

4 · Reduce

Minimize sensitive data; limit model behavior and impact

5 · Demonstrate

Evidence for management, regulators and clients

What Each Step Involves

Govern

Inventory where AI is used, set policies, name owners — the Chief AI Officer enters here; ISO/IEC 42001 AIMS

Understand

Which threats apply per system; draw the line between your controls and your supplier's

Adapt

Reshape existing processes: AI threat modeling, AI security testing, supply chain for data, models, hosting

Reduce

Assume the model can fail: shrink sensitive data, cap privileges, add oversight

Demonstrate

Prove safeguards work: transparency, test results, documentation

🧠

Memorize This

The five steps

Govern Understand Adapt Reduce Demonstrate

“G.U.A.R.D. your AI — in exactly this order.”

The anchor transition

Understand identifies and teaches
Adapt changes your processes
Reduce limits the impact

After Govern and Understand comes Adapt — the exam's favorite “next step.”

BUILD IT TOGETHER

Name the 5 G.U.A.R.D. Steps

Call them out in order — each click reveals the next.

1Govern — inventory, policies, responsibilities

2Understand — which threats apply + educate

3Adapt — AI threat modeling, testing, supply chain

4Reduce — minimize data, limit behavior

5Demonstrate — evidence that it all works

Let the room call each step before clicking; ask “what actually happens in this step?” before revealing the next.

Responsible AI vs Trustworthy AI

Responsible AI

Ethics, society, governance
Fairness, societal impact, accountability
Owned by boards & ethics committees

vs

Trustworthy AI

Technical & operational qualities
Robustness, reliability, transparency, explainability
Owned by engineering & operations

How to tell them apart

Ask what the concern attaches to: people and governance structures → responsible; measurable system qualities → trustworthy.

AI Security vs Conventional Security

Still true

AI systems are IT systems: every conventional threat and control still applies.

The equation

AI security = threats to AI-specific assets + threats to all other assets.

Novelty 1 · New assets
Training data, augmentation data, model, input, output.

Novelty 2 · New attack surface
Legitimate model use — simply querying can be attacking.

Novelty 3 · New suppliers
Data, ready-made models, model hosting.

Conventional security is necessary but not sufficient

🛡

Why Your WAF Waves It Through

AI attacks manipulate meaning and statistics — through fully legitimate channels.

An adversarial example is a well-formed request; a poisoned training record carries no malware signature.

Exam Question

Kestrel Mobility has an AI inventory with named owners, its engineers are trained on the threats relevant to each system, and it has just finished extending its ISMS, threat modeling, security testing and supplier management to cover AI. According to G.U.A.R.D., what comes next?

A) Understand — determine which threats apply to each system

B) Adapt — introduce AI-specific threat modeling and testing

C) Reduce — minimize sensitive data and limit model behavior

D) Demonstrate — document evidence for the regulator

Answer: C

Govern, Understand and Adapt are complete, so Reduce comes next: shrink the sensitive data footprint and cap what the model can do. D is the trap — you cannot demonstrate safeguards you have not yet deployed; A and B are already done.

Five Assets — and How They Break

Training data

Leaks: development-time data leak; via use: sensitive data disclosure through use, model inversion, membership inference. Manipulated: data poisoning.

Augmentation data (incl. system prompts)

Direct augmentation data leak · augmentation data manipulation.

Model
Leaks: direct development-time model leak, direct runtime model leak, model exfiltration. Poisoned: direct development-time / supply-chain / direct runtime model poisoning.

Input

Input data leak — prompts can carry secrets.

Output

Output containing conventional injection — attacks downstream systems.

The rhythm: data and model leak and get poisoned — development-time and runtime; input leaks; output injects

🕵

Shadow AI: The Pasted Source Code

2023: engineers at a major electronics maker pasted confidential source code into a public chatbot while debugging.

Sensitive input flowed straight to an external provider
A written ban existed — bans alone don't work
Usage just moves out of sight, where no control applies

The fix that works

Provide a sanctioned, secure, good-quality alternative — and make the risks of unsanctioned tools explicitly clear.

SORT EXERCISE

Whose Step Is It Anyway?

Sort each activity into its G.U.A.R.D. step.

Add AI-specific threat modeling to the SDLC

Publish test evidence for the regulator

Build the AI inventory

Strip customer identifiers from training data

Train engineers on the threats that apply to each system

Extend supplier contracts to cover model hosting

Appoint a Chief AI Officer

Require human sign-off on model-triggered actions

Decide which controls are the supplier's job

Govern

Build the AI inventory
Appoint a Chief AI Officer

Understand

Train engineers on applicable threats
Decide which controls are the supplier's job

Adapt

AI-specific threat modeling in the SDLC
Supplier contracts cover model hosting

Reduce

Strip customer identifiers from training data
Human sign-off on model-triggered actions

Demonstrate

Publish test evidence for the regulator

Pairs, 3 minutes, then reveal. The Adapt-vs-Reduce boundary sparks the best discussion — process change vs impact limitation.

Topic 1 · Subtopic 1.2 · 5% of exam · ~2 questions

Threat Modeling and Agentic AI Risks

Four risk steps · the bridge to priorities · agents on a leash

A threat catalogue is not a risk list — and an agent is not just a chatbot.

Risk Management in Four Steps

1 · Identify

Threat modeling turns the catalogue into concrete risks

2 · Evaluate

Likelihood × severity; prioritize on a heatmap

3 · Risk treatment

Mitigate, transfer, avoid, or accept each risk

4 · Communication & monitoring

Risk register; inform stakeholders; verify treatments

↻ Repeat

Regularly — and whenever changes warrant it

Four Ways to Treat a Risk

Mitigate

Implement controls — the most common route

Transfer

Shift it to a third party: outsourcing, insurance

Avoid

Change plans so the risk disappears — maybe no AI here at all

Accept

Knowingly bear it when treatment costs more than it's worth

🌉

Threat Modeling Is the Bridge

From threat catalogue to concrete, prioritized risks — for your system.

Three questions per threat: does it apply here? How could it realistically happen? What would the impact be?

CLASS POLL

Vote Now!

Halvora Insurance keeps its claims chatbot running but buys a cyber-insurance policy covering losses from model manipulation. Which risk treatment option is this?

A) Mitigation

B) Transfer

C) Avoidance

D) Acceptance

Answer: B) Transfer

The risk is shifted to a third party through insurance. It is not acceptance — someone else bears the loss — and not mitigation, because nothing about the system itself changed.

Hands up per option before revealing; ask a “D” voter to defend, then reveal — the accept/transfer line is the point.

💥

Agents Amplify

Agentic AI doesn't create new threats so much as amplify existing ones.

Agents act, run autonomously, behave unpredictably and span systems — the injection that once embarrassed you now moves money.

🤖

The Compromised-Agent Chain

Corvid Retail's invoice-handling agent runs under one shared service account reaching ticketing, HR records and the payment gateway. A supplier email arrives with instructions hidden in white text.

Indirect prompt injection: untrusted data read as instructions
One shared account → actions chain across every connected system
Excessive agency: far more capability than the task needs

Blast radius

Data theft needs three ingredients: attacker-controlled data in, access to sensitive data, a way out. Remove any one and the attack collapses.

Six Controls for Agentic AI

Traceability

Log what the agent did — and why

Memory-integrity protection

Guard stored state and plans from tampering

Prompt-injection defenses

Separate untrusted content from instructions

Rule-based guardrails

Deterministic checks the model can't talk past

Least model privilege

Scope each agent's permissions to its task

Human oversight

A person approves consequential actions

"Never build access control on GenAI."

— OWASP AI Exchange

Exam Tip

Instructions are not enforcement — the right input overrides them, so authorization lives in the architecture, outside the model. And the maxim behind shared accounts: convenience is the enemy of security.

Exam Question

Marrowgate Legal investigates two incidents on its contract-review assistant. Threat 1: an attacker with write access to the document store edited the precedent texts the assistant retrieves, changing its advice. Threat 2: a generated summary contained hidden JavaScript that executed in the client portal's browser. Which pair is correct?

A) Data poisoning + direct prompt injection

B) Augmentation data manipulation + output containing conventional injection

C) Data poisoning + output containing conventional injection

D) Augmentation data manipulation + indirect prompt injection

E) Direct runtime model poisoning + direct prompt injection

Answer: B

Threat 1 rides in with each retrieval while the model stays untouched — augmentation data manipulation, not data poisoning (which targets training data). Threat 2 is model output carrying a conventional attack downstream. Judge halves independently: C fails on half 1, D fails on half 2.

Topic 2 · Subtopic 2.1 · 17.5% of exam

Input Threats

Evasion · prompt injection · disclosure · exfiltration · resource exhaustion

The single heaviest section of your exam — ~7 of 40 questions walk through this one door.

Frame the stakes: any exposed model faces these on day one. Budget the most classroom time here.

One Door, Five Threat Families

Evasion

Crafted input fools the model into doing its task wrong

Prompt injection

Instructions smuggled into the prompt — direct or indirect

Sensitive data disclosure through use

The model gives away its training data

Model exfiltration

The model itself is copied — question by question

AI resource exhaustion

Input burns availability — or your budget

Same door

All arrive through the input channel — so they share generic controls

Six Generic Controls — Shared by All Input Threats

Monitor use

Log inputs, outputs, patterns — detect and reconstruct

Rate limit

Slow the experimentation attacks depend on

Model access control

Only authenticated, authorized actors may query

Anomalous input handling

Flag the single strange input

Unwanted input series handling

Flag the suspicious sequence of inputs

Obscure confidence

Starve attacks of the feedback signal they feed on

🎯

Evasion

Crafted input — an adversarial example — misleads the model into performing its task incorrectly.

Evasion manipulates the data the model works on; prompt injection manipulates the instructions.

🔍

Zero-Knowledge Evasion: The Query-Probing Story

The model is a closed box — the attacker knows nothing and asks it everything.

No code, no weights, no architecture
Thousands of designed inputs hit the live model
Each response redraws the map of the decision boundary
Returned confidence scores make the search far faster

Tell-tale sign

Your logs fill with probing traffic — this is where rate limiting and series detection bite.

Partial-Knowledge and Perfect-Knowledge

Partial-knowledge evasion (gray-box)

Some internals known — architecture family, kind of training data. Sharpens the search or the surrogate. The most realistic real-world case.

Perfect-knowledge evasion (white-box)
Full architecture, parameters, weights in hand. Compute the minimal perturbation directly from gradients — no probing needed.

More knowledge = fewer queries — the search moves off your servers

🔄

Transfer Attack: The Surrogate

The attacker never queries your model during the search.

Builds or obtains a surrogate model — a copy or approximation of the target
Crafts adversarial examples on the surrogate, at leisure
Similar task → similar decision boundaries → attacks carry over
Zero queries against the target while crafting

Why your logs stay clean

Rate limiting, series detection and obscured confidence never see the attack being developed.

🚪

Evasion After Poisoning: The Planted Key

The odd one out — the weakness was manufactured, not found.

Training data was poisoned earlier — a development-time attack
The poison plants a backdoor: trigger input → attacker-chosen output
At runtime the attacker simply presents the trigger
No search needed — they planted the key themselves

Two-phase attack

Planted at development-time, cashed in at runtime. Full poisoning story: subtopic 2.2.

Five Evasion Types — a Ladder of Attacker Insight

Zero-knowledge

No internals — probe the live model, read its responses

Partial-knowledge

Some internals — a sharper, cheaper search

Perfect-knowledge

Full weights — compute the attack from gradients

Transfer attack

Surrogate model — the search happens elsewhere

Evasion after poisoning

Planted backdoor — no search at all

Zero-Knowledge vs Transfer Attack

Zero-knowledge

Search runs on the live target
Thousands of probing queries
Logs fill with traffic
Rate limits & series detection bite

vs

Transfer attack

Search runs on a surrogate
Zero target queries while crafting
Logs stay clean
Only per-input defenses catch it

How to tell them apart

Ask where the experimentation happens: probing the live target → zero-knowledge; crafting on a stand-in → transfer attack.

🛡

Controls Bite the Search — Not the Example

Rate limiting, series detection and obscured confidence frustrate probing.

Against transfer attacks and evasion after poisoning, only per-input defenses still work: evasion input handling, input distortion, adversarial training.

SORT EXERCISE

Which Evasion Type Is It?

Sort each mini-scenario into one of the five evasion types.

Computes the exact perturbation from the model's gradients — the full weight file is in hand

Sends thousands of mutated images to the live API, steering by the labels returned

Presents the trigger pattern an accomplice planted in last year's training data

Never touches the target — crafts stickers on a similar model they trained themselves

A leaked vendor paper reveals the architecture, making the query search far more efficient

Zero-knowledge

Thousands of mutated images to the live API — search on the target

Partial-knowledge

Leaked architecture paper sharpens the search — some internals known

Perfect-knowledge

Gradients from the full weight file — compute, don't probe

Transfer attack

Stickers crafted on their own similar model — a surrogate

Evasion after poisoning

Planted trigger presented at runtime — no search needed

Give pairs 3 minutes; collect answers before revealing. The classifier is always the attacker's knowledge, never the data type.

Exam Question

Drava Telecom's spam filter blocks a marketing firm's mailings. The firm never queries the filter. Instead, it crafts rewordings on an open-source spam model it runs locally — and the rewritten mailings then slip past Drava's filter. Which evasion type is this?

A) Zero-knowledge evasion

B) Partial-knowledge evasion

C) Transfer attack

D) Evasion after poisoning

Answer: C

The search ran on a surrogate — a similar model the attacker controls — and zero queries hit the target. Zero-knowledge is the tempting distractor, but it requires probing the live filter; Drava's logs stay clean.

Direct vs Indirect Prompt Injection

Direct prompt injection

The user typing is the attacker
Jailbreaks, role-play, "ignore previous instructions"
Result flows back to the attacker

vs

Indirect prompt injection

A third party attacks; the user is a victim
Instructions hide in content the application inserts — webpage, CV, image
Dedicated control: input segregation

How to tell them apart

Trace the channel: typed by the user → direct. Riding inside inserted third-party content → indirect.

🔓

Jailbreak

A direct prompt injection aimed at defeating the supplier's alignment and safety training.

Two routes in: abuse competing objectives — helpfulness overrides safety — or use inputs the safety training doesn't recognize, like unusual encodings.

Recognize the Forms, Recognize the Carriers

Direct — attack forms

Role-play · override instructions · encodings & mixed languages · split-up requests · multi-turn steering · system prompt leakage

Indirect — the carriers
Compromised webpage fetched as context · white-on-white text in a CV · pixels in an image a multimodal model reads

Agentic AI raises the stakes

If the model can act, a poisoned page acts too — untrusted data treated as executable instructions.

Shared control: prompt injection I/O handling · indirect-specific: input segregation

BUILD IT TOGETHER

Name the Seven Layers of Prompt Injection Protection

Call them out, in order! Each click reveals the next.

1Model Alignment

2Prompt Injection Defense

3Human Oversight

4Automated Oversight

5User-Based Privilege

6Intent-Based Privilege

7Just-In-Time Authorization

After the reveal, ask: which two layers prevent and detect? (1–2.) What do the other five do? (Limit the blast radius.)

Every Layer Has a Flaw

1 Model Alignment

Models stay easy to mislead — trained or not.

2 Prompt Injection Defense

An arms race; false negatives guaranteed.

3 Human Oversight

Costly, slow — and approval fatigue clicks "yes" by reflex.

4 Automated Oversight

Reactive: acts only once trouble has started.

5 User-Based Privilege

Users may do far more than the task needs.

6 Intent-Based Privilege

Intent isn't always known in advance.

7 Just-In-Time Authorization

Finest grain — but the architecture must support it.

Weak alone, strong together — assume injection succeeds; layers 3–7 shrink the blast radius

CLASS POLL

Vote Now!

Talvik Energy's outage-report agent receives, in advance, read-only access to the sensor archive — because writing reports requires reading, never sending or deleting. Which protection layer is this?

A) Layer 3 — Human Oversight

B) Layer 5 — User-Based Privilege

C) Layer 6 — Intent-Based Privilege

D) Layer 7 — Just-In-Time Authorization

Answer: C) Layer 6 — Intent-Based Privilege

Rights scoped to the task, assigned in advance. Layer 5 scopes by user identity; layer 7 grants rights at the moment, per subtask.

Model Inversion vs Membership Inference

Model inversion

Attacker starts with nothing
Optimizes inputs to chase confidence signals
Reconstructs approximations of training data
Gain: data they never had — a recognizable face

vs

Membership inference

Attacker already holds the record
Tell-tale extra confidence betrays membership
Gain: one bit — in or out
One bit can reveal a diagnosis

How to tell them apart

Inversion = unknown data out of the model. Membership inference = known data held up against the model. Both feed on confidence indications.

The Disclosure Trio

Disclosure of sensitive data in model output
The model simply emits memorized training or input data — no attack needed. Last line of defense: sensitive output handling.

Model inversion

Reconstruct what you never had — via intensive, confidence-guided querying.

Membership inference

Confirm what you brought along — was this record in the training set?

Group name: sensitive data disclosure through use — a confidentiality breach of the training set

🔬

Overfitting Is the Root Cause

A model with too much capacity memorizes individual records — which can then be reconstructed or recognized.

Paired controls: small model at development-time; obscure confidence at runtime — starve both attacks.

📡

Model Exfiltration: From Q&A to Replica

Pellucid Insurance notices one account sweeping its pricing API — 900,000 methodical quote requests.

Harvested input–output pairs become a manufactured training set
A new model trained on them replicates the original
The replica = a perfect-knowledge surrogate for attacking you
Weeks later: adversarial tricks work with zero visible probing

Also called

Model stealing · model extraction · model theft through use. Countered by the generic input-threat controls plus one dedicated control →

💧

Model Watermarking = Post-Theft Proof

A hidden marker proves a surfaced copy derives from your model — supporting ownership claims and legal action.

It does not prevent the theft. Prevention comes from access control, rate limiting, monitoring and series detection.

SPOT THE MISTAKES

Find the 3 Errors

Work in pairs — a colleague wrote this in the risk register.

"Model exfiltration means the attacker breaks into production storage and copies the parameter file. Fortunately, model watermarking prevents this theft. A related threat, membership inference, lets an attacker reconstruct training records they never possessed."

breaks into production storage harvests input–output pairs through normal use — breaking in and copying the file is a direct runtime model leak, not exfiltration.

prevents this theft proves ownership after the theft — watermarking enables post-theft verification; it stops nothing.

reconstruct records they never possessed confirm whether a record they already hold was in the training set — reconstruction is model inversion.

2 minutes in pairs. All three errors are classic exam distractors — say so explicitly.

AI Resource Exhaustion

Denial of service (DoS)

Availability: the system turns slow or unresponsive

Denial-of-wallet (DoW)

Funds: metered compute and API fees burn your budget

Sponge attack

Input crafted to maximize computation (energy-latency) — can cause both at once

DoS input validation

Reject or correct the oversized, pathological, deliberately complex

Limit resources

Cap what any single input may consume

⚡

Content Is the AI Twist

Exhaustion can come from frequency, volume — or the content of a single input.

Conventional DoS thinking counts requests. One cleverly built sponge input costs as much as a flood — so cap resources per input, not just per actor.

🧠

Memorize This

Five evasion types

Zero-knowledge Partial-knowledge Perfect-knowledge Transfer attack Evasion after poisoning

"none → some → all → surrogate → planted"

Seven protection layers

Model Alignment Prompt Injection Defense Human Oversight Automated Oversight User-Based Privilege Intent-Based Privilege Just-In-Time Authorization

"Most Prompts Hide An Unwelcome Instruction, Justifiably"

Exam Trigger Phrases — Input Threats

"probes the live API with mutated inputs"

Zero-knowledge evasion

"builds a surrogate model"

Transfer attack

"hidden text in a retrieved page or CV"

Indirect prompt injection

"confidence reveals the record was in training"

Membership inference

"harvests input–output pairs to train a copy"

Model exfiltration

"the cloud bill exploded"

Denial-of-wallet (DoW)

Exam Question

Quellhaus Bank runs a face-recognition entry system and a credit-scoring API. Incident 1: a journalist submits a specific customer's photo and concludes, from the unusually high confidence returned, that the photo was in the training set. Incident 2: a competitor scripts two million varied requests to the scoring API and trains a working copy from the answers. Which pair is correct?

A) Model inversion + direct runtime model leak

B) Membership inference + model exfiltration

C) Membership inference + direct runtime model leak

D) Model inversion + model exfiltration

Answer: B

Incident 1: the journalist brought the record; confidence only confirmed membership — inversion would mean reconstructing data she never had. Incident 2: the model was stolen through use, by harvesting I/O pairs — a leak would require breaking into the systems storing it.

Day 2 of 3

Day 2

Development-time & runtime threats, then the controls that answer them

Yesterday: the organization and the input door. Today: attacks on the build pipeline and the live system — then Topic 3's control catalog.

5-minute Day 1 recap: ask the class to name the five evasion types and the seven layers from memory.

Topic 2 · Subtopic 2.2 · 10% of exam

Development-Time Threats

Poisoning attacks integrity — leaks attack confidentiality

The attacker strikes while you build: your data, your pipeline, your supply chain. ~4 questions.

☣

Data Poisoning

Manipulating the data a model learns from, to change the model's behavior.

Whoever controls the training data controls the behavior — no need to touch the model or the code at all.

Five Entry Points — Same Threat

Supplier

Dataset poisoned before you obtain it

Transit

Data altered on the way to storage

Storage

Training database edited in your environment

Preparation

Manipulated during cleaning and labeling

Operation

Live input collected as tomorrow's training data

The trap

Attacker used the live system — still development-time: the harm happens when the data is learned from

🚪

The Trigger Sticker

Cintra Logistics' parcel scanner waves through any box bearing a small violet sticker. Targeted poisoning planted a backdoor.

A few poisoned samples: subtle trigger pattern + attacker-chosen label
Perfect behavior on everything else — including your whole test set
Later, the adversary simply shows the trigger
No code to review; parameters mean nothing to the eye

The runtime cash-in

Exploiting the planted trigger is evasion after poisoning (2.1): planted development-time, triggered at runtime.

Sabotage vs Targeted (Backdoor) Poisoning

Sabotage

Degrades the model for regular inputs
Fraud detection simply stops working
Normal traffic misbehaves → surfaces quickly

vs

Targeted / backdoor (Trojan)

Hidden trigger + attacker-chosen label
Normal behavior otherwise — passes every test
Far more dangerous: designed for your blind spot

How to tell them apart

Detectability is the divider: sabotage announces itself; a backdoor hides until its trigger appears.

Direct Development-Time Model Poisoning

What the hands touch
Stored weights edited · model file swapped · serialized model that runs code when loaded (deserialization) · pipeline code & configuration · a compromised library in the training run

Boundary 1

Manipulated at a supplier, then shipped to you → supply-chain model poisoning.

Boundary 2

Learning data manipulated → data poisoning.

"Direct" = hands on the model itself, or on the machinery that builds it

📦

The Backdoor That Fine-Tuning Missed

Ondine Diagnostics downloads an open-source clinical model from a public hub and fine-tunes it on its own clean records.

Manipulated before integration — at the supplier or in transit; invisible from your side
Fine-tuning on clean data does not reliably erase a backdoor
Supplied model used for further training = a transfer learning attack
Your data, pipeline, people: all clean

Your remaining controls

Provenance, checksums & signatures, scan artifacts before loading, poison robust model, model ensemble, continuous validation.

Data Poisoning vs Model Poisoning

Data poisoning

Training data manipulated
The machinery works as designed — it faithfully learns corrupted material
Trigger words: records, labels, dataset

vs

Model poisoning

The model or its engineering elements manipulated
Weights, pipeline code, configuration, libraries
Direct (your environment) or supply-chain (supplier's model)

How to tell them apart

Ask what the attacker's hands touched: learning data → data poisoning; the model or its machinery → model poisoning.

CLASS POLL

Vote Now!

An audit at Sorrel Analytics finds a compromised Python package in the training pipeline: on every run it silently nudges certain model weights. The training data was never touched. What is the threat?

A) Data poisoning

B) Supply-chain model poisoning

C) Direct development-time model poisoning

D) Direct runtime model poisoning

Answer: C) Direct development-time model poisoning

Libraries are engineering elements that build the model — hands on the machinery, not the data. B tempts because the package arrived via the supply chain, but no supplied trained model was manipulated.

Expect a split B/C vote — let both camps argue before revealing. The threat is named after what is manipulated.

Three Development-Time Leaks — the Asset Decides the Name

Development-time data leak

Asset: training/test data — real data, personal data, company secrets

Direct development-time model leak

Asset: model attributes — parameters, weights, architecture

Source code/configuration leak

Asset: the recipe — pipeline code and training configuration

Leak ≠ poisoning

Copied = leak (confidentiality) · changed = poisoning (integrity)

🔑

A Model Leak Upgrades the Attacker

With a private copy, a zero-knowledge attacker becomes a perfect-knowledge one.

Evasion and inference attacks get rehearsed offline — no rate limits, no monitoring, no detection in the way.

Where Poisoning Enters the Lifecycle

Supplier

Poisoned dataset · manipulated pre-trained model (supply-chain model poisoning)

Preparation

Data tampered while being cleaned and labeled

Training

Training database hacked · parameters, code, config, libraries (direct development-time model poisoning)

Runtime

Operation-collected data poisoned · planted trigger cashed in (evasion after poisoning)

Exam Question

Vireo Health downloads a pre-trained triage model from a public hub and fine-tunes it on its own carefully validated records. Months later, red teamers find one odd token sequence that reliably produces dangerous advice. Vireo's data, pipeline, and staff all check out clean. What is the threat?

A) Data poisoning

B) Direct development-time model poisoning

C) Supply-chain model poisoning

D) Direct prompt injection

Answer: C

The model arrived manipulated, and fine-tuning on clean data does not reliably erase a planted backdoor — a transfer learning attack. A and B would locate the manipulation inside Vireo's own environment, which checked out clean; D is a runtime input threat, not a planted behavior.

Topic 2 · Subtopic 2.3 · 10% of exam

Runtime Conventional Security Threats

Old attacks, new consequences

A live AI system is still an IT system — every conventional attack works here too. ~4 questions.

⚠

The Technique Is Old — the Consequences Are AI-Specific

SQL injection, stolen credentials, ransomware: none of them care that a neural network is inside.

Steal the model → run inference attacks offline. Tamper with parameters → invisible to code review. Hack augmentation data → change behavior without touching the model.

Direct Runtime Model Poisoning vs Direct Runtime Model Leak

Direct runtime model poisoning

Live parameters altered — or the model's I/O logic compromised
Integrity breach: runs, but off-spec
Controls: runtime model integrity · I/O integrity

vs

Direct runtime model leak

Live parameters copied — storage, memory, even side channels
Confidentiality breach: IP theft + offline rehearsal copy
Controls: runtime model confidentiality · model obfuscation

How to tell them apart

Altered = poisoning (integrity) · copied = leak (confidentiality). Replicated purely by querying the API? Neither — that is model exfiltration.

💬

The Script in the Transcript

Juniper Airlines' support assistant writes answers straight into the console. A prankster makes it output hidden script — which runs in the next agent's browser.

Model output carried a conventional attack (cross-site scripting)
Victim: the downstream component that trusts the output
Variant: data packed into a markdown image URL — exfiltrated on render
Payload arrives via prompt injection, or emerges on its own

Output containing conventional injection

Decades-old lesson: treat model output as untrusted input. Control: encode model output.

Input Data Leak

Input data leak
The user's input is exposed at rest or in transit by a conventional attack. The model behaves normally — the plumbing around it bleeds.

Control

Model input confidentiality: encryption, access control, minimal retention. Plus data minimization — what you never store cannot leak.

Why the stakes are high

Prompts carry strategy papers, source code, health questions — and metadata ties them to identified users.

Where it leaks

Debug logs · provider-side prompt logging (read the fine print) · intercepted traffic · RAG context rides inside the prompt, so it leaks too.

Input Data Leak vs Sensitive Data Disclosure Through Use

Input data leak

Breach in storage or on the wire
Log file, misconfigured bucket, intercepted connection
The model is never touched

vs

Sensitive data disclosure through use

The model's own answer reveals the data
Memorized training data or confidential context
Breach happens through the input–output channel

How to tell them apart

Locate the breach: "log file", "at rest", "in transit" → input data leak. "The model revealed…" → disclosure through use.

Direct Augmentation Data Leak vs Augmentation Data Manipulation

Direct augmentation data leak

Attacker reads: vector database dumped, retrieval traffic sniffed
Confidentiality breach
Behavior does not change

vs

Augmentation data manipulation

Attacker writes: planted chunks steer every future prompt
Integrity breach
Data poisoning's logic, transplanted to runtime data

Vector databases are an attack surface

A copy of sensitive content outside its regular protection — and embeddings can be mined back into text. Read = leak · write = manipulation.

SPOT THE MISTAKES

Find the 3 Errors

Work in pairs — an intern drafted this incident summary.

"A chatbot was manipulated into emitting hidden JavaScript that ran in the next viewer's browser — a classic case of indirect prompt injection. The fix is simple: once the model is well aligned, its output can be trusted downstream. Separately, an attacker who rewrites chunks in the RAG vector database commits data poisoning."

indirect prompt injection output containing conventional injection — the model is the delivery vehicle; the victim is the downstream component that processes the output.

its output can be trusted downstream treat model output as untrusted input — encode model output — alignment never guarantees clean output.

data poisoning augmentation data manipulation — the vector database is a runtime asset feeding prompts, not training data.

SORT EXERCISE

Development-Time or Runtime?

Sort each incident by lifecycle stage — then name the threat.

An insider relabels fraud records in the training database

A misconfigured bucket exposes months of prompt logs

Weights copied from a data scientist's laptop

The deployed model's parameter file is edited on the production server

Preprocessing scripts and training config stolen from the Git repository

A planted chunk in the vector database steers the assistant's answers

Development-time

Relabeled fraud records — data poisoning
Weights from the laptop — direct development-time model leak
Stolen scripts & config — source code/configuration leak

Runtime

Exposed prompt logs — input data leak
Edited production parameters — direct runtime model poisoning
Planted vector chunk — augmentation data manipulation

Two-step call-outs: stage first, threat name second. 4 minutes, then reveal bucket by bucket.

Topic 2 Recap — Every Threat, Mapped

Input (through use)

Evasion (5 types) · direct & indirect prompt injection · sensitive data disclosure through use · model inversion · membership inference · model exfiltration · AI resource exhaustion

Development-time

Data poisoning · direct development-time model poisoning · supply-chain model poisoning · development-time data leak · direct development-time model leak · source code/configuration leak

Runtime conventional

Direct runtime model poisoning · direct runtime model leak · output containing conventional injection · input data leak · direct augmentation data leak · augmentation data manipulation

🧭

Name Any Threat in Two Questions

1. Which lifecycle stage — development-time or runtime? 2. Which asset — data, model, input, output, or augmentation data?

Answer both and the threat name follows. Tomorrow's controls walk the same map from the defender's side.

Exam Question

Ostmark Credit suffers two incidents in one week. Incident 1: an intruder on a production host copies the scoring model's parameters from memory. Incident 2: a misconfigured debug proxy exposes months of customers' prompts to the internet. Which pair is correct?

A) Model exfiltration + development-time data leak

B) Direct runtime model leak + input data leak

C) Direct runtime model leak + sensitive data disclosure through use

D) Direct development-time model leak + input data leak

Answer: B

Parameters copied by breaking into the live system = direct runtime model leak — exfiltration would harvest I/O pairs through queries, and D picks the wrong environment. Exposed prompt logs = input data leak; the model revealed nothing, which rules out disclosure through use.

Topic 3 · Subtopic 3.1 · 12.5% of exam

Governance

Six controls, eight rollout steps, provider vs deployer

AI security starts at the top — not in the code.

Topic 3: Three Families of General Controls

Governance

3.1 — six controls that manage AI through policies, roles, and risk

Limit sensitive data

3.2 — five controls that shrink the data attack surface

Limit unwanted behavior

3.3 — seven controls that cap what misbehavior can reach

Specialized controls

Input filtering, robust training, output encoding — they live with their threats in Topic 2

🏛

What "Good AI Security Governance" Means

Clear policies, defined roles, and risk management — spanning secure development, deployment, and monitoring.

Never a single tool, a one-off audit, or one lifecycle stage.

📋

The Bare Minimum

1. Make an inventory of current AI use — including ideas in the pipeline. 2. Perform a risk analysis on it.

You cannot protect what you do not know you have.

The Six General Governance Controls

AI Program · AI PROGRAM

Govern AI as an organization: inventory, impact analysis, responsibilities, AI literacy.

Security Program · SEC PROGRAM

The ISMS covers the whole AI lifecycle and its AI-specific assets and threats.

Secure Development Program · SEC DEV PROGRAM

Build security into the AI system while it is being made.

Development Program · DEV PROGRAM
Engineering best practice for AI — maintainable, reliable, future-proof. Broader than security.

Check Compliance · CHECK COMPLIANCE

Privacy and AI laws in compliance management — regulation as a driver.

Security Education · SEC EDUCATE

Teach AI security threats and controls to engineers, dev teams, security pros.

None of these invents a parallel "AI department" — they extend structures you already have.

The Near-Twins: Development vs Secure Development

Development Program

Lifecycle program for AI work
General engineering best practice: versioning, testing, documentation
Objective: maintainable, portable, reliable, future-proof systems
Security is one benefit among several

vs

Secure Development Program

Development processes that build security in
Addresses risks while the system is constructed, not after
Objective: reduce security risks during development
Security is the whole point

How to tell them apart

Read the objective. Engineering quality with security as a side benefit → Development Program. Security built into construction → Secure Development Program.

☔

Coverage: One Word — Overarching

The general governance controls apply to all AI threats and all lifecycle stages.

Any answer that fences them into one threat, one tool, or one phase is a trap.

BUILD IT TOGETHER

The 8 Organizational Implementation Steps

Call them out! Each click reveals the next.

1Organize control of AI — ownership, inventory, risk

2Teach data obfuscation & minimization

3Extend supply-chain management to data, models, cloud

4Add AI assets & risks to the ISMS repository

5Teach DevSecOps

6Teach AI security controls for model engineering & runtime

7Extend monitoring to AI-attack behavior

8Implement model guardrails, oversight & least privilege

Go around the room — one step per participant. Note the arc: inventory first, runtime controls last. ~5 min.

📦

Ready-Made Models Change the Question

Not just "which controls do we need?" — "who implements which controls?"

A ready-made model is trained — possibly hosted — by a third party. Provider: model-level, development-time. You: application-level.

Self-Hosted vs Hosted Ready-Made Model

Self-hosted

Supplier: development-time, model-level controls — training-data hygiene, base alignment
You: everything at runtime — infrastructure, monitoring, rate limiting, access control, output validation, privileges, oversight
Data stays inside your environment

vs

Hosted (API)

Supplier also runs the platform: hosting security, its monitoring and rate limiting
You keep application-level controls: what data you send, output validation, injection handling, privileges, oversight
Your input leaves your environment — in clear text

How to tell who owns what

Ownership follows who operates the layer — not who authored the model.

🔓

The Jailbroken Tutor

Quillow, a language-learning app on a hosted LLM API, is jailbroken into producing offensive replies. Who owns the fix?

Base alignment belongs to the provider — report the jailbreak
Quillow cannot retrain or fine-tune someone else's model
"Stronger system prompt" = more of what just failed

The deployer's own fix

Add an output validation layer: check model responses against rules — and a filtering model where needed — before they ever reach a learner.

Hosted Model Due Diligence

Clear text

A hosted model must read your input unencrypted — outside your infrastructure

Where does it run?

Vendor's cluster, or your virtual private cloud?

What is retained?

Retention rules — and a court order can override them

What is logged?

And who — or what — reads those logs?

Used for training?

Is your input training someone else's model? Check the opt-out

The trade-off

Model quality vs data control — some data can't accept the residual risk

🛡

Your Duties Never Transfer

No provider can decide what data you send, whether to trust the output, or which privileges your users and model get.

Hosting shifts infrastructure work — application-level controls stay with the deployer.

SORT EXERCISE

Provider or Deployer?

A hosted API model. Whose job is each control?

Training-data hygiene

Decide what data is sent to the model

Base model alignment

Output validation & encoding

Hosting platform security

User & model privileges

Development environment security

Oversight of behavior in your context

Provider (supplier & host)

Training-data hygiene
Base model alignment
Hosting platform security
Development environment security

Deployer (you)

Decide what data is sent to the model
Output validation & encoding
User & model privileges
Oversight of behavior in your context

Topic 3 · Subtopic 3.2 · 7.5% of exam

Limiting Sensitive Data

Five controls that shrink the data attack surface

The cheapest data to defend is the data you never keep.

📏

Shrink the Data Attack Surface

Three dimensions: amount · variety · duration.

Fewer records, fewer kinds, kept for less time — development-time and runtime, from training data to inputs, outputs, and logs.

The Five Data-Limitation Controls

Data minimization

DATA MINIMIZE — remove fields and records the application doesn't need

Allowed data

ALLOWED DATA — remove data prohibited for this purpose ("may we use it at all?")

Short retention

SHORT RETAIN — remove or anonymize once no longer needed; minimization along the time axis

Obfuscate training data

OBFUSCATE TRAINING DATA — mask, tokenize, pseudonymize, add differential-privacy noise to what must stay

Discretion

DISCRETE — minimize access to technical details attackers could use

Delete vs Disguise

Data minimization

Deletes data you never needed
AI models tolerate reduced features better than intuition suggests
Nothing left = nothing to steal

vs

Obfuscate training data

Transforms data you must keep
Masking, tokenization, pseudonymization, calibrated noise
Reduces re-identification risk — never eliminates it

How to tell them apart

Delete first; obfuscate only what you cannot delete. And note: pseudonymization is reversible (a mapping table exists) — weaker than anonymization.

"What is not there cannot be leaked — or manipulated."

— the data-limitation mantra

Exam Tip

The benefit lands on confidentiality and integrity: nothing to disclose, nothing to corrupt. "Encryption makes retained data safe anyway" is the distractor — retained data is still a target.

Exam Question

Vantora Retail trains a churn model on customer records that still contain full bank account numbers — which have no predictive value. Which control should be applied, and how?

A) Obfuscate training data — tokenize the account numbers

B) Data minimization — remove the account numbers entirely

C) Short retention — delete the training extract after five years

D) Discretion — restrict access to the pipeline documentation

Answer: B

Data with no predictive value should be deleted, not disguised. A is the classic trap: obfuscation is for sensitive data that must stay — and tokenization leaves a mapping table that itself becomes an asset to steal.

Topic 3 · Subtopic 3.3 · 7.5% of exam

Limiting Unwanted Behavior

Seven controls that cap what misbehavior can reach

You can't prevent every cause — so limit the effects.

The Seven Behavior-Limitation Controls

Oversight · OVERSIGHT

Watch behavior — human or automated — and respond. The final checkpoint. Beware approval fatigue.

Least model privilege · LEAST MODEL PRIVILEGE
Minimize what the model can do and access. The heart of agentic AI safety.

Model alignment · MODEL ALIGNMENT

Constrain behavior inside the model — probabilistic, manipulable, never a guarantee.

AI transparency · AI TRANSPARENCY

Tell users the system's properties so they can calibrate reliance.

Continuous validation · CONTINUOUS VALIDATION

Frequently test against a test set — catches poisoning, drift, staleness. Not backdoors.

Explainability · EXPLAINABILITY

Explain individual decisions — counters overreliance, helps assessors.

Unwanted bias testing · UNWANTED BIAS TESTING

Bias tests double as a security sensor: a sudden shift can reveal manipulation.

Whatever the cause — attack or accident — limit what misbehavior can reach.

🚫

A System Prompt Is Not Access Control

"Never refund above €500" in a prompt is a suggestion to a text predictor — prompt injection walks straight through it.

Authorization lives outside the model: task-scoped least model privilege, plus human approval for high-risk actions.

AI Transparency vs Explainability

AI transparency

About the system: how it roughly works, its data, expected accuracy, residual risks
Lets users calibrate reliance and decide what to share
Simplest form: saying an AI is involved at all

vs

Explainability

About one decision: why the model produced this output
Builds justified trust, counters overreliance
Helps security assessors judge the model's risks

How to tell them apart

System properties → transparency. One specific decision → explainability.

SPOT THE MISTAKES

Find the 3 Errors

Work in pairs — each sentence hides one.

"Aldertree Bank deploys an agentic assistant that can move funds between a customer's own accounts. The security plan: a system prompt forbidding transfers above €500 serves as the assistant's access control; weekly continuous validation will also catch any backdoor poisoning; and AI transparency will explain to each customer why an individual transfer was flagged."

system prompt as access control authorization must live outside the model — a prompt is probabilistic model alignment; use least model privilege plus human approval.

validation catches backdoor poisoning backdoors are built to pass test sets — they trigger only on inputs that never appear in validation; you need data quality control and poisoning defenses.

transparency explains individual decisions that is explainability — transparency covers system properties, not single outputs.

Pairs, 3 minutes, then reveal one error at a time.

🚗

The One-Dollar Chevrolet

Late 2023: users manipulate a US car dealership's chatbot into "agreeing" to sell a new Chevrolet for $1 — "no takesies backsies."

The prompt injection was trivial
But the bot could only generate text — not execute sales
Damage: reputational, not financial

Blast radius in action

Rerun it with an agent that holds ordering authority and no privilege limits — the same trivial trick becomes a direct financial loss.

Ask the room: what changes if this bot can issue quotes binding to your CRM? ~2 min discussion.

💥

Blast Radius: Two Levers

1. Minimize & obfuscate data — less to lose. 2. Limit model behavior — less it can do.

Whatever goes wrong — poisoning, injection, or honest error — these levers cap what it costs.

⚠

Unwanted Behavior Needs No Attacker

Bad training data, drift and staleness, engineering mistakes, feedback loops — model collapse.

Attacks are one cause among several — so these controls are shared work, and they pay off even with zero adversaries.

Benefits Beyond Security

Fewer hallucinations

Higher task success and accuracy

Better calibration

Consistent, on-scope outputs

Less waste

No compute burned on off-task output

Fewer incidents

Less operational firefighting

Smaller attack surface

A constrained model offers less to exploit

Lower exposure

Reduced legal, security, and reputational risk

🧠

Memorize This: 6 · 5 · 7

Count the families

6governance controls 5data-limitation controls 7behavior-limitation controls

"Six govern, five slim, seven restrain."

The canonical answers

Bare minimum = inventory + risk analysis
Governance coverage = overarching
Jailbroken API model → output validation layer
Agent that acts → task-scoped least privilege + human approval

SORT EXERCISE

Threat → Primary Control

Match each Topic 2 threat to the control family that counters it first.

membership inference

development-time data leak

model exfiltration

denial-of-wallet (DoW)

data poisoning

supply-chain model poisoning

output containing conventional injection

direct prompt injection

Data limitation (minimize · obfuscate · short retain)

membership inference
development-time data leak

Monitoring & rate limiting

model exfiltration
denial-of-wallet (DoW)

Data quality control & supply chain management

data poisoning
supply-chain model poisoning

Encode model output

output containing conventional injection

Injection I/O handling + oversight & least model privilege

direct prompt injection

Pairs, 5 minutes — this is the Chapter 3 master table in miniature. Remind them the six governance controls sit over every row.

Exam Question

Solvex Energy gives an agentic AI assistant access to its billing system: it can read contracts, adjust tariffs, and issue credits. Which combination best limits the effects of manipulation?

A) A system prompt forbidding credits above €200, plus AI transparency

B) Continuous validation plus explainability

C) Task-scoped least model privilege plus human approval for high-risk actions

D) Data minimization plus short retention

Answer: C

"Agent" + "can act" always pulls toward privilege plus approval: bound what actions are possible, insert a human before the irreversible ones. A relies on probabilistic alignment — not access control. B detects and explains after the fact; D limits data, not actions.

Day 3 of 3

Day 3 — Testing, Privacy & Compliance, Exam Review

Topic 4 · Topic 5 · cross-topic recap and exam strategy

Attack your own system before someone else does.

Topic 4 · Subtopic 4.1 · 5% of exam

Threats Scope

Three testing strategies, two threat trios

A control is a hypothesis until someone tries to break it.

CLASS POLL

Vote Now!

Fenwick Insurance's annual penetration test formally included its claims-fraud model and its customer chatbot — and found no serious issues. Is the AI now security-tested?

A) Yes — both AI systems were in scope

B) No — pentesting covers only one of three testing strategies

C) Yes — provided the models also pass their accuracy checks

D) No — AI systems cannot be pentested at all

Answer: B) No — pentesting covers only one of three testing strategies

The pentest exercised the conventional stack — servers, APIs, access control. Nobody validated model behavior against acceptance criteria, and nobody simulated attacks on the models themselves. One leg of a three-legged stool.

Hands up per option before revealing — expect a split between A and B. ~3 min.

⚔

Why AI Security Testing Exists

Assess the resilience of an AI system by reproducing realistic attacks against it in a controlled environment.

Not accuracy, not compliance, not feature checks — if there is no adversary in the answer, it is not AI security testing.

Three Testing Strategies

Conventional security testing

Pentest the stack around the AI: infrastructure, APIs, access control, supply chain

Model performance validation

Benign test set vs acceptance criteria — detects permanently altered behavior and drift

AI security testing

The security part of AI red teaming: simulate attacks, probe safeguards, play the adversary

AI Security Testing vs Model Performance Validation

AI security testing

Adversarial by design
Hostile prompts, crafted inputs, bypass attempts
Success = finding the weakness before a real attacker does

vs

Model performance validation

Benign by design
Representative test set, accuracy vs acceptance criteria
Security use: spotting behavior permanently altered by poisoning

How to tell them apart

Hostile inputs → security testing. Accuracy on normal data → performance validation. "Simulates attacks" vs "acceptance criteria" — the stem words give it away.

What to Test For: Two Trios

Predictive AI

Evasion — crafted inputs mislead the task
Model exfiltration — the stolen replica becomes an attack oracle
Model poisoning — development-time manipulation of data, pipeline, or supply chain

vs

Generative AI

Prompt injection — manipulative instructions, direct or indirect
Sensitive data disclosure in output — the model is coaxed into revealing secrets
Insecure output handling — output carrying conventional injection hits downstream systems

Three vs three, no overlap

Anchor on the paradigm first: does the system predict or generate? Then check the threat against the right trio.

Exam Question

Atlas Freight runs a predictive delivery-delay model and a generative customer-support chatbot. Beyond conventional security testing, which threat belongs on each test plan?

A) Delay model: prompt injection — Chatbot: evasion

B) Delay model: evasion — Chatbot: sensitive data disclosure in output

C) Delay model: insecure output handling — Chatbot: model exfiltration

D) Delay model: sensitive data disclosure in output — Chatbot: model poisoning

Answer: B

Evasion sits in the predictive trio (evasion, model exfiltration, model poisoning); sensitive data disclosure in output sits in the generative trio (prompt injection, sensitive data disclosure in output, insecure output handling). Option A cross-wires the paradigms exactly backwards — the classic trap.

Topic 4 · Subtopic 4.2 · 2.5% of exam

AI Security Testing Strategies

The eight-step approach — iterative by design

One clean round proves only that the easy attacks failed.

BUILD IT TOGETHER

Name the 8 Testing Steps

Call them out! Each click reveals the next — the exam asks what comes next.

1Define objectives & scope

2Understand the AI system

3Identify potential threats

4Develop attack scenarios

5Test execution

6Risk assessment

7Prioritization & risk mitigation

8Validation of fixes — then iterate

Point out the shape: scoping before attacking, validation loops back into testing. ~4 min.

Test Execution Done Right

Production parity

Same model version, prompts, tools, permissions, configuration as production

Run it multiple times

GenAI output is non-deterministic — one clean run may be luck

Attack the real route

Through the system API with all filters — including untrusted-data paths, to simulate indirect prompt injection

Positive testing

Benign inputs must still work — don't drown legitimate users in false positives

🔄

Blocked Is Not Resilient

After blocked inputs: add variation — synonyms, encodings, formatting changes — and rerun.

If a paraphrase sails through, your safeguard matched surface features, not intent. In essence: an evasion attack on your own detection.

Exam Question

Juniper Telecom's red team finds four vulnerabilities in its support chatbot. Engineering implements all mitigations, and the project manager closes the engagement. According to the general AI security testing approach, what is wrong?

A) Nothing — risk mitigation is the final step

B) Risk assessment should have been repeated before mitigation

C) Validation of fixes is missing — the system must be retested post-remediation

D) The engagement should have closed with a compliance report

Answer: C

Step 8 is validation of fixes: implementing a mitigation is not evidence it works until the previously successful attacks are rerun. That loop back into testing is why the approach is iterative. A confuses "action taken" with "risk reduced"; D swaps in a compliance goal that isn't the purpose of security testing.

Topic 5 · Subtopic 5.1 · 5% of exam · ~2 questions

Privacy and AI Security

The privacy definition, AI-specific concerns, the nine privacy principles

You can encrypt everything perfectly and still violate privacy.

"Privacy is personal data protection plus respect for further individual rights."

— The two-part definition the exam wants, word for word

Exam Tip

Distractors shrink privacy down to confidentiality. "We encrypted it" never answers a question about consent, purpose, or erasure.

AI Privacy Has Two Parts

1 · The security part

Confidentiality & integrity of personal data in training data, model input, and output — plus integrity of model behavior where wrong behavior can hurt individuals.

2 · The rights part — not security
Further individual rights under privacy regulations: use limitation, consent, fairness, transparency — the rights to know, correct, object, erase.

Perfect encryption cannot fix part 2.

Why AI Makes Privacy Harder

📦 Data intensity

Data-hungry systems: extra risk at collection and retention, many sources, many legal constraints.

⌛ Long retention

Retraining keeps training data around for years — a direct tension with storage limitation.

🔧 Engineering exposure

AI teams routinely handle production personal data during development — conventional dev teams rarely do.

🕵 Model attacks

Model inversion, membership inference, sensitive data disclosure through use — the model itself becomes a leak channel.

⚖ Discriminating decisions

Decisions about people can discriminate — and outputs can trigger privacy-invading actions.

🌐 Federated learning
The AI-native mitigation: train in iterations across separate sites, so raw data never leaves its source.

Assess Before, Respond After

Privacy impact assessment (PIA / DPIA)

Structured, up-front review of privacy risks. GDPR's DPIA is mandatory when processing is likely high-risk for individuals — training AI on personal data is a textbook trigger.

Privacy incident

Personal data leaks, is accessed without authorization, or is used beyond its purpose → incident response plus GDPR breach-notification duties.

Run the DPIA before the first record enters the data science environment.

The Nine Privacy Principles — Scenario Cues

Accuracy

A wrong data point drives a harmful automated decision.

Consent

Permission never asked, bundled, or impossible to withdraw.

Data minimization & storage limitation

More data, finer grain, or longer retention than the purpose needs.

Fairness & lawfulness

Unexpected handling, no legal basis, discriminatory effects.

Privacy rights

No way to access, correct, erase, or object.

Privacy by design

Privacy bolted on after launch. Companion: privacy by default.

Security & safeguards

Personal data unprotected in training data, environment, or I/O.

Transparency & explainability

Affected people cannot learn how the decision was made.

Use limitation & purpose specification
Data collected for one purpose, reused for another — THE top pattern.

🔄

The Top Exam Pattern

Data reused beyond the purpose it was collected for → use limitation & purpose specification.

"We already have the data" is never a justification — the principle constrains use, not just collection.

📱

"We Already Have the Data"

Large platforms collected phone numbers for multi-factor authentication — a security purpose.

The numbers were quietly reused for advertising and targeting
No new data was collected; storage stayed secure
Regulators sanctioned it as a serious violation anyway

The lesson

The purpose users agreed to was security, not marketing. Reuse beyond purpose is the violation — not collection.

🧠

Memorize This — Nine Principles, Alphabetical

A to P

Accuracy
Consent
Data minimization & storage limitation
Fairness & lawfulness
Privacy rights

P to U

Privacy by design
Security & safeguards
Transparency & explainability
Use limitation & purpose specification

"Counting note: EXIN's official list shows eight bullets — it merges privacy rights + privacy by design. Know the content either way."

SORT EXERCISE

Which Principle Is Violated?

Match each scenario to the privacy principle it violates.

Loyalty-card purchases, collected for rewards, now train an ad-targeting model

The privacy team first reviews the model two months after go-live

Consent hidden inside a 40-page terms of service

An erasure request arrives — no process exists to honor it

Data sharing is ON by default; users must hunt for the opt-out

An employer "asks" employees to volunteer health data

Income data collected for KYC checks feeds a loan-pricing model

Customers have no channel to see or correct their own record

Use limitation & purpose specification

Loyalty-card purchases → ad-targeting model
KYC income data → loan-pricing model

Privacy by design

Privacy review only after go-live
Protective settings not the default (privacy by default)

Consent

Bundled into the terms of service
Employer "asking" — genuine consent impossible

Privacy rights

No process for erasure requests
No way to access or correct your record

Pairs, 3 minutes. Ask for the decisive cue before revealing each bucket — "purpose switch" should come up unprompted.

Exam Question

Fjordline Ferries trains a no-show prediction model on booking data. All personal data is encrypted, access requires MFA, and every query is logged. A passenger asks which of her data the model used and requests erasure — Fjordline has no process to respond. What is the privacy situation?

A) Privacy is covered — strong security controls protect the personal data

B) Privacy is partly covered — data protection is handled, but further individual rights are not respected

C) Privacy is not affected — prediction models make no automated decisions

D) Privacy is covered, provided a DPIA was completed before training

Answer: B

Privacy is personal data protection plus respect for further individual rights. Encryption, MFA, and logging cover the first half only; transparency and erasure belong to the second half. A shrinks privacy to confidentiality — the classic distractor. D fails because a DPIA identifies risks; it does not discharge individual rights.

Topic 5 · Subtopic 5.2 · 7.5% of exam · ~3 questions

Compliance and Regulation

Four ISO/IEC standards, the EU AI Act, the GDPR, copyright

Regulations demand outcomes; standards give you the machinery — and the evidence.

The ISO Quad — Match Standard to Job

ISO/IEC 23894

AI risk management across the lifecycle. Hook: "ISO 31000, translated for AI."

ISO/IEC 27005

Information security risk management. Hook: "the risk engine behind 27001 — not AI-specific."

ISO/IEC 42001
AI management system (AIMS): governance, policies, roles, continual improvement. Hook: "42001 is to AI what 27001 is to infosec."

ISO/IEC 5338

AI lifecycle / MLOps processes. Hook: "engineering, not governance."

Matching trap

AI risk management → 23894. Information security risk management → 27005. "Management system" language always points to 42001, never to 5338.

EU AI Act — Four Risk Tiers, Top Down

1 · Unacceptable risk

Prohibited outright — social scoring, manipulation, real-time remote biometric identification in public spaces.

2 · High risk

Permitted with compliance obligations + ex-ante conformity assessment — résumé screening, medical devices, critical infrastructure.

3 · Limited risk

Permitted with transparency obligations — a chatbot must disclose that it is a bot.

4 · Minimal / no risk

Permitted without restrictions.

Two footnotes that score marks

Tiers are not mutually exclusive — one system can owe obligations from more than one tier. And the Act protects people, not company secrets: compliance ≠ secure.

CLASS POLL

Vote Now!

Kestrel Talent launches an AI system that screens résumés and shortlists candidates for interviews. Under the EU AI Act, this system is…

A) Prohibited — it makes decisions about people

B) High risk — permitted with compliance obligations and an ex-ante conformity assessment

C) Limited risk — it only needs to disclose that it is AI

D) Minimal risk — recruitment is not named in the Act

Answer: B) High risk — conformity assessment before market

Recruitment decides access to employment — the canonical high-risk example. "Prohibited" is the planted trap: prohibition is reserved for unacceptable-tier practices like social scoring.

Hands up per option before revealing — expect a split between A and B; use it to anchor "high risk, not prohibited".

GDPR × AI — Ten Friction Points (Know the Names)

1

Lawful basis

2

Purpose limitation

3

Data minimization vs model performance

4

Transparency and explainability

5

Automated decision-making and profiling

6

Operationalizing data-subject rights

7

Accuracy and fairness

8

Security and leakage

9

International transfers

10

Accountability and roles

These are compliance difficulties you manage — not the nine principles you violate.

BUILD IT TOGETHER

Name the 10 Copyright Mitigations

Call them out! Each click reveals the next.

1Mitigate disclosure of training data in output

2Comprehensive IP audits

3Clear legal framework & policies

4Ethical data sourcing

5Define ownership of AI-generated content

6Confidentiality & trade-secret protocols

7Employee training

8Compliance monitoring systems

9Response planning for IP infringement

10Licenses and/or warranties from AI suppliers

🏠

The Data-Sourcing Hierarchy

Safest and most ethical: create training data in-house. Licensed or permissioned data comes second.

"Publicly available" is not a license — that is exactly what the AI copyright lawsuits are about.

Exam Question

Bergamot Bank runs an ISO/IEC 27001 ISMS. The board now wants an equivalent organization-wide framework for AI: governance, policies, roles, controls, and continual improvement. Which standard should the bank adopt?

A) ISO/IEC 23894

B) ISO/IEC 27005

C) ISO/IEC 42001

D) ISO/IEC 5338

Answer: C

"Management system" language — governance, policies, roles, continual improvement — always points to ISO/IEC 42001, the AI management system (AIMS) and the AI analogue of 27001. A is the tempting distractor: 23894 is AI risk management, not a management system. 5338 is lifecycle engineering; 27005 is information security risk management.

Review · All topics

Review — Putting It All Together

The threat map, the frameworks in order, the traps, the strategy

40 questions · 90 minutes · 26 correct = pass. Let's make sure the traps don't work on you.

The Threat Map — One Last Look

Input threats (runtime use)

evasion (5 types)
direct & indirect prompt injection
sensitive data disclosure through use
model inversion · membership inference
model exfiltration
AI resource exhaustion — denial of service (DoS), denial-of-wallet (DoW), sponge attack

Development-time threats

data poisoning
direct development-time model poisoning
supply-chain model poisoning
development-time data leak
direct development-time model leak
source code/configuration leak

Runtime conventional threats

direct runtime model poisoning
direct runtime model leak
output containing conventional injection
input data leak
direct augmentation data leak
augmentation data manipulation

Classify in two steps: lifecycle stage first, asset second.

🧠

Speed Round 1 — The Ordered Frameworks

G.U.A.R.D. — fixed order

Govern Understand Adapt Reduce Demonstrate

"After Govern + Understand, the next step is Adapt."

Risk management — 4 steps, repeated

1 · Identify (threat modeling)
2 · Evaluate (likelihood × severity)
3 · Risk treatment (mitigate · transfer · avoid · accept)
4 · Risk communication & monitoring

"Controls only appear at risk treatment."

BUILD IT TOGETHER

Speed Round 2 — The Seven Layers of Prompt Injection Protection

Call them out in order! Each click reveals the next.

1Model Alignment

2Prompt Injection Defense

3Human Oversight

4Automated Oversight

5User-Based Privilege

6Intent-Based Privilege

7Just-In-Time Authorization

After the reveal, ask which layers prevent (1–2) and which limit blast radius (3–7). Weak alone, strong together.

🔁

Speed Round 3 — The Two 8-Step Sequences

AI security testing (order!)

1 · Define objectives & scope
2 · Understand the AI system
3 · Identify potential threats
4 · Develop attack scenarios
5 · Test execution
6 · Risk assessment
7 · Prioritization & risk mitigation
8 · Validation of fixes

Organizational implementation

1 · Organize control of AI
2 · Teach data obfuscation & minimization
3 · Extend supply-chain management
4 · Add AI assets & risks to the ISMS
5 · Teach DevSecOps
6 · Teach AI security controls
7 · Extend monitoring to AI-attack behavior
8 · Guardrails, oversight, least privilege

"Next-step questions pay for knowing the order, not just the members."

Trap Pair 1 — Model Inversion vs Membership Inference

Model inversion

Reconstructs training data the attacker never had
Inputs optimized against confidence scores
Result: approximate data

vs

Membership inference

Confirms whether a record the attacker already holds was in the training set
Excess confidence betrays membership
Result: one bit — yes/no

How to tell them apart

Reconstruct vs confirm — both feed on confidence scores. Bonus pair: model exfiltration harvests input–output pairs to build a replica (front door); a model leak steals the real parameter file (break-in).

Trap Pair 2 — Same Words, Different Lifecycle Stage

Development-time

Attacker's hands reach the engineering environment
direct development-time model poisoning
direct development-time model leak

vs

Runtime

Attacker's hands reach the live production system
direct runtime model poisoning
direct runtime model leak

Two more quick tells

DoW vs DoS: follow the harm — money drained → denial-of-wallet (DoW); service down → denial of service (DoS). Privacy principles vs GDPR challenges: rules you violate vs difficulties you manage.

SPOT THE MISTAKES

Find the 3 Errors

Work in pairs — this incident report mislabels three threats.

"An attacker harvested thousands of input–output pairs from our public API and trained a working replica of the model — a textbook direct runtime model leak. They then flooded the endpoint with compute-heavy sponge inputs, tripling our monthly cloud bill — a classic denial of service (DoS). Finally, instructions hidden in a supplier webpage our assistant retrieves made it exfiltrate data — direct prompt injection."

direct runtime model leak model exfiltration — a replica built from harvested input–output pairs uses the front door; a leak steals the actual parameter file.

denial of service (DoS) denial-of-wallet (DoW) — the named harm is cost, not availability; one sponge attack can cause both, so follow the harm.

direct prompt injection indirect prompt injection — the instructions ride in third-party content the application inserts, not in the user's own prompt.

Quick-Fire Next-Step Anchors

Govern + Understand done?

Next G.U.A.R.D. step: Adapt — AI threat modeling and AI testing live there.

Inventory finished?

Next: a risk analysis on it — the bare-minimum pair for AI security oversight.

Threat list finished?

Next: threat modeling → prioritized risks. Controls come only at risk treatment.

First test round blocked everything?

Add input variation — synonyms, encodings, formatting. Never declare victory.

Jailbroken API model?

The deployer's fix is an output validation layer — never "retrain the provider's model".

Data reused for a new purpose?

Use limitation & purpose specification — not consent, not minimization.

Exam Strategy — Four Rules

⚖ Pairing items

Judge each half independently — one right half never rescues a wrong one.

🔁 Next-step items

Know the order of every framework, not just its members.

⏰ 2:15 per question

40 questions in 90 minutes. Flag, move on, return — no question is worth five.

👥 Sibling options

If two options sound like siblings, one is the trap — recheck lifecycle stage and asset.

Gauntlet 1 — Threats (Pairing Item)

Saffron Health finds two incidents. Incident 1: crafted queries against the diagnosis API let a researcher reconstruct recognizable patient records from the training data. Incident 2: a contractor copied the model's parameter files from a production server. Which pair names the threats?

A) membership inference + model exfiltration

B) model inversion + direct runtime model leak

C) model inversion + model exfiltration

D) sensitive data disclosure through use + direct development-time model leak

Answer: B

Reconstructing training data from outputs is model inversion — membership inference would only confirm a known record's presence. Copying parameter files from production is a direct runtime model leak; model exfiltration (C's second half) builds a replica through queries, and D's second half puts the theft in the wrong lifecycle stage. Judge each half independently.

Gauntlet 2 — Organization (Next-Step Item)

Cobalt Mobility has inventoried all AI use, published AI policies with assigned responsibilities, and educated its engineers and security staff on which AI threats apply. Following G.U.A.R.D., what is the next step?

A) Demonstrate — collect evidence for management and regulators

B) Reduce — minimize sensitive data and limit model behavior

C) Adapt — extend the ISMS, threat modeling, and testing to AI

D) Understand — map the AI threat landscape

Answer: C

The inventory and policies complete Govern; threat education completes Understand. Next in the fixed sequence is Adapt — where AI threat modeling and AI security testing live. D is the trap for anyone who files threat education under a still-open Understand; A and B skip ahead in the order.

Gauntlet 3 — Controls (Ready-Made Models)

Orbita Travel builds its booking assistant on a hosted third-party language model accessed through an API. Users discover a jailbreak that makes the assistant produce offensive text. What is Orbita's most effective measure?

A) Require the provider to retrain the model with stronger alignment

B) Add an output validation layer in its own application

C) Move to self-hosting so the runtime controls return in-house

D) Rely on the hosting platform's monitoring to catch abuse

Answer: B

Application-level duties never transfer to the provider — output validation stays with the deployer, and it works immediately regardless of the model's alignment. A is the tempting distractor: only the provider can retrain, on their timeline, and alignment is probabilistic. C changes who hosts, not the application-level gap; D outsources a duty that is yours.

EXIN AI Security Professional

You're Ready — Good Luck!

40 questions · 90 minutes · 65% to pass

Next step: take the practice exams in the portal under exam conditions — 90 minutes, no notes — then review every explanation, right or wrong.

Point students to the portal: practice exams, flashcards, cheat sheet, study guide. Collect feedback before closing.