EXIN AI Security Professional
Based on the OWASP AI Exchange
Three days from threat map to exam-ready.
Day 1 — Foundations
Intro & exam format · AI security in the organization · Input threats
Day 2: development-time & runtime threats + controls · Day 3: testing, privacy & compliance, review.
Exam At A Glance
Where the Marks Are
Anatomy of an AISP Question
Judge each half independently; eliminate every option where either half is wrong. Usually one survives.
2:15 Per Question
90 minutes for 40 questions — time is not the enemy.
Read each scenario once, carefully, instead of three times in a hurry.
Your Study Toolkit
Learn the Map
65% of the exam is threats + controls — learn the map, the rest follows.
Every threat pairs with its controls; master that pairing and Topics 1, 4 and 5 fall into place.
Organizing AI Security
G.U.A.R.D. · responsible vs trustworthy · assets & threats
The board says “get AI security organized.” Where do you start — and in what order?
G.U.A.R.D. — Five Steps to Organize AI Security
What Each Step Involves
Memorize This
The five steps
“G.U.A.R.D. your AI — in exactly this order.”
The anchor transition
- Understand identifies and teaches
- Adapt changes your processes
- Reduce limits the impact
After Govern and Understand comes Adapt — the exam's favorite “next step.”
Name the 5 G.U.A.R.D. Steps
Call them out in order — each click reveals the next.
Responsible AI vs Trustworthy AI
Responsible AI
- Ethics, society, governance
- Fairness, societal impact, accountability
- Owned by boards & ethics committees
Trustworthy AI
- Technical & operational qualities
- Robustness, reliability, transparency, explainability
- Owned by engineering & operations
Ask what the concern attaches to: people and governance structures → responsible; measurable system qualities → trustworthy.
AI Security vs Conventional Security
Why Your WAF Waves It Through
AI attacks manipulate meaning and statistics — through fully legitimate channels.
An adversarial example is a well-formed request; a poisoned training record carries no malware signature.
Exam Question
Kestrel Mobility has an AI inventory with named owners, its engineers are trained on the threats relevant to each system, and it has just finished extending its ISMS, threat modeling, security testing and supplier management to cover AI. According to G.U.A.R.D., what comes next?
Five Assets — and How They Break
Shadow AI: The Pasted Source Code
2023: engineers at a major electronics maker pasted confidential source code into a public chatbot while debugging.
- Sensitive input flowed straight to an external provider
- A written ban existed — bans alone don't work
- Usage just moves out of sight, where no control applies
Provide a sanctioned, secure, good-quality alternative — and make the risks of unsanctioned tools explicitly clear.
Whose Step Is It Anyway?
Sort each activity into its G.U.A.R.D. step.
- Build the AI inventory
- Appoint a Chief AI Officer
- Train engineers on applicable threats
- Decide which controls are the supplier's job
- AI-specific threat modeling in the SDLC
- Supplier contracts cover model hosting
- Strip customer identifiers from training data
- Human sign-off on model-triggered actions
- Publish test evidence for the regulator
Threat Modeling and Agentic AI Risks
Four risk steps · the bridge to priorities · agents on a leash
A threat catalogue is not a risk list — and an agent is not just a chatbot.
Risk Management in Four Steps
Four Ways to Treat a Risk
Threat Modeling Is the Bridge
From threat catalogue to concrete, prioritized risks — for your system.
Three questions per threat: does it apply here? How could it realistically happen? What would the impact be?
Vote Now!
Halvora Insurance keeps its claims chatbot running but buys a cyber-insurance policy covering losses from model manipulation. Which risk treatment option is this?
Agents Amplify
Agentic AI doesn't create new threats so much as amplify existing ones.
Agents act, run autonomously, behave unpredictably and span systems — the injection that once embarrassed you now moves money.
The Compromised-Agent Chain
Corvid Retail's invoice-handling agent runs under one shared service account reaching ticketing, HR records and the payment gateway. A supplier email arrives with instructions hidden in white text.
- Indirect prompt injection: untrusted data read as instructions
- One shared account → actions chain across every connected system
- Excessive agency: far more capability than the task needs
Data theft needs three ingredients: attacker-controlled data in, access to sensitive data, a way out. Remove any one and the attack collapses.
Six Controls for Agentic AI
Instructions are not enforcement — the right input overrides them, so authorization lives in the architecture, outside the model. And the maxim behind shared accounts: convenience is the enemy of security.
Exam Question
Marrowgate Legal investigates two incidents on its contract-review assistant. Threat 1: an attacker with write access to the document store edited the precedent texts the assistant retrieves, changing its advice. Threat 2: a generated summary contained hidden JavaScript that executed in the client portal's browser. Which pair is correct?
Input Threats
Evasion · prompt injection · disclosure · exfiltration · resource exhaustion
The single heaviest section of your exam — ~7 of 40 questions walk through this one door.
One Door, Five Threat Families
Six Generic Controls — Shared by All Input Threats
Evasion
Crafted input — an adversarial example — misleads the model into performing its task incorrectly.
Evasion manipulates the data the model works on; prompt injection manipulates the instructions.
Zero-Knowledge Evasion: The Query-Probing Story
The model is a closed box — the attacker knows nothing and asks it everything.
- No code, no weights, no architecture
- Thousands of designed inputs hit the live model
- Each response redraws the map of the decision boundary
- Returned confidence scores make the search far faster
Your logs fill with probing traffic — this is where rate limiting and series detection bite.
Partial-Knowledge and Perfect-Knowledge
Transfer Attack: The Surrogate
The attacker never queries your model during the search.
- Builds or obtains a surrogate model — a copy or approximation of the target
- Crafts adversarial examples on the surrogate, at leisure
- Similar task → similar decision boundaries → attacks carry over
- Zero queries against the target while crafting
Rate limiting, series detection and obscured confidence never see the attack being developed.
Evasion After Poisoning: The Planted Key
The odd one out — the weakness was manufactured, not found.
- Training data was poisoned earlier — a development-time attack
- The poison plants a backdoor: trigger input → attacker-chosen output
- At runtime the attacker simply presents the trigger
- No search needed — they planted the key themselves
Planted at development-time, cashed in at runtime. Full poisoning story: subtopic 2.2.
Five Evasion Types — a Ladder of Attacker Insight
Zero-Knowledge vs Transfer Attack
Zero-knowledge
- Search runs on the live target
- Thousands of probing queries
- Logs fill with traffic
- Rate limits & series detection bite
Transfer attack
- Search runs on a surrogate
- Zero target queries while crafting
- Logs stay clean
- Only per-input defenses catch it
Ask where the experimentation happens: probing the live target → zero-knowledge; crafting on a stand-in → transfer attack.
Controls Bite the Search — Not the Example
Rate limiting, series detection and obscured confidence frustrate probing.
Against transfer attacks and evasion after poisoning, only per-input defenses still work: evasion input handling, input distortion, adversarial training.
Which Evasion Type Is It?
Sort each mini-scenario into one of the five evasion types.
- Thousands of mutated images to the live API — search on the target
- Leaked architecture paper sharpens the search — some internals known
- Gradients from the full weight file — compute, don't probe
- Stickers crafted on their own similar model — a surrogate
- Planted trigger presented at runtime — no search needed
Exam Question
Drava Telecom's spam filter blocks a marketing firm's mailings. The firm never queries the filter. Instead, it crafts rewordings on an open-source spam model it runs locally — and the rewritten mailings then slip past Drava's filter. Which evasion type is this?
Direct vs Indirect Prompt Injection
Direct prompt injection
- The user typing is the attacker
- Jailbreaks, role-play, "ignore previous instructions"
- Result flows back to the attacker
Indirect prompt injection
- A third party attacks; the user is a victim
- Instructions hide in content the application inserts — webpage, CV, image
- Dedicated control: input segregation
Trace the channel: typed by the user → direct. Riding inside inserted third-party content → indirect.
Jailbreak
A direct prompt injection aimed at defeating the supplier's alignment and safety training.
Two routes in: abuse competing objectives — helpfulness overrides safety — or use inputs the safety training doesn't recognize, like unusual encodings.
Recognize the Forms, Recognize the Carriers
Name the Seven Layers of Prompt Injection Protection
Call them out, in order! Each click reveals the next.
Every Layer Has a Flaw
Vote Now!
Talvik Energy's outage-report agent receives, in advance, read-only access to the sensor archive — because writing reports requires reading, never sending or deleting. Which protection layer is this?
Model Inversion vs Membership Inference
Model inversion
- Attacker starts with nothing
- Optimizes inputs to chase confidence signals
- Reconstructs approximations of training data
- Gain: data they never had — a recognizable face
Membership inference
- Attacker already holds the record
- Tell-tale extra confidence betrays membership
- Gain: one bit — in or out
- One bit can reveal a diagnosis
Inversion = unknown data out of the model. Membership inference = known data held up against the model. Both feed on confidence indications.
The Disclosure Trio
Overfitting Is the Root Cause
A model with too much capacity memorizes individual records — which can then be reconstructed or recognized.
Paired controls: small model at development-time; obscure confidence at runtime — starve both attacks.
Model Exfiltration: From Q&A to Replica
Pellucid Insurance notices one account sweeping its pricing API — 900,000 methodical quote requests.
- Harvested input–output pairs become a manufactured training set
- A new model trained on them replicates the original
- The replica = a perfect-knowledge surrogate for attacking you
- Weeks later: adversarial tricks work with zero visible probing
Model stealing · model extraction · model theft through use. Countered by the generic input-threat controls plus one dedicated control →
Model Watermarking = Post-Theft Proof
A hidden marker proves a surfaced copy derives from your model — supporting ownership claims and legal action.
It does not prevent the theft. Prevention comes from access control, rate limiting, monitoring and series detection.
Find the 3 Errors
Work in pairs — a colleague wrote this in the risk register.
breaks into production storage harvests input–output pairs through normal use — breaking in and copying the file is a direct runtime model leak, not exfiltration.
prevents this theft proves ownership after the theft — watermarking enables post-theft verification; it stops nothing.
reconstruct records they never possessed confirm whether a record they already hold was in the training set — reconstruction is model inversion.
AI Resource Exhaustion
Content Is the AI Twist
Exhaustion can come from frequency, volume — or the content of a single input.
Conventional DoS thinking counts requests. One cleverly built sponge input costs as much as a flood — so cap resources per input, not just per actor.
Memorize This
Five evasion types
"none → some → all → surrogate → planted"
Seven protection layers
"Most Prompts Hide An Unwelcome Instruction, Justifiably"
Exam Trigger Phrases — Input Threats
Exam Question
Quellhaus Bank runs a face-recognition entry system and a credit-scoring API. Incident 1: a journalist submits a specific customer's photo and concludes, from the unusually high confidence returned, that the photo was in the training set. Incident 2: a competitor scripts two million varied requests to the scoring API and trains a working copy from the answers. Which pair is correct?
Day 2
Development-time & runtime threats, then the controls that answer them
Yesterday: the organization and the input door. Today: attacks on the build pipeline and the live system — then Topic 3's control catalog.
Development-Time Threats
Poisoning attacks integrity — leaks attack confidentiality
The attacker strikes while you build: your data, your pipeline, your supply chain. ~4 questions.
Data Poisoning
Manipulating the data a model learns from, to change the model's behavior.
Whoever controls the training data controls the behavior — no need to touch the model or the code at all.
Five Entry Points — Same Threat
The Trigger Sticker
Cintra Logistics' parcel scanner waves through any box bearing a small violet sticker. Targeted poisoning planted a backdoor.
- A few poisoned samples: subtle trigger pattern + attacker-chosen label
- Perfect behavior on everything else — including your whole test set
- Later, the adversary simply shows the trigger
- No code to review; parameters mean nothing to the eye
Exploiting the planted trigger is evasion after poisoning (2.1): planted development-time, triggered at runtime.
Sabotage vs Targeted (Backdoor) Poisoning
Sabotage
- Degrades the model for regular inputs
- Fraud detection simply stops working
- Normal traffic misbehaves → surfaces quickly
Targeted / backdoor (Trojan)
- Hidden trigger + attacker-chosen label
- Normal behavior otherwise — passes every test
- Far more dangerous: designed for your blind spot
Detectability is the divider: sabotage announces itself; a backdoor hides until its trigger appears.
Direct Development-Time Model Poisoning
The Backdoor That Fine-Tuning Missed
Ondine Diagnostics downloads an open-source clinical model from a public hub and fine-tunes it on its own clean records.
- Manipulated before integration — at the supplier or in transit; invisible from your side
- Fine-tuning on clean data does not reliably erase a backdoor
- Supplied model used for further training = a transfer learning attack
- Your data, pipeline, people: all clean
Provenance, checksums & signatures, scan artifacts before loading, poison robust model, model ensemble, continuous validation.
Data Poisoning vs Model Poisoning
Data poisoning
- Training data manipulated
- The machinery works as designed — it faithfully learns corrupted material
- Trigger words: records, labels, dataset
Model poisoning
- The model or its engineering elements manipulated
- Weights, pipeline code, configuration, libraries
- Direct (your environment) or supply-chain (supplier's model)
Ask what the attacker's hands touched: learning data → data poisoning; the model or its machinery → model poisoning.
Vote Now!
An audit at Sorrel Analytics finds a compromised Python package in the training pipeline: on every run it silently nudges certain model weights. The training data was never touched. What is the threat?
Three Development-Time Leaks — the Asset Decides the Name
A Model Leak Upgrades the Attacker
With a private copy, a zero-knowledge attacker becomes a perfect-knowledge one.
Evasion and inference attacks get rehearsed offline — no rate limits, no monitoring, no detection in the way.
Where Poisoning Enters the Lifecycle
Exam Question
Vireo Health downloads a pre-trained triage model from a public hub and fine-tunes it on its own carefully validated records. Months later, red teamers find one odd token sequence that reliably produces dangerous advice. Vireo's data, pipeline, and staff all check out clean. What is the threat?
Runtime Conventional Security Threats
Old attacks, new consequences
A live AI system is still an IT system — every conventional attack works here too. ~4 questions.
The Technique Is Old — the Consequences Are AI-Specific
SQL injection, stolen credentials, ransomware: none of them care that a neural network is inside.
Steal the model → run inference attacks offline. Tamper with parameters → invisible to code review. Hack augmentation data → change behavior without touching the model.
Direct Runtime Model Poisoning vs Direct Runtime Model Leak
Direct runtime model poisoning
- Live parameters altered — or the model's I/O logic compromised
- Integrity breach: runs, but off-spec
- Controls: runtime model integrity · I/O integrity
Direct runtime model leak
- Live parameters copied — storage, memory, even side channels
- Confidentiality breach: IP theft + offline rehearsal copy
- Controls: runtime model confidentiality · model obfuscation
Altered = poisoning (integrity) · copied = leak (confidentiality). Replicated purely by querying the API? Neither — that is model exfiltration.
The Script in the Transcript
Juniper Airlines' support assistant writes answers straight into the console. A prankster makes it output hidden script — which runs in the next agent's browser.
- Model output carried a conventional attack (cross-site scripting)
- Victim: the downstream component that trusts the output
- Variant: data packed into a markdown image URL — exfiltrated on render
- Payload arrives via prompt injection, or emerges on its own
Decades-old lesson: treat model output as untrusted input. Control: encode model output.
Input Data Leak
Input Data Leak vs Sensitive Data Disclosure Through Use
Input data leak
- Breach in storage or on the wire
- Log file, misconfigured bucket, intercepted connection
- The model is never touched
Sensitive data disclosure through use
- The model's own answer reveals the data
- Memorized training data or confidential context
- Breach happens through the input–output channel
Locate the breach: "log file", "at rest", "in transit" → input data leak. "The model revealed…" → disclosure through use.
Direct Augmentation Data Leak vs Augmentation Data Manipulation
Direct augmentation data leak
- Attacker reads: vector database dumped, retrieval traffic sniffed
- Confidentiality breach
- Behavior does not change
Augmentation data manipulation
- Attacker writes: planted chunks steer every future prompt
- Integrity breach
- Data poisoning's logic, transplanted to runtime data
A copy of sensitive content outside its regular protection — and embeddings can be mined back into text. Read = leak · write = manipulation.
Find the 3 Errors
Work in pairs — an intern drafted this incident summary.
indirect prompt injection output containing conventional injection — the model is the delivery vehicle; the victim is the downstream component that processes the output.
its output can be trusted downstream treat model output as untrusted input — encode model output — alignment never guarantees clean output.
data poisoning augmentation data manipulation — the vector database is a runtime asset feeding prompts, not training data.
Development-Time or Runtime?
Sort each incident by lifecycle stage — then name the threat.
- Relabeled fraud records — data poisoning
- Weights from the laptop — direct development-time model leak
- Stolen scripts & config — source code/configuration leak
- Exposed prompt logs — input data leak
- Edited production parameters — direct runtime model poisoning
- Planted vector chunk — augmentation data manipulation
Topic 2 Recap — Every Threat, Mapped
Name Any Threat in Two Questions
1. Which lifecycle stage — development-time or runtime? 2. Which asset — data, model, input, output, or augmentation data?
Answer both and the threat name follows. Tomorrow's controls walk the same map from the defender's side.
Exam Question
Ostmark Credit suffers two incidents in one week. Incident 1: an intruder on a production host copies the scoring model's parameters from memory. Incident 2: a misconfigured debug proxy exposes months of customers' prompts to the internet. Which pair is correct?
Governance
Six controls, eight rollout steps, provider vs deployer
AI security starts at the top — not in the code.
Topic 3: Three Families of General Controls
What "Good AI Security Governance" Means
Clear policies, defined roles, and risk management — spanning secure development, deployment, and monitoring.
Never a single tool, a one-off audit, or one lifecycle stage.
The Bare Minimum
1. Make an inventory of current AI use — including ideas in the pipeline. 2. Perform a risk analysis on it.
You cannot protect what you do not know you have.
The Six General Governance Controls
AI PROGRAMSEC PROGRAMSEC DEV PROGRAMDEV PROGRAMCHECK COMPLIANCESEC EDUCATEThe Near-Twins: Development vs Secure Development
Development Program
- Lifecycle program for AI work
- General engineering best practice: versioning, testing, documentation
- Objective: maintainable, portable, reliable, future-proof systems
- Security is one benefit among several
Secure Development Program
- Development processes that build security in
- Addresses risks while the system is constructed, not after
- Objective: reduce security risks during development
- Security is the whole point
Read the objective. Engineering quality with security as a side benefit → Development Program. Security built into construction → Secure Development Program.
Coverage: One Word — Overarching
The general governance controls apply to all AI threats and all lifecycle stages.
Any answer that fences them into one threat, one tool, or one phase is a trap.
The 8 Organizational Implementation Steps
Call them out! Each click reveals the next.
Ready-Made Models Change the Question
Not just "which controls do we need?" — "who implements which controls?"
A ready-made model is trained — possibly hosted — by a third party. Provider: model-level, development-time. You: application-level.
Self-Hosted vs Hosted Ready-Made Model
Self-hosted
- Supplier: development-time, model-level controls — training-data hygiene, base alignment
- You: everything at runtime — infrastructure, monitoring, rate limiting, access control, output validation, privileges, oversight
- Data stays inside your environment
Hosted (API)
- Supplier also runs the platform: hosting security, its monitoring and rate limiting
- You keep application-level controls: what data you send, output validation, injection handling, privileges, oversight
- Your input leaves your environment — in clear text
Ownership follows who operates the layer — not who authored the model.
The Jailbroken Tutor
Quillow, a language-learning app on a hosted LLM API, is jailbroken into producing offensive replies. Who owns the fix?
- Base alignment belongs to the provider — report the jailbreak
- Quillow cannot retrain or fine-tune someone else's model
- "Stronger system prompt" = more of what just failed
Add an output validation layer: check model responses against rules — and a filtering model where needed — before they ever reach a learner.
Hosted Model Due Diligence
Your Duties Never Transfer
No provider can decide what data you send, whether to trust the output, or which privileges your users and model get.
Hosting shifts infrastructure work — application-level controls stay with the deployer.
Provider or Deployer?
A hosted API model. Whose job is each control?
- Training-data hygiene
- Base model alignment
- Hosting platform security
- Development environment security
- Decide what data is sent to the model
- Output validation & encoding
- User & model privileges
- Oversight of behavior in your context
Limiting Sensitive Data
Five controls that shrink the data attack surface
The cheapest data to defend is the data you never keep.
Shrink the Data Attack Surface
Three dimensions: amount · variety · duration.
Fewer records, fewer kinds, kept for less time — development-time and runtime, from training data to inputs, outputs, and logs.
The Five Data-Limitation Controls
DATA MINIMIZE — remove fields and records the application doesn't needALLOWED DATA — remove data prohibited for this purpose ("may we use it at all?")SHORT RETAIN — remove or anonymize once no longer needed; minimization along the time axisOBFUSCATE TRAINING DATA — mask, tokenize, pseudonymize, add differential-privacy noise to what must stayDISCRETE — minimize access to technical details attackers could useDelete vs Disguise
Data minimization
- Deletes data you never needed
- AI models tolerate reduced features better than intuition suggests
- Nothing left = nothing to steal
Obfuscate training data
- Transforms data you must keep
- Masking, tokenization, pseudonymization, calibrated noise
- Reduces re-identification risk — never eliminates it
Delete first; obfuscate only what you cannot delete. And note: pseudonymization is reversible (a mapping table exists) — weaker than anonymization.
The benefit lands on confidentiality and integrity: nothing to disclose, nothing to corrupt. "Encryption makes retained data safe anyway" is the distractor — retained data is still a target.
Exam Question
Vantora Retail trains a churn model on customer records that still contain full bank account numbers — which have no predictive value. Which control should be applied, and how?
Limiting Unwanted Behavior
Seven controls that cap what misbehavior can reach
You can't prevent every cause — so limit the effects.
The Seven Behavior-Limitation Controls
OVERSIGHTLEAST MODEL PRIVILEGEMODEL ALIGNMENTAI TRANSPARENCYCONTINUOUS VALIDATIONEXPLAINABILITYUNWANTED BIAS TESTINGA System Prompt Is Not Access Control
"Never refund above €500" in a prompt is a suggestion to a text predictor — prompt injection walks straight through it.
Authorization lives outside the model: task-scoped least model privilege, plus human approval for high-risk actions.
AI Transparency vs Explainability
AI transparency
- About the system: how it roughly works, its data, expected accuracy, residual risks
- Lets users calibrate reliance and decide what to share
- Simplest form: saying an AI is involved at all
Explainability
- About one decision: why the model produced this output
- Builds justified trust, counters overreliance
- Helps security assessors judge the model's risks
System properties → transparency. One specific decision → explainability.
Find the 3 Errors
Work in pairs — each sentence hides one.
system prompt as access control authorization must live outside the model — a prompt is probabilistic model alignment; use least model privilege plus human approval.
validation catches backdoor poisoning backdoors are built to pass test sets — they trigger only on inputs that never appear in validation; you need data quality control and poisoning defenses.
transparency explains individual decisions that is explainability — transparency covers system properties, not single outputs.
The One-Dollar Chevrolet
Late 2023: users manipulate a US car dealership's chatbot into "agreeing" to sell a new Chevrolet for $1 — "no takesies backsies."
- The prompt injection was trivial
- But the bot could only generate text — not execute sales
- Damage: reputational, not financial
Rerun it with an agent that holds ordering authority and no privilege limits — the same trivial trick becomes a direct financial loss.
Blast Radius: Two Levers
1. Minimize & obfuscate data — less to lose. 2. Limit model behavior — less it can do.
Whatever goes wrong — poisoning, injection, or honest error — these levers cap what it costs.
Unwanted Behavior Needs No Attacker
Bad training data, drift and staleness, engineering mistakes, feedback loops — model collapse.
Attacks are one cause among several — so these controls are shared work, and they pay off even with zero adversaries.
Benefits Beyond Security
Memorize This: 6 · 5 · 7
Count the families
"Six govern, five slim, seven restrain."
The canonical answers
- Bare minimum = inventory + risk analysis
- Governance coverage = overarching
- Jailbroken API model → output validation layer
- Agent that acts → task-scoped least privilege + human approval
Threat → Primary Control
Match each Topic 2 threat to the control family that counters it first.
- membership inference
- development-time data leak
- model exfiltration
- denial-of-wallet (DoW)
- data poisoning
- supply-chain model poisoning
- output containing conventional injection
- direct prompt injection
Exam Question
Solvex Energy gives an agentic AI assistant access to its billing system: it can read contracts, adjust tariffs, and issue credits. Which combination best limits the effects of manipulation?
Day 3 — Testing, Privacy & Compliance, Exam Review
Topic 4 · Topic 5 · cross-topic recap and exam strategy
Attack your own system before someone else does.
Threats Scope
Three testing strategies, two threat trios
A control is a hypothesis until someone tries to break it.
Vote Now!
Fenwick Insurance's annual penetration test formally included its claims-fraud model and its customer chatbot — and found no serious issues. Is the AI now security-tested?
Why AI Security Testing Exists
Assess the resilience of an AI system by reproducing realistic attacks against it in a controlled environment.
Not accuracy, not compliance, not feature checks — if there is no adversary in the answer, it is not AI security testing.
Three Testing Strategies
AI Security Testing vs Model Performance Validation
AI security testing
- Adversarial by design
- Hostile prompts, crafted inputs, bypass attempts
- Success = finding the weakness before a real attacker does
Model performance validation
- Benign by design
- Representative test set, accuracy vs acceptance criteria
- Security use: spotting behavior permanently altered by poisoning
Hostile inputs → security testing. Accuracy on normal data → performance validation. "Simulates attacks" vs "acceptance criteria" — the stem words give it away.
What to Test For: Two Trios
Predictive AI
- Evasion — crafted inputs mislead the task
- Model exfiltration — the stolen replica becomes an attack oracle
- Model poisoning — development-time manipulation of data, pipeline, or supply chain
Generative AI
- Prompt injection — manipulative instructions, direct or indirect
- Sensitive data disclosure in output — the model is coaxed into revealing secrets
- Insecure output handling — output carrying conventional injection hits downstream systems
Anchor on the paradigm first: does the system predict or generate? Then check the threat against the right trio.
Exam Question
Atlas Freight runs a predictive delivery-delay model and a generative customer-support chatbot. Beyond conventional security testing, which threat belongs on each test plan?
AI Security Testing Strategies
The eight-step approach — iterative by design
One clean round proves only that the easy attacks failed.
Name the 8 Testing Steps
Call them out! Each click reveals the next — the exam asks what comes next.
Test Execution Done Right
Blocked Is Not Resilient
After blocked inputs: add variation — synonyms, encodings, formatting changes — and rerun.
If a paraphrase sails through, your safeguard matched surface features, not intent. In essence: an evasion attack on your own detection.
Exam Question
Juniper Telecom's red team finds four vulnerabilities in its support chatbot. Engineering implements all mitigations, and the project manager closes the engagement. According to the general AI security testing approach, what is wrong?
Privacy and AI Security
The privacy definition, AI-specific concerns, the nine privacy principles
You can encrypt everything perfectly and still violate privacy.
Distractors shrink privacy down to confidentiality. "We encrypted it" never answers a question about consent, purpose, or erasure.
AI Privacy Has Two Parts
Why AI Makes Privacy Harder
Assess Before, Respond After
The Nine Privacy Principles — Scenario Cues
The Top Exam Pattern
Data reused beyond the purpose it was collected for → use limitation & purpose specification.
"We already have the data" is never a justification — the principle constrains use, not just collection.
"We Already Have the Data"
Large platforms collected phone numbers for multi-factor authentication — a security purpose.
- The numbers were quietly reused for advertising and targeting
- No new data was collected; storage stayed secure
- Regulators sanctioned it as a serious violation anyway
The purpose users agreed to was security, not marketing. Reuse beyond purpose is the violation — not collection.
Memorize This — Nine Principles, Alphabetical
A to P
- Accuracy
- Consent
- Data minimization & storage limitation
- Fairness & lawfulness
- Privacy rights
P to U
- Privacy by design
- Security & safeguards
- Transparency & explainability
- Use limitation & purpose specification
"Counting note: EXIN's official list shows eight bullets — it merges privacy rights + privacy by design. Know the content either way."
Which Principle Is Violated?
Match each scenario to the privacy principle it violates.
- Loyalty-card purchases → ad-targeting model
- KYC income data → loan-pricing model
- Privacy review only after go-live
- Protective settings not the default (privacy by default)
- Bundled into the terms of service
- Employer "asking" — genuine consent impossible
- No process for erasure requests
- No way to access or correct your record
Exam Question
Fjordline Ferries trains a no-show prediction model on booking data. All personal data is encrypted, access requires MFA, and every query is logged. A passenger asks which of her data the model used and requests erasure — Fjordline has no process to respond. What is the privacy situation?
Compliance and Regulation
Four ISO/IEC standards, the EU AI Act, the GDPR, copyright
Regulations demand outcomes; standards give you the machinery — and the evidence.
The ISO Quad — Match Standard to Job
AI risk management → 23894. Information security risk management → 27005. "Management system" language always points to 42001, never to 5338.
EU AI Act — Four Risk Tiers, Top Down
Tiers are not mutually exclusive — one system can owe obligations from more than one tier. And the Act protects people, not company secrets: compliance ≠ secure.
Vote Now!
Kestrel Talent launches an AI system that screens résumés and shortlists candidates for interviews. Under the EU AI Act, this system is…
GDPR × AI — Ten Friction Points (Know the Names)
Name the 10 Copyright Mitigations
Call them out! Each click reveals the next.
The Data-Sourcing Hierarchy
Safest and most ethical: create training data in-house. Licensed or permissioned data comes second.
"Publicly available" is not a license — that is exactly what the AI copyright lawsuits are about.
Exam Question
Bergamot Bank runs an ISO/IEC 27001 ISMS. The board now wants an equivalent organization-wide framework for AI: governance, policies, roles, controls, and continual improvement. Which standard should the bank adopt?
Review — Putting It All Together
The threat map, the frameworks in order, the traps, the strategy
40 questions · 90 minutes · 26 correct = pass. Let's make sure the traps don't work on you.
The Threat Map — One Last Look
- evasion (5 types)
- direct & indirect prompt injection
- sensitive data disclosure through use
- model inversion · membership inference
- model exfiltration
- AI resource exhaustion — denial of service (DoS), denial-of-wallet (DoW), sponge attack
- data poisoning
- direct development-time model poisoning
- supply-chain model poisoning
- development-time data leak
- direct development-time model leak
- source code/configuration leak
- direct runtime model poisoning
- direct runtime model leak
- output containing conventional injection
- input data leak
- direct augmentation data leak
- augmentation data manipulation
Speed Round 1 — The Ordered Frameworks
G.U.A.R.D. — fixed order
"After Govern + Understand, the next step is Adapt."
Risk management — 4 steps, repeated
- 1 · Identify (threat modeling)
- 2 · Evaluate (likelihood × severity)
- 3 · Risk treatment (mitigate · transfer · avoid · accept)
- 4 · Risk communication & monitoring
"Controls only appear at risk treatment."
Speed Round 2 — The Seven Layers of Prompt Injection Protection
Call them out in order! Each click reveals the next.
Speed Round 3 — The Two 8-Step Sequences
AI security testing (order!)
- 1 · Define objectives & scope
- 2 · Understand the AI system
- 3 · Identify potential threats
- 4 · Develop attack scenarios
- 5 · Test execution
- 6 · Risk assessment
- 7 · Prioritization & risk mitigation
- 8 · Validation of fixes
Organizational implementation
- 1 · Organize control of AI
- 2 · Teach data obfuscation & minimization
- 3 · Extend supply-chain management
- 4 · Add AI assets & risks to the ISMS
- 5 · Teach DevSecOps
- 6 · Teach AI security controls
- 7 · Extend monitoring to AI-attack behavior
- 8 · Guardrails, oversight, least privilege
"Next-step questions pay for knowing the order, not just the members."
Trap Pair 1 — Model Inversion vs Membership Inference
Model inversion
- Reconstructs training data the attacker never had
- Inputs optimized against confidence scores
- Result: approximate data
Membership inference
- Confirms whether a record the attacker already holds was in the training set
- Excess confidence betrays membership
- Result: one bit — yes/no
Reconstruct vs confirm — both feed on confidence scores. Bonus pair: model exfiltration harvests input–output pairs to build a replica (front door); a model leak steals the real parameter file (break-in).
Trap Pair 2 — Same Words, Different Lifecycle Stage
Development-time
- Attacker's hands reach the engineering environment
- direct development-time model poisoning
- direct development-time model leak
Runtime
- Attacker's hands reach the live production system
- direct runtime model poisoning
- direct runtime model leak
DoW vs DoS: follow the harm — money drained → denial-of-wallet (DoW); service down → denial of service (DoS). Privacy principles vs GDPR challenges: rules you violate vs difficulties you manage.
Find the 3 Errors
Work in pairs — this incident report mislabels three threats.
direct runtime model leak model exfiltration — a replica built from harvested input–output pairs uses the front door; a leak steals the actual parameter file.
denial of service (DoS) denial-of-wallet (DoW) — the named harm is cost, not availability; one sponge attack can cause both, so follow the harm.
direct prompt injection indirect prompt injection — the instructions ride in third-party content the application inserts, not in the user's own prompt.
Quick-Fire Next-Step Anchors
Exam Strategy — Four Rules
Gauntlet 1 — Threats (Pairing Item)
Saffron Health finds two incidents. Incident 1: crafted queries against the diagnosis API let a researcher reconstruct recognizable patient records from the training data. Incident 2: a contractor copied the model's parameter files from a production server. Which pair names the threats?
Gauntlet 2 — Organization (Next-Step Item)
Cobalt Mobility has inventoried all AI use, published AI policies with assigned responsibilities, and educated its engineers and security staff on which AI threats apply. Following G.U.A.R.D., what is the next step?
Gauntlet 3 — Controls (Ready-Made Models)
Orbita Travel builds its booking assistant on a hosted third-party language model accessed through an API. Users discover a jailbreak that makes the assistant produce offensive text. What is Orbita's most effective measure?
You're Ready — Good Luck!
40 questions · 90 minutes · 65% to pass
Next step: take the practice exams in the portal under exam conditions — 90 minutes, no notes — then review every explanation, right or wrong.