EXIN AI Security Professional
Condensed Study Guide — based on the OWASP AI Exchange (AISP.EN)
Exam Overview & Study Strategy
This is the condensed revision companion to the full textbook: one section per exam subtopic, with the facts, every framework list in full and in order, and the traps. Go back to the textbook whenever a bullet feels shaky.
The AISP exam tests at Bloom levels 2 and 3 — no pure recall, but you cannot apply a list you cannot recall. Three question shapes recur: scenario classification (wrong options are always adjacent concepts — classify by lifecycle stage and asset first), two-part pairing items (judge each half independently; eliminate any option where either half fails), and “what is the next step?” (free marks if you know each framework's order, not just its members).
Topic 1: AI Security in the Organization
15% of Exam1.1 Organizing AI Security
10% of exam · ~4 questionsThe five G.U.A.R.D. steps (Bloom 3 — know the order and what belongs where):
- Govern — general AI governance: AI inventory, policies, assigned responsibilities (increasingly a Chief AI Officer alongside the CISO), impact assessments, compliance checking, education. Home of the AI management system (AIMS) per ISO/IEC 42001.
- Understand — decide which threats actually apply per system; educate engineers and security staff; draw the responsibility line with suppliers.
- Adapt — reshape existing processes: extend the ISMS with AI assets and threats, add AI-specific threat modeling, AI security testing, and supply-chain management for data, models and hosting.
- Reduce — limit the impact of things going wrong (the model can always be wrong or manipulated): minimize and obfuscate sensitive data; cap model privileges, add guardrails and oversight.
- Demonstrate — evidence that safeguards work: transparency, test results, documentation — for management, regulators and clients.
- Responsible AI = the ethics/society/governance lens — fairness to people, accountability, oversight structures (boards, ethics committees). Trustworthy AI = the technical/operational lens — robustness, reliability, transparency, explainability (engineering). Tell: “ethics, societal impact, accountability” → responsible; “robustness, reliability, explainability” → trustworthy — even if the story contains the word “trust.”
- Shadow AI: staff use unsanctioned public tools regardless of bans. The effective fix is a secure, high-quality sanctioned alternative plus explicit risk communication — never a ban alone.
- Equation: AI security = threats to AI-specific assets + threats to all other assets. It extends the existing program; conventional tooling is necessary but not sufficient.
- Three novelties: (1) new assets — training data, augmentation data, the model, its input, its output; (2) new attack surface through legitimate use — just querying the model enables evasion, prompt injection, extraction, resource exhaustion; (3) new suppliers — data, ready-made models, model hosting.
- Why conventional tools miss AI attacks: they match structure and known-bad byte patterns, while adversarial inputs and poisoned records travel through fully legitimate channels — the harm is statistical, not signature-shaped. Zero model trust: assume the model can be misled, wrong, and leaky.
- Threat rhythm across the five assets: both data and the model face leaks (confidentiality) and poisoning (integrity) — the model in both lifecycle phases, development-time and runtime; input only leaks; output's danger is carrying injection onward. System prompts count as augmentation data.
G.U.A.R.D. = Govern → Understand → Adapt → Reduce → Demonstrate. Five AI-specific assets: training data, augmentation data, model, input, output.
“Next step” items love the Govern/Understand → Adapt transition: AI threat modeling and AI testing are Adapt (process change), not Understand (identify + teach) and not Reduce (impact-limiting measures). And a chatbot reciting training records is sensitive data disclosure through use, not a development-time data leak — nothing was stolen from the engineering environment.
Go deeper: full textbook section.
1.2 Threat Modeling & Agentic AI
5% of exam · ~2 questionsFour risk management steps (in order, repeated regularly and on change):
- Identify — threat modeling turns the threat catalogue into concrete, prioritized risks for your system (does it apply? how could it happen here? what impact?).
- Evaluate — estimate likelihood × severity; plot on a heatmap to prioritize.
- Risk treatment — per risk: mitigate (implement controls — most common), transfer (outsource/insure), avoid (change plans, maybe drop AI there), accept (knowingly bear it).
- Risk communication & monitoring — risk register with owner, severity, status; inform stakeholders; verify treatments work.
- Threat modeling is the bridge between the generic threat catalogue and concrete prioritized risks. Controls only enter at step 3 — picking controls from a raw threat list skips two steps.
- Agentic AI takes action: it invokes tools, triggers other agents, operates across systems. Four amplifiers: agents act, are autonomous (agent triggers agent; working memory becomes an attack vector), behave complexly, and are multi-system.
- Excessive agency = more capability than the task needs; blast radius = how much one compromise can damage — the central design question.
- Data-theft triangle: agent processes attacker-controlled data + can access sensitive data + can send data out. Remove any one leg and that attack collapses.
- Agentic AI amplifies existing threats rather than creating new ones: an injection that once produced an embarrassing answer now moves money.
- Six agentic AI controls: traceability · memory-integrity protection · prompt-injection defenses · rule-based guardrails · least model privilege · human oversight.
Identify → Evaluate → Risk treatment → Risk communication & monitoring; treatments mitigate / transfer / avoid / accept; the six agentic controls; and the rule never build access control on GenAI — authorization must be deterministic, outside the model. Maxim: convenience is the enemy of security.
“We listed all applicable threats — what next?” → threat modeling to derive prioritized risks, not buying controls. Cancelling the AI feature entirely = avoidance (not acceptance — you did not proceed; not transfer — nothing shifted to a third party).
Go deeper: full textbook section.
Topic 2: AI Security Threats
37.5% of Exam2.1 Input Threats
17.5% of exam · ~7 questionsThe heaviest subtopic on the exam. Input threats (threats through use) need nothing but access to the input channel of a deployed model.
- Six generic runtime controls (know which control starves which attack): monitor use · rate limiting · model access control · anomalous input handling · unwanted input series handling · obscure confidence (rich confidence scores are the feedback many attacks feed on).
- Evasion — crafting input (adversarial examples) that misleads the model into doing its task incorrectly; integrity of model behavior. Evasion manipulates the data the model works on; prompt injection manipulates instructions. Variants: targeted/untargeted, digital/physical, diffuse perturbation vs localized patch.
Five evasion types, ordered by attacker knowledge:
- Zero-knowledge evasion (black/closed-box) — no internals at all; query the live target and read responses (decision-based; score-based if confidence is returned).
- Partial-knowledge evasion (gray-box) — some internals (architecture, data type) sharpen the search; the most realistic case.
- Perfect-knowledge evasion (white/open-box) — full architecture + parameters + weights; compute the minimal perturbation directly via gradients.
- Transfer attack — craft adversarial examples on a surrogate model (similar model, own-trained copy, stolen or exfiltrated replica), then apply to the target. Zero queries against the target during the search — rate limits and detection never see it coming.
- Evasion after poisoning — the trigger of a backdoor planted earlier via poisoning; no search needed, the weakness was manufactured, not natural.
- Search-frustrating controls (rate limiting, series detection, obscure confidence) are useless against transfer attacks and evasion after poisoning; per-input controls (evasion input handling, input distortion, adversarial training) still work.
- Direct prompt injection — the user is the attacker, socially engineering the model (role-play, “ignore previous instructions,” encodings, multi-turn steering, coaxing out the system prompt). A jailbreak specifically defeats the supplier's alignment. Harm usually flows back to the attacker only — unless there is shared context or the model can act.
- Indirect prompt injection — a third party hides instructions in content the application inserts into the prompt (compromised webpage, white-on-white CV text, pixels in an image); the typing user is an innocent victim. OWASP's analogy: remote code execution. Most dangerous in agentic systems.
- Controls: prompt injection I/O handling (both kinds); input segregation is dedicated to the indirect kind (delimit untrusted data + instruct the model to ignore instructions inside it — a partial mitigation only).
The seven layers of prompt injection protection (in order — weak alone, strong together):
- Model Alignment — train and instruct the model to behave (flaw: models stay easy to mislead).
- Prompt Injection Defense — sanitize, filter, detect (flaw: arms race, false negatives → assume some injections succeed).
- Human Oversight — a human approves selected critical actions (flaw: cost, delay, approval fatigue).
- Automated Oversight — logic detects suspicious activity in context and intervenes on its own (flaw: reactive).
- User-Based Privilege — the agent gets exactly the served user's rights, in advance (flaw: users hold more than the task needs).
- Intent-Based Privilege — task-scoped rights, in advance (a summarizer reads mail, cannot send).
- Just-In-Time Authorization — rights granted at the moment, per subtask; the sub-agent touching untrusted content holds nothing to abuse.
- Layers 1–2 prevent and detect; layers 3–7 control blast radius. Cues: self-operating detector = 4; a person approving = 3; pre-assigned by user identity = 5, by task = 6; granted at the moment = 7.
- Sensitive data disclosure through use — three mechanisms: disclosure in output (the model emits memorized training/input data — once inside a model, source access rights can no longer be enforced; last-line control: sensitive output handling); model inversion (attacker reconstructs training data they never had by optimizing inputs to maximize confidence signals); membership inference (attacker already holds a record and reads tell-tale extra confidence to learn one bit: was it in the training set?).
- Root enabler: overfitting — dedicated development-time control is a small model (plus regularization); runtime: obscure confidence, rate limiting.
- Model exfiltration — harvest input–output pairs through the legitimate interface (or logs/traffic) and train a functional replica. Synonyms: model stealing, model extraction, model theft through use. Three consequences: IP theft at API prices; the replica is a perfect-knowledge surrogate for developing evasion attacks offline; safety protections can be stripped from a copy. Countermeasures: the generic input-threat controls plus model watermarking — which proves ownership after theft, never prevents it.
- AI resource exhaustion — input causes depletion of funds or availability via frequency, volume, or content. The sponge attack (energy-latency attack) crafts single inputs that maximize computation: a denial-of-wallet (DoW) attack on the budget that can simultaneously become denial of service (DoS). Two dedicated controls: DoS input validation · limit resources (per-input cap).
Five evasion types in order: zero-knowledge · partial-knowledge · perfect-knowledge · transfer attack · evasion after poisoning. Two injection kinds: direct · indirect. Seven layers in order: Model Alignment · Prompt Injection Defense · Human Oversight · Automated Oversight · User-Based Privilege · Intent-Based Privilege · Just-In-Time Authorization. Disclosure trio: disclosure in output · model inversion · membership inference. Exfiltration = input-threat controls + model watermarking. Sponge → DoW ± DoS.
Sibling pairs decide this subtopic. Where did the search happen — live target (zero-knowledge) vs surrogate (transfer attack)? Who typed the instruction — user (direct) vs third-party content (indirect)? What does the attacker end with — data they lacked (inversion) vs a verdict on data they brought (membership inference)? How did the model leave — harvested I/O (exfiltration) vs break-in (direct model leak)? Which harm — money (DoW) vs availability (DoS)?
Go deeper: full textbook section.
2.2 Development-Time Threats
10% of exam · ~4 questionsAttacks strike while the system is built: the engineering environment and its supply chain. Poisoning breaks integrity of behavior; leaks break confidentiality. Always ask first: which lifecycle stage?
- Data poisoning — manipulating data the model learns from, to change its behavior. Five entry points: supplier · transit · storage · preparation · operation (live-collected training data — fake reviews fed to the next retraining: the attacker touches the running system, but the threat is development-time because the harm happens when the data is learned from).
- Two flavors: sabotage (regular inputs misbehave — surfaces quickly) vs targeted/backdoor (a hidden trigger + attacker-chosen label; normal behavior otherwise, so it sails through every test set). Runtime exploitation of the planted trigger = evasion after poisoning. Backdoors are hard to find: no code to review, parameters unreadable, testing uses normal cases.
- Anti-poisoning controls: development security, data segregation, supply-chain management; more train data, data quality control, train data distortion, poison robust model, adversarial training, model ensemble (a deviating member flags poisoning).
- Direct development-time model poisoning — the attacker's hands on the model or its build machinery inside the development environment: edited weights, swapped model files, serialized files executing code on load (deserialization), altered pipeline code or configuration, compromised libraries.
- Supply-chain model poisoning — a supplied trained model manipulated before you integrated it (poisoned supplier data or tampered parameters — invisible from your side). Covers ready-made models used as-is and models you fine-tune: fine-tuning on clean data does not reliably erase a backdoor (a transfer learning attack). Your controls: supply-chain management (provenance, checksums/signatures, supplier vetting, artifact scanning), plus poison robust model, model ensemble, continuous validation.
- Broad model poisoning has exactly three types: data poisoning · direct development-time model poisoning · supply-chain model poisoning. A supplied poisoned dataset = data poisoning; a supplied poisoned model = supply-chain model poisoning.
- Three development-time leaks — named by the stolen asset: development-time data leak (training/test data — the development environment holds real data, unlike conventional dev); direct development-time model leak (parameters, weights, architecture — upgrades a zero-knowledge attacker to perfect-knowledge); source code/configuration leak (the pipeline recipe — IP even with no data or weights exposed).
- Leak controls: development security, data segregation, confidential compute, data limitation. Federated learning nuance: decreases the risk of all data leaking, increases the risk of some data leaking (more environments).
Five poisoning entry points: supplier · transit · storage · preparation · operation. Sabotage vs backdoor. Three model-poisoning types. Three leaks by asset: data / model / source code+configuration.
“Which is NOT data poisoning?” — the odd one out is a crafted input fooling a deployed model (evasion, runtime). Data vs model poisoning: what did the attacker touch — learning data (“records, labels, dataset”) vs the model or machinery (“weights, parameters, code, configuration, library”)? Copied = leak; changed = poisoning. Weight files copied from storage = a leak, never exfiltration.
Go deeper: full textbook section.
2.3 Runtime Conventional Security Threats
10% of exam · ~4 questionsA live AI system is an ordinary IT system: every conventional attack applies (SQL injection, stolen credentials, man-in-the-middle, ransomware) against the confidentiality, integrity, and availability of all assets. Defenses are the existing Secure Development Program and Security Program. The exam probes the AI-specific consequences:
- Breach of production storage → steal the model, then run inversion and membership inference offline against the copy.
- Tamper with the model undetected — parameters are opaque binary; no code review will show it.
- Change behavior without touching the model — hack the augmentation-data store.
- Direct runtime model poisoning — altering the parameters of the live model (or compromising its input/output logic, e.g. man-in-the-middle rewriting responses). Integrity breach: still runs, no longer behaves as validated. Controls: runtime model integrity (access control, checksums, encryption, Trusted Execution Environment) · runtime model input/output integrity.
- Direct runtime model leak — stealing parameters from production (executables, memory, storage, transfer — including side channels: timing, power, electromagnetic emissions). Double harm: IP theft + a private rehearsal copy. Controls: runtime model confidentiality · model obfuscation.
- Lifecycle discipline: same attack, different crime scene — “training environment / before deployment” → development-time; “production server / live system” → runtime. Reconstructed purely via API queries → neither: model exfiltration.
- Output containing conventional injection — model output carries a classic payload (XSS, commands) that a downstream component trusts and executes; the exfiltration variant packs data into a URL or markdown image the client fetches. Rule: treat model output as untrusted input. Control: encode model output.
- Input data leak — the user's input (often deeply sensitive: strategy, code, health) exposed at rest or in transit by conventional attack — debug logs, breached buckets. Aggravators: metadata identifies users; cloud inference processes input in clear text; provider logging and subpoenas; RAG context rides inside prompts. Control: model input confidentiality (encryption, access control, minimal retention) + send less in the first place.
- Augmentation data (RAG fragments, system prompts) — double identity: like training data for integrity, like any sensitive store for confidentiality. Usually lives in a vector database — a copy of sensitive content outside its regular protection, and embeddings themselves can be mined to recover text. Retrieval must respect the asking user's access rights; anything retrievable can end up in output.
- Direct augmentation data leak — attacker reads the store (confidentiality; behavior unchanged). Augmentation data manipulation — attacker writes to the store (integrity; behavior steered without touching model or user input). Controls: augmentation data confidentiality · augmentation data integrity.
The runtime six: direct runtime model poisoning · direct runtime model leak · output containing conventional injection · input data leak · direct augmentation data leak · augmentation data manipulation — plus conventional attacks on any asset.
Break-in to the vector store that changes answers = augmentation data manipulation; malicious instructions arriving through normally ingested content = indirect prompt injection. Prompts stolen from a log (“at rest, in transit”) = input data leak; “the model revealed it in its answer” = sensitive data disclosure through use.
Go deeper: full textbook section.
Topic 3: AI Security Controls
27.5% of Exam3.1 Governance
12.5% of exam · ~5 questions- Good AI security governance = clear policies, defined roles, and risk management spanning secure development, deployment, and monitoring. Never a single tool, a one-off audit, or one lifecycle stage.
- Bare minimum for AI security oversight: (1) inventory current AI use and AI ideas; (2) risk analysis on that inventory (genuine threat modeling → applicable threats, needed controls, owners). Inventory first, then analysis.
Six general governance controls (learn as a set):
- AI Program — govern AI as an organization: inventory of initiatives, impact analysis, responsibilities, AI literacy.
- Security Program — the ISMS covers the whole AI lifecycle and AI-specific assets and threats.
- Secure Development Program — build security in during construction.
- Development Program — engineering best practice (versioning, testing, documentation) applied to AI; its objective is broader than security: maintainable, portable, reliable, future-ready systems. The odd one out.
- Check Compliance — AI-relevant and privacy laws inside compliance management; compliance as a driver — but laws protect people, not your company secrets, so compliance is a floor, never a risk analysis.
- Security Education — teach engineers, dev teams, and security staff the threats and controls — including which controls are yours vs the supplier's.
- Coverage: governance controls are overarching — all AI threats, all lifecycle stages (they set the conditions under which every other control is selected and run).
- Eight organizational implementation steps (in order): organize control of AI → teach data obfuscation & minimization → extend supply-chain management (data, models, cloud) → add AI assets & risks to the ISMS repository → teach DevSecOps → teach AI security controls for model engineering & runtime → extend monitoring to AI-attack behavior → implement model guardrails, oversight and least privilege.
- Ready-made models: the provider owns model-level, development-time controls (training-data hygiene, poisoning defenses, its own environment and supply chain, base alignment). Your assurance over that side is supply-chain management (vetting, authenticity, published testing).
- The deployer (you) owns application-level controls always: what data you send, output validation and encoding, prompt injection handling, user and model privileges, oversight in your context.
- Self-hosted ready-made model: you also own everything at runtime (infrastructure, monitoring, rate limiting, access control). Hosted (API): the provider takes the hosting platform's runtime controls — but application-level duties never transfer.
- Hosted extra: your input is processed in clear text inside the provider's environment. Due-diligence questions: the model's actual run location, retention rules, logging practices, and whether your input feeds training. Contracts: the supplier is responsible for training-data content, not automatically accountable — check licenses and warranties.
Six governance controls: AI Program · Security Program · Secure Development Program · Development Program · Check Compliance · Security Education. Bare minimum = inventory + risk analysis. Provider = model-level, development-time; deployer = application-level (+ all runtime when self-hosted).
A jailbroken third-party API model — the deployer's own remedy is an output validation layer, never “retrain the model” or “strengthen the system prompt.” Governance coverage questions: any answer fencing governance into one threat, tool, or phase is wrong — the word is overarching.
Go deeper: full textbook section.
3.2 Limiting Sensitive Data
7.5% of exam · ~3 questionsReduce the data attack surface in three dimensions: amount, variety, duration — development-time and runtime, across training data, augmentation data, inputs, outputs, and logs.
- Data minimization — remove fields and records the application does not need (verify by experiment; models tolerate reduced features surprisingly well; propagate upstream deletions; keep identifiers only to honor deletion, outside training).
- Allowed data — remove data prohibited for the purpose (e.g., personal data without consent for reuse). Minimization is a necessity test; allowed data is a permission test.
- Short retention — delete or anonymize once no longer needed or when law requires; minimization along the time axis (every extra month is extra exposure).
- Obfuscate training data — when data must stay, make it less recognizable: masking/tokenization, pseudonymization (reversible — the mapping table needs separate protection; weaker than anonymization), differential-privacy noise, distributed learning (PATE), encryption where possible. Caveats: costs model performance; reduces, never eliminates, re-identification risk.
- Discretion — minimize access to technical details (papers, blogs, verbose output, error messages) that would help attackers pick and tune attacks; balance against AI transparency: open about properties, quiet about internals.
- Why it works: “what is not there cannot be leaked — or manipulated.” Absent data cannot be disclosed, reconstructed (inversion), inferred (membership inference), or corrupted (poisoning) — the benefit lands on confidentiality and integrity, and a stolen minimized dataset simply contains less to lose. One of the two blast-radius levers.
Five data-limitation controls (OWASP tags): data minimization (DATA MINIMIZE) · allowed data (ALLOWED DATA) · short retention (SHORT RETAIN) · obfuscate training data (OBFUSCATE TRAINING DATA) · discretion (DISCRETE) — against amount, variety, duration.
Unneeded sensitive fields (card numbers with no predictive value) → delete (minimization), don't tokenize — obfuscation is only for data you must keep, and the mapping table becomes its own target. “Encryption makes retention safe” is wrong: retained encrypted data is still a target, and retention time still sets the exposure window.
Go deeper: full textbook section.
3.3 Limiting Unwanted Behavior
7.5% of exam · ~3 questionsUnwanted behavior needs no attacker: bad or insufficient training data, drift and staleness, engineering mistakes, and feedback loops (model output contaminating future training — model collapse). This family limits the effects, whatever the cause.
- Oversight — watch behavior (human or automated) and respond: output detection rules, grounding checks by a second model, rollback, escalation. Human weaknesses: cost, slowness, missing context, approval fatigue.
- Least model privilege — minimize what the model can do and reach: act with the served user's rights, scope permissions to the task, and never place authorization inside GenAI instructions. The heart of agentic safety.
- Model alignment — constrain behavior inside the model (training choices, fine-tuning, RLHF, system prompts). Probabilistic and manipulable — one layer only.
- AI transparency — inform users about system properties (how it roughly works, training data, expected accuracy, residual risks) so they calibrate reliance; simplest form: disclosing that an AI is involved (the EU AI Act requires it for chatbots).
- Continuous validation — frequently test behavior against a test set to catch permanent change (poisoning) or drift; respond by investigating, rolling back, restricting use, adding oversight, disabling. Limit: backdoors are designed to pass it.
- Explainability — explain individual decisions; counters overreliance and helps assessors.
- Unwanted bias testing — measure unwanted bias; doubles as a security sensor — a sudden bias shift can be the first symptom of poisoning.
- Canonical answers: best agent safeguard = task-scoped least privilege + human approval for high-risk actions. Transparency = system properties; explainability = a specific decision.
- Blast radius has two levers — minimize/obfuscate data (3.2) and limit behavior (3.3). Risk payoff: a reduced attack surface, a bounded blast radius, lower legal/reputational exposure. Performance payoff: outputs stay on-scope, calibration and consistency improve, hallucinations drop, compute waste and incident counts fall — reliability and resource efficiency.
- Two mitigation failure modes: overreliance (users trust too much → transparency/explainability) and excessive agency (engineers grant too much → least model privilege).
Seven behavior-limitation controls: oversight · least model privilege · model alignment · AI transparency · continuous validation · explainability · unwanted bias testing.
“The system prompt forbids it” is never sufficient — that is alignment, not access control. And continuous validation is not the defense against backdoor poisoning (triggers never appear in test sets); that needs data quality control and poisoning-specific defenses.
Go deeper: full textbook section.
Topic 4: AI Security Testing
7.5% of Exam4.1 Threats Scope
5% of exam · ~2 questions- Primary purpose of AI security testing: assess the resilience of an AI system by reproducing realistic attacks in a controlled environment. Not accuracy measurement, not compliance demonstration, not functional verification — an answer option without an adversary in it cannot be AI security testing.
Three testing strategies:
- Conventional security testing — pentesting the surrounding stack: servers, APIs, authentication, network, supply chain. Mandatory, but blind to model-specific attacks.
- Model performance validation — a benign test set against acceptance criteria; security use: detecting permanently altered behavior (data or model poisoning); also correctness and drift. Maps to continuous validation.
- AI security testing — the security part of AI red teaming: simulate attacks, stress safeguards, attempt bypasses.
- Threats to test for beyond conventional testing (3 vs 3, no overlap) — predictive AI: evasion · model exfiltration (the replica becomes an attack oracle) · model poisoning (data, pipeline, model, or training supply chain). Generative AI: prompt injection (direct + indirect) · sensitive data disclosure in output · insecure output handling (output carries a conventional payload processed downstream).
- These trios are the key threats, not the whole landscape — full scope comes from risk analysis during scoping.
Three strategies: conventional security testing · model performance validation · AI security testing. Predictive trio: evasion · model exfiltration · model poisoning. Generative trio: prompt injection · sensitive data disclosure in output · insecure output handling.
Hostile vs benign inputs is the discriminator: attack simulation → security testing; accuracy on normal data → performance validation (monthly precision/recall reports are not security testing). Watch cross-wiring: “prompt injection against a fraud classifier” or “evasion against a chatbot” pairs the threat with the wrong paradigm.
Go deeper: full textbook section.
4.2 AI Security Testing Strategies
2.5% of exam · ~1 questionThe eight-step general AI security testing approach (in order):
- Define objectives & scope — goals aligned with organizational, compliance, and risk requirements.
- Understand the AI system — model, use cases, deployment scenarios.
- Identify potential threats — threat modeling, attack surface, threat actors (start from the 4.1 trios).
- Develop attack scenarios — concrete scenarios and edge cases for this system.
- Test execution — run the attacks, manual or automated.
- Risk assessment — document vulnerabilities; weigh severity against likelihood.
- Prioritization & risk mitigation — remediation plan, implement mitigations, compute residual risk.
- Validation of fixes — retest post-remediation; a fix that fails sends you back through the cycle.
- The process is iterative twice over: findings feed back within a cycle, and the whole cycle reruns regularly (at minimum before each deployment).
- Execution discipline: build a base attack set, tailor it to the system's own sensitive data and unacceptable outputs, pair each attack with an automated detection, test through the production route with production-equivalent configuration, run multiple times (non-deterministic output), and keep positive testing (benign inputs still work).
- After a clean first round, add variation algorithms — synonyms, encodings, formatting changes. A blocked attack proves the phrasing was recognized, not that the threat is handled; varying inputs is, in essence, an evasion attack on your own detection mechanisms.
Eight steps, in order: Define objectives & scope · Understand the AI system · Identify potential threats · Develop attack scenarios · Test execution · Risk assessment · Prioritization & risk mitigation · Validation of fixes. “What comes next” is the question format here.
“First round blocked all simple malicious inputs — what next?” → add input variation and retest. Wrong answers: declare the system resilient, jump to validation of fixes (nothing failed yet), or switch to performance validation.
Go deeper: full textbook section.
Topic 5: Privacy and Compliance in AI Security
12.5% of Exam5.1 Privacy and AI Security
5% of exam · ~2 questions- Privacy = personal data protection + respect for further individual rights (know, correct, erase, object). Perfect encryption can still violate privacy — the second half is about purpose and rights, not secrecy.
- AI privacy splits in two: (1) security threats and controls protecting personal data everywhere it lives (training/test data, input, output) plus integrity of model behavior where wrong outputs hurt people; (2) non-security obligations — use limitation, consent, fairness, transparency, accuracy, and the rights to correct, object, erase.
- Why AI makes privacy harder: data intensity · long retention of training data (retraining) · exposure in the engineering environment (real data, unlike conventional dev) · model attacks that extract training data (inversion, membership inference, disclosure through use) · discriminating decisions · privacy-invading actions triggered by output. AI-native mitigation: federated learning — train in iterations across sites so raw data never leaves its source.
- PIA/DPIA: structured up-front assessment; a DPIA is mandatory under GDPR for likely-high-risk processing — training on personal data is a textbook trigger. A safeguard failure is a privacy incident with breach-notification duties.
Nine privacy principles (alphabetical, with scenario cues):
- Accuracy — an incorrect record leads to a damaging automated decision about a person.
- Consent — no valid permission: never asked, bundled, unwithdrawable, or coerced (employer–employee).
- Data minimization & storage limitation — more data, finer grain, or longer retention than the purpose needs.
- Fairness & lawfulness — handling people's data in unexpected ways, without legal basis, or with unjustified adverse effects.
- Privacy rights — no way to access, correct, erase, or object.
- Privacy by design — privacy bolted on after launch instead of engineered in (companion: privacy by default — most protective settings out of the box).
- Security & safeguards — personal data unprotected in training sets, the engineering environment, or I/O.
- Transparency & explainability — people affected by an algorithmic decision have no way to see its logic or inputs.
- Use limitation & purpose specification — data collected for one purpose reused for another. The top exam pattern: whenever data crosses from its collection purpose to a new one (KYC data → marketing model, MFA phone numbers → advertising), this is the answer.
The nine principles above. Counting note: EXIN's official concept list bundles privacy rights and privacy by design together, showing eight bullets — the content is identical either way.
For data-reuse scenarios, consent and data minimization are the planted distractors — check the purpose first; “we already have the data” is never a justification. And any option reducing privacy to “keeping data secret” is wrong by definition.
Go deeper: full textbook section.
5.2 Compliance and Regulation
7.5% of exam · ~3 questionsThe four ISO/IEC standards (a matching exercise):
- ISO/IEC 23894 — AI risk management guidance across the AI lifecycle, aligned with ISO 31000.
- ISO/IEC 27005 — information security risk management supporting ISO/IEC 27001 — not AI-specific; extend it to AI attack surfaces.
- ISO/IEC 42001 — the AI management system (AIMS): governance, policies, roles, controls, continual improvement. 42001 is to AI what 27001 is to information security.
- ISO/IEC 5338 — AI system lifecycle processes: AI engineering / MLOps for data and model development, deployment, operation.
- Matching cues: “AI risk management” → 23894; “information security risk management” → 27005; “management system / governance / continual improvement” → 42001; “lifecycle / engineering / MLOps” → 5338.
EU AI Act — four risk tiers (perspective: harm to people — safety, health, fundamental rights):
- Unacceptable risk — prohibited outright: social scoring, manipulation, real-time remote biometric identification in public spaces. No compliance path exists.
- High risk — permitted with compliance obligations + ex-ante conformity assessment (before market): products under safety legislation (medical devices), recruitment/employment, critical infrastructure, law enforcement.
- Limited risk (specific transparency risk) — permitted with transparency obligations: a chatbot must disclose it is a bot.
- Minimal / no risk — permitted without restrictions.
- Tiers are not mutually exclusive — a high-risk conversational system also owes limited-tier disclosures. Blind spot: the Act protects people, not company secrets — compliant is not the same as secure. GenAI addition: providers must disclose copyrighted training material.
Ten GDPR friction points (challenges AI creates under the GDPR):
- Lawful basis — genuinely difficult to identify one that squarely covers model training on personal data.
- Purpose limitation — service data keeps getting pulled toward training, a new purpose.
- Data minimization vs model performance — the law wants less, accuracy wants more.
- Transparency and explainability — “meaningful information about the logic” vs black-box models.
- Automated decision-making and profiling — rights to human intervention, contest, information.
- Operationalizing data-subject rights — erasure baked into a model may mean retraining without the person's data.
- Accuracy and fairness — incorrect or biased data → unjustified adverse decisions.
- Security and leakage — inversion, membership inference, and disclosure make the model itself a breach channel.
- International transfers — data, GPUs, and providers cross borders casually.
- Accountability and roles — controller vs processor across supplier/deployer chains is rarely obvious.
Ten copyright risk-mitigation strategies:
- Mitigate disclosure of sensitive training data in output — stop the model emitting absorbed protected material.
- Comprehensive IP audits — inventory all IP touching the system: datasets, code, systems, interfaces.
- Clear legal framework and policies — written, enforced AI-use policies aligned with IP law.
- Ethical data sourcing — in-house, permissioned, or properly licensed public-domain data. Ranking: in-house creation is the most ethical and safest (full provenance); licensed second; “publicly available” is not a license.
- Define ownership of AI-generated content — decide up front who owns output and how it may be used.
- Confidentiality and trade-secret protocols — handling rules that preserve trade-secret status.
- Employee training — staff know the AI IP policies and the cost of infringement.
- Compliance monitoring systems — ongoing checks for potential infringement by the system.
- Response planning for IP infringement — a prepared plan for claims.
- Licenses and/or warranties from AI suppliers — contractual cover for intended (and future) use, with supplier obligations for infringement claims.
The ISO quad: 23894 AI risk · 27005 infosec risk · 42001 AIMS · 5338 lifecycle. The four tiers top-down: unacceptable · high · limited · minimal. All ten GDPR friction points and all ten copyright strategies, with the in-house-first sourcing rule.
The two risk standards get swapped — AI risk → 23894, information security risk → 27005; “management system” language always points to 42001, never 5338. Recruitment/résumé AI is high risk (conformity assessment), not prohibited. And keep the nine privacy principles (rules you violate) apart from the ten GDPR challenges (difficulties you manage): “which principle is violated?” is answered from the nine-principle list.
Go deeper: full textbook section.
Final Exam-Week Checklist
The lists to run through once daily in the final week
- G.U.A.R.D. in order — Govern, Understand, Adapt, Reduce, Demonstrate — and one anchored transition: after Govern + Understand comes Adapt (AI threat modeling, AI testing, supply chain).
- Risk management — Identify → Evaluate → Risk treatment → Risk communication & monitoring; treatments mitigate/transfer/avoid/accept; threat modeling bridges catalogue → prioritized risks.
- Five evasion types by attacker knowledge — zero-knowledge, partial-knowledge, perfect-knowledge, transfer attack, evasion after poisoning. Classify by where the search happened, never by what was attacked.
- Prompt injection — direct (user types it) vs indirect (rides in third-party content; extra control = input segregation); the seven layers in order: Model Alignment, Prompt Injection Defense, Human Oversight, Automated Oversight, User-Based Privilege, Intent-Based Privilege, Just-In-Time Authorization.
- Disclosure trio + exfiltration + exhaustion — disclosure in output vs model inversion (reconstruct unknown) vs membership inference (confirm known); model exfiltration = I/O harvesting (+ model watermarking = post-theft proof); sponge attack → denial-of-wallet (money) and/or denial of service (availability).
- Development-time vs runtime naming — development-time: data poisoning (5 entry points), direct development-time model poisoning, supply-chain model poisoning, three leaks (data / model / source code+configuration). Runtime: direct runtime model poisoning, direct runtime model leak, output containing conventional injection, input data leak, direct augmentation data leak, augmentation data manipulation.
- Six governance controls — AI Program, Security Program, Secure Development Program, Development Program, Check Compliance, Security Education — overarching (all threats, all stages); bare minimum = inventory + risk analysis; eight rollout steps in order.
- Provider vs deployer — provider: model-level, development-time controls; deployer: application-level always (output validation, injection handling, privileges, what data you send) + the full runtime stack when self-hosted.
- Five data-limitation controls — data minimization, allowed data, short retention, obfuscate training data, discretion. Mantra: what isn't there can't be leaked or manipulated. Delete first; obfuscate only what must stay.
- Seven unwanted-behavior controls — oversight, least model privilege, model alignment, AI transparency, continuous validation, explainability, unwanted bias testing. Agent answer: task-scoped least privilege + human approval. A system prompt is never access control.
- Testing — three strategies (conventional, performance validation, AI security testing / red teaming); predictive trio vs generative trio; eight steps in order; blocked attacks → add input variation.
- Nine privacy principles — top pattern: data reused beyond its collection purpose → use limitation & purpose specification. Privacy = data protection + further individual rights.
- AI Act pyramid + ISO quad — unacceptable (prohibited) / high (ex-ante conformity assessment — recruitment, medical) / limited (disclose the bot) / minimal (free); tiers stack. 23894 AI risk · 27005 infosec risk · 42001 AIMS · 5338 lifecycle.
- Ten GDPR friction points and ten copyright strategies — recite both lists in full; sourcing rule: in-house data creation is the safest and most ethical.
- Quick-fire confusables — inversion ≠ membership inference; direct ≠ indirect injection; exfiltration ≠ model leak; DoW ≠ DoS; data ≠ model poisoning; augmentation leak ≠ manipulation; responsible ≠ trustworthy AI; security testing ≠ performance validation; transparency ≠ explainability; principles ≠ GDPR challenges. When two options sound like siblings, return to the scenario's lifecycle stage and asset.