| AI-specific asset | Can leak (confidentiality) | Can be manipulated (integrity) |
|---|---|---|
| Training data | development-time data leak; via use at runtime: sensitive data disclosure through use, model inversion, membership inference | data poisoning — the trained model learns wrong behavior |
| Augmentation data (incl. system prompts) | direct augmentation data leak | augmentation data manipulation |
| Model | direct development-time model leak · direct runtime model leak · model exfiltration (copy via queries) | direct development-time model poisoning · supply-chain model poisoning · direct runtime model poisoning |
| Input | input data leak | — |
| Output | — | output containing conventional injection — attacks downstream systems |
| Step | One-phrase gloss |
|---|---|
| 1 Govern | AI inventory, policies, responsibilities, compliance, education |
| 2 Understand | Which threats apply; educate engineers & security staff |
| 3 Adapt | Extend ISMS, threat modeling, testing, supply chain to AI |
| 4 Reduce | Minimize sensitive data; limit model behavior & impact |
| 5 Demonstrate | Evidence for management, regulators, clients |
| Step | Content |
|---|---|
| 1 Identify | Threat modeling turns the catalogue into concrete risks |
| 2 Evaluate | Likelihood × severity; prioritize on a heatmap |
| 3 Risk treatment | Mitigate · transfer · avoid · accept |
| 4 Risk communication & monitoring | Risk register; inform stakeholders; verify treatments |
| Strategy | What it does |
|---|---|
| Conventional security testing | Pentest the stack around the AI: infrastructure, APIs, access |
| Model performance validation | Benign test set vs acceptance criteria — catches permanently altered behavior & drift |
| AI security testing | Security part of AI red teaming — simulate realistic attacks to assess resilience |
| Test targets | Key threat trio |
|---|---|
| Predictive AI | evasion · model exfiltration · model poisoning |
| Generative AI | prompt injection · sensitive data disclosure in output · insecure output handling |
| Threat (locked name) | Mechanism | Impact | Primary control(s) |
|---|---|---|---|
| evasion | Crafted adversarial input misleads the model’s task | B | evasion-robust model · adversarial training · input distortion · evasion input handling |
| direct prompt injection | User’s own prompt bends the model (jailbreak, role-play, “ignore instructions”) | B | prompt injection I/O handling · the 7 layers |
| indirect prompt injection | Instructions hidden in third-party content the app inserts (webpage, CV, image) | B | input segregation · prompt injection I/O handling |
| sensitive data disclosure through use | Model emits memorized training/input data in its answers | L | sensitive output handling · data limitation |
| model inversion | Inputs optimized against confidence scores reconstruct training data | L | obscure confidence · small model · rate limiting |
| membership inference | Excess confidence betrays whether a known record was in training | L | obscure confidence · small model |
| model exfiltration | Harvest input–output pairs, train a functional replica | L | rate limiting · access control · series handling · model watermarking (proves, never prevents) |
| AI resource exhaustion | Frequency, volume or content (sponge attack) drains compute/funds → denial of service (DoS) / denial-of-wallet (DoW) | A $ | DoS input validation · limit resources · rate limiting |
| Threat (locked name) | Mechanism | Impact | Primary control(s) |
|---|---|---|---|
| data poisoning | Learning data manipulated (supplier · transit · storage · preparation · operation-collected) — sabotage or backdoor trigger | B | data quality control · more train data · train data distortion · poison robust model · model ensemble |
| direct development-time model poisoning | Parameters, pipeline code, config or libraries tampered in the dev environment | B | development security · data segregation · continuous validation |
| supply-chain model poisoning | Supplied trained model manipulated before integration — fine-tuning won’t erase a backdoor | B | supply chain management · model ensemble · continuous validation |
| development-time data leak | Train/test data stolen from the engineering environment | L | development security · data limitation · federated learning |
| direct development-time model leak | Parameters/weights/architecture stolen from dev — upgrades the attacker to perfect-knowledge | L | development security · data segregation · confidential compute |
| source code/configuration leak | The pipeline recipe stolen — no data or weights needed | L | development security |
| Threat (locked name) | Mechanism | Impact | Primary control(s) |
|---|---|---|---|
| conventional runtime attacks | Any classic attack (SQLi, stolen credentials) hits AI assets too | B L A | Security Program · Secure Development Program |
| direct runtime model poisoning | Live parameters — or the model’s I/O logic — altered in production | B | runtime model integrity · runtime model I/O integrity |
| direct runtime model leak | Parameters stolen from production (files, memory, side channels) | L | runtime model confidentiality · model obfuscation |
| output containing conventional injection | Output carries XSS/SQL that a downstream component renders or executes | B | encode model output |
| input data leak | User prompts exposed at rest or in transit (debug logs, proxies) | L | model input confidentiality · data minimization |
| direct augmentation data leak | Attacker reads the vector store / system prompt — embeddings are recoverable | L | augmentation data confidentiality |
| augmentation data manipulation | Attacker writes to the vector store / system prompt / agent memory — steers behavior | B | augmentation data integrity |
| Type | Where the search happens |
|---|---|
| zero-knowledge | No internals; query the live target and read responses |
| partial-knowledge | Some internals (e.g. architecture); sharper search |
| perfect-knowledge | Full weights & parameters; compute perturbations via gradients |
| transfer attack | Craft on a surrogate model; zero queries on the target |
| evasion after poisoning | Planted backdoor trigger — no search needed at all |
| Mechanism | What the attacker gets |
|---|---|
| disclosure in output | Model volunteers the data — no attack needed; last line: sensitive output handling |
| model inversion | Approximate reconstruction of data they never had |
| membership inference | One bit: was this known record in the training set? |
| Control | Objective |
|---|---|
AI Program AI PROGRAM | Organization takes responsibility for AI: inventory, impact analysis, responsibilities, AI literacy |
Security Program SEC PROGRAM | ISMS owns AI assets, threats & risks across the whole lifecycle |
Secure Development Program SEC DEV PROGRAM | Build security in while the AI system is developed, not after |
Development Program DEV PROGRAM | Engineering best practice → maintainable, portable, reliable, future-ready AI — the odd one out: broader than security |
Check Compliance CHECK COMPLIANCE | AI & privacy law in compliance management — a driver, but a floor, not a risk analysis |
Security Education SEC EDUCATE | Teach engineers, dev teams & security staff the threats and controls — incl. which are the supplier’s |
| Control | One-liner |
|---|---|
data minimization DATA MINIMIZE | Remove fields/records the application does not need — delete first |
allowed data ALLOWED DATA | Remove data prohibited for the intended purpose (“may we use it at all?”) |
short retention SHORT RETAIN | Delete or anonymize once no longer needed — minimization along the time axis |
obfuscate training data OBFUSCATE TRAINING DATA | Mask, tokenize, pseudonymize, add noise (differential privacy) to data you must keep — reduces, never eliminates, risk |
discretion DISCRETE | Keep technical details (architecture, config, verbose errors) away from attackers |
| Control | One-liner |
|---|---|
oversight OVERSIGHT | Watch behavior (human or automated) and respond — beware approval fatigue |
least model privilege LEAST MODEL PRIVILEGE | Minimize what the model can do and reach; authorization never lives in prompts |
model alignment MODEL ALIGNMENT | Constrain behavior in the model (training, RLHF, system prompts) — probabilistic, no guarantee |
AI transparency AI TRANSPARENCY | Tell users the system’s properties so they calibrate reliance (system-level) |
continuous validation CONTINUOUS VALIDATION | Frequent test-set runs catch permanent change & drift — not backdoors |
explainability EXPLAINABILITY | Explain individual decisions; counters overreliance (decision-level) |
unwanted bias testing UNWANTED BIAS TESTING | A sudden bias shift can be the first symptom of an attack — a security sensor |
| Deployment | Model supplier owns | You (deployer) own |
|---|---|---|
| Self-hosted | Development-time, model-level controls: training-data hygiene & limitation, poisoning defenses, its dev environment security, its supply chain, base alignment | Everything at runtime: infrastructure, runtime model integrity & confidentiality, monitoring, rate limiting, access control, output validation, privileges, oversight |
| Hosted (API) | All model-level controls + runtime controls of the hosting platform: its security, monitoring, rate limiting, protection of the running model | Application-level usage controls: what data you send, output validation & encoding, prompt-injection handling around your app, user & model privileges, oversight in your context |
| Principle | Cue in the scenario |
|---|---|
| accuracy | A wrong data point drives a harmful automated decision |
| consent | Permission never asked, bundled, or cannot be withdrawn |
| data minimization & storage limitation | More data / finer grain / longer retention than the purpose needs |
| fairness & lawfulness | Unexpected handling, no legal basis, discriminatory effects |
| privacy rights | No way to access, correct, erase or object |
| privacy by design | Privacy bolted on after launch (companion: privacy by default) |
| security & safeguards | Personal data unprotected in training data, environment or I/O |
| transparency & explainability | Affected people cannot learn how the decision was made |
| use limitation & purpose specification | Data collected for one purpose reused for another — THE top pattern |
| Tier | Status · examples |
|---|---|
| unacceptable | Prohibited outright — social scoring, manipulation, real-time remote biometric ID |
| high | Permitted with compliance + ex-ante conformity assessment — résumé screening/recruitment, medical devices, critical infrastructure |
| limited | Transparency obligations — a chatbot must disclose it is a bot |
| minimal | Permitted without restrictions |
| Standard | Role · memory hook |
|---|---|
| ISO/IEC 23894 | AI risk management — “ISO 31000, translated for AI” |
| ISO/IEC 27005 | Information security risk management — “the risk engine behind 27001, not AI-specific” |
| ISO/IEC 42001 | AI management system (AIMS) — “42001 is to AI what 27001 is to infosec” |
| ISO/IEC 5338 | AI lifecycle / MLOps processes — “engineering, not governance” |
| Concept A | Concept B | How to tell them apart |
|---|---|---|
| model inversion | membership inference | Inversion reconstructs data the attacker never had; membership inference confirms (yes/no) a record they already hold. Both feed on confidence scores. |
| direct prompt injection | indirect prompt injection | Trace the channel: typed by the user → direct; hidden in third-party content the app inserts (webpage, CV, image) → indirect. |
| model exfiltration | direct model leak | Front door vs break-in: harvested input–output pairs build an approximation; a leak steals the real parameter file from storage. |
| denial-of-wallet (DoW) | denial of service (DoS) | Follow the harm: money drained (bills, tokens, GPU) → DoW; service slow or unavailable → DoS. One sponge attack can cause both. |
| data poisoning | model poisoning | What did the attacker touch? Learning data (records, labels, dataset) → data; parameters, pipeline code, config, libraries → model. |
| direct development-time model poisoning | direct runtime model poisoning | Where do the attacker’s hands reach the parameters: engineering environment before release vs the live production system. |
| direct development-time model leak | direct runtime model leak | Same theft, different crime scene: dev environment (training server, laptop) vs production (files, memory, side channels). |
| direct augmentation data leak | augmentation data manipulation | Read vs write: exposure of the vector store / system prompt → leak; corrupted content steering answers → manipulation. |
| input data leak | sensitive data disclosure through use | Locate the breach: log file / at rest / in transit → input data leak; “the model revealed it in an answer” → disclosure through use. |
| zero-knowledge evasion | transfer attack | Where the search happens: probing the live target’s API vs crafting on a surrogate model — the target never sees the search. |
| evasion after poisoning | data poisoning | Same backdoor, two moments: planting the trigger in training data = data poisoning (development-time); presenting the trigger later = evasion after poisoning (runtime). |
| responsible AI | trustworthy AI | What the concern attaches to: people, society, governance, accountability → responsible; robustness, reliability, transparency, explainability → trustworthy. |
| provider (supplier) | deployer (you) | Who can change the layer: only the provider can alter training data or base alignment; only you can change what your app sends, validates and permits. |
| AI security testing | model performance validation | Hostile vs benign inputs: simulated attacks and bypass attempts vs accuracy on a test set against acceptance criteria. |
| privacy principles (9) | GDPR challenges (10) | Principles are rules you apply and can violate; challenges are compliance difficulties you manage. “Which principle is violated?” → answer from the nine. |
| Development Program | Secure Development Program | Objective wording: maintainable, portable, reliable, future-ready (quality beyond security) vs building security in during development. |