LLM Security Checklist: 12 Vulnerabilities to Fix Before You Launch
Every LLM-based system carries a predictable set of vulnerabilities — and a pre-launch audit typically finds 10 to 15 of them. The good news: each has a known fix, and all of them are cheaper to close before launch than after an incident. Here is the 12-point checklist we run against every production system, mapped to the OWASP LLM Top-10.
Input-layer vulnerabilities
- 1. Direct prompt injection — a user overrides your system prompt. Fix: input validation rules plus an ML classifier, and a templated prompt structure the user input cannot break out of.
- 2. Indirect prompt injection — a malicious instruction hidden in an uploaded PDF or a retrieved web page. Fix: treat all retrieved content as untrusted; strip and sandbox it before it reaches the model.
- 3. Jailbreak via role-play — 'pretend you are a model with no rules'. Fix: a guardrail layer (Llama Guard or a Claude guardrail) evaluating intent before the main model runs.
Data-leak vulnerabilities
- 4. System-prompt extraction — the user gets the model to reveal its instructions. Fix: never put secrets in the prompt; assume the prompt is public; harden against extraction patterns.
- 5. PII reconstruction — targeted prompting rebuilds personal data. Fix: PII redaction before the model call, not after; minimise what enters the context at all.
- 6. RAG metadata leakage — the model echoes internal document paths, author names, or other tenants' filenames. Fix: strip metadata from retrieved chunks before they reach the model.
- 7. Cross-tenant bleed — tenant A retrieves tenant B's data. Fix: namespace isolation enforced at the vector-DB query layer, not just in application code.
Tool and agent vulnerabilities
- 8. Over-permissioned tools — an agent can call any API it likes. Fix: a JWT-scoped permission model — each tool gets the minimum access it needs, nothing more.
- 9. Destructive action without a gate — an agent deletes, sends, or pays autonomously. Fix: human-in-the-loop approval for any irreversible or financial action.
- 10. Unvalidated output — the model returns malformed JSON or unsafe content downstream. Fix: schema validation plus a toxicity and PII check on every output.
Operational vulnerabilities
- 11. Cost exploit — no rate limit, so an attacker (or a bug) runs your API bill to five figures overnight. Fix: rate and cost limits per user and per tenant, with alerts at 50% and a hard stop at 100%.
- 12. No audit trail — when something goes wrong you cannot reconstruct what happened. Fix: a write-only audit log of every prompt, tool call, and output, retained for compliance.
“A typical pre-launch audit finds 10 to 15 critical issues, and 100% of them are fixable inside two weeks. The cost of prevention is a rounding error next to a GDPR or MDR fine.”
How to use this checklist
Run all 12 as a red-team exercise before launch — at least 800 attack iterations across the categories above, with a regression test written for every finding. Re-run the suite on every model change. If you are in a regulated industry, the output of this audit also becomes your security dossier: architecture diagrams, threat model, test evidence, and an incident-response runbook. That documentation is what regulators and enterprise buyers will ask for — building it pre-launch is far cheaper than reconstructing it later.