Prompt Injection
Overview
Prompt injection is the most critical vulnerability class in LLM applications. It occurs when an attacker embeds instructions inside input that the model processes, causing the model to follow the attacker's instructions instead of (or in addition to) the developer's.
It's the SQL injection of AI. The root cause is identical: no separation between code (instructions) and data (user input).
Why It's Unsolved
There is currently no reliable, general-purpose defense against prompt injection. Unlike SQL injection, which was solved with parameterized queries, LLMs process everything as natural language tokens — instructions and data share the same channel. Heuristic filters help but can always be bypassed with creative encoding.
Two Types
Direct Prompt Injection
The attacker is the user. They type adversarial instructions directly into the chat or API.
Goal: Make the model do something the developer prohibited — bypass content policies, extract the system prompt, generate restricted content.
Indirect Prompt Injection
The attacker plants instructions in content the model will read — a webpage, email, document, database record, or tool output. The victim is a different user whose AI assistant processes the poisoned content.
Goal: Execute actions on behalf of the victim — exfiltrate data, trigger tool calls, manipulate outputs, spread to other conversations.
Impact
| Scenario | Impact |
|---|---|
| Chatbot with tool use | Attacker triggers unauthorized API calls |
| RAG system | Poisoned document hijacks all responses |
| Email assistant | Malicious email exfiltrates inbox contents |
| Code assistant | Injected comment inserts vulnerable code |
| Customer support bot | Attacker extracts other customers' data |