Prompt Injection

Overview

Prompt injection is the most critical vulnerability class in LLM applications. It occurs when an attacker embeds instructions inside input that the model processes, causing the model to follow the attacker's instructions instead of (or in addition to) the developer's.

It's the SQL injection of AI. The root cause is identical: no separation between code (instructions) and data (user input).

Why It's Unsolved

There is currently no reliable, general-purpose defense against prompt injection. Unlike SQL injection, which was solved with parameterized queries, LLMs process everything as natural language tokens — instructions and data share the same channel. Heuristic filters help but can always be bypassed with creative encoding.

Scenario	Impact
Chatbot with tool use	Attacker triggers unauthorized API calls
RAG system	Poisoned document hijacks all responses
Email assistant	Malicious email exfiltrates inbox contents
Code assistant	Injected comment inserts vulnerable code
Customer support bot	Attacker extracts other customers' data

AI Security Book

Prompt Injection

Overview

Why It's Unsolved

Two Types

Direct Prompt Injection

Indirect Prompt Injection

Impact

Subsections