How LLMs Work

The Big Picture

Large Language Models are transformers trained on internet-scale text data to predict the next token. That's the entire concept. Everything else is implementation detail — but those details matter for security.

The pipeline:

Raw text → Tokenization → Embeddings → Positional Encoding 
→ Transformer Layers (×80-120) → Output Probabilities → Sample Next Token

Each step in this pipeline introduces attack surface. This section breaks down each stage.

What Makes LLMs Different

LLMs aren't just "big neural networks." The transformer architecture has specific properties that create unique security concerns:

Context windows — the model can only "see" a fixed number of tokens at once (4K-200K+). This constrains and enables attacks.
Autoregressive generation — output is produced one token at a time, each conditioned on everything before it. This means early tokens influence everything downstream.
In-context learning — the model can learn new tasks from examples in the prompt without weight changes. This is also what makes prompt injection possible.
Instruction following — fine-tuned models follow natural language instructions, which means an attacker's instructions look identical to legitimate ones.

The Fundamental Security Problem

The model has no architectural separation between instructions and data. Everything is tokens. The system prompt, the user's message, retrieved documents, tool outputs — they all enter the same context window as a flat sequence of tokens. The model was trained to treat some tokens as instructions, but that distinction is learned behavior, not a hard boundary.

This is equivalent to a system where SQL queries and user input share the same channel with no parameterization. That's why prompt injection is the defining vulnerability of LLM applications.

AI Security Book

How LLMs Work

The Big Picture

What Makes LLMs Different

The Fundamental Security Problem

Subsections