Embeddings & Positional Encoding

Embeddings

After tokenization, each token ID is converted into a dense vector — a list of numbers (typically 4,096 to 12,288 dimensions for large models). This is done via a lookup in the embedding matrix, a massive table learned during training.

Why Vectors?

A token ID like 4523 is arbitrary — it tells the model nothing about meaning. The embedding vector encodes semantic relationships:

Similar meanings → similar vectors. "Hacker" and "attacker" are close in embedding space.
Different meanings → distant vectors. "Hacker" and "banana" are far apart.
Relationships are directional. The vector from "king" to "queen" is roughly the same as "man" to "woman."

Embedding Arithmetic

This isn't a party trick — it's literal vector math:

embedding("king") - embedding("man") + embedding("woman") ≈ embedding("queen")
embedding("Paris") - embedding("France") + embedding("Germany") ≈ embedding("Berlin")

The model learns these relationships automatically from the statistical patterns in training data.

Dimensions

Model	Embedding Dimensions
GPT-2	768
GPT-3	12,288
Llama 2 7B	4,096
Llama 2 70B	8,192
Claude (estimated)	8,192+

More dimensions = more nuance in representing meaning, but more compute cost.

Positional Encoding

Embeddings alone have no concept of word order. "Dog bites man" and "man bites dog" produce the same set of embedding vectors — just in a different order. The model needs to know where each token sits in the sequence.

How It Works

Each position in the sequence (0, 1, 2, ...) gets its own vector, which is added to the token embedding. The combined vector now encodes both what the token is and where it is.

Methods

Sinusoidal (original transformer): Uses sine and cosine functions at different frequencies. Position 0 gets one pattern, position 1 gets another, etc. Fixed — not learned.

PE(pos, 2i) = sin(pos / 10000^(2i/d_model))
PE(pos, 2i+1) = cos(pos / 10000^(2i/d_model))

Learned positional embeddings: A trainable embedding matrix for positions, just like the token embeddings. Most modern models use this.

RoPE (Rotary Position Embedding): Used by Llama, Mistral, and many recent models. Encodes position as a rotation in embedding space. Enables better generalization to longer sequences than seen during training.

Security Relevance

Embedding similarity enables transfer attacks. If two inputs have similar embeddings, they may trigger similar model behavior — even if the surface text looks different.

Positional attacks. Instructions placed at the beginning of the context window tend to carry more weight than instructions buried in the middle (the "lost in the middle" phenomenon). Attackers exploit this by front-loading injected instructions.

Embedding inversion. Given a model's embeddings (e.g., from a vector database), it's possible to approximately reconstruct the original text — a privacy risk for RAG systems storing sensitive documents.

AI Security Book