Model Architectures
Overview
Not all AI models are the same architecture. Understanding the differences matters for red teaming because different architectures have different vulnerability profiles.
Decoder-Only (Autoregressive)
What it is: Generates text left to right, one token at a time. Each token can only attend to previous tokens (causal masking).
Models: GPT-4, Claude, Llama, Mistral, Gemini
Used for: Chatbots, text generation, code generation, reasoning
Security profile: Susceptible to prompt injection, jailbreaking, and next-token manipulation. The autoregressive nature means early tokens disproportionately influence later generation.
Encoder-Only
What it is: Processes the entire input bidirectionally (every token attends to every other token). Produces a representation of the input, not generated text.
Models: BERT, RoBERTa, DeBERTa
Used for: Classification, sentiment analysis, named entity recognition, embedding generation
Security profile: Susceptible to adversarial examples for classification evasion. Less relevant for prompt injection since they don't generate text.
Encoder-Decoder
What it is: Encoder processes the input bidirectionally, decoder generates output autoregressively while attending to the encoder's representation.
Models: T5, BART, Flan-T5
Used for: Translation, summarization, question answering
Security profile: Hybrid vulnerabilities — the encoder side is susceptible to adversarial input perturbation, the decoder side to generation-based attacks.
Mixture of Experts (MoE)
What it is: Instead of one massive feed-forward network, MoE uses multiple smaller "expert" networks. A routing mechanism selects which experts process each token. Only a fraction of parameters are active per forward pass.
Models: Mixtral, GPT-4 (rumored), Switch Transformer
Used for: Reducing inference cost while maintaining capacity
Security profile: Expert routing can be manipulated — adversarial inputs might trigger specific experts or avoid the expert that handles safety-relevant processing.
Diffusion Models
What it is: Generates output by iteratively denoising random noise. Used primarily for images, audio, and video.
Models: Stable Diffusion, DALL-E, Midjourney
Used for: Image generation, audio synthesis, video generation
Security profile: Susceptible to adversarial perturbation in the latent space, prompt injection via text encoder, and training data memorization (generating recognizable copyrighted images).
Multimodal Models
What it is: Combines multiple input types (text, images, audio, video) into a single model. Typically uses a vision encoder connected to an LLM backbone.
Models: GPT-4V/o, Claude 3 (vision), Gemini, LLaVA
Used for: Image understanding, document analysis, video analysis
Security profile: Cross-modal injection — hiding text instructions in images that the vision encoder reads but humans don't notice. This is a growing attack vector.
Model Size Reference
| Model | Parameters | Architecture |
|---|---|---|
| GPT-2 | 1.5B | Decoder-only |
| Llama 2 | 7B / 13B / 70B | Decoder-only |
| Llama 3 | 8B / 70B / 405B | Decoder-only |
| Mixtral 8x7B | 46.7B (12.9B active) | MoE Decoder-only |
| GPT-4 | ~1.8T (rumored) | MoE Decoder-only |
| BERT-large | 340M | Encoder-only |
| T5-XXL | 11B | Encoder-Decoder |