Training Pipeline
Overview
The training pipeline is the full process of turning raw data into a deployable model. Every stage is a potential attack surface.
Data Collection → Data Cleaning → Tokenization → Pre-Training
→ Fine-Tuning (SFT) → Alignment (RLHF/DPO) → Evaluation → Deployment
Pipeline Stages & Attack Surface
| Stage | What Happens | Attack Vector |
|---|---|---|
| Data Collection | Scrape web, license datasets | Data poisoning via web content |
| Data Cleaning | Dedup, filter, quality check | Poison samples that survive filtering |
| Tokenization | Build vocabulary from corpus | Vocabulary manipulation |
| Pre-Training | Next-token prediction on trillions of tokens | Backdoor injection at scale |
| Fine-Tuning (SFT) | Train on curated instruction-response pairs | Poisoned fine-tuning data |
| RLHF/DPO | Align to human preferences | Reward model manipulation |
| Evaluation | Benchmark performance | Benchmark gaming |
| Deployment | Serve via API | API-level attacks (injection, extraction) |
Cost & Scale
Modern frontier models:
- Training data: 1-15 trillion tokens
- Parameters: 70B - 1.8T
- Compute: thousands of GPUs for months
- Cost: $50M - $500M+ per training run
- Energy: equivalent to hundreds of homes per year
This scale makes re-training expensive, which means data poisoning effects persist — you can't just "patch" a poisoned model easily.