Training Pipeline

Overview

The training pipeline is the full process of turning raw data into a deployable model. Every stage is a potential attack surface.

Data Collection → Data Cleaning → Tokenization → Pre-Training
→ Fine-Tuning (SFT) → Alignment (RLHF/DPO) → Evaluation → Deployment

Pipeline Stages & Attack Surface

StageWhat HappensAttack Vector
Data CollectionScrape web, license datasetsData poisoning via web content
Data CleaningDedup, filter, quality checkPoison samples that survive filtering
TokenizationBuild vocabulary from corpusVocabulary manipulation
Pre-TrainingNext-token prediction on trillions of tokensBackdoor injection at scale
Fine-Tuning (SFT)Train on curated instruction-response pairsPoisoned fine-tuning data
RLHF/DPOAlign to human preferencesReward model manipulation
EvaluationBenchmark performanceBenchmark gaming
DeploymentServe via APIAPI-level attacks (injection, extraction)

Cost & Scale

Modern frontier models:

  • Training data: 1-15 trillion tokens
  • Parameters: 70B - 1.8T
  • Compute: thousands of GPUs for months
  • Cost: $50M - $500M+ per training run
  • Energy: equivalent to hundreds of homes per year

This scale makes re-training expensive, which means data poisoning effects persist — you can't just "patch" a poisoned model easily.

Subsections