Differential Privacy
What It Is
Differential privacy is a mathematical framework that provides provable guarantees against training data extraction. It adds carefully calibrated noise during training so that no individual training example can be identified from the model's outputs.
How It Works
During training, noise is added to the gradients before updating model weights. The amount of noise is controlled by the privacy budget (epsilon, ε):
- Low ε (strong privacy): More noise, less memorization, lower model quality
- High ε (weak privacy): Less noise, more memorization, higher model quality
The trade-off is fundamental — stronger privacy guarantees mean worse model performance.
Current State
| Aspect | Status |
|---|---|
| Theoretical foundation | Strong — well-established mathematics |
| Implementation for small models | Mature — libraries like Opacus (PyTorch) |
| Implementation for LLMs | Challenging — significant quality degradation |
| Adoption in production LLMs | Very low — most providers don't use it |
| Regulatory recognition | Growing — mentioned in GDPR guidance and AI regulations |
Why Most LLMs Don't Use It
Applying differential privacy to large language models degrades output quality significantly. Current frontier models prioritize capability over privacy guarantees, relying instead on data deduplication, output filtering, and post-hoc mitigations.
When to Consider Differential Privacy
- Training models on highly sensitive data (medical records, financial data)
- Regulatory requirements mandate provable privacy guarantees
- Model will be publicly accessible (high extraction risk)
- Training data contains data subjects who haven't consented to AI training
Alternatives and Complements
| Approach | What It Does | Privacy Guarantee |
|---|---|---|
| Differential privacy | Mathematical noise during training | Provable |
| Data deduplication | Remove repeated data to reduce memorization | Heuristic |
| Data sanitization | Remove PII before training | Depends on detection quality |
| Output filtering | Block PII in model responses | Post-hoc, not preventive |
| Federated learning | Train on distributed data without centralizing it | Partial — gradients can still leak |