Differential Privacy

What It Is

Differential privacy is a mathematical framework that provides provable guarantees against training data extraction. It adds carefully calibrated noise during training so that no individual training example can be identified from the model's outputs.

How It Works

During training, noise is added to the gradients before updating model weights. The amount of noise is controlled by the privacy budget (epsilon, ε):

Low ε (strong privacy): More noise, less memorization, lower model quality
High ε (weak privacy): Less noise, more memorization, higher model quality

The trade-off is fundamental — stronger privacy guarantees mean worse model performance.

Current State

Aspect	Status
Theoretical foundation	Strong — well-established mathematics
Implementation for small models	Mature — libraries like Opacus (PyTorch)
Implementation for LLMs	Challenging — significant quality degradation
Adoption in production LLMs	Very low — most providers don't use it
Regulatory recognition	Growing — mentioned in GDPR guidance and AI regulations

Why Most LLMs Don't Use It

Applying differential privacy to large language models degrades output quality significantly. Current frontier models prioritize capability over privacy guarantees, relying instead on data deduplication, output filtering, and post-hoc mitigations.

When to Consider Differential Privacy

Training models on highly sensitive data (medical records, financial data)
Regulatory requirements mandate provable privacy guarantees
Model will be publicly accessible (high extraction risk)
Training data contains data subjects who haven't consented to AI training

Alternatives and Complements

Approach	What It Does	Privacy Guarantee
Differential privacy	Mathematical noise during training	Provable
Data deduplication	Remove repeated data to reduce memorization	Heuristic
Data sanitization	Remove PII before training	Depends on detection quality
Output filtering	Block PII in model responses	Post-hoc, not preventive
Federated learning	Train on distributed data without centralizing it	Partial — gradients can still leak

AI Security Book