Differential Privacy

What It Is

Differential privacy is a mathematical framework that provides provable guarantees against training data extraction. It adds carefully calibrated noise during training so that no individual training example can be identified from the model's outputs.

How It Works

During training, noise is added to the gradients before updating model weights. The amount of noise is controlled by the privacy budget (epsilon, ε):

  • Low ε (strong privacy): More noise, less memorization, lower model quality
  • High ε (weak privacy): Less noise, more memorization, higher model quality

The trade-off is fundamental — stronger privacy guarantees mean worse model performance.

Current State

AspectStatus
Theoretical foundationStrong — well-established mathematics
Implementation for small modelsMature — libraries like Opacus (PyTorch)
Implementation for LLMsChallenging — significant quality degradation
Adoption in production LLMsVery low — most providers don't use it
Regulatory recognitionGrowing — mentioned in GDPR guidance and AI regulations

Why Most LLMs Don't Use It

Applying differential privacy to large language models degrades output quality significantly. Current frontier models prioritize capability over privacy guarantees, relying instead on data deduplication, output filtering, and post-hoc mitigations.

When to Consider Differential Privacy

  • Training models on highly sensitive data (medical records, financial data)
  • Regulatory requirements mandate provable privacy guarantees
  • Model will be publicly accessible (high extraction risk)
  • Training data contains data subjects who haven't consented to AI training

Alternatives and Complements

ApproachWhat It DoesPrivacy Guarantee
Differential privacyMathematical noise during trainingProvable
Data deduplicationRemove repeated data to reduce memorizationHeuristic
Data sanitizationRemove PII before trainingDepends on detection quality
Output filteringBlock PII in model responsesPost-hoc, not preventive
Federated learningTrain on distributed data without centralizing itPartial — gradients can still leak