Model Monitoring & Drift Detection

What to Monitor

Category	Metrics	Why
Performance	Accuracy, latency, error rate, throughput	Detect degradation before users notice
Data drift	Input feature distributions, token distributions	World changes → model gets stale
Output drift	Response length distribution, sentiment, refusal rate	Model behavior shifting over time
Safety	Toxicity rate, PII in outputs, jailbreak success rate	Safety guardrails weakening
Cost	Tokens per request, GPU utilization, API spend	Budget anomalies indicate abuse
Operational	Uptime, queue depth, timeout rate	Infrastructure health

Drift Detection Methods

Statistical tests: Compare current input/output distributions against a reference baseline using KS test, PSI (Population Stability Index), or Jensen-Shannon divergence.

Performance benchmarks: Run a fixed evaluation set on a schedule. If accuracy drops below threshold, trigger alert.

Canary queries: Periodically send known-answer queries and verify correct responses. Functions like a health check for model quality.

Human evaluation sampling: Randomly sample a percentage of production outputs for human review. Track quality scores over time.

Alerting Thresholds

Condition	Action
Accuracy drops >5% from baseline	Alert — investigate
Latency p99 exceeds 2x normal	Alert — check GPU health
PII detection rate spikes	Critical alert — potential data leakage
Refusal rate drops significantly	Alert — safety guardrails may be degraded
API cost exceeds daily budget by 2x	Alert — possible extraction or abuse
Error rate exceeds 5%	Alert — infrastructure issue

Tools

Tool	Purpose
Evidently AI	Open-source ML monitoring, drift detection
Arize	ML observability platform
WhyLabs	Data and model monitoring
Fiddler AI	Model performance management
Custom Prometheus/Grafana	Build your own with standard observability stack

AI Security Book

Model Monitoring & Drift Detection

What to Monitor

Drift Detection Methods

Alerting Thresholds

Tools