Failover & Fallback Strategies
Why AI Systems Need Fallbacks
AI systems can fail in ways traditional software doesn't — hallucinating confidently, degrading gradually, or becoming adversarially compromised without obvious errors. Fallbacks ensure business continuity.
Fallback Architecture
Tier 1: Model Fallback
Primary model fails → route to a secondary model.
| Primary | Fallback | Trade-off |
|---|---|---|
| GPT-4o | Claude 3.5 Sonnet | Different vendor, similar capability |
| Claude 3.5 Sonnet | Llama 3 70B (self-hosted) | No vendor dependency, lower quality |
| Custom fine-tune | Base model without fine-tuning | Loses specialization, maintains function |
Tier 2: Degraded Service
All models unavailable → serve reduced functionality.
- Return cached responses for common queries
- Route to rule-based system (decision tree, keyword matching)
- Display "AI unavailable" with human escalation option
Tier 3: Human Fallback
AI system compromised or unreliable → route to humans.
- Live chat agents handle queries directly
- Queue system with SLA for response time
- Automated triage routes to appropriate human team
Implementation Patterns
Circuit Breaker
Monitor error rate → if rate > threshold for N seconds:
→ Open circuit (stop sending to primary)
→ Route all traffic to fallback
→ After cooldown period, test primary with canary request
→ If canary succeeds, close circuit (resume primary)
Confidence Gating
Model produces response with confidence score
→ If confidence > threshold: return response
→ If confidence < threshold: flag for human review
→ If confidence < critical threshold: route to fallback
Cost-Based Circuit Breaker
Track API spend per hour
→ If spend > 2x normal: alert
→ If spend > 5x normal: switch to cheaper fallback model
→ If spend > 10x normal: suspend AI service, route to humans