Failover & Fallback Strategies

Why AI Systems Need Fallbacks

AI systems can fail in ways traditional software doesn't — hallucinating confidently, degrading gradually, or becoming adversarially compromised without obvious errors. Fallbacks ensure business continuity.

Fallback Architecture

Tier 1: Model Fallback

Primary model fails → route to a secondary model.

PrimaryFallbackTrade-off
GPT-4oClaude 3.5 SonnetDifferent vendor, similar capability
Claude 3.5 SonnetLlama 3 70B (self-hosted)No vendor dependency, lower quality
Custom fine-tuneBase model without fine-tuningLoses specialization, maintains function

Tier 2: Degraded Service

All models unavailable → serve reduced functionality.

  • Return cached responses for common queries
  • Route to rule-based system (decision tree, keyword matching)
  • Display "AI unavailable" with human escalation option

Tier 3: Human Fallback

AI system compromised or unreliable → route to humans.

  • Live chat agents handle queries directly
  • Queue system with SLA for response time
  • Automated triage routes to appropriate human team

Implementation Patterns

Circuit Breaker

Monitor error rate → if rate > threshold for N seconds:
  → Open circuit (stop sending to primary)
  → Route all traffic to fallback
  → After cooldown period, test primary with canary request
  → If canary succeeds, close circuit (resume primary)

Confidence Gating

Model produces response with confidence score
  → If confidence > threshold: return response
  → If confidence < threshold: flag for human review
  → If confidence < critical threshold: route to fallback

Cost-Based Circuit Breaker

Track API spend per hour
  → If spend > 2x normal: alert
  → If spend > 5x normal: switch to cheaper fallback model
  → If spend > 10x normal: suspend AI service, route to humans