API Security for AI Endpoints
AI-Specific API Risks
AI APIs differ from traditional APIs because every request is computationally expensive (GPU inference), every response may contain generated content that's hard to predict or filter, and the API surface is natural language — traditional input validation doesn't apply in the same way.
Essential Controls
Authentication & Authorization
- API key or OAuth 2.0 for all endpoints
- Per-user and per-key rate limits (tokens/minute, requests/hour)
- Scope-limited API keys — separate keys for read-only vs. tool-use access
- IP allowlisting for production integrations
Rate Limiting
AI-specific rate limiting should track both request count and token consumption:
| Metric | Why | Threshold Example |
|---|---|---|
| Requests per minute | Prevent basic flooding | 60 RPM per key |
| Input tokens per minute | Prevent context stuffing | 100K tokens/min |
| Output tokens per minute | Prevent expensive generation | 50K tokens/min |
| Cost per hour | Prevent budget exhaustion | $50/hour per key |
Input Validation
- Maximum input length (token count)
- Input encoding validation (reject malformed Unicode)
- Perplexity checking (flag unusual token sequences)
- Content classification on input (detect adversarial patterns)
Output Security
- PII scanning on all responses
- Content safety classification on outputs
- Response size limits
- Watermarking for model output attribution
Logging & Monitoring
- Log all requests and responses (with PII redaction)
- Anomaly detection on query patterns
- Alert on extraction indicators (high volume, systematic variation)
- Audit trail for all API key operations