API Security for AI Endpoints

AI-Specific API Risks

AI APIs differ from traditional APIs because every request is computationally expensive (GPU inference), every response may contain generated content that's hard to predict or filter, and the API surface is natural language — traditional input validation doesn't apply in the same way.

Essential Controls

Authentication & Authorization

API key or OAuth 2.0 for all endpoints
Per-user and per-key rate limits (tokens/minute, requests/hour)
Scope-limited API keys — separate keys for read-only vs. tool-use access
IP allowlisting for production integrations

Rate Limiting

AI-specific rate limiting should track both request count and token consumption:

Metric	Why	Threshold Example
Requests per minute	Prevent basic flooding	60 RPM per key
Input tokens per minute	Prevent context stuffing	100K tokens/min
Output tokens per minute	Prevent expensive generation	50K tokens/min
Cost per hour	Prevent budget exhaustion	$50/hour per key

Input Validation

Maximum input length (token count)
Input encoding validation (reject malformed Unicode)
Perplexity checking (flag unusual token sequences)
Content classification on input (detect adversarial patterns)

Output Security

PII scanning on all responses
Content safety classification on outputs
Response size limits
Watermarking for model output attribution

Logging & Monitoring

Log all requests and responses (with PII redaction)
Anomaly detection on query patterns
Alert on extraction indicators (high volume, systematic variation)
Audit trail for all API key operations

AI Security Book