No input filter can reliably block all prompt injection. Natural language is too flexible — any filter that blocks adversarial instructions will also block some legitimate requests. Filters reduce risk but do not eliminate it.
User input
→ Input filter (PII redaction, injection detection)
→ Model inference
→ Output filter (PII scan, safety check, leakage detection)
→ User response
Both filters should run as separate services from the model — if the model is compromised via injection, the output filter still catches dangerous responses.