Responsible Disclosure for AI Vulnerabilities

Why AI Disclosure Is Different

Traditional vulnerability disclosure has mature processes — CVEs, CVSS scoring, coordinated disclosure timelines. AI vulnerability disclosure is still immature, and several factors make it harder:

No CVE equivalent. There's no standardized identifier system for AI vulnerabilities. A prompt injection affecting GPT-4 doesn't get a CVE.
Reproducibility is probabilistic. The same jailbreak prompt might work 60% of the time. Traditional vulns are deterministic — they either work or they don't.
The "fix" is unclear. Patching a prompt injection isn't like patching a buffer overflow. It may require retraining, fine-tuning, or filter updates — and the fix may break other behavior.
Severity is subjective. A jailbreak that produces mildly inappropriate text and one that exfiltrates user data are both "prompt injection" but have vastly different impact.
Disclosure can become the exploit. Publishing a jailbreak template doesn't require adaptation — anyone can copy-paste it. Traditional exploits usually need targeting.

Vendor Disclosure Programs

Major AI Providers

Provider	Program	URL	Scope
OpenAI	Bug Bounty (via Bugcrowd)	bugcrowd.com/openai	API vulnerabilities, data exposure. Jailbreaks/safety bypasses NOT in scope for bounty but can be reported.
Anthropic	Responsible Disclosure	anthropic.com/responsible-disclosure	Security vulnerabilities in systems and infrastructure. Safety issues reported through separate channels.
Google (DeepMind)	Google VRP	bughunters.google.com	AI-specific vulnerabilities in Google products. Includes model manipulation, training data extraction.
Meta	Bug Bounty + AI Red Team	facebook.com/whitehat	Llama model vulnerabilities, platform AI features.
Microsoft	MSRC + AI Red Team	msrc.microsoft.com	Copilot, Azure AI, Bing AI vulnerabilities.
Hugging Face	Security reporting	huggingface.co/security	Model hub vulnerabilities, malicious models, infrastructure issues.

What's Typically In Scope

Category	Usually In Scope	Usually Out of Scope
Infrastructure vulns	Yes — SSRF, auth bypass, data exposure
Training data extraction	Yes — PII or sensitive data recovered	General memorization without sensitive content
Cross-user data leakage	Yes — accessing another user's data
System prompt extraction	Varies — some treat as informational	Often out of scope for bounty
Jailbreaks	Usually out of scope for bounty	Can be reported for safety team review
Model output quality	No	Hallucinations, factual errors
Bias	No (for bug bounty)	Report through responsible AI channels

How to Report

Step 1: Classify the Finding

Classification	Description	Urgency
Security vulnerability	Infrastructure exploit, data exposure, auth bypass	Report immediately via security channel
Safety bypass with impact	Jailbreak that enables harmful actions (tool abuse, data exfil)	Report within 24-48 hours
Safety bypass without impact	Jailbreak that produces restricted text only	Report at your convenience
Prompt injection (indirect)	Third-party content can hijack model behavior	Report within 48 hours — higher impact
Model behavior issue	Bias, hallucination, quality degradation	Report through product feedback channels

Step 2: Document the Finding

Include in your report:

## Summary
[One sentence: what the vulnerability is and why it matters]

## Affected System
[Model name, version if known, API or web interface, specific feature]

## Reproduction Steps
1. [Exact steps to reproduce]
2. [Include exact prompts — copy-paste ready]
3. [Note any required preconditions]

## Observed Behavior
[What the model did — include exact output if possible]

## Expected Behavior
[What the model should have done]

## Reproduction Rate
[Approximate percentage: "works ~70% of the time across 20 attempts"]

## Impact Assessment
[What an attacker could achieve with this vulnerability]
[Data at risk, unauthorized actions possible, affected users]

## Suggested Mitigation
[If you have ideas for how to fix it — optional but appreciated]

## Environment
[Date/time of testing, browser/API client used, account type]

Step 3: Submit Through the Right Channel

Security vulnerabilities: Use the vendor's security reporting page, not public forums
Safety issues: Use the dedicated safety reporting mechanism if available
No response in 5 business days: Send a follow-up. If no response in 15 business days, consider escalating through CERT/CC or the AI Incident Database

Step 4: Coordinate Disclosure

Follow the vendor's stated disclosure timeline (typically 90 days)
For AI vulns, consider longer timelines — fixes may require retraining
Don't publish working jailbreak prompts before the vendor has had time to respond
If publishing research, consider redacting the specific bypass technique while describing the vulnerability class

Disclosure Dos and Don'ts

Do:

Report through official channels first
Provide clear reproduction steps
Assess and communicate real-world impact
Give the vendor reasonable time to respond
Document everything for your records

Don't:

Test on production systems beyond what's needed to confirm the issue
Access, store, or exfiltrate other users' data during testing
Publish working exploits before coordinated disclosure
Overstate severity — "I jailbroke ChatGPT" is different from "I extracted user data"
Threaten the vendor or demand payment outside of formal bug bounty programs

For Organizations: Building Your Own AI Disclosure Program

If you deploy AI-powered products, you need a process for receiving AI vulnerability reports:

Minimum Requirements

Dedicated intake channel — separate from traditional security bugs. AI reports need reviewers who understand prompt injection, not just web app vulns.
Defined scope — clearly state what's in scope (infrastructure, data leakage, injection) and what's not (jailbreaks that only produce text, hallucinations).
Response SLA — acknowledge receipt within 48 hours, triage within 5 business days.
AI-specific severity framework — traditional CVSS doesn't capture AI risks well. Define your own:

Severity	Criteria
Critical	Data exfiltration, unauthorized actions, cross-user impact
High	Reliable system prompt extraction with credentials, persistent injection
Medium	System prompt extraction (no creds), inconsistent jailbreak with tool abuse
Low	Jailbreak producing restricted text, information disclosure without sensitive data
Informational	Theoretical risk, defense recommendations

Remediation process — define who triages AI reports, how fixes are tested, and what "fixed" means (is a filter patch sufficient, or does this need retraining?).

Industry Resources

AI Incident Database (AIID): Tracks real-world AI failures and incidents — useful for understanding impact patterns
AVID (AI Vulnerability Database): Community effort to catalog AI vulnerabilities with structured reports
MITRE ATLAS: Use ATLAS technique IDs in your reports for standardized classification
OWASP LLM Top 10: Reference for categorizing findings

AI Security Book