Responsible Disclosure for AI Vulnerabilities

Why AI Disclosure Is Different

Traditional vulnerability disclosure has mature processes — CVEs, CVSS scoring, coordinated disclosure timelines. AI vulnerability disclosure is still immature, and several factors make it harder:

  • No CVE equivalent. There's no standardized identifier system for AI vulnerabilities. A prompt injection affecting GPT-4 doesn't get a CVE.
  • Reproducibility is probabilistic. The same jailbreak prompt might work 60% of the time. Traditional vulns are deterministic — they either work or they don't.
  • The "fix" is unclear. Patching a prompt injection isn't like patching a buffer overflow. It may require retraining, fine-tuning, or filter updates — and the fix may break other behavior.
  • Severity is subjective. A jailbreak that produces mildly inappropriate text and one that exfiltrates user data are both "prompt injection" but have vastly different impact.
  • Disclosure can become the exploit. Publishing a jailbreak template doesn't require adaptation — anyone can copy-paste it. Traditional exploits usually need targeting.

Vendor Disclosure Programs

Major AI Providers

ProviderProgramURLScope
OpenAIBug Bounty (via Bugcrowd)bugcrowd.com/openaiAPI vulnerabilities, data exposure. Jailbreaks/safety bypasses NOT in scope for bounty but can be reported.
AnthropicResponsible Disclosureanthropic.com/responsible-disclosureSecurity vulnerabilities in systems and infrastructure. Safety issues reported through separate channels.
Google (DeepMind)Google VRPbughunters.google.comAI-specific vulnerabilities in Google products. Includes model manipulation, training data extraction.
MetaBug Bounty + AI Red Teamfacebook.com/whitehatLlama model vulnerabilities, platform AI features.
MicrosoftMSRC + AI Red Teammsrc.microsoft.comCopilot, Azure AI, Bing AI vulnerabilities.
Hugging FaceSecurity reportinghuggingface.co/securityModel hub vulnerabilities, malicious models, infrastructure issues.

What's Typically In Scope

CategoryUsually In ScopeUsually Out of Scope
Infrastructure vulnsYes — SSRF, auth bypass, data exposure
Training data extractionYes — PII or sensitive data recoveredGeneral memorization without sensitive content
Cross-user data leakageYes — accessing another user's data
System prompt extractionVaries — some treat as informationalOften out of scope for bounty
JailbreaksUsually out of scope for bountyCan be reported for safety team review
Model output qualityNoHallucinations, factual errors
BiasNo (for bug bounty)Report through responsible AI channels

How to Report

Step 1: Classify the Finding

ClassificationDescriptionUrgency
Security vulnerabilityInfrastructure exploit, data exposure, auth bypassReport immediately via security channel
Safety bypass with impactJailbreak that enables harmful actions (tool abuse, data exfil)Report within 24-48 hours
Safety bypass without impactJailbreak that produces restricted text onlyReport at your convenience
Prompt injection (indirect)Third-party content can hijack model behaviorReport within 48 hours — higher impact
Model behavior issueBias, hallucination, quality degradationReport through product feedback channels

Step 2: Document the Finding

Include in your report:

## Summary
[One sentence: what the vulnerability is and why it matters]

## Affected System
[Model name, version if known, API or web interface, specific feature]

## Reproduction Steps
1. [Exact steps to reproduce]
2. [Include exact prompts — copy-paste ready]
3. [Note any required preconditions]

## Observed Behavior
[What the model did — include exact output if possible]

## Expected Behavior
[What the model should have done]

## Reproduction Rate
[Approximate percentage: "works ~70% of the time across 20 attempts"]

## Impact Assessment
[What an attacker could achieve with this vulnerability]
[Data at risk, unauthorized actions possible, affected users]

## Suggested Mitigation
[If you have ideas for how to fix it — optional but appreciated]

## Environment
[Date/time of testing, browser/API client used, account type]

Step 3: Submit Through the Right Channel

  • Security vulnerabilities: Use the vendor's security reporting page, not public forums
  • Safety issues: Use the dedicated safety reporting mechanism if available
  • No response in 5 business days: Send a follow-up. If no response in 15 business days, consider escalating through CERT/CC or the AI Incident Database

Step 4: Coordinate Disclosure

  • Follow the vendor's stated disclosure timeline (typically 90 days)
  • For AI vulns, consider longer timelines — fixes may require retraining
  • Don't publish working jailbreak prompts before the vendor has had time to respond
  • If publishing research, consider redacting the specific bypass technique while describing the vulnerability class

Disclosure Dos and Don'ts

Do:

  • Report through official channels first
  • Provide clear reproduction steps
  • Assess and communicate real-world impact
  • Give the vendor reasonable time to respond
  • Document everything for your records

Don't:

  • Test on production systems beyond what's needed to confirm the issue
  • Access, store, or exfiltrate other users' data during testing
  • Publish working exploits before coordinated disclosure
  • Overstate severity — "I jailbroke ChatGPT" is different from "I extracted user data"
  • Threaten the vendor or demand payment outside of formal bug bounty programs

For Organizations: Building Your Own AI Disclosure Program

If you deploy AI-powered products, you need a process for receiving AI vulnerability reports:

Minimum Requirements

  1. Dedicated intake channel — separate from traditional security bugs. AI reports need reviewers who understand prompt injection, not just web app vulns.
  2. Defined scope — clearly state what's in scope (infrastructure, data leakage, injection) and what's not (jailbreaks that only produce text, hallucinations).
  3. Response SLA — acknowledge receipt within 48 hours, triage within 5 business days.
  4. AI-specific severity framework — traditional CVSS doesn't capture AI risks well. Define your own:
SeverityCriteria
CriticalData exfiltration, unauthorized actions, cross-user impact
HighReliable system prompt extraction with credentials, persistent injection
MediumSystem prompt extraction (no creds), inconsistent jailbreak with tool abuse
LowJailbreak producing restricted text, information disclosure without sensitive data
InformationalTheoretical risk, defense recommendations
  1. Remediation process — define who triages AI reports, how fixes are tested, and what "fixed" means (is a filter patch sufficient, or does this need retraining?).

Industry Resources

  • AI Incident Database (AIID): Tracks real-world AI failures and incidents — useful for understanding impact patterns
  • AVID (AI Vulnerability Database): Community effort to catalog AI vulnerabilities with structured reports
  • MITRE ATLAS: Use ATLAS technique IDs in your reports for standardized classification
  • OWASP LLM Top 10: Reference for categorizing findings