Automated Vulnerability Research

Current Capabilities

LLMs can assist with (but not fully automate) vulnerability research:

TaskAI EffectivenessNotes
Code review for known patternsHighSQLi, XSS, buffer overflows — well-represented in training
Fuzzing harness generationMedium-HighCan generate seed inputs and harnesses
Binary decompilation analysisMediumUnderstands pseudocode, can identify patterns
Exploit developmentLow-MediumCan assist with proof-of-concept but struggles with novel techniques
Novel vulnerability classesLowStill requires human creativity and intuition

Practical Applications

LLM-Assisted Code Review

Feed source code to a model and ask it to identify security issues:

Review this code for security vulnerabilities. Focus on:
- Input validation
- Authentication/authorization flaws
- Injection vulnerabilities
- Cryptographic weaknesses
- Race conditions

Effective for OWASP Top 10 patterns. Less effective for logic bugs or novel attack chains.

AI-Generated Fuzzing

Use LLMs to generate intelligent seed inputs for fuzzing:

  1. Feed the model the target's API documentation or interface
  2. Ask it to generate edge cases, boundary values, and malformed inputs
  3. Use these as seeds for a traditional fuzzer (AFL++, LibFuzzer)
  4. Let the fuzzer mutate from the AI-generated seeds

Binary Analysis Assistance

Feed decompiled pseudocode to a model for analysis:

  • Rename variables and functions based on inferred purpose
  • Identify known vulnerability patterns in decompiled code
  • Generate hypothesis about function behavior
  • Suggest areas of the binary worth deeper manual analysis

Limitations

  • Models can't execute or debug code (without tool use)
  • False positive rate is high for code review
  • Novel vulnerability classes require human insight
  • Models hallucinate vulnerabilities that don't exist
  • Context window limits how much code can be analyzed at once