Automated Vulnerability Research
Current Capabilities
LLMs can assist with (but not fully automate) vulnerability research:
| Task | AI Effectiveness | Notes |
|---|---|---|
| Code review for known patterns | High | SQLi, XSS, buffer overflows — well-represented in training |
| Fuzzing harness generation | Medium-High | Can generate seed inputs and harnesses |
| Binary decompilation analysis | Medium | Understands pseudocode, can identify patterns |
| Exploit development | Low-Medium | Can assist with proof-of-concept but struggles with novel techniques |
| Novel vulnerability classes | Low | Still requires human creativity and intuition |
Practical Applications
LLM-Assisted Code Review
Feed source code to a model and ask it to identify security issues:
Review this code for security vulnerabilities. Focus on:
- Input validation
- Authentication/authorization flaws
- Injection vulnerabilities
- Cryptographic weaknesses
- Race conditions
Effective for OWASP Top 10 patterns. Less effective for logic bugs or novel attack chains.
AI-Generated Fuzzing
Use LLMs to generate intelligent seed inputs for fuzzing:
- Feed the model the target's API documentation or interface
- Ask it to generate edge cases, boundary values, and malformed inputs
- Use these as seeds for a traditional fuzzer (AFL++, LibFuzzer)
- Let the fuzzer mutate from the AI-generated seeds
Binary Analysis Assistance
Feed decompiled pseudocode to a model for analysis:
- Rename variables and functions based on inferred purpose
- Identify known vulnerability patterns in decompiled code
- Generate hypothesis about function behavior
- Suggest areas of the binary worth deeper manual analysis
Limitations
- Models can't execute or debug code (without tool use)
- False positive rate is high for code review
- Novel vulnerability classes require human insight
- Models hallucinate vulnerabilities that don't exist
- Context window limits how much code can be analyzed at once