Research Papers & Reading List

Essential Papers (Read First)

PaperAuthorsYearTopic
Intriguing Properties of Neural NetworksSzegedy et al.2013Adversarial examples discovery
Explaining and Harnessing Adversarial ExamplesGoodfellow et al.2014FGSM attack
Towards Evaluating the Robustness of Neural NetworksCarlini & Wagner2017C&W attack — broke all defenses
Attention Is All You NeedVaswani et al.2017Transformer architecture
Not What You've Signed Up ForGreshake et al.2023Indirect prompt injection
Universal and Transferable Adversarial Attacks on Aligned LMsZou et al.2023GCG jailbreak attack
Ignore This Title and HackAPromptSchulhoff et al.2023Prompt injection taxonomy
Poisoning Web-Scale Training Datasets is PracticalCarlini et al.2023Web-scale data poisoning
Extracting Training Data from Large Language ModelsCarlini et al.2021Training data memorization
Stealing Machine Learning Models via Prediction APIsTramer et al.2016Model extraction
BadNets: Identifying Vulnerabilities in the ML Supply ChainGu et al.2017Neural network backdoors

Researchers to Follow

  • Nicholas Carlini (Google DeepMind) — adversarial ML, extraction, poisoning
  • Florian Tramer (ETH Zurich) — model stealing, privacy attacks
  • Battista Biggio (U. Cagliari) — founded adversarial ML as a field
  • Kai Greshake — indirect prompt injection
  • Andy Zou — GCG attack, alignment robustness
  • Zico Kolter (CMU) — certified robustness, adversarial training
  • Dawn Song (UC Berkeley) — AI security across the stack

Frameworks & Standards

Threat Intelligence

  • Microsoft Threat Intelligence AI reports
  • Google Threat Analysis Group AI updates
  • Mandiant / CrowdStrike AI threat reports
  • Anthropic safety research publications
  • OpenAI safety research publications