| Intriguing Properties of Neural Networks | Szegedy et al. | 2013 | Adversarial examples discovery |
| Explaining and Harnessing Adversarial Examples | Goodfellow et al. | 2014 | FGSM attack |
| Towards Evaluating the Robustness of Neural Networks | Carlini & Wagner | 2017 | C&W attack — broke all defenses |
| Attention Is All You Need | Vaswani et al. | 2017 | Transformer architecture |
| Not What You've Signed Up For | Greshake et al. | 2023 | Indirect prompt injection |
| Universal and Transferable Adversarial Attacks on Aligned LMs | Zou et al. | 2023 | GCG jailbreak attack |
| Ignore This Title and HackAPrompt | Schulhoff et al. | 2023 | Prompt injection taxonomy |
| Poisoning Web-Scale Training Datasets is Practical | Carlini et al. | 2023 | Web-scale data poisoning |
| Extracting Training Data from Large Language Models | Carlini et al. | 2021 | Training data memorization |
| Stealing Machine Learning Models via Prediction APIs | Tramer et al. | 2016 | Model extraction |
| BadNets: Identifying Vulnerabilities in the ML Supply Chain | Gu et al. | 2017 | Neural network backdoors |