Research Papers & Reading List

Essential Papers (Read First)

Paper	Authors	Year	Topic
Intriguing Properties of Neural Networks	Szegedy et al.	2013	Adversarial examples discovery
Explaining and Harnessing Adversarial Examples	Goodfellow et al.	2014	FGSM attack
Towards Evaluating the Robustness of Neural Networks	Carlini & Wagner	2017	C&W attack — broke all defenses
Attention Is All You Need	Vaswani et al.	2017	Transformer architecture
Not What You've Signed Up For	Greshake et al.	2023	Indirect prompt injection
Universal and Transferable Adversarial Attacks on Aligned LMs	Zou et al.	2023	GCG jailbreak attack
Ignore This Title and HackAPrompt	Schulhoff et al.	2023	Prompt injection taxonomy
Poisoning Web-Scale Training Datasets is Practical	Carlini et al.	2023	Web-scale data poisoning
Extracting Training Data from Large Language Models	Carlini et al.	2021	Training data memorization
Stealing Machine Learning Models via Prediction APIs	Tramer et al.	2016	Model extraction
BadNets: Identifying Vulnerabilities in the ML Supply Chain	Gu et al.	2017	Neural network backdoors