Deepfakes & Synthetic Media

Types of Synthetic Media

Type	Technology	Current Quality	Detection Difficulty
Voice cloning	Neural TTS, voice conversion	Very High	Hard
Face swap (video)	GAN-based, diffusion-based	High	Medium
Full synthetic video	Video diffusion models	Medium-High	Medium
Synthetic images	Stable Diffusion, DALL-E, Midjourney	Very High	Hard
Text generation	LLMs	Very High	Very Hard

Tool	Type	Sample Needed	Quality
ElevenLabs	Commercial API	30 seconds	Very High
Tortoise-TTS	Open source	5-30 seconds	High
VALL-E / VALL-E X	Research	3 seconds	Very High
RVC (Retrieval-Based Voice Conversion)	Open source	10+ minutes for training	High
So-VITS-SVC	Open source	30+ minutes for training	High

Approach	What It Does	Limitations
Audio watermarking	Embed imperceptible markers in legitimate audio	Only works for content you generate
Liveness detection	Check for signs of real-time human speech	Can be bypassed with high-quality clones
Provenance tracking	C2PA/Content Credentials standard	Adoption still early
Employee training	Teach verification procedures	Human factor — people still get fooled
Callback verification	Always call back on known numbers	Doesn't scale, not always followed