Confidentiality — Data Leakage & Privacy

AI-Specific Confidentiality Threats

Training Data Leakage

Models memorize and can reproduce training data. This includes PII (names, emails, phone numbers, addresses), credentials (API keys, passwords in code), proprietary content (internal documents, trade secrets), and copyrighted material.

Risk level: High for any model trained on internal data or fine-tuned on proprietary datasets.

System Prompt Exposure

System prompts often contain business logic, API keys, internal URLs, persona instructions, and security rules. Extraction gives attackers a blueprint of the application.

Conversation Data Exposure

Multi-tenant AI systems — where multiple users share the same model deployment — may leak data between users through shared context, caching, or logging failures.

Shadow AI Data Leakage

Employees paste sensitive data into unauthorized AI tools. This is the most common AI confidentiality risk in enterprises today.

Data Type	Risk Example
Source code	Developer pastes proprietary code into ChatGPT for debugging
Customer data	Support rep pastes customer PII into AI for email drafting
Financial data	Analyst uploads earnings data to AI for summarization
Legal documents	Attorney pastes contracts into AI for review
HR records	HR uploads employee reviews for AI-assisted feedback

Embedding Inversion

RAG systems store document embeddings in vector databases. Research has shown embeddings can be inverted to approximately reconstruct the original text — meaning the vector database itself is a data leakage risk.

Controls

Control	Implementation	Effectiveness
Output DLP	Scan model outputs for PII patterns (SSN, CC, email) before returning to user	Medium — catches known patterns, misses novel ones
Input DLP	Scan user inputs and block sensitive data from reaching the model	Medium-High — prevents data exposure to third-party models
AI acceptable use policy	Define what data can and cannot be shared with AI tools	Foundational — requires training and enforcement
CASB integration	Monitor and control employee access to cloud AI services	High — provides visibility into shadow AI
Data classification gates	Only allow models to access data at or below their classification level	High — prevents classification boundary violations
Differential privacy	Add mathematical noise during training to prevent memorization	High effectiveness but degrades model quality
Endpoint controls	Block or monitor clipboard copy to AI web applications	Medium — can be circumvented
Audit logging	Log all interactions with AI systems for forensic review	Detective only — doesn't prevent but enables response
Token-level filtering	Strip or mask PII from model context before processing	Medium-High — requires robust PII detection

Metrics

Number of shadow AI tools detected per month
PII detection rate in model outputs
Percentage of AI interactions covered by DLP
Mean time to detect data leakage incidents
Employee completion rate for AI acceptable use training

AI Security Book