Supply Chain Security for Models
The AI Supply Chain
| Component | Source | Risk |
|---|---|---|
| Pre-trained model | Model hub (Hugging Face), vendor API | Backdoor, pickle exploit, license issues |
| Fine-tuning data | Internal data, public datasets, contractors | Poisoning, PII, quality issues |
| Model serving framework | PyTorch, vLLM, TGI, Ollama | Vulnerabilities in inference code |
| Plugins/tools | First-party, third-party, community | Malicious tool, data exfiltration |
| Vector database | Pinecone, Weaviate, ChromaDB, pgvector | Poisoned embeddings, unauthorized access |
| Python dependencies | PyPI packages | Dependency confusion, typosquatting |
Controls
Model Artifact Security
- Only download from verified sources
- Verify hash against published checksums
- Use safetensors format to prevent pickle execution
- Scan model files with model-specific security tools
- Document model provenance: source, version, modification history
Dependency Management
- Pin all dependency versions
- Use lockfiles (pip-compile, poetry.lock)
- Scan dependencies for known vulnerabilities (Snyk, pip-audit)
- Use private PyPI mirror for production dependencies
- Review new dependency additions before approval
Tool and Plugin Security
- Vet all third-party tools before enabling
- Sandbox tool execution environments
- Audit tool permissions (what data can the tool access?)
- Monitor tool call patterns for anomalies
- Maintain an approved tool registry
SBOM for AI
Create an AI-specific Software Bill of Materials that includes:
□ Base model name, version, source, hash
□ Fine-tuning dataset source and version
□ Model serving framework and version
□ All Python dependencies with versions
□ System prompt version and change history
□ Tool/plugin list with versions
□ RAG data sources and update schedule
□ Vector database engine and version