Building a Local Lab

Hardware Requirements

Use Case	GPU	VRAM	Cost (approx.)
7-8B models (Llama 3 8B, Mistral 7B)	RTX 4070 Ti	12GB	$600-800
13B models (quantized 70B)	RTX 4090	24GB	$1,500-2,000
70B models (full precision)	2x A100 80GB	160GB	Cloud rental
Fine-tuning (LoRA)	RTX 4090 or A100	24-80GB	$1,500+ or cloud

For getting started, a single RTX 4090 handles most red team use cases.

Software Stack

Inference (Running Models)

# Ollama — simplest option
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3
ollama pull mistral

# vLLM — production API server
pip install vllm
python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-8B

# llama.cpp — CPU/GPU inference, GGUF format
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make
./main -m models/llama-3-8b.Q4_K_M.gguf -p "Hello"

Fine-Tuning

# Axolotl — easiest fine-tuning framework
pip install axolotl
# Configure a LoRA fine-tune in YAML and run

# Hugging Face Transformers + PEFT
pip install transformers peft trl datasets

Models to Download

Model	Why	Size
Llama 3 8B	Fast, capable, good baseline	~4.5GB (Q4)
Mistral 7B	Strong reasoning, efficient	~4GB (Q4)
Llama 3 70B	Closest to frontier model behavior	~40GB (Q4)
Mixtral 8x7B	MoE architecture, good balance	~26GB (Q4)

Lab Setup Checklist

□ GPU with 24GB+ VRAM installed and drivers updated
□ CUDA toolkit installed
□ Ollama installed with Llama 3 and Mistral pulled
□ Python environment with transformers, torch, vllm
□ Garak installed for scanning
□ PyRIT installed for orchestration
□ Test target deployed (local chatbot with system prompt)
□ Logging infrastructure (save all inputs and outputs)

AI Security Book

Building a Local Lab

Hardware Requirements

Software Stack

Inference (Running Models)

Fine-Tuning

Models to Download

Lab Setup Checklist