Terminology Glossary
Quick reference for AI/ML terms used throughout this book.
| Term | Definition |
|---|---|
| Activation Function | Non-linear function applied to neuron output (ReLU, GELU, sigmoid) |
| Adversarial Example | Input crafted to cause misclassification while appearing normal to humans |
| Alignment | Training a model to behave according to human values and intentions |
| Attention | Mechanism allowing each token to weigh the relevance of every other token |
| Autoregressive | Generating output one token at a time, each conditioned on prior tokens |
| Backpropagation | Algorithm for computing gradients through a neural network |
| BLEU/ROUGE | Metrics for evaluating generated text quality |
| Chain-of-Thought (CoT) | Prompting technique that elicits step-by-step reasoning |
| Context Window | Maximum number of tokens the model can process at once |
| DPO | Direct Preference Optimization — alternative to RLHF for alignment |
| Embedding | Dense vector representation of a token capturing semantic meaning |
| Epoch | One full pass through the training dataset |
| Few-Shot | Providing examples in the prompt to guide the model |
| Fine-Tuning | Additional training on a specific dataset after pre-training |
| FGSM | Fast Gradient Sign Method — efficient adversarial attack |
| Gradient | Direction and magnitude of steepest ascent in the loss landscape |
| Gradient Descent | Optimization algorithm that follows negative gradients to minimize loss |
| Hallucination | Model generating confident but factually incorrect output |
| Hyperparameter | Training setting not learned from data (learning rate, batch size) |
| Inference | Using a trained model to make predictions |
| In-Context Learning | Model learning from examples provided in the prompt |
| Jailbreak | Technique to bypass model safety training |
| LoRA | Low-Rank Adaptation — efficient fine-tuning method |
| Loss Function | Measures how wrong the model's prediction is |
| LLM | Large Language Model |
| Logits | Raw model output before softmax normalization |
| Membership Inference | Determining if a specific sample was in the training data |
| MLP / FFN | Multi-layer perceptron / Feed-forward network within transformer layers |
| Next-Token Prediction | The training objective: predict the next token given prior context |
| Overfitting | Model memorizes training data, fails to generalize |
| Parameter | A learned weight in the model |
| Perplexity | Metric for how well a model predicts a text sample (lower = better) |
| Positional Encoding | Vector added to embeddings to encode token position in sequence |
| Prompt Injection | Embedding adversarial instructions in model input |
| QLoRA | Quantized LoRA — even more memory-efficient fine-tuning |
| Quantization | Reducing model precision (float32 → int8) to reduce size/speed |
| RAG | Retrieval-Augmented Generation — model retrieves external docs before responding |
| Reinforcement Learning | Learning by trial and reward signal |
| RLHF | Reinforcement Learning from Human Feedback |
| Self-Attention | Attention mechanism where query, key, value all come from the same sequence |
| Softmax | Function that converts logits to probability distribution summing to 1 |
| System Prompt | Hidden instructions from the developer that set model behavior |
| Temperature | Controls randomness in sampling (0 = deterministic, higher = more random) |
| Token | Sub-word unit that the model processes (not exactly a word or character) |
| Tokenizer | Converts text to token IDs and back |
| Top-k / Top-p | Sampling strategies to control output diversity |
| Transfer Attack | Adversarial example crafted on one model that works on another |
| Transformer | Architecture using self-attention, basis of all modern LLMs |
| Vector Database | Database storing embeddings for similarity search (used in RAG) |
| Weight | Learnable parameter in a neural network |
| Zero-Shot | Model performing a task with no examples, just instructions |