Terminology Glossary

Quick reference for AI/ML terms used throughout this book.

TermDefinition
Activation FunctionNon-linear function applied to neuron output (ReLU, GELU, sigmoid)
Adversarial ExampleInput crafted to cause misclassification while appearing normal to humans
AlignmentTraining a model to behave according to human values and intentions
AttentionMechanism allowing each token to weigh the relevance of every other token
AutoregressiveGenerating output one token at a time, each conditioned on prior tokens
BackpropagationAlgorithm for computing gradients through a neural network
BLEU/ROUGEMetrics for evaluating generated text quality
Chain-of-Thought (CoT)Prompting technique that elicits step-by-step reasoning
Context WindowMaximum number of tokens the model can process at once
DPODirect Preference Optimization — alternative to RLHF for alignment
EmbeddingDense vector representation of a token capturing semantic meaning
EpochOne full pass through the training dataset
Few-ShotProviding examples in the prompt to guide the model
Fine-TuningAdditional training on a specific dataset after pre-training
FGSMFast Gradient Sign Method — efficient adversarial attack
GradientDirection and magnitude of steepest ascent in the loss landscape
Gradient DescentOptimization algorithm that follows negative gradients to minimize loss
HallucinationModel generating confident but factually incorrect output
HyperparameterTraining setting not learned from data (learning rate, batch size)
InferenceUsing a trained model to make predictions
In-Context LearningModel learning from examples provided in the prompt
JailbreakTechnique to bypass model safety training
LoRALow-Rank Adaptation — efficient fine-tuning method
Loss FunctionMeasures how wrong the model's prediction is
LLMLarge Language Model
LogitsRaw model output before softmax normalization
Membership InferenceDetermining if a specific sample was in the training data
MLP / FFNMulti-layer perceptron / Feed-forward network within transformer layers
Next-Token PredictionThe training objective: predict the next token given prior context
OverfittingModel memorizes training data, fails to generalize
ParameterA learned weight in the model
PerplexityMetric for how well a model predicts a text sample (lower = better)
Positional EncodingVector added to embeddings to encode token position in sequence
Prompt InjectionEmbedding adversarial instructions in model input
QLoRAQuantized LoRA — even more memory-efficient fine-tuning
QuantizationReducing model precision (float32 → int8) to reduce size/speed
RAGRetrieval-Augmented Generation — model retrieves external docs before responding
Reinforcement LearningLearning by trial and reward signal
RLHFReinforcement Learning from Human Feedback
Self-AttentionAttention mechanism where query, key, value all come from the same sequence
SoftmaxFunction that converts logits to probability distribution summing to 1
System PromptHidden instructions from the developer that set model behavior
TemperatureControls randomness in sampling (0 = deterministic, higher = more random)
TokenSub-word unit that the model processes (not exactly a word or character)
TokenizerConverts text to token IDs and back
Top-k / Top-pSampling strategies to control output diversity
Transfer AttackAdversarial example crafted on one model that works on another
TransformerArchitecture using self-attention, basis of all modern LLMs
Vector DatabaseDatabase storing embeddings for similarity search (used in RAG)
WeightLearnable parameter in a neural network
Zero-ShotModel performing a task with no examples, just instructions