Glossary & Concept Index

Agent

An AI system that can plan, call tools, observe results, and continue through a task instead of returning one immediate answer.

Chapter 6

Alignment

Training and product work that steers a model toward helpful, honest, safe, and controllable behavior.

Chapter 2

Attention

The Transformer mechanism that lets each token weigh which other tokens matter most for interpreting it.

Chapter 1

Context Window

The immediate working memory available to the model: the prompt, retrieved documents, tool outputs, and prior messages it can inspect now.

Chapter 3

Context Engineering

Designing what evidence, instructions, memory, examples, and tool results enter the model's working context so attention is spent on the right material.

Chapter 1

Diffusion

A generative media approach that starts from noise and repeatedly removes predicted noise until an image, video, or audio sample emerges.

Chapter 5

Embedding

A vector representation that places text, images, or other data into a mathematical space where similar meanings sit near each other.

Chapter 3

Eval

A repeatable test that checks whether an AI system behaves correctly on cases that matter for the product.

Chapter 8

Groundedness

The degree to which an answer is supported by the evidence the system retrieved or was given.

Chapter 8

Guardrail

A product-level check or constraint around the model, such as input filtering, output validation, citation checks, or tool permission rules.

Chapter 8

LLM-as-Judge

Using a model to grade another model's output with a rubric. It is useful for scale, but it needs calibration because judges can be biased.

Chapter 8

Mixture of Experts

A sparse architecture where a router activates only a few specialized expert networks for each token.

Chapter 4

Prompt Injection

An attack where untrusted text tries to override the system's instructions, often through retrieved documents, web pages, or user input.

Chapter 8

Query, Key, Value

The three vectors used by attention: what a token seeks, what other tokens offer, and the content that gets blended into the result.

Chapter 1

Retrieval-Augmented Generation

A pattern where the system retrieves relevant external evidence, places it in context, and asks the model to answer from that evidence.

Chapter 3

Reranking

A second retrieval step that reorders candidate chunks by relevance before they are placed into the prompt.

Chapter 3

Test-Time Compute

Extra inference-time work spent on planning, searching, checking, or tool use before the final answer is returned.

Chapter 7

Tool Call

A structured request from the model to the host application, such as searching, calculating, reading a file, sending a message, or updating a record.

Chapter 6