AI terms in plain English
Tokens, context windows, RAG, MCP, fine-tuning — every AI term you keep meeting, defined honestly without the jargon. Searchable, opinionated, free.
How to use this glossary
Search for a specific term, or click a category pill to focus on one area. Click any term to read the full definition with examples and links to related compares and skills.
We define terms the way someone shipping AI work actually uses them — not the textbook version. If a term is contested or vague, we say so.
Foundations
13 termsCore LLM concepts: tokens, context windows, prompts, hallucinations, frontier models.
Foundations
13 termsCore LLM concepts: tokens, context windows, prompts, hallucinations, frontier models.
AGI / ASI
AGI (Artificial General Intelligence) is AI matching humans across most cognitive tasks. ASI (Artificial Super Intelligence) exceeds humans across all of them. Both terms are contested and there's no agreed definition of when either has arrived.
Context window
The maximum amount of text (measured in tokens) a model can consider at one time — the prompt plus its response.
Frontier model
The leading-edge models at any given moment — defined by capability, not parameter count. In April 2026 that's GPT-5, Claude Opus 4.7, Gemini 3 Pro, and a handful of others.
Hallucination
When an LLM produces fluent, confident-sounding output that is factually wrong or fabricated — fake citations, invented APIs, made-up case law, plausible-but-false biographies.
LLM (Large Language Model)
A neural network trained on huge amounts of text to predict the next token. The engine behind ChatGPT, Claude, Gemini, and most modern AI tools.
Open source AI
Per OSI's 2024 Open Source AI Definition: weights, training code, and enough information about training data for someone to reproduce the model. Most 'open' models in 2026 don't actually qualify.
Open weights
A model whose trained weight files are publicly downloadable, so anyone can run it locally or fine-tune it. Doesn't mean the training data or training code is public.
Prompt
The input you send to an LLM. In a chat interface, it's your message; in an API call, it's the full text the model sees, including any system instructions and conversation history.
System prompt
A persistent instruction that defines the model's persona, rules, and behavior for an entire conversation — separate from the user's individual messages.
Temperature
A sampling parameter controlling randomness in an LLM's output. 0 is deterministic and conservative; ~1 is creative and varied; above 1 starts to incoherent.
Token
The atomic unit an LLM reads and writes — usually a sub-word fragment. "The" is one token; "tokenization" is three.
Tokenizer
The component that splits raw text into tokens before the model sees it. Different models use different tokenizers, so the same word can cost more on one model than another.
Top-p (nucleus sampling)
An alternative or complement to temperature: sample only from the smallest set of tokens whose cumulative probability exceeds p (e.g. 0.9). Cuts off the long tail of unlikely tokens dynamically.
Architecture
7 termsHow models are built: Transformers, attention, MoE, diffusion, multimodal.
Architecture
7 termsHow models are built: Transformers, attention, MoE, diffusion, multimodal.
Attention
The mechanism that lets a model decide which parts of its input matter most for each token it produces. Every Transformer is built around it.
Decoder-only model
A Transformer that just predicts the next token, autoregressively. The architecture used by GPT, Claude, Llama, Gemini, and basically every modern chat model.
Diffusion model
A generative model that learns to reverse a noising process. Start with pure noise, denoise it step by step, end up with an image. The mechanism behind Stable Diffusion, Flux, Midjourney, and most modern image generators.
Encoder-decoder
The original Transformer architecture: an encoder that reads the input and a decoder that generates the output, connected by cross-attention. Still used for translation and some summarization, mostly extinct for chat.
Mixture of Experts (MoE)
A sparse Transformer variant that routes each token to only a few "expert" sub-networks instead of running the whole model. Lets you scale total parameters way up without scaling cost proportionally.
Transformer
The neural network architecture behind every modern LLM. Introduced in 2017, it processes sequences using self-attention instead of recurrence.
Vision-Language Model (VLM)
A model that processes both images and text in a unified way. Show it a screenshot, a chart, or a photo, and it can describe, analyze, or answer questions about what it sees.
Training & adaptation
11 termsPre-training, fine-tuning, RLHF, DPO, LoRA, quantization, distillation.
Training & adaptation
11 termsPre-training, fine-tuning, RLHF, DPO, LoRA, quantization, distillation.
Constitutional AI
Anthropic's alignment technique where the model critiques and revises its own outputs against a written set of principles, reducing the need for human preference labels.
Distillation
Training a smaller 'student' model to mimic a larger 'teacher' model, getting most of the teacher's quality at a fraction of the size and cost.
DPO (Direct Preference Optimization)
A simpler alternative to RLHF that trains directly on preference pairs without needing a separate reward model or reinforcement learning loop.
Fine-tuning
Continuing training on a pre-trained model with a smaller, curated dataset to adapt it to a specific task, domain, or style.
LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning method that trains tiny adapter layers and freezes the base model, cutting fine-tuning cost by 100x or more.
Post-training
The umbrella term for everything done to a model after pre-training — SFT, RLHF, DPO, safety tuning, persona work — where most modern model differentiation actually happens.
Pre-training
The first and most expensive training phase, where a model learns general language and world knowledge from trillions of tokens of text.
Quantization
Reducing the numerical precision of a model's weights — for example FP16 down to INT8 or INT4 — so it runs on smaller, cheaper hardware.
RLAIF (RL from AI Feedback)
RLHF where the human ranker is replaced (or augmented) by an AI judge, making preference training cheap enough to run at massive scale.
RLHF (Reinforcement Learning from Human Feedback)
A post-training method where humans rank model outputs and the model learns a reward signal from those rankings, then optimizes against it.
Synthetic data
Training data generated by AI models rather than collected from humans — increasingly the dominant data source in modern post-training.
Inference & reasoning
8 termsChain of thought, reasoning models, test-time compute, ReAct, in-context learning.
Inference & reasoning
8 termsChain of thought, reasoning models, test-time compute, ReAct, in-context learning.
Chain of thought (CoT)
A prompting technique that asks the model to reason step-by-step before giving a final answer, dramatically improving accuracy on math, logic, and multi-step problems.
Few-shot / Zero-shot / N-shot
Prompting styles defined by how many examples you include before the actual task: zero examples (zero-shot), a few (few-shot), or N specific examples (N-shot).
In-context learning
The phenomenon where an LLM "learns" a new task from examples shown in the prompt, without any gradient updates or training — the model adapts purely from context.
ReAct (Reasoning + Acting)
An agent pattern where the model interleaves thinking and tool use in a loop: reason about what to do, take an action, observe the result, reason again.
Reasoning model
A class of LLM trained to think for an extended period — sometimes minutes — before producing a final answer, trading latency for much higher accuracy on hard problems.
Self-consistency
Running the same prompt multiple times with sampling and taking the majority answer — a simple way to boost reasoning accuracy by trading more compute for better results.
Test-time compute
The principle that you can make a model smarter by spending more compute at inference time — not just at training time — by having it think longer, sample more, or search more.
Tree of thought (ToT)
A reasoning technique where the model branches into multiple candidate thoughts at each step, evaluates them, and prunes — exploring a tree of reasoning paths instead of one chain.
RAG & retrieval
8 termsVector databases, semantic search, chunking, reranking, hybrid search.
RAG & retrieval
8 termsVector databases, semantic search, chunking, reranking, hybrid search.
Chunking
Splitting source documents into smaller pieces before embedding so each chunk is small enough to retrieve precisely and fit in the prompt.
Embedding model
A model that turns a piece of text (or image, or audio) into a fixed-length vector, where similar inputs produce similar vectors. Different from the LLM that generates answers.
Hybrid search
Combining semantic (embedding) search with keyword search (BM25) and fusing the results — the production-default retrieval strategy in 2026.
Knowledge cutoff
The date after which a model has no training data. Anything that happened after the cutoff, the model either doesn't know or hallucinates with confidence.
RAG (Retrieval-Augmented Generation)
A pattern that grounds an LLM in your data: retrieve the most relevant chunks of text at query time, paste them into the prompt, then let the model answer.
Reranking
A second-pass model that takes the top-K results from initial retrieval and reorders them by true relevance to the query.
Semantic search
Search by meaning instead of keyword overlap — uses embedding models to find documents that are conceptually similar to the query, even with no shared words.
Vector database
A specialized database for storing embeddings (high-dimensional vectors) and finding the nearest neighbors to a query vector — fast.
Agents & tools
16 termsAI agents, tool use, MCP, computer use, Skills, custom GPTs, memory, multi-agent.
Agents & tools
16 termsAI agents, tool use, MCP, computer use, Skills, custom GPTs, memory, multi-agent.
.cursorrules / repo instruction files
Plain-text files (.cursorrules, AGENTS.md, CLAUDE.md) at the repo root that tell coding agents how your codebase works — conventions, dependencies, gotchas.
AI agent
A system where an LLM autonomously chains reasoning and tool calls in a loop to accomplish a goal — instead of just answering one question.
AI memory
Persistent context an AI assistant carries across separate conversations — your preferences, past projects, recurring facts — so you don't have to re-explain yourself every time.
AI team topology
The pattern of using multiple AI models for different roles in a single workflow — typically architect, builder, reviewer — instead of one model doing everything.
Artifact
Claude's term for generated content rendered as a separate, persistent object — code, a document, a chart, a webpage — instead of just inline chat text.
Build-more, verify-more
A practical principle for AI-assisted work: as builds get faster with AI, verification has to scale with them — otherwise quality degrades silently.
Computer use
Agents that control a real desktop — moving the mouse, typing, reading the screen — instead of calling structured APIs.
Cross-model handoff
Passing context, decisions, and intermediate work from one AI model to another — the connective tissue in a multi-model workflow.
Custom GPT
OpenAI's user-built assistants — a base GPT model plus custom instructions, knowledge files, and optional API actions, packaged as a shareable bot.
MCP (Model Context Protocol)
An open protocol — pioneered by Anthropic in late 2024 — for connecting AI models to external tools, data sources, and services through a standard interface.
Model verification
Using a second AI model to check the first model's output — a structural defense against hallucinations and confident-wrong answers.
Multi-agent system
A setup where multiple specialized LLM agents collaborate — one plans, another codes, another reviews — instead of one model trying to do everything.
Orchestration
Coordinating multiple LLM calls, tools, retrievals, and conditional logic into a working AI pipeline — the plumbing layer between "raw model" and "shipped product."
Skill (Claude Skills)
Installable markdown files that give Claude reusable, named capabilities — like "write a PR description" or "format a sales email" — without retraining or fine-tuning.
Tool use / Function calling
Letting an LLM call external functions — search the web, run code, query a database — by emitting a structured function call the runtime executes for it.
Workflow (in AI)
A pre-defined sequence of LLM calls and tool steps with fixed control flow — the developer decides the order, not the model. Distinct from an agent, which lets the model wander.
Modalities
6 termsText-to-image, text-to-video, ControlNet, image LoRA, latent space, OCR.
Modalities
6 termsText-to-image, text-to-video, ControlNet, image LoRA, latent space, OCR.
ControlNet
A Stable Diffusion technique that lets you condition image generation on a structural input — a depth map, pose skeleton, edge map, or scribble — alongside the text prompt.
Image LoRA
A small fine-tuning file (typically 50-300MB) that bends a base diffusion model toward a specific style, character, or subject. Stack them at inference to combine effects.
Latent space
The compressed numerical representation that diffusion models actually work in — not pixels, but a smaller learned embedding. The 'Latent' in Stable Diffusion's name.
OCR (Optical Character Recognition)
Extracting machine-readable text from images of text — scanned documents, photos of receipts, screenshots, PDFs. Increasingly handled by general vision-language models instead of dedicated OCR engines.
Text-to-image
Generative AI that turns a text prompt into an image. The category covers tools like Midjourney, DALL-E, Stable Diffusion, Flux, and Ideogram.
Text-to-video
Generative AI that turns a text prompt (or an image) into a short video clip. The leading systems in 2026 are Sora, Runway, Pika, Veo, and Kling.
Safety, eval & ops
14 termsBenchmarks, jailbreaks, prompt injection, alignment, red teaming, latency, pricing.
Safety, eval & ops
14 termsBenchmarks, jailbreaks, prompt injection, alignment, red teaming, latency, pricing.
Alignment
The technical problem of getting AI systems to actually do what humans want — follow intent, respect values, and avoid harmful behavior.
API (in AI context)
Programmatic access to a model — the HTTP endpoint you POST a prompt to and get a completion back. This is what you actually integrate.
Benchmark
A standardized test of model capability — same questions, same scoring, run across many models so you can rank them.
Eval / evaluation
The broader practice of measuring model output quality on tasks you actually care about — usually a custom test suite specific to your app.
Guardrails
Runtime safety filters wrapped around an LLM's input and output to catch bad prompts and bad responses before they escape.
Inference provider
A third party that hosts open-weight models behind an API so you don't have to run GPUs yourself. Often cheaper than self-hosting at small or medium scale.
Jailbreak
A prompt or technique that bypasses an LLM's safety training to get outputs the model would normally refuse.
Latency
Time from sending a request to receiving the response. For LLMs, split into time-to-first-token (TTFT) and total completion time.
MMLU (Massive Multitask Language Understanding)
A 57-subject multiple-choice benchmark covering everything from US history to abstract algebra. The most-cited general knowledge test of the LLM era.
Prompt injection
An attack where malicious instructions hidden in user input or external content hijack an LLM's behavior.
Rate limit
A provider-imposed cap on how many requests or tokens you can use per minute or day. The thing that breaks production at scale.
Red teaming
Proactive adversarial testing of an AI system before release — paid attackers try to break it so the lab can fix issues.
Self-hosting
Running open-weight models on your own infrastructure instead of calling a hosted API.
Token cost
How API providers price model usage — dollars per million tokens, with input tokens cheaper than output tokens.