All terms
Foundations
Context window
Also known as: context length, token window
The maximum amount of text (measured in tokens) a model can consider at one time — the prompt plus its response.
What it means
A context window is the budget of tokens an LLM can hold in working memory for a single conversation or request. Everything counts: your prompt, any documents you paste in, the model's reply, and (in chat) prior turns the model still remembers.
Larger windows let you paste in long documents, big code repositories, or extended chat history. But context isn't free — bigger windows mean higher latency, higher cost per request, and (for many models) degraded recall in the middle of the window. "Lost in the middle" is a real failure mode where models pay less attention to information buried in the center of a long input.
Context window sizes vary by model and tier. As of 2026, ChatGPT Plus is around 32k tokens, Claude defaults to 200k (1M on enterprise), and Gemini exposes 1M+ tokens. A token is roughly 0.75 words for English, so 200k tokens ≈ 150,000 words ≈ a 600-page novel.
Example
You paste a 50-page PDF (~25,000 tokens) into Claude and ask questions about it. The PDF, your question, and the response all share the 200k context window.
Why it matters
Context window decides what kinds of work a model can do without chunking. If you analyze long documents, big codebases, or maintain long conversations, the window size is one of the most important model differences — sometimes more important than raw model quality.