All terms
Inference & reasoning
Chain of thought (CoT)
Also known as: CoT, step-by-step reasoning, chain-of-thought prompting
A prompting technique that asks the model to reason step-by-step before giving a final answer, dramatically improving accuracy on math, logic, and multi-step problems.
What it means
Chain of thought is the simple, almost dumb-sounding observation that LLMs do better on hard problems when you make them show their work. Instead of asking "What is 17 × 24?" and hoping the model nails it, you ask "What is 17 × 24? Think step by step." The model then writes out the intermediate steps — 17 × 20 = 340, 17 × 4 = 68, 340 + 68 = 408 — and lands on the right answer far more often. The original 2022 paper found that adding "let's think step by step" to the prompt was sometimes enough to double accuracy on reasoning benchmarks.
Why it works isn't fully understood, but the working theory is that LLMs can only do so much computation per token they emit. Forcing the model to write out steps gives it more "thinking tokens" to use — each intermediate token is another forward pass through the network. Skipping straight to the answer is like asking a human to do long division in their head; writing it out is using scratch paper.
CoT started as a prompt trick you typed manually. By 2024-2025 it was baked into the models themselves. Reasoning models like OpenAI's o1, GPT-5 thinking, Claude Opus thinking, and DeepSeek-R1 do CoT internally — they generate long hidden reasoning chains before producing the visible answer, and they're trained specifically to make those chains useful. You don't need to ask anymore; the model thinks first by default.
Example
Prompt: "A juggler has 16 balls. Half are golf balls, and half of those are blue. How many blue golf balls? Think step by step." The model writes: "16 balls total. Half are golf balls = 8. Half of those are blue = 4. Answer: 4." Without the CoT instruction, smaller models often guess wrong.
Why it matters
CoT is the single most important inference-time technique of the LLM era. It's the bridge between models that confidently produce wrong answers and models that actually reason. Even with reasoning models doing it automatically, understanding CoT helps you debug why a model failed and write better prompts for non-reasoning models (which are still cheaper and faster).