All terms
Inference & reasoning
Reasoning model
Also known as: thinking model, extended thinking, inference-time reasoning
A class of LLM trained to think for an extended period — sometimes minutes — before producing a final answer, trading latency for much higher accuracy on hard problems.
What it means
A reasoning model is an LLM that's been specifically trained to do extended thinking before answering. When you send it a prompt, it doesn't just generate a reply — it first generates a long hidden (or semi-hidden) chain of thought, often thousands of tokens, where it explores approaches, checks its work, backtracks, and revises. Only after this internal deliberation does it emit the visible answer. OpenAI's o1 and o3, GPT-5 thinking, Claude Opus and Sonnet thinking modes, DeepSeek-R1, and Gemini's thinking variants are all reasoning models.
The training trick is reinforcement learning on problems with verifiable answers — math, code, logic puzzles — where you can automatically check if the model got the right answer and reward longer, more careful reasoning. The model learns to think harder on harder problems. The result is a step-change in performance on competition math, coding, and scientific reasoning, while costing little extra on easy queries (the model learns to keep thinking short when it doesn't need it).
The trade-off is latency and cost. A reasoning model might take 30 seconds to several minutes to answer, and you pay for all the hidden thinking tokens. For "what's the capital of France" this is wasted compute; for "find the bug in this 500-line function" it's worth every token. By 2026, most frontier APIs let you toggle thinking on per-request, so you can pick the right tool for each task.
Example
Ask Claude Opus thinking to debug a tricky concurrency issue. It spends 90 seconds generating reasoning — considering race conditions, walking through execution orders, ruling out hypotheses — before pointing at the exact line and explaining why it deadlocks. A non-reasoning model would guess in 3 seconds and probably miss it.
Why it matters
Reasoning models are the biggest capability jump since GPT-4. They turn LLMs from pattern-matchers into systems that can actually solve novel problems they haven't seen before. If you're picking a model for hard tasks (engineering, research, complex analysis), reasoning vs non-reasoning matters more than which lab made it.