Chain-of-thought prompting: when reasoning out loud changes the output

Why telling a model to "think step by step" works, and when it doesn't. Zero-shot CoT vs few-shot CoT, what tasks benefit most, and the cases where it actively slows you down.

7 min read·Updated Jun 14, 2026

You have a model that can answer questions. You also have a model that can show its work while answering questions. These are not the same model in practice — even though they're the same weights.

Chain-of-thought prompting is the technique that turns one into the other. Understanding when it helps (and when it doesn't) is one of the most useful things you can know about getting reliable output from any LLM.

What chain-of-thought actually is

A standard prompt asks a model to jump straight to an answer. A chain-of-thought prompt asks the model to reason through the problem before committing to one.

The canonical trigger is: "Think step by step."

That's it. Three words that, added to a prompt, can change a wrong answer into a correct one — specifically for problems that require multi-step reasoning.

Here's why it works: language models generate tokens one at a time, left to right. Each token is conditioned on what came before it. When you force a model to produce a reasoning chain before the final answer, that reasoning is now in the context window and can inform the answer. Without it, the model is jumping straight from question to conclusion, and anything that needed intermediate steps gets skipped or compressed into a guess.

Zero-shot CoT vs few-shot CoT

There are two ways to trigger chain-of-thought reasoning:

Zero-shot CoT — you instruct the model to reason out loud without giving it any examples:

Q: A train leaves at 9am traveling at 60 mph. Another leaves the same station at 11am
   traveling at 90 mph. At what time does the second train catch up to the first?

Think step by step before answering.

Few-shot CoT — you include one or more worked examples in your prompt, demonstrating what a reasoning chain should look like:

Q: If I have 3 apples and buy 4 more, then give 2 away, how many do I have?
A: Start with 3. Add 4 → 7. Give away 2 → 5. Answer: 5.

Q: A store sells items at cost plus 40% markup. If cost is $25, what's the sale price?
A: [model continues the pattern]

Zero-shot is easier to write. Few-shot is more reliable when output format matters — the model learns to structure its reasoning the way your examples demonstrate.

When it actually helps

Chain-of-thought makes a meaningful difference for tasks that require:

  • Multi-step arithmetic or algebra — any calculation where intermediate steps matter
  • Logical deduction — "given A and B, what follows?" chains
  • Planning problems — tasks where sequence matters and skipping steps produces wrong answers
  • Complex comparisons — weighing multiple factors with interactions
  • Code debugging — tracing through what code actually does line by line

In these cases, CoT can move a model from wrong answers to correct ones with no other changes.

When it doesn't help (or actively hurts)

CoT is not a universal improvement. It can be neutral or counterproductive for:

Simple factual lookups. "What year was the Eiffel Tower built?" does not benefit from step-by-step reasoning. The model either knows or doesn't.

Creative tasks. Asking a model to reason step-by-step before writing a poem interrupts the generation with scaffolding that the task doesn't need.

Short-answer classification. "Is this email spam or not-spam?" does not need a reasoning chain — and adding one can cause the model to talk itself into a wrong conclusion.

Latency-sensitive production pipelines. Reasoning chains mean more tokens, which means more cost and higher latency. At scale, that matters.

The pattern to internalize: CoT helps when the process of getting to the answer is itself what produces the right answer. It doesn't help when the answer is either known or unknowable.

Prompting patterns

Minimal zero-shot trigger:

[Your question here]

Think through this step by step before giving your final answer.

Slightly more structured:

[Your question here]

Reason through the problem. Then, on a new line, state your final answer clearly.

Forcing visible separation (useful when you need to parse output):

[Your question here]

First, work through the reasoning in a <thinking> block.
Then give your answer in an <answer> block.

Few-shot with demonstrated format:

Q: [example question]
Reasoning: [step 1] → [step 2] → [step 3]
Answer: [example answer]

Q: [your actual question]

The failure modes to know

Confident wrong chains. CoT can produce a well-structured reasoning path that leads to an incorrect conclusion. The output looks more reliable because it shows work — but the work can contain errors. Don't mistake a long reasoning chain for a correct one.

Reasoning that contradicts the answer. Models sometimes produce a correct chain but then state a different final answer. Watch for the last sentence, not the middle paragraphs.

Circular reasoning. The model restates the question as a premise, then derives the answer from that restatement. Looks like reasoning; isn't.

If correctness matters, treat the reasoning chain as auditable, not authoritative. The chain is useful as a diagnostic — when you see where the reasoning broke, you can fix the prompt.

Get the next guide when it lands

One email on Sunday with new /learn guides, tool updates, and a couple of links worth reading.