Tree of thought (ToT)

What it means

Tree of thought generalizes chain of thought from a single linear sequence of steps to a tree. At each reasoning step, the model proposes several alternative next thoughts, then evaluates them — usually by self-scoring or by another model judging promise — and either expands the most promising branches or prunes dead ends. It's effectively breadth-first or beam search over the space of reasoning chains, with the LLM as both the generator and the evaluator. The original 2023 paper showed ToT crushing CoT on tasks like Game of 24 and creative writing where the right answer requires backtracking. The cost is real. ToT can use 10x-100x more tokens than CoT for the same query because you're generating and scoring many branches. The implementation complexity is also non-trivial: you need a search controller, evaluation prompts, branching strategies, and termination conditions. For most production systems this overhead doesn't pay off — you can usually get similar gains from a reasoning model (which does internal exploration) or self-consistency (which is much simpler) at a fraction of the engineering cost. In 2026, tree of thought is mostly an academic and research technique. Production systems use chain of thought, reasoning models, or self-consistency. The ideas behind ToT — branching, evaluating, pruning — are alive and well, but they're now baked into reasoning model training (which learns to backtrack on its own) and into agent search loops (where the "branches" are tool-call sequences). You'll see the ToT name occasionally in papers and demos; you'll rarely see it in shipped products.

Example

Game of 24: given numbers 4, 9, 10, 13, reach 24 using arithmetic. CoT picks one strategy and often fails. ToT branches at each step (try 13-9 first? try 10/4 first?), evaluates partial states, abandons branches that can't reach 24, and explores promising ones. Solves it where CoT can't — but uses dozens of LLM calls.

Why it matters

Tree of thought is a useful concept even if you never implement it: it's the cleanest illustration of why search at inference time helps, and why reasoning models work. But practically, don't reach for ToT first. Try a reasoning model. Try self-consistency. ToT is the heavy artillery for cases those don't solve, and most projects never need it.

What it means

Example

Why it matters

Related terms

See it in a comparison