All terms
Inference & reasoning
Self-consistency
Also known as: majority voting, sampling-based inference, consensus decoding
Running the same prompt multiple times with sampling and taking the majority answer — a simple way to boost reasoning accuracy by trading more compute for better results.
What it means
Self-consistency is the embarrassingly simple idea that if you run a model on the same problem several times and it tends to give the same answer, that answer is more likely to be right. You sample N completions (usually with temperature > 0 so you get variety), extract the final answer from each, and take the most common one. The wrong answers tend to disagree with each other; the right answer tends to converge. On math benchmarks, going from 1 sample to 40 samples can lift accuracy by 10-20 percentage points, even with no other changes.
It's the simplest example of a broader idea: best-of-N sampling. The same pattern shows up everywhere — generate N candidate completions, score them somehow (majority vote, a verifier model, executing code and checking results, a reward model), pick the best. Self-consistency is the unsupervised version where the "scorer" is just counting votes. Best-of-N with a learned verifier is more powerful but more complex. Both are forms of test-time compute: spend more inference to get better answers.
In practice, self-consistency lives in the background. Production systems use it for high-stakes reasoning tasks (code generation, math, structured extraction) where the cost of more samples is worth the accuracy gain. Reasoning models do something like an internal version automatically — exploring branches and converging — which means classic self-consistency adds less than it used to on top of o1-class models. But when you're using a cheap, fast model and need reasoning model accuracy, sampling 5-10 times and voting is often the cheapest way to close most of the gap.
Example
You have a math problem and a fast non-reasoning model. Run it 10 times with temperature 0.7. Six runs say "42," two say "40," one says "44," one says "37." You return 42 with reasonable confidence. Cost: 10x a single call. Accuracy: often comparable to a much more expensive reasoning model.
Why it matters
Self-consistency is one of the highest leverage techniques you can apply without changing models. It's the foundational building block for understanding modern inference patterns — every 'best of N,' verifier-guided generation, or Monte Carlo reasoning system is a relative of self-consistency. Knowing when to reach for it (verifiable answers, stochastic models, latency tolerance) is a useful instinct.