All terms
Foundations
Top-p (nucleus sampling)
Also known as: nucleus sampling, top-p sampling
An alternative or complement to temperature: sample only from the smallest set of tokens whose cumulative probability exceeds p (e.g. 0.9). Cuts off the long tail of unlikely tokens dynamically.
What it means
Top-p sampling, a.k.a. nucleus sampling, sorts the next-token candidates by probability and keeps only those whose cumulative probability sums to p. At p=0.9, you sample from the smallest set of tokens that together cover 90% of the probability mass. Everything in the long tail is discarded entirely.
The advantage over top-k (which always keeps the top K tokens regardless of distribution) is that top-p adapts to context. When the model is highly confident — say, the next token after "The capital of France is" — top-p might only consider 1-2 tokens. When the model is uncertain — say, the start of a creative story — top-p might keep dozens. This produces more natural variation than top-k's fixed cutoff.
Top-p and temperature interact, and most APIs let you set both. A common production setting is temperature 0.7, top-p 0.95: temperature gently flattens the distribution, top-p truncates the bottom. Setting both aggressively (high temp + low p) is usually not what you want; pick one as your main lever. For deterministic-ish output, set temperature to 0 and top-p becomes irrelevant. For creative work, lean on temperature first and use top-p as a safety net to prevent the occasional very-low-probability token from making output incoherent.
Example
After 'The cat sat on the' at top-p 0.9, the model might consider {mat, floor, couch, chair, windowsill} — a handful of tokens covering most of the probability mass. After 'The chemical formula for water is' at top-p 0.9, only {H} survives because the model is overwhelmingly confident.
Why it matters
Top-p is the more thoughtful default than top-k for any generative use case. If you only learn one sampling parameter, learn temperature; if you learn two, learn top-p. Together they cover almost all the practical control you'll want over creativity vs. coherence.