Post-training

What it means

Pre-training builds a base model with raw knowledge. Post-training is everything you do afterward to make that base model into a usable product. The standard pipeline in 2026 looks roughly: supervised fine-tuning on instruction-following data, preference optimization (DPO or RLHF/RLAIF) for helpfulness and tone, safety training (refusals, jailbreak resistance, constitutional principles), and finally persona / character tuning so the model "feels like" GPT or Claude or Gemini rather than a generic assistant. The big shift in the last two years is how much post-training matters relative to pre-training. Pre-training has hit diminishing returns on raw scale, while post-training has exploded in sophistication. Most of the gap between a mediocre open model and a great one — same parameter count, same base data — comes down to who has better post-training pipelines, better synthetic data generation, and better evaluation harnesses. This is why DeepSeek-V3 base and Llama 4 base are competitive on raw IQ but the polished, helpful chat experience still favors closed labs that pour enormous post-training effort in. Post-training is also where every model's "personality" lives. Why does Claude apologize so much? Why does ChatGPT default to bullet points? Why does Gemini lean cautious? Those are post-training decisions, not architectural ones — and they can be changed in days, not months. When a lab "ships an update" without retraining the base, what they almost always shipped was a post-training refresh.

Example

Llama 4 8B Base and Llama 4 8B Instruct have the same architecture and the same pre-training. The difference between 'unhinged text completer' and 'capable chat assistant' is entirely post-training: SFT, DPO, safety filtering, and persona tuning layered on top.

Why it matters

Post-training is where AI labs compete in 2026. Pre-training is largely commoditized — anyone with a billion dollars and a year can build a respectable base model. What separates products is the post-training stack: data quality, preference pipelines, safety methodology, evaluation. If you're choosing between models for a serious project, you're mostly choosing between post-training philosophies.

What it means

Example

Why it matters

Related terms

See it in a comparison