All terms
Training & adaptation
Post-training
Also known as: alignment training, instruction tuning pipeline
The umbrella term for everything done to a model after pre-training — SFT, RLHF, DPO, safety tuning, persona work — where most modern model differentiation actually happens.
What it means
Pre-training builds a base model with raw knowledge. Post-training is everything you do afterward to make that base model into a usable product. The standard pipeline in 2026 looks roughly: supervised fine-tuning on instruction-following data, preference optimization (DPO or RLHF/RLAIF) for helpfulness and tone, safety training (refusals, jailbreak resistance, constitutional principles), and finally persona / character tuning so the model "feels like" GPT or Claude or Gemini rather than a generic assistant.
The big shift in the last two years is how much post-training matters relative to pre-training. Pre-training has hit diminishing returns on raw scale, while post-training has exploded in sophistication. Most of the gap between a mediocre open model and a great one — same parameter count, same base data — comes down to who has better post-training pipelines, better synthetic data generation, and better evaluation harnesses. This is why DeepSeek-V3 base and Llama 4 base are competitive on raw IQ but the polished, helpful chat experience still favors closed labs that pour enormous post-training effort in.
Post-training is also where every model's "personality" lives. Why does Claude apologize so much? Why does ChatGPT default to bullet points? Why does Gemini lean cautious? Those are post-training decisions, not architectural ones — and they can be changed in days, not months. When a lab "ships an update" without retraining the base, what they almost always shipped was a post-training refresh.
Example
Llama 4 8B Base and Llama 4 8B Instruct have the same architecture and the same pre-training. The difference between 'unhinged text completer' and 'capable chat assistant' is entirely post-training: SFT, DPO, safety filtering, and persona tuning layered on top.
Why it matters
Post-training is where AI labs compete in 2026. Pre-training is largely commoditized — anyone with a billion dollars and a year can build a respectable base model. What separates products is the post-training stack: data quality, preference pipelines, safety methodology, evaluation. If you're choosing between models for a serious project, you're mostly choosing between post-training philosophies.