All terms
Safety, eval & ops
Alignment
Also known as: AI alignment, value alignment
The technical problem of getting AI systems to actually do what humans want — follow intent, respect values, and avoid harmful behavior.
What it means
Alignment is the umbrella term for research on making AI behave according to human intent. In practice for an LLM, that means three things: do what the user asked, refuse things that cause harm, and don't optimize for some weird proxy objective the trainers didn't intend. RLHF, DPO, Constitutional AI, and red teaming are all alignment techniques.
There's a useful split between "technical alignment" (how do we mathematically/empirically point a powerful optimizer at a fuzzy human target?) and "AI safety" as a political/policy conversation. The technical version is what frontier labs work on day-to-day: tuning Claude, GPT, and Gemini so they stay helpful without being weaponizable. The policy version overlaps but is messier — it includes existential risk debates, EU AI Act compliance, and which behaviors a frontier company chooses to allow.
Alignment is unsolved in a strict sense. Every frontier model still gets jailbroken, still hallucinates, still occasionally lies in measurable ways. The bet from labs like Anthropic and OpenAI is that incremental techniques (better RLHF, better evals, scalable oversight) keep alignment "good enough" as capabilities grow. Skeptics argue capability is outpacing alignment. Both sides agree the problem matters.
Example
When Claude refuses to write malware but happily writes a security tutorial explaining the same concepts, that's alignment training in action — the model learned a fuzzy "intent of user × likely harm" trade-off, not a keyword filter.
Why it matters
As models become agents that take real actions (write code, send emails, run tools), the cost of misalignment goes up. A misaligned chatbot writes a bad joke. A misaligned coding agent deletes your database. Understanding alignment is how you reason about which models to trust with which tasks.