Constitutional AI

What it means

Constitutional AI (CAI) is Anthropic's answer to the bottleneck of human-rated preference data. Instead of paying humans to rank thousands of outputs, you give the model a written "constitution" — a set of natural-language principles like "be helpful but don't help with weapons synthesis" or "respect user autonomy" — and ask the model itself to critique its outputs against those principles and rewrite the bad ones. The revised outputs become training data. RLHF's "human says A is better than B" turns into "the constitution implies A is better than B." In practice CAI is a two-stage process. First, supervised: the model generates responses, critiques them against the constitution, revises them, and trains on those revisions. Second, RL: the model ranks pairs of its own outputs against constitutional principles, and that ranking trains a reward signal — basically RLAIF with an explicit rulebook. Anthropic publishes the constitution Claude is trained against, which is unusually transparent compared to OpenAI's or Google's safety training, both of which are largely opaque. The honest take: CAI is more a methodology and a marketing position than a wildly different result. Every frontier lab now uses some mix of human preferences, AI feedback, and explicit principles in post-training. Anthropic's contribution was making the rules legible and showing you can scale alignment cheaper than pure RLHF. The downside is that the model is only as well-aligned as the constitution is well-written — and writing the constitution is itself a hard, contested problem.

Example

Claude is trained with a constitution that draws on the UN Declaration of Human Rights, Apple's terms of service, and Anthropic's own safety principles. When Claude declines a borderline request, it's usually a constitutional principle being applied, not a hardcoded keyword filter.

Why it matters

Constitutional AI matters because it's the one major alignment technique whose rules are public. If you want to know why Claude refuses certain prompts, you can read the constitution. That transparency is rare in frontier AI and is why CAI gets disproportionate attention in policy discussions about how to govern model behavior.

What it means

Example

Why it matters

Related terms

See it in a comparison