Open source AI — what it means in plain English

What it means

In October 2024, the Open Source Initiative (the org that defines what "open source" means for software) published version 1.0 of the Open Source AI Definition (OSAID). To qualify as open source, an AI system must provide: (1) the model weights, (2) the training and inference code under an OSI-approved license, and (3) sufficient information about the training data — including data sources, processing, and access methods — that a skilled person could reproduce a substantially equivalent model. The data requirement is the hard part. OSI deliberately did not require publishing the training dataset itself, because much of it is copyrighted or licensed. But "data information" must be detailed enough to actually reproduce, not just hand-wave. By that standard, almost no major "open" model in 2026 is OSI-compliant. Llama publishes weights but not training data details. DeepSeek and Qwen are similar. Mistral's open releases are closer but still incomplete. The OSI maintains a rolling list of compliant systems; it's short. The distinction matters because "open source" carries weight. Calling something open source signals reproducibility, auditability, and community ownership. When labs call open-weights models "open source," they're trading on that reputation without delivering its substance — which is why OSI pushed for a strict definition. In practice, "open weights" is what most people in 2026 actually mean when they say "open source AI", and the looser usage isn't going away. But for compliance, procurement, and serious evaluation, the distinction matters: open-weights gets you a runnable model; open-source AI gets you a reproducible one.

Example

OLMo from AI2 is one of the few projects that aims for full OSAID compliance — weights, training code, data documentation, and intermediate checkpoints all public. By contrast, Llama 4 is open-weights but explicitly not open source under OSI's definition.

Why it matters

If your org has a procurement, compliance, or audit requirement for 'open source' software, an open-weights model probably doesn't satisfy it. Knowing the difference avoids burning a six-month vendor review on a model that fails the legal definition. It also matters for trust: a fully open-source model can be independently audited for training-data contamination and bias in ways open-weights models can't.

What it means

Example

Why it matters

Related terms