Open-source AI

Open-source AI, explained

"Open" is one of the most slippery words in AI. It can mean weights you can download, code you can fork, or a whole agent stack you can self-host — and the difference matters when you pick a model, license, or product.

What "open" actually means in AI

Three things get called "open" and only one of them matches the original open-source definition.

Open weights means the trained model file is published — you can download it, run it on your own hardware, and fine-tune it. Training code, training data, and evaluation scripts are usually notreleased. This is what almost every "open-source" model from a major lab actually is. Llama, Qwen, DeepSeek, Mistral, Gemma, Kimi K2.6, and OpenAI's gpt-oss all fall here.

Open source, in the strict sense, means weights + training code + training data + a permissive license. Very few labs do this. Allen AI's OLMo and EleutherAI's Pythia are the canonical examples — useful mostly for research, not competitive on capability with the open-weights flagships.

Open tools / open agents is the third bucket: the application layer. Source code for the program that uses an LLM to do work — agents, IDEs, orchestrators. OpenClaw, Cline, Aider, OpenHands, n8n. The model they call out to is usually closed.

The three flavors of open

Open weights

The model file is public. Training data isn't.

You can download a .safetensors or .gguffile and run inference on your own GPU or via a hosting provider. Fine-tuning is allowed. Reproducing the model from scratch isn't — the training data and recipe stay proprietary. License terms vary widely and matter (see pitfalls below).

This is what people usually meanwhen they say "open-source AI" in 2026.

Open source (strict)

Weights + code + training data + permissive license.

Everything you need to reproduce the model is published. Allen AI's OLMo family and EleutherAI's Pythia are the standard examples. Useful for academic research, alignment work, and educational projects. Not competitive on raw capability with the open-weights flagships listed below — labs that publish everything have fewer tricks to keep secret, so they typically have less budget.

Use it when: you need to study how a model was actually built or run experiments that require a fully-known training pipeline.

Open tools / open agents

Apps built on top of LLMs, with full source.

The application around the LLM is open: agent loops, IDEs, orchestrators, integrations. The model these tools call is usually closed (Claude, GPT, Gemini), but the harness, prompts, tool definitions, and execution logic are all readable, forkable, and self-hostable.

Examples: OpenClaw, Cline, Aider, n8n. See the full lineup in the tools section below.

Models that matter (May 2026)

Five families ship the open-weight flagships you'll actually pick from. Almost every one is a sparse Mixture-of-Experts — total parameters are huge, but only a small slice activates per token, so inference cost stays reasonable.

DeepSeek V4 (Pro / Flash)

DeepSeek

Source
Params
1.6T total / 49B active (Pro); 284B / 13B (Flash)
Context
1M tokens
License
MIT

Best at: Coding (93.5 LiveCodeBench), best price-to-capability ratio for self-hosted

Llama 4 (Scout / Maverick)

Meta

Source
Params
400B / 17B active (Maverick); Scout is multimodal with 10M-token context
Context
10M tokens (Scout) / 1M (Maverick)
License
Llama 4 Community License (700M MAU clause)

Best at: Long-context multimodal (Scout); broad ecosystem support; safest enterprise default

Qwen 3.5 / 3.6 Plus

Alibaba

Source
Params
397B / 17B active
Context
1M tokens
License
Apache 2.0

Best at: Agentic coding, multilingual (esp. Chinese), tool use

Kimi K2.6

Moonshot AI

Source
Params
1T total / 32B active
Context
256K tokens
License
Modified MIT

Best at: Long-horizon agentic coding; ties GPT-5.5 on SWE-Bench Pro at ~80% lower cost

gpt-oss-120b / gpt-oss-20b

OpenAI

Source
Params
117B / 5.1B active (120b); 21B (20b)
Context
128K tokens
License
Apache 2.0

Best at: Reasoning (near o4-mini); 120b runs on a single 80GB GPU, 20b on 16GB

The leaderboard moves every month. For live rankings, check the Artificial Analysis leaderboard or LMArena.

Where to run them

Three paths, ordered by setup cost. Pick based on volume, privacy needs, and how comfortable you are with GPU plumbing.

1. Local

Your laptop or desktop

Quantized models, your own hardware, zero API cost. 7B-30B fits on consumer hardware; bigger models need beefy Macs (M3/M4 Ultra with 64-128GB unified memory) or a discrete GPU with 24GB+ VRAM.

  • Ollama — easiest, CLI-first
  • LM Studio — desktop app, GGUF browser
  • llama.cpp — what most of these run on under the hood

2. Managed inference

Pay per token, no GPU

Same OpenAI-compatible API as the closed labs. You bring keys, they run the GPUs. Best balance of cost and operational simplicity for production.

  • Together AI — broad model selection
  • Groq — sub-100ms latency on smaller models
  • Fireworks AI — fast + fine-tuning
  • DeepInfra — cheapest per token
  • OpenRouter — one key, many providers

3. Self-hosted GPU

Your own H100/A100 fleet

Lowest cost per token at high volume, full data control, but real ops work. For most teams, only worth it if you're burning >$10k/mo on inference or have hard data-residency requirements.

  • vLLM — production-grade serving
  • SGLang — fast structured outputs
  • TensorRT-LLM — NVIDIA-optimized
Local · install + run a model
# macOS
brew install ollama

# Pull + chat with gpt-oss-20b
# (16 GB RAM is enough)
ollama run gpt-oss:20b

# Or run DeepSeek V4 Flash if your
# GPU has the headroom
ollama run deepseek-v4:flash
Managed · call Together's API
curl https://api.together.xyz/v1/chat/completions \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V4-Pro",
    "messages": [
      {"role":"user","content":"hi"}
    ]
  }'
  • When you're prototyping → Ollama. No account, zero cost, runs offline.
  • When you ship to real users → Together or OpenRouter. Pay-per-token, OpenAI-compatible API, no GPU ops.
  • When the API bill clears $10k/mo → revisit self-hosted vLLM. Below that threshold, the ops cost dwarfs the savings.

Open-source AI agents and tools

The application layer is where most of the practical action is in 2026. These tools are open-source — you can read every prompt, every tool definition, every loop — even when the model behind them is closed.

OpenClaw

Featured

Autonomous personal AI agent · MIT-licensed

The fastest-growing open-source project in GitHub history. Runs locally and acts through messaging apps — you DM your assistant on WhatsApp, Telegram, Signal, or Discord, and it carries out tasks across your files, terminal, calendar, and inbox with persistent memory. Plugs into Claude, GPT, or DeepSeek as the brain.

For the full list and head-to-head comparisons, see /tools and /compare.

Open vs frontier — when each wins

If your priority is…Pick
Lowest cost per token at scaleOpen weights (DeepSeek V4 Flash, Qwen 3.5)
Hardest reasoning, agentic reliabilityFrontier (Claude Opus, GPT-5)
Long-horizon agentic codingOpen: Kimi K2.6 or Qwen 3.6 Plus
Data residency / no third-party processorOpen weights, self-hosted
Fine-tune on your domain dataOpen weights (LoRA on Llama / Qwen)
On-device / offline (laptop, edge)Open: gpt-oss-20b, Qwen small, Gemma 4
Best safety guardrails out of the boxFrontier (Claude, GPT-5)
No vendor lock-inOpen weights via OpenRouter
Tool-use / function-calling reliabilityFrontier still ahead, gap shrinking

Common pitfalls

  • Treating "open" as one license.Llama's license bans use by anyone with >700M MAU and includes acceptable-use clauses. Apache 2.0 (Qwen, Gemma, Mistral, gpt-oss) is genuinely permissive. MIT (DeepSeek) is the most generous. Read the license before you build a product on the model.
  • Assuming open-weights is always behind frontier. Not true for coding in 2026 — Kimi K2.6 ties GPT-5.5 on SWE-Bench Pro, DeepSeek V4 Pro leads LiveCodeBench. Frontier still wins on the hardest reasoning benchmarks and on tool-use reliability, but the gap is task-specific, not uniform.
  • "Free" means you pay the GPU bill.Self-hosting a 400B-active model isn't free — it's a $30k+ box plus power. Managed inference (Together, Groq, Fireworks) is cheaper than the API-equivalent frontier model but still costs real money per token. Compare token rates honestly.
  • Quantization quality loss is real.A 4-bit quant of a 70B model isn't the same model — capability degrades, especially at long context and on reasoning. For benchmarks, run the quant you'll deploy, not the full-precision reference.
  • Open agents inherit prompt-injection risk. An agent like OpenClaw with broad permissions (inbox, terminal, calendar) is exactly the high-blast-radius surface attackers target. Sandbox aggressively, audit any third-party skills, and read what you install.
  • Confusing inference-cheap with training-cheap.MoE architectures keep inference affordable, but pre-training a frontier-class open-weights model still costs tens of millions of dollars. The reason you're getting these for free is that someone else paid for training and chose to release the weights — not because LLMs got cheap to build.

Terms worth knowing

Related comparisons