Open-source AI, explained
"Open" is one of the most slippery words in AI. It can mean weights you can download, code you can fork, or a whole agent stack you can self-host — and the difference matters when you pick a model, license, or product.
What "open" actually means in AI
Three things get called "open" and only one of them matches the original open-source definition.
Open weights means the trained model file is published — you can download it, run it on your own hardware, and fine-tune it. Training code, training data, and evaluation scripts are usually notreleased. This is what almost every "open-source" model from a major lab actually is. Llama, Qwen, DeepSeek, Mistral, Gemma, Kimi K2.6, and OpenAI's gpt-oss all fall here.
Open source, in the strict sense, means weights + training code + training data + a permissive license. Very few labs do this. Allen AI's OLMo and EleutherAI's Pythia are the canonical examples — useful mostly for research, not competitive on capability with the open-weights flagships.
Open tools / open agents is the third bucket: the application layer. Source code for the program that uses an LLM to do work — agents, IDEs, orchestrators. OpenClaw, Cline, Aider, OpenHands, n8n. The model they call out to is usually closed.
The three flavors of open
Open weights
The model file is public. Training data isn't.
You can download a .safetensors or .gguffile and run inference on your own GPU or via a hosting provider. Fine-tuning is allowed. Reproducing the model from scratch isn't — the training data and recipe stay proprietary. License terms vary widely and matter (see pitfalls below).
This is what people usually meanwhen they say "open-source AI" in 2026.
Open source (strict)
Weights + code + training data + permissive license.
Everything you need to reproduce the model is published. Allen AI's OLMo family and EleutherAI's Pythia are the standard examples. Useful for academic research, alignment work, and educational projects. Not competitive on raw capability with the open-weights flagships listed below — labs that publish everything have fewer tricks to keep secret, so they typically have less budget.
Use it when: you need to study how a model was actually built or run experiments that require a fully-known training pipeline.
Open tools / open agents
Apps built on top of LLMs, with full source.
The application around the LLM is open: agent loops, IDEs, orchestrators, integrations. The model these tools call is usually closed (Claude, GPT, Gemini), but the harness, prompts, tool definitions, and execution logic are all readable, forkable, and self-hostable.
Examples: OpenClaw, Cline, Aider, n8n. See the full lineup in the tools section below.
Models that matter (May 2026)
Five families ship the open-weight flagships you'll actually pick from. Almost every one is a sparse Mixture-of-Experts — total parameters are huge, but only a small slice activates per token, so inference cost stays reasonable.
DeepSeek V4 (Pro / Flash)
DeepSeek
- Params
- 1.6T total / 49B active (Pro); 284B / 13B (Flash)
- Context
- 1M tokens
- License
- MIT
Best at: Coding (93.5 LiveCodeBench), best price-to-capability ratio for self-hosted
Llama 4 (Scout / Maverick)
Meta
- Params
- 400B / 17B active (Maverick); Scout is multimodal with 10M-token context
- Context
- 10M tokens (Scout) / 1M (Maverick)
- License
- Llama 4 Community License (700M MAU clause)
Best at: Long-context multimodal (Scout); broad ecosystem support; safest enterprise default
Qwen 3.5 / 3.6 Plus
Alibaba
- Params
- 397B / 17B active
- Context
- 1M tokens
- License
- Apache 2.0
Best at: Agentic coding, multilingual (esp. Chinese), tool use
Kimi K2.6
Moonshot AI
- Params
- 1T total / 32B active
- Context
- 256K tokens
- License
- Modified MIT
Best at: Long-horizon agentic coding; ties GPT-5.5 on SWE-Bench Pro at ~80% lower cost
gpt-oss-120b / gpt-oss-20b
OpenAI
- Params
- 117B / 5.1B active (120b); 21B (20b)
- Context
- 128K tokens
- License
- Apache 2.0
Best at: Reasoning (near o4-mini); 120b runs on a single 80GB GPU, 20b on 16GB
The leaderboard moves every month. For live rankings, check the Artificial Analysis leaderboard or LMArena.
Where to run them
Three paths, ordered by setup cost. Pick based on volume, privacy needs, and how comfortable you are with GPU plumbing.
1. Local
Your laptop or desktop
Quantized models, your own hardware, zero API cost. 7B-30B fits on consumer hardware; bigger models need beefy Macs (M3/M4 Ultra with 64-128GB unified memory) or a discrete GPU with 24GB+ VRAM.
- Ollama — easiest, CLI-first
- LM Studio — desktop app, GGUF browser
- llama.cpp — what most of these run on under the hood
2. Managed inference
Pay per token, no GPU
Same OpenAI-compatible API as the closed labs. You bring keys, they run the GPUs. Best balance of cost and operational simplicity for production.
- Together AI — broad model selection
- Groq — sub-100ms latency on smaller models
- Fireworks AI — fast + fine-tuning
- DeepInfra — cheapest per token
- OpenRouter — one key, many providers
3. Self-hosted GPU
Your own H100/A100 fleet
Lowest cost per token at high volume, full data control, but real ops work. For most teams, only worth it if you're burning >$10k/mo on inference or have hard data-residency requirements.
- vLLM — production-grade serving
- SGLang — fast structured outputs
- TensorRT-LLM — NVIDIA-optimized
# macOS
brew install ollama
# Pull + chat with gpt-oss-20b
# (16 GB RAM is enough)
ollama run gpt-oss:20b
# Or run DeepSeek V4 Flash if your
# GPU has the headroom
ollama run deepseek-v4:flashcurl https://api.together.xyz/v1/chat/completions \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-V4-Pro",
"messages": [
{"role":"user","content":"hi"}
]
}'- When you're prototyping → Ollama. No account, zero cost, runs offline.
- When you ship to real users → Together or OpenRouter. Pay-per-token, OpenAI-compatible API, no GPU ops.
- When the API bill clears $10k/mo → revisit self-hosted vLLM. Below that threshold, the ops cost dwarfs the savings.
Open-source AI agents and tools
The application layer is where most of the practical action is in 2026. These tools are open-source — you can read every prompt, every tool definition, every loop — even when the model behind them is closed.
OpenClaw
FeaturedAutonomous personal AI agent · MIT-licensed
The fastest-growing open-source project in GitHub history. Runs locally and acts through messaging apps — you DM your assistant on WhatsApp, Telegram, Signal, or Discord, and it carries out tasks across your files, terminal, calendar, and inbox with persistent memory. Plugs into Claude, GPT, or DeepSeek as the brain.
Cline
VS Code extension. Autonomous coding agent that reads, edits, and runs commands. Bring-your-own LLM key.
Aider
Terminal-native pair programmer. Edits files via git commits; pairs especially well with Claude or DeepSeek.
Continue
VS Code / JetBrains extension. Inline chat + autocomplete with any model, including local Ollama.
OpenHands (formerly OpenDevin)
Autonomous web-and-code agent. Runs in a sandbox container; can browse, write code, run tests.
OpenCode
Open-source terminal coding agent in the Claude Code mold, by SST. Plug in any model.
n8n
Visual workflow builder. AI nodes for orchestrating LLM calls inside larger automations. Self-hostable.
For the full list and head-to-head comparisons, see /tools and /compare.
Open vs frontier — when each wins
| If your priority is… | Pick |
|---|---|
| Lowest cost per token at scale | Open weights (DeepSeek V4 Flash, Qwen 3.5) |
| Hardest reasoning, agentic reliability | Frontier (Claude Opus, GPT-5) |
| Long-horizon agentic coding | Open: Kimi K2.6 or Qwen 3.6 Plus |
| Data residency / no third-party processor | Open weights, self-hosted |
| Fine-tune on your domain data | Open weights (LoRA on Llama / Qwen) |
| On-device / offline (laptop, edge) | Open: gpt-oss-20b, Qwen small, Gemma 4 |
| Best safety guardrails out of the box | Frontier (Claude, GPT-5) |
| No vendor lock-in | Open weights via OpenRouter |
| Tool-use / function-calling reliability | Frontier still ahead, gap shrinking |
Common pitfalls
- Treating "open" as one license.Llama's license bans use by anyone with >700M MAU and includes acceptable-use clauses. Apache 2.0 (Qwen, Gemma, Mistral, gpt-oss) is genuinely permissive. MIT (DeepSeek) is the most generous. Read the license before you build a product on the model.
- Assuming open-weights is always behind frontier. Not true for coding in 2026 — Kimi K2.6 ties GPT-5.5 on SWE-Bench Pro, DeepSeek V4 Pro leads LiveCodeBench. Frontier still wins on the hardest reasoning benchmarks and on tool-use reliability, but the gap is task-specific, not uniform.
- "Free" means you pay the GPU bill.Self-hosting a 400B-active model isn't free — it's a $30k+ box plus power. Managed inference (Together, Groq, Fireworks) is cheaper than the API-equivalent frontier model but still costs real money per token. Compare token rates honestly.
- Quantization quality loss is real.A 4-bit quant of a 70B model isn't the same model — capability degrades, especially at long context and on reasoning. For benchmarks, run the quant you'll deploy, not the full-precision reference.
- Open agents inherit prompt-injection risk. An agent like OpenClaw with broad permissions (inbox, terminal, calendar) is exactly the high-blast-radius surface attackers target. Sandbox aggressively, audit any third-party skills, and read what you install.
- Confusing inference-cheap with training-cheap.MoE architectures keep inference affordable, but pre-training a frontier-class open-weights model still costs tens of millions of dollars. The reason you're getting these for free is that someone else paid for training and chose to release the weights — not because LLMs got cheap to build.