Open-source AI

Open-source AI, explained

"Open" is one of the most slippery words in AI. It can mean weights you can download, code you can fork, or a whole agent stack you can self-host — and the difference matters when you pick a model, license, or product.

What "open" actually means in AI

Three things get called "open" and only one of them matches the original open-source definition.

Open weights means the trained model file is published — you can download it, run it on your own hardware, and fine-tune it. Training code, training data, and evaluation scripts are usually notreleased. This is what almost every "open-source" model from a major lab actually is. Llama, Qwen, DeepSeek, Mistral, Gemma, Kimi K2.6, and OpenAI's gpt-oss all fall here.

Open source, in the strict sense, means weights + training code + training data + a permissive license. Very few labs do this. Allen AI's OLMo and EleutherAI's Pythia are the canonical examples — useful mostly for research, not competitive on capability with the open-weights flagships.

Open tools / open agents is the third bucket: the application layer. Source code for the program that uses an LLM to do work — agents, IDEs, orchestrators. OpenClaw, Cline, Aider, OpenHands, n8n. The model they call out to is usually closed.

The three flavors of open

Open weights

The model file is public. Training data isn't.

You can download a .safetensors or .gguffile and run inference on your own GPU or via a hosting provider. Fine-tuning is allowed. Reproducing the model from scratch isn't — the training data and recipe stay proprietary. License terms vary widely and matter (see pitfalls below).

This is what people usually meanwhen they say "open-source AI" in 2026.

Open source (strict)

Weights + code + training data + permissive license.

Everything you need to reproduce the model is published. Allen AI's OLMo family and EleutherAI's Pythia are the standard examples. Useful for academic research, alignment work, and educational projects. Not competitive on raw capability with the open-weights flagships listed below — labs that publish everything have fewer tricks to keep secret, so they typically have less budget.

Use it when: you need to study how a model was actually built or run experiments that require a fully-known training pipeline.

Open tools / open agents

Apps built on top of LLMs, with full source.

The application around the LLM is open: agent loops, IDEs, orchestrators, integrations. The model these tools call is usually closed (Claude, GPT, Gemini), but the harness, prompts, tool definitions, and execution logic are all readable, forkable, and self-hostable.

Examples: OpenClaw, Cline, Aider, n8n. See the full lineup in the tools section below.

Models that matter (May 2026)

Five families ship the open-weight flagships you'll actually pick from. Almost every one is a sparse Mixture-of-Experts — total parameters are huge, but only a small slice activates per token, so inference cost stays reasonable.

DeepSeek V4 (Pro / Flash)

DeepSeek

Source

Params: 1.6T total / 49B active (Pro); 284B / 13B (Flash)
Context: 1M tokens
License: MIT

Best at: Coding (93.5 LiveCodeBench), best price-to-capability ratio for self-hosted

Llama 4 (Scout / Maverick)

Qwen 3.5 / 3.6 Plus

Alibaba

Source

Params: 397B / 17B active
Context: 1M tokens
License: Apache 2.0

Best at: Agentic coding, multilingual (esp. Chinese), tool use

Kimi K2.6

Moonshot AI

Source

Params: 1T total / 32B active
Context: 256K tokens
License: Modified MIT

Best at: Long-horizon agentic coding; ties GPT-5.5 on SWE-Bench Pro at ~80% lower cost

gpt-oss-120b / gpt-oss-20b

OpenAI

Source

Params: 117B / 5.1B active (120b); 21B (20b)
Context: 128K tokens
License: Apache 2.0

Best at: Reasoning (near o4-mini); 120b runs on a single 80GB GPU, 20b on 16GB

The leaderboard moves every month. For live rankings, check the Artificial Analysis leaderboard or LMArena.

Where to run them

Three paths, ordered by setup cost. Pick based on volume, privacy needs, and how comfortable you are with GPU plumbing.

1. Local

Your laptop or desktop

Quantized models, your own hardware, zero API cost. 7B-30B fits on consumer hardware; bigger models need beefy Macs (M3/M4 Ultra with 64-128GB unified memory) or a discrete GPU with 24GB+ VRAM.

Ollama — easiest, CLI-first
LM Studio — desktop app, GGUF browser
llama.cpp — what most of these run on under the hood

2. Managed inference

Pay per token, no GPU

Same OpenAI-compatible API as the closed labs. You bring keys, they run the GPUs. Best balance of cost and operational simplicity for production.

Together AI — broad model selection
Groq — sub-100ms latency on smaller models
Fireworks AI — fast + fine-tuning
DeepInfra — cheapest per token
OpenRouter — one key, many providers

3. Self-hosted GPU

Your own H100/A100 fleet

Lowest cost per token at high volume, full data control, but real ops work. For most teams, only worth it if you're burning >$10k/mo on inference or have hard data-residency requirements.

vLLM — production-grade serving
SGLang — fast structured outputs
TensorRT-LLM — NVIDIA-optimized

Local · install + run a model

# macOS
brew install ollama

# Pull + chat with gpt-oss-20b
# (16 GB RAM is enough)
ollama run gpt-oss:20b

# Or run DeepSeek V4 Flash if your
# GPU has the headroom
ollama run deepseek-v4:flash

Managed · call Together's API

curl https://api.together.xyz/v1/chat/completions \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V4-Pro",
    "messages": [
      {"role":"user","content":"hi"}
    ]
  }'

When you're prototyping → Ollama. No account, zero cost, runs offline.
When you ship to real users → Together or OpenRouter. Pay-per-token, OpenAI-compatible API, no GPU ops.
When the API bill clears $10k/mo → revisit self-hosted vLLM. Below that threshold, the ops cost dwarfs the savings.

Open-source AI agents and tools

The application layer is where most of the practical action is in 2026. These tools are open-source — you can read every prompt, every tool definition, every loop — even when the model behind them is closed.

OpenClaw

Featured

Autonomous personal AI agent · MIT-licensed

The fastest-growing open-source project in GitHub history. Runs locally and acts through messaging apps — you DM your assistant on WhatsApp, Telegram, Signal, or Discord, and it carries out tasks across your files, terminal, calendar, and inbox with persistent memory. Plugs into Claude, GPT, or DeepSeek as the brain.

See in /tools openclaw.ai GitHub

Cline

VS Code extension. Autonomous coding agent that reads, edits, and runs commands. Bring-your-own LLM key.

Aider

Terminal-native pair programmer. Edits files via git commits; pairs especially well with Claude or DeepSeek.

Continue

VS Code / JetBrains extension. Inline chat + autocomplete with any model, including local Ollama.

OpenHands (formerly OpenDevin)

Autonomous web-and-code agent. Runs in a sandbox container; can browse, write code, run tests.

OpenCode

Open-source terminal coding agent in the Claude Code mold, by SST. Plug in any model.

n8n

Visual workflow builder. AI nodes for orchestrating LLM calls inside larger automations. Self-hostable.

For the full list and head-to-head comparisons, see /tools and /compare.

Open vs frontier — when each wins

If your priority is…	Pick
Lowest cost per token at scale	Open weights (DeepSeek V4 Flash, Qwen 3.5)
Hardest reasoning, agentic reliability	Frontier (Claude Opus, GPT-5)
Long-horizon agentic coding	Open: Kimi K2.6 or Qwen 3.6 Plus
Data residency / no third-party processor	Open weights, self-hosted
Fine-tune on your domain data	Open weights (LoRA on Llama / Qwen)
On-device / offline (laptop, edge)	Open: gpt-oss-20b, Qwen small, Gemma 4
Best safety guardrails out of the box	Frontier (Claude, GPT-5)
No vendor lock-in	Open weights via OpenRouter
Tool-use / function-calling reliability	Frontier still ahead, gap shrinking

Common pitfalls

Treating "open" as one license.Llama's license bans use by anyone with >700M MAU and includes acceptable-use clauses. Apache 2.0 (Qwen, Gemma, Mistral, gpt-oss) is genuinely permissive. MIT (DeepSeek) is the most generous. Read the license before you build a product on the model.
Assuming open-weights is always behind frontier. Not true for coding in 2026 — Kimi K2.6 ties GPT-5.5 on SWE-Bench Pro, DeepSeek V4 Pro leads LiveCodeBench. Frontier still wins on the hardest reasoning benchmarks and on tool-use reliability, but the gap is task-specific, not uniform.
"Free" means you pay the GPU bill.Self-hosting a 400B-active model isn't free — it's a $30k+ box plus power. Managed inference (Together, Groq, Fireworks) is cheaper than the API-equivalent frontier model but still costs real money per token. Compare token rates honestly.
Quantization quality loss is real.A 4-bit quant of a 70B model isn't the same model — capability degrades, especially at long context and on reasoning. For benchmarks, run the quant you'll deploy, not the full-precision reference.
Open agents inherit prompt-injection risk. An agent like OpenClaw with broad permissions (inbox, terminal, calendar) is exactly the high-blast-radius surface attackers target. Sandbox aggressively, audit any third-party skills, and read what you install.
Confusing inference-cheap with training-cheap.MoE architectures keep inference affordable, but pre-training a frontier-class open-weights model still costs tens of millions of dollars. The reason you're getting these for free is that someone else paid for training and chose to release the weights — not because LLMs got cheap to build.

Terms worth knowing

Open weights Open-source AI Frontier model Mixture of Experts (MoE)Quantization Fine-tuning LoRA Context window AI agent MCP

Related comparisons

Cline vs Claude Code Claude vs DeepSeek Aider vs Cursor