All terms
Foundations

LLM (Large Language Model)

Also known as: large language model, foundation model, language model

A neural network trained on huge amounts of text to predict the next token. The engine behind ChatGPT, Claude, Gemini, and most modern AI tools.

What it means

An LLM is a transformer-based neural network with billions to trillions of parameters, trained on a large slice of the internet (plus books, code, and curated data) to predict the next token given the previous ones. Everything else — chat, coding, reasoning, agents — is built on top of that one trick. "LLM" is not the same as "AI" or "chatbot." AI is the broad field. A chatbot is a product surface (ChatGPT, Claude.ai). An LLM is the underlying model. GPT-5 is an LLM. ChatGPT is the product wrapping it, with system prompts, tools, memory, and UI. Confusing the three is the most common mistake in AI conversations. Scale matters because capabilities emerge non-linearly. A 1B-parameter model can autocomplete sentences. A 70B model can write coherent essays. A frontier-tier model (Claude Opus 4.7, GPT-5, Gemini 3 Pro) can write production code, reason through ambiguous problems, and operate tools — capabilities that didn't exist at smaller scales. This "emergent capability" framing is contested in research circles, but the practical result is real: the gap between a 7B local model and a frontier model in 2026 is enormous. LLMs are stochastic, not deterministic. The same prompt can produce different outputs. They don't "know" things — they pattern-match against training data. They hallucinate, struggle with arithmetic, and have no memory between sessions unless you bolt one on. Understanding what an LLM is and isn't is the difference between using AI well and being constantly disappointed by it.

Example

Claude Opus 4.7, GPT-5, Gemini 3 Pro, DeepSeek V4, Llama 4, and Mistral Large 3 are all LLMs. They differ in scale, training data, fine-tuning approach, and serving infrastructure — but the underlying architecture is broadly the same.

Why it matters

Almost every AI product you'll use in 2026 is a thin wrapper around an LLM. Knowing what the model is doing under the hood — predicting tokens, not retrieving facts — explains its strengths (fluency, pattern recognition) and its failure modes (hallucinations, arithmetic errors, knowledge cutoffs).

Related terms

See it in a comparison