All terms
Foundations
Token
Also known as: BPE token, subword token
The atomic unit an LLM reads and writes — usually a sub-word fragment. "The" is one token; "tokenization" is three.
What it means
A token is the smallest piece of text an LLM processes. It's not a word and not a character — it's a sub-word fragment chosen by the model's tokenizer during training. Common words ("the", "and", "is") are single tokens. Rarer words get broken into pieces: "tokenization" might split into "token" + "iz" + "ation".
A useful rule of thumb for English: 1 token ≈ 0.75 words, or 1 word ≈ 1.3 tokens. 1,000 tokens is roughly 750 words or about 4 paragraphs. Code tokenizes differently — whitespace, brackets, and punctuation each tend to be their own tokens, so code is more token-dense than prose. Non-English languages, especially CJK and many low-resource languages, often tokenize much worse, sometimes 2-4x more tokens for the same meaning.
Tokens are the unit of everything that matters commercially: context windows are measured in tokens, API pricing is per-million-tokens (input and output priced separately), and rate limits are token-per-minute. When a model "costs $3/M input, $15/M output," that's per million tokens. A 50-page PDF is roughly 25k tokens. A long Claude conversation can quietly burn 100k+ tokens before you notice.
Example
In OpenAI's tokenizer, "Hello, world!" = 4 tokens ("Hello", ",", " world", "!"). The phrase "antidisestablishmentarianism" tokenizes into about 6 pieces, while "the" is just 1.
Why it matters
Tokens are how you reason about cost, latency, and context limits. Pasting a 200k-token document into a model with a 32k window will fail. Sending the same conversation back and forth in chat replays all prior tokens on every turn — which is why long chats get expensive fast.