What is an LLM, really?
The mental model that fixes most prompting confusion. Why models hallucinate, what context windows are, and how tokens map to cost.
You've probably been using ChatGPT or Claude for a while now. It feels like it understands you. It doesn't.
An LLM is a giant prediction machine. Its only job is to guess the next word in a sequence. That's it. It looks at everything you've written so far, asks itself "what word probably comes next?", picks one, and repeats. It does this thousands of times in a few seconds. The result feels like a conversation.
That's the whole trick.
This sounds reductive, but it explains almost every weird thing the model does. If you understand it, you understand prompting.
Hallucinations, explained
The model doesn't have a database of facts. It has patterns. "Capital of France" almost always continues with "Paris" in its training data, so when you ask, it says Paris. But "the third paper Anthropic published in 2023" might not have a strong pattern. So it fills in something plausible. That's a hallucination. It isn't lying. It just doesn't know it's wrong.
The fix: give the model the facts inside your prompt. If you paste the actual paper list, it will work from that. If you don't, it will guess.
Context windows
The model only sees what's in front of it right now. Each conversation has a context window, basically a buffer of how much text the model can hold in mind at once. Claude can hold roughly a 500-page novel. ChatGPT, less.
When you exceed that, the model starts forgetting the beginning of the conversation. This is also why "tell me more" sometimes makes the answer worse. You're using window space without adding useful new material.
Tokens and cost
When the model reads your text, it chops it into chunks called tokens. One token is roughly 3 to 4 characters of English. "Hello, world" is 3 tokens. Pricing on every model is measured in tokens, not words. If you paste a 50-page document and ask one question, you're sending 20,000+ tokens. Each one costs money.
This is why compressing your prompt has a real economic effect, not just a stylistic one.
A better way to ask
Stop thinking "ask the AI a question." Start thinking "show the AI a document and let it autocomplete the answer."
When you write:
Write me a cold email to a CFO at a 200-person SaaS company.
you're asking. When you write:
Here are three cold emails I've sent before that got replies.
[pastes them]
Write a fourth one in the same style for a CFO at a 200-person SaaS company.
you're showing. Same task, very different output. The second prompt stacks the model with examples, so it autocompletes from a much richer starting point.
Quick recap
It predicts. It doesn't know.
Context is everything. Stuff your prompt with the relevant material.
Cost scales with tokens. Long prompts and long answers both add up.
The next guide in this pillar covers five prompt patterns that work in any model.
Next in this pillar
How to prompt: 5 patterns that work in any model