What is an LLM, really?

You've probably been using ChatGPT or Claude for a while now. It feels like it understands you. It doesn't.

An LLM is a giant prediction machine. Its only job is to guess the next word in a sequence. That's it. It looks at everything you've written so far, asks itself "what word probably comes next?", picks one, and repeats. It does this thousands of times in a few seconds. The result feels like a conversation.

That's the whole trick.

This sounds reductive, but it explains almost every weird thing the model does. If you understand it, you understand prompting.

Hallucinations, explained

The model doesn't have a database of facts. It has patterns. "Capital of France" almost always continues with "Paris" in its training data, so when you ask, it says Paris. But "the third paper Anthropic published in 2023" might not have a strong pattern. So it fills in something plausible. That's a hallucination. It isn't lying. It just doesn't know it's wrong.

The fix: give the model the facts inside your prompt. If you paste the actual paper list, it will work from that. If you don't, it will guess.

Context windows

The model only sees what's in front of it right now. Each conversation has a context window, basically a buffer of how much text the model can hold in mind at once. Claude can hold roughly a 500-page novel. ChatGPT, less.

When you exceed that, the model starts forgetting the beginning of the conversation. This is also why "tell me more" sometimes makes the answer worse. You're using window space without adding useful new material.

Tokens and cost

When the model reads your text, it chops it into chunks called tokens. One token is roughly 3 to 4 characters of English. "Hello, world" is 3 tokens. Pricing on every model is measured in tokens, not words. If you paste a 50-page document and ask one question, you're sending 20,000+ tokens. Each one costs money.

This is why compressing your prompt has a real economic effect, not just a stylistic one.

A better way to ask

Stop thinking "ask the AI a question." Start thinking "show the AI a document and let it autocomplete the answer."

When you write:

Write me a cold email to a CFO at a 200-person SaaS company.

you're asking. When you write:

Here are three cold emails I've sent before that got replies.
[pastes them]
Write a fourth one in the same style for a CFO at a 200-person SaaS company.

you're showing. Same task, very different output. The second prompt stacks the model with examples, so it autocompletes from a much richer starting point.

Quick recap

It predicts. It doesn't know.

Context is everything. Stuff your prompt with the relevant material.

Cost scales with tokens. Long prompts and long answers both add up.

The next guide in this pillar covers five prompt patterns that work in any model.