Prompt chaining: how to break complex tasks into reliable steps

Why one long prompt fails where a chain of short ones succeeds. How to pass outputs as inputs, where to put the split points, and when chaining is overkill.

7 min read·Updated Jun 14, 2026

There's a ceiling on what a single prompt can reliably do. Not a hard limit — you can ask a model anything in one message. But as task complexity grows, the probability of getting the whole thing right in one shot decreases. This is not a bug. It's the shape of how sequence-to-sequence generation works.

Prompt chaining is the response to that ceiling. Instead of one large prompt, you write a pipeline: the output of each step becomes the input for the next. The chain only needs each step to work reliably — not everything at once.

Why single prompts fail on complex tasks

A single long prompt that asks a model to do many things at once creates several failure modes:

Instruction drift. The model follows early instructions well and later instructions loosely — attention fades across long instruction sets.

Mixed-quality outputs. Even if most of the task is done well, one weak step in the middle poisons what comes after it. There's nowhere to catch the failure.

No checkpoints. You can't inspect intermediate state. If the output is wrong, you don't know which part of the task failed.

Context contamination. Early outputs influence later generations in ways you didn't intend — a wrong assumption made in step 1 propagates through steps 2, 3, and 4.

Chaining solves these by making each step smaller and observable.

The basic pattern

Prompt chaining means:

  1. Write a prompt that does one well-defined thing
  2. Take its output
  3. Insert that output into the next prompt as context
  4. Repeat

In practice, this is often as simple as copy-pasting output into the next message in a conversation, or — if you're building a pipeline — concatenating strings in code.

Example: research → analysis → draft

Step 1 prompt:

I'm writing an article about the decline of third-party cookies.
List the five most significant technical reasons cookies are being phased out.
Output as a numbered list, one sentence per item.

Step 2 prompt (uses step 1 output):

Here are five reasons third-party cookies are being phased out:
[paste step 1 output]

For each reason, identify: (a) who is most affected, and (b) the leading technical alternative.
Use the same numbered format.

Step 3 prompt (uses step 2 output):

Using this analysis:
[paste step 2 output]

Write the opening two paragraphs of an article for a technical marketing audience.
Tone: clear, not breathless. No "the world is changing" openers.

Each step has a narrow job. Each output can be inspected and corrected before it feeds the next step.

Where to put the split points

Split a task when:

  • Output format changes. The first step returns data; the second step transforms or summarizes it. Different output formats need different instructions.

  • Quality gates matter. If step 1 output is wrong, you want to fix it before step 2 runs. You can't do that in a single prompt.

  • You need to inject human judgment. Maybe step 1 generates options, a human picks one, and step 2 executes on the chosen option. Chaining enables that loop.

  • The model needs to reason before acting. Extract information first, then make a decision based on what was extracted. Trying to do both at once usually compresses the extraction.

Keep each step focused on one kind of operation: research, transform, analyze, write, classify. Mixed-mode prompts — "research AND analyze AND write" — are where reliability drops.

When chaining is overkill

Not every task needs a chain. Single-prompt approaches are simpler and faster when:

  • The task is genuinely one operation: "write a subject line for this email"
  • You don't need to inspect intermediate outputs
  • Latency matters and each API call adds meaningful delay
  • The task is short enough that instruction drift isn't a factor

The rule of thumb: if you can explain the task in one sentence with one clear output format, one prompt probably works. If the explanation requires "and then" or "based on that", a chain is worth considering.

Practical patterns

The review loop — generate, then critique, then revise:

Step 1: Generate [output]
Step 2: "Review the above output. List any factual errors, weak arguments, or missing elements."
Step 3: "Revise [output] based on this critique: [step 2 output]"

The extraction step — pull structured data before working with it:

Step 1: "Extract all action items from this transcript. Format as: [owner] will [task] by [date]."
Step 2: "For each action item, draft a follow-up email to the owner."

The branching chain — one classification step routes to different downstream prompts:

Step 1: "Classify this support ticket: billing, technical, or general."
Step 2a (if billing): "Draft a billing response..."
Step 2b (if technical): "Draft a technical escalation..."

The parallel fan-out — run multiple step-1 variants, merge into step 2:

Step 1a: "Draft a formal version of this message."
Step 1b: "Draft a casual version of this message."
Step 2: "Given these two versions, write a final version that keeps the formality of [1a] but the tone of [1b]."

What to watch for

Chain length compounds errors. A 10-step chain where each step has a 90% success rate has an 35% chance of at least one failure. Keep chains short unless you have human review at key joints.

Context window limits. If you're stuffing full outputs from each step into the next prompt, long chains can hit token limits. Summarize or extract rather than passing everything wholesale.

Format stability. Each step's output needs to be parseable by the next step's prompt. If step 1 sometimes returns bullet points and sometimes paragraphs, step 2 will behave inconsistently. Either constrain step 1's format tightly or make step 2 format-agnostic.

Get the next guide when it lands

One email on Sunday with new /learn guides, tool updates, and a couple of links worth reading.