Coding with AI

How to actually work with AI when building software

The biggest mistake people make with AI-assisted dev in 2026 is treating it like "use Claude for everything" or "use ChatGPT for everything." The better mental model is a team of models — different tools for different roles. This page sorts out the layers, the workflows, and the decisions.

Choose your mode

Most of what follows depends on what you're actually doing. Find the row that fits — not the one that sounds impressive — and read the rest of the page through that lens. These aren't filters; they're orientation.

Just starting

First time letting AI write code you keep. You're still learning what good output looks like.

  • One tool, one model. No orchestration.
  • Read every diff before accepting.
  • Skip agents and teams until you feel the limit.

Solo / side projects

Shipping your own stuff. No teammates to coordinate with. Speed matters more than process.

  • IDE tool plus a terminal agent for bigger jobs.
  • One repo instruction file, kept short.
  • Tests on the parts you'd hate to break.

Real production repo

Code your users depend on. Mistakes cost real money or trust. You can't skip review.

  • Architect + Builder workflow, written plans.
  • A second model verifies every diff.
  • CI checks gate the merge, not the human.

Team lead

You're setting the defaults other engineers will inherit. Consistency beats cleverness.

  • One blessed IDE, one blessed agent. Document why.
  • Repo instruction files committed and reviewed.
  • Verification model wired into CI, not optional.

Recommended starter stacks

Concrete picks for each mode above. Not the only valid answers — but answers that actually work today, and that you can change later without rewriting everything. Skip the ones that don't apply yet.

Starter

IDE
Cursor
Terminal agent
None yet — add later.
Foundation model
Whatever ships in the IDE's default plan.
Repo file
.cursorrules — 20 lines, your real conventions.
Verification
Your eyes. Read the diff. That's the practice.

Solo builder

IDE
Cursor (or Cline if you prefer open-source).
Terminal agent
Claude Code for multi-file work.
Foundation model
Claude Sonnet for builds, Opus for plans.
Repo file
CLAUDE.md + .cursorrules pointing at it.
Verification
Tests on critical paths. Skim the diff yourself.

Production engineer

IDE
Cursor for inline edits, chat-in-editor.
Terminal agent
Claude Code as the builder; Codex as the reviewer (or vice versa — different model is the point).
Foundation model
Two providers. Don't let the same model build and verify.
Repo file
AGENTS.md as canonical, CLAUDE.md + .cursorrules point at it.
Verification
CI checks + reviewer model on every PR. Human review on auth, payments, data.

Team / lead

IDE
One blessed pick across the team — Cursor or Cline. Not both.
Terminal agent
Claude Code as default; Codex available for verification runs.
Foundation model
At least two, contractually. Builder and reviewer must differ.
Repo file
AGENTS.md owned and PR-reviewed; treated like docs that ship.
Verification
Reviewer model in CI, blocking merge. Human review for architecture and high-stakes paths only.

Move up a row when the current one stops fitting — usually because a hallucinated change made it past you. Don't adopt the heaviest stack on day one. The overhead only pays off when the risk it covers is real.

The stack layers

AI-assisted dev happens in four layers. You don't have to use all of them, but understanding what each layer does helps you choose tools deliberately instead of by hype.

1

Foundation model

The brain. The LLM behind everything else. Examples: Claude Opus, GPT-5, Gemini Pro, Kimi, DeepSeek. You usually don't pick this directly — it's chosen by the layer above.

2

IDE / editor integration

Where you actually type code. Examples: Cursor, GitHub Copilot, Windsurf, Continue, Cline. This layer handles autocomplete, inline edits, and chat-in-editor.

3

Terminal / agent layer

For longer tasks the IDE can't handle on its own. Examples: Claude Code, Codex, Kimi Code, Aider. Runs as a CLI, integrates with git, can drive multi-file changes and CI.

4

Verification / review

A second model whose only job is to check the first model's output. Could be the same product running with a different prompt, or a different model entirely. Catches what the builder missed.

Most teams start with layers 1+2. As work gets bigger, they add layer 3. Once they've been bitten by a hallucinated PR or two, they add layer 4.

Models as a team, not a tool

The 2026 default for serious work isn't "pick the best AI and use it for everything." It's splitting the work across roles, like a small engineering team. The pattern people converge on:

RoleJobWhat it rewards
Architect"What should we build?" Plans, structures, decides tradeoffs.Careful reasoning. Big context. Slow is fine.
Builder"Build it." Implements, tests, ships.Speed. Tool integration. Repo awareness.
Reviewer"Find what's wrong." Skeptical, fresh-eyes critique.Independence from the builder's biases.
SpecialistNarrow expertise (security review, perf, docs).Domain depth, not generality.

The point isn't to use four different products. It's to think in roles. You can play all four roles with one product if you switch prompts. You can play them with two or three different models. The structure matters more than the brand.

See the glossary on AI team topology and model verification for the underlying concepts.

Three workflow shapes

Most AI-assisted dev work fits one of three shapes. Pick deliberately — using the wrong shape is where most velocity gets lost.

1. Solo builder

One model in one tool. You prompt, it builds, you review with your own eyes.

When it's right:small features, prototypes, throwaway code. Adding more layers is overhead you don't need.

2. Architect + Builder

One model plans, a different one (or the same one with different instructions) implements. The plan is written down — usually as a PLAN.md — before code is written.

When it's right:features that touch multiple files or services. Anything where "how should we structure this?" isn't obvious. Refactors.

3. Full team (Architect → Builder → Reviewer)

Three roles, ideally three models. Architect writes the plan. Builder implements. A different model reads the diff with no context about how it was built and looks for problems.

When it's right: code shipping to production. Anything touching auth, payments, data integrity. Anywhere a confident hallucination is expensive.

Handoff patterns

The thing that breaks multi-model workflows isn't the models — it's the handoffs. People paste 50k tokens of chat history into the next model and wonder why output drops. The fix is structured handoffs: small, deliberate documents that capture decisions, not conversation.

A typical handoff chain

  1. Architect → Builder: Architect produces PLAN.md— goals, constraints, chosen approach, rejected alternatives, non-goals. Builder reads only this, not the architect's reasoning trace.
  2. Builder → Reviewer: Builder produces a diff plus an IMPLEMENTATION.md — what was built, what deviated from the plan, why. Reviewer reads PLAN.md + diff + IMPLEMENTATION.md.
  3. Reviewer → Builder (loop): Reviewer writes REVIEW.md — issues, questions, blocking concerns. Builder addresses them and loops back. Stop when reviewer is satisfied.

Each handoff doc is short — under 2k tokens, ideally. The discipline is in summarizing, not in writing more.

Files every repo should have

The single highest-ROI thing you can add to a codebase for AI-assisted work is a repo instruction file at the root. It tells every coding agent how your code actually works — conventions, dependencies, gotchas — so they stop relearning it every session.

The standard files

  • CLAUDE.md — Read by Claude Code. Project context, conventions, what NOT to do.
  • AGENTS.md — Generic agent instruction file, increasingly read by multiple tools.
  • .cursorrules — Cursor-specific. If you use Cursor, this is what it reads.
  • .continue/ — Continue config + checks. CI-friendly.

A good repo file is short (a few hundred lines max), specific (real examples, not principles), and actively maintained. Bad ones are aspirational and rot. See the glossary on repo instruction files for what good ones look like.

What good and bad ones look like

The difference isn't length — it's signal. A useful AGENTS.md is boring and specific. A useless one reads like a careers page.

High signal
# Acme Dashboard

Stack: Next.js 16 (App Router) + Postgres 16 + Resend.
Deployed to Vercel. Postgres on Neon (preview branches per PR).

## Commands
- npm run dev          # localhost:3000
- npm run typecheck    # must pass before commit
- npm test             # vitest, watch mode by default
- npm run db:migrate   # drizzle, never edit migrations by hand

## Architecture
- App Router only. No /pages directory.
- Server Components by default. Add 'use client' only when needed.
- DB access lives in lib/db/*. Never import drizzle in components.
- All emails go through lib/email/send.ts (wraps Resend).

## Tests
- Co-located: foo.ts -> foo.test.ts.
- New server actions require a test. UI components do not.
- Coverage gate: 70% on lib/, no gate on app/.

## Never
- Don't add a new ORM. We use drizzle.
- Don't introduce client-side data fetching for first paint.
- Don't catch errors silently — log via lib/log.ts and rethrow.

## Style
- Tailwind utility classes, no CSS modules.
- Named exports only. No default exports outside route files.
- Dates as ISO strings at the boundary; Date objects internally.
Low signal
# Welcome to Acme

Acme is a next-generation platform empowering teams to
unlock productivity through delightful experiences. Our
mission is to build software people love.

## Our values
- We value clean code.
- We believe in excellence.
- We move fast and care deeply about quality.
- Communication is key.

## Getting started
Clone the repo and follow the README. Make sure you have
Node installed. Install dependencies and you should be
good to go!

## Coding guidelines
- Write good, readable code.
- Follow best practices.
- Write tests where appropriate.
- Keep functions small and focused.
- Comment your code where it makes sense.
- Be a good citizen of the codebase.

## Architecture
We use a modern stack with industry-standard tools. The
frontend talks to the backend, which talks to the database.
For more details ask in #engineering.

## Notes
TODO: update this doc — last edited 14 months ago.

Test: hand it to a model that's never seen the codebase. If it can't answer "what command runs the tests?" or "where does DB code live?" from the file alone, it's the bad version.

Template gallery

Starting points for the files above. Copy, paste at the root of your repo, then strip what doesn't apply and fill in the rest. Templates are intentionally terse — a long file that nobody reads is worse than a short one that everyone does.

AGENTS.md

Generic multi-tool repo instruction file.

# Project: <name>

Stack: <framework> + <db> + <key services>.
Deployed to <host>. Source of truth: this file.

## Commands
CLAUDE.md

Claude Code-specific instructions and loop.

# Claude Code instructions

You are working in a real production repo. Read this file fully
before touching code. When in doubt, ask before editing.

## Stack
.cursorrules

Cursor-specific editing rules and tone.

# .cursorrules

Stack: <framework> + <db> + <key services>.
Read AGENTS.md at repo root before any non-trivial edit.

## Editing
PLAN.md

Architect → Builder handoff template.

# PLAN: <feature name>

Author: <model + role>
Date: <YYYY-MM-DD>
Status: draft | accepted | implemented
IMPLEMENTATION.md

Builder → Reviewer handoff template.

# IMPLEMENTATION: <feature name>

Author: <model + role>
Date: <YYYY-MM-DD>
Plan: link to PLAN-<feature>.md
REVIEW.md

Reviewer → Builder feedback template.

# REVIEW: <feature name>

Reviewer: <model + role>
Date: <YYYY-MM-DD>
Implementation: link to IMPL-<feature>.md
Verdict: approve | request-changes | block

These are starting scaffolds, not finished docs. The first edit pass — deleting lines that don't apply to your stack — is where the file becomes useful.

Verification loops

The most underrated practice in AI-assisted dev right now. AI lets you build 5x faster — but reviewing a 500-line diff still takes 500 lines of attention. If verification doesn't scale with build velocity, error rates rise silently. The principle is build-more, verify-more.

Three verification layers worth wiring in:

Automated checks (cheap, always-on)

Tests, linters, type-checks, security scans. Run on every PR. AI doesn't skip these because they're fast and don't need a human.

Model verification (medium effort, high signal)

A second model reviews the diff with no priors. Different model from the builder, ideally. Catches confident-wrong claims and missed edge cases. See model verification.

Human review (expensive, save it)

Still essential for high-stakes code (auth, payments, data integrity) and for the architecture decisions automated tools can't evaluate. Don't use human attention for things automation could catch.

When each level is required

Verification scales with risk, not effort. The right amount is the minimum that catches the failures you care about — and the floor rises fast once real users, real money, or real data are on the line.

Code typeAutomatedModel reviewIndependent verifierHuman review
Throwaway prototype / demo
Internal tool
Production feature
Auth / payments / data integrity
Database migration / schema change

Read the rows as floors, not ceilings. A throwaway demo can absolutely get human review if you have time — but production code without an independent verifier is shipping on faith.

Escalation rules

The matrix above is the default. These are the overrides — situations where the normal verification level is not enough, regardless of how confident the builder model sounds. Treat them as hard rules, not suggestions. If a diff trips one, the gate closes until the extra verification is done.

If auth, sessions, or password handling is touched

Human review required. No exceptions for "small" auth changes — those are the dangerous ones.

If migration is destructive (DROP COLUMN, RENAME, NOT NULL on existing rows)

Require an explicit, written rollback plan attached to the PR. Test it on a copy of prod before merge.

If diff exceeds ~400 lines

Require a structured PLAN.md before merge. Big diffs without a plan are how scope creep ships unreviewed.

If code touches payments, billing, or money flow

Independent verifier model AND human review. Different model from the builder. Money bugs do not fail loud.

If editing CI/CD or deploy pipelines

Human review required, plus a tested rollback. A broken deploy pipeline blocks every fix that comes after it.

If touching customer data export, deletion, or PII

Human review plus a security checklist. Get this wrong once and you owe regulators an explanation.

These rules exist because confident-wrong is the AI failure mode that costs the most. The builder will tell you the migration is safe. The builder is not the one paying for the rollback.

Picking your setup

There's no single right answer. The right setup depends on what you build, how often, and how much risk you can absorb. A rough sorting:

If you're…Start here
Just exploring AI devOne IDE-integrated tool. Solo-builder workflow.
Shipping side projectsIDE tool + a terminal agent for bigger tasks. Add CLAUDE.md or .cursorrules.
Working in a real production codebaseArchitect+Builder workflow. Repo instruction file. Automated CI checks on every PR.
Building anything mission-criticalFull team workflow. Verification model on every PR. Human review on auth/payments/data.
Cost-sensitive / OSSOpen-source tooling (Cline, Continue, OpenHands, Aider) + BYOK. Run verification only on critical paths.

For specific tool comparisons see /compare. For tools by category see /tools. For Claude-specific deep-dive see /agents.

Where it ships

Most AI builders include their own one-click deploy — Lovable, Bolt, v0, Replit, Base44, and Emergent will all ship what they generate without any external service. That's the right call for prototypes and demos. The platforms below are where teams land when they outgrow the built-in hosting: to own the deploy, extend the stack, control costs at scale, or wire into infra they already run.

Hosting & deploy

Database & backend

Once you're going external, pick by fit with your existing workflow — not feature lists. Most AI-generated apps will run on any of these; the right answer is whichever one matches how you already deploy, monitor, and pay.

Common mistakes

The mistakes change as you get more practice. What trips up someone in their first week is different from what blows up a team six months in. Sorted by where you are.

Beginner — your first weeks with AI

  • Trusting generated code blindly. Confident-sounding output isn't the same as correct output. Read every line before you commit it, especially early on when you're still calibrating what the model gets wrong.
  • Using one model for everything. Same problem as one engineer doing every job: you get blind spots. Split roles where the work matters.
  • Overcomplicating too early. You don't need agents, orchestration, or a verification model on day one. Start with one tool, one prompt, one file. Add layers only when you feel the pain they fix.
  • Reaching for orchestration when one call would do. Sometimes one model and one prompt is the right tool. Don't add agents and teams to problems that don't need them.

Intermediate — a few months in

  • Skipping the repo instruction file. Every session re-explaining your codebase to a fresh agent is wasted tokens and inconsistent results. 30 minutes writing CLAUDE.md saves hours per week.
  • Pasting chat history as a handoff. The next model doesn't need 50k tokens of conversation — it needs the conclusion plus the constraints. Write a short, structured handoff.
  • Dumping giant context into the prompt. More tokens isn't more signal. A 200k-token paste of your repo buries the part that matters and degrades output. Curate context the way you'd brief a new hire.
  • No clear handoff docs between roles. If your architect, builder, and reviewer aren't reading the same short artifacts (PLAN.md, IMPLEMENTATION.md, REVIEW.md), they're each guessing at the others' intent. Write the docs.
  • Asking the builder to verify itself. Self-review is theater — same model, same biases, same blind spots. The verifier needs to be different from the builder.

Advanced — running AI workflows on a team

  • Building 5x faster without verifying 5x more. Velocity gains compound, but so do undetected errors. If verification doesn't scale with build speed, error rates rise silently. Wire in checks before you ship.
  • Unclear ownership of review. "The model reviewed it" isn't a name on a PR. Someone human is accountable for what ships — decide who, write it down, and don't let model verification quietly absorb that responsibility.
  • Too much autonomy on high-risk surfaces. Auth, payments, data integrity, infra changes — these are not places to let an agent merge on green CI. Gate the blast radius; agents propose, humans approve.
  • Vendor lock-in through undocumented habits. If your team's workflow only works because everyone happens to use the same IDE, the same model, the same prompt tricks — you have a dependency you didn't choose. Write the conventions down so they survive a tool swap.

Go deeper