Catching AI-generated bugs before they ship
Hallucinated APIs, missing edge cases, security holes. How to review AI code without being a senior engineer.
AI-generated code looks right. That's the problem.
When a model writes you a function, it writes one that fits the shape of the request. The variable names are sensible. The structure looks like code you'd write yourself. It compiles. It even runs on the happy path. Confidence and correctness are not the same thing, and the gap between them is where bugs hide.
You don't need to be a senior engineer to catch most of these. You just need to know what to look for.
Hallucinated APIs
The model will sometimes invent a method that doesn't exist, or use parameters that were renamed two versions ago. It looks like real code because it's pattern-matched to real code.
// Looks fine. The method doesn't exist.
const result = await stripe.customers.findOrCreate({ email })
If you don't recognize a function name, search the actual docs before trusting it. "Search the docs" is the cheapest debugging step you have.
Missing edge cases
Models write for the input you described, not the inputs that will actually show up. Empty arrays, null values, network failures, weird unicode, timezones across DST boundaries. None of that is in the prompt, so none of it is in the output.
# Crashes when items is empty
return sum(x.price for x in items) / len(items)
The fix is asking explicitly: "what happens if the input is empty, null, or huge?"
Security holes
This is the category that bites hardest. Unparameterized SQL queries. API keys logged to the console. Auth checks skipped on the server because the frontend already hides the button. Unsafe deserialization on user input. The model will happily produce all of these.
# Classic SQL injection
cursor.execute(f"SELECT * FROM users WHERE email = '{email}'")
Never trust AI-generated code that touches auth, secrets, or a database without a second pass focused only on security.
Outdated patterns
Models train on a snapshot of the internet. If a library changed its API last year, the model might still write the old version. React class components, deprecated Node crypto methods, OpenAI SDK calls from the v3 era. It's not wrong in the sense of broken, it's wrong in the sense of "this is how we did it in 2022."
Check the version in your package.json against what the model wrote. If they disagree, the model loses.
Looks right, isn't
The hardest class. Code that passes a glance review and breaks subtly when run. Off-by-one errors. Wrong comparison operators. A loop that mutates the array it's iterating. A regex that matches 99% of cases. These are the bugs that ship.
The only reliable defense is running the code on real inputs, not just reading it.
A review checklist you can actually use
- Read every line. If you're going to merge it, you're responsible for it. "The AI wrote it" is not a postmortem.
- Search the docs for any function or method name you don't recognize. Two minutes saves two hours.
- Run the code on edge inputs. Empty, null, very large, unicode, negative numbers.
- Grep the diff for
password,secret,api_key,token. Make sure none are hardcoded or logged. - If it touches a database, confirm queries are parameterized.
- If it touches auth, confirm the check happens server-side.
- Ask the model to review its own output. (See below.)
The self-review prompt
After the model writes code, paste it back and ask:
Below is code you just wrote. Act as a skeptical senior engineer reviewing it for a production deploy. List:
1. Any function, method, or parameter that may not exist or may be deprecated.
2. Edge cases the code does not handle (empty, null, very large, unicode, concurrency).
3. Security issues (injection, exposed secrets, missing auth, unsafe deserialization).
4. Anything that looks correct at a glance but would fail when run.
Be specific. Quote the line. If it's fine, say "no issues found" for that category.
[paste code]
You'll be surprised how often it catches its own bugs. The same model that wrote the code can often spot what's wrong with it, because reviewing is a different task than generating.
Make it write the tests
This is the highest-leverage habit in the entire pillar. After the model writes a function, ask it to write tests for that function. Then run them. Then paste any failures back and have it fix them.
Write tests for the function above. Cover the happy path plus empty input, null input, very large input, and any error conditions. Use [your test framework].
Tests force the code to be runnable, force you to look at edge cases, and force the model to confront its own assumptions. A function with passing tests it wrote itself is dramatically more trustworthy than a function it just claimed works.
Wrapping Pillar 3
Across this pillar you've gone from "AI is a chat tool" to "AI builds software in your repo." You picked your tooling, you scoped projects so the model doesn't drift, and now you have a checklist to keep bad code from shipping.
Pillar 4 is about owning what you've built. Hosting, deployment, monitoring, costs, and the boring infrastructure that turns a working prototype into something you can run as a real product. Build is the fun half. Ship is the half that pays.
Next in this pillar
Working with AI on a codebase you didn't write