Claude Skill

AI Output Verifier

Audits AI-generated code for hallucinated APIs, made-up signatures, and silent rewrites.

Download skill (.zip)Or download whole pack

What it does

Takes AI-generated code (yours or someone else's) and verifies the parts the model is most likely to invent: function signatures, library imports, schema fields, type definitions. Flags lines that need human checking and tells you how to verify each one. Calibrated to catch the failures unique to AI-assisted code, not general code review.

When to use

✓You're reviewing a PR that was mostly AI-written
✓You wrote AI-assisted code and want a sanity check before committing
✓A diff feels suspicious and you want to find the lines the AI invented

When not to use

✗Code you wrote yourself line-by-line — failure modes are different
✗Style or formatting concerns — use a linter

Install

Download the .zip, then unzip into your Claude skills folder.

mkdir -p ~/.claude/skills
unzip ~/Downloads/ai-output-verifier.zip -d ~/.claude/skills/

# Restart Claude Code session.
# Skill is now available — Claude will use it when relevant.

SKILL.md

---
name: ai-output-verifier
description: Use when reviewing AI-generated code for hallucinations, invented APIs, or silent behavior changes. Triggers on "verify this AI code", "did the AI make this up", or pasted AI-assisted diffs.
---

# AI Output Verifier

LLMs don't know what they don't know. When information is missing, they fill the gap with something statistically plausible — a function that doesn't exist, an import path that's wrong, a schema field renamed last week. The code compiles. It runs. Sometimes it even seems to work.

This skill verifies the parts AI is most likely to invent.

## Required inputs

Before auditing, confirm you have:

1. **The AI-generated code** (file or diff)
2. **Real source of truth** for any external surfaces the code touches:
   - Schema (Prisma, Drizzle, raw SQL, etc.)
   - Type definitions for libraries it imports
   - API contracts (OpenAPI, gRPC schemas)
3. **The package.json / lockfile** so we can check actual installed versions

If source-of-truth files aren't pasted, ask before auditing. Auditing against assumed APIs is exactly the failure mode this skill exists to prevent.

## Audit order

### 1. Imports
- Does every imported module exist in package.json / requirements.txt / equivalent?
- Is the imported symbol actually exported by that module at this version? (Default vs named, renamed exports, version-specific exports.)

### 2. Library function calls
- Does the function exist with that name?
- Does the signature match — argument count, types, optional parameters?
- Common hallucinations:
  - Methods that "sound like they should exist" (`findManyAndCount`, `upsertMany`)
  - Methods from one library applied to another (Mongoose syntax on Prisma)
  - Async/sync mismatches (await on a non-promise, missing await on a promise)

### 3. Database / schema references
- Every column referenced exists
- Every table referenced exists
- Type matches (string vs int, nullable vs not)
- Joins reference real foreign keys

### 4. Internal API references
- Every function/type imported from the user's own codebase exists and is exported
- Signatures match the call sites

### 5. Silent rewrites
- Did the AI "improve" something it shouldn't have?
- Look for: renamed variables, shifted control flow, "cleaner" error handling that swallows errors that used to surface, removed validation, changed defaults

## Output format

```
## Verified
- ✓ [item] — [how verified]

## Suspicious — needs human check
- ⚠ [line N]: [what concerns you]
  - To verify: [specific action — grep, link, type check]
  - If wrong: [what breaks]

## Confirmed wrong
- ✗ [line N]: [what's wrong]
  - Evidence: [the schema/type/signature that contradicts it]
  - Fix: [minimal correction]
```

## Anti-patterns

- Approving without checking — "looks reasonable" is the failure mode
- Trusting one AI to fact-check another's output when both have the same training cutoff and gaps
- Rewriting before verifying — fixing the wrong thing is twice the work
- Stopping at the first issue — there are usually clusters

## Tone

- Specific. Quote the line. Show the actual schema/type that contradicts.
- Confident on the wrongs, calibrated on the maybes. Don't say "probably wrong" when you can grep and find out.
- Brief. The reader is debugging — they don't need narration.

Example prompts

Once installed, try these prompts in Claude:

Verify this Prisma query the AI wrote against my actual schema. [paste schema] [paste query]
Audit this file for hallucinated functions or imports. The AI wrote the whole thing. [paste file]

Related prompts

Don't want to install a skill? These prompts in /prompts cover similar ground for one-shot use:

code prompts