What AI is good at, and what it still gets wrong

A blunt capability map. The categories of work where AI is reliable, the categories where it bluffs, and the in-between where it works if you verify.

7 min read·Updated May 25, 2026

AI is fluent. That is the trap. Fluent output sounds confident even when it is wrong, and most people calibrate to confidence before content. This guide is the blunt version of where AI is reliable, where it bluffs, and where it works only if you check.

If you are starting out, this is the map that prevents the first few avoidable mistakes.

Categories of work

Think about AI's reliability across three categories: high-trust, conditional-trust, and low-trust. The category depends on what kind of output the task produces and how easy it is to verify.

High-trust: where AI is genuinely reliable

These are tasks where AI is good enough that you can usually use the output with light editing.

Transformation of text you provide. Summarize this document. Translate this paragraph. Convert this list into a table. Rewrite this email in a more direct tone. The model has the source in front of it; it is not inventing facts, just reshaping what is there. Errors are usually stylistic, not factual.

Explanation of well-known concepts. Explain how Stripe webhooks work. What is OAuth. How does a vector database differ from a regular one. These topics are well-represented in training data and the model can produce a useful explanation. You should still verify before quoting specifics, but the structure of the answer is usually right.

First drafts of creative work. Outlines, brainstorms, naming ideas, draft emails, slide bullet points. The output is rarely the final version, but it gets you past the blank-page problem in seconds.

Code in well-known languages and frameworks. Python, JavaScript, TypeScript, common libraries, common patterns. The model has seen millions of examples and produces syntactically correct, usually-functional code. Errors tend to be in the seams — wrong library version, hallucinated method names, missed edge cases — which is why review still matters. See Catching AI-generated bugs.

Structured rewriting. Convert a transcript into meeting notes. Pull action items out of a Slack thread. Extract a list of names from an email chain. AI handles these well because the work is mostly identification and reformatting.

Tutoring on familiar topics. "Explain this like I am five." "What does this acronym mean." "Walk me through what a HashMap does." The model is patient, infinitely available, and good at restating things in different ways.

Conditional-trust: AI works if you verify

These tasks are where AI is genuinely useful, but the output can be silently wrong in ways that matter. You use it with a verification step.

Research summaries. AI can produce a coherent summary of a topic, but it will sometimes invent quotes, misattribute sources, or confidently state numbers it half-remembered. Treat the output as a draft and verify any factual claim before quoting it. AI-search tools (Perplexity, ChatGPT with browsing, Gemini with search) reduce but do not eliminate this.

Writing that includes facts. A blog post, a customer email, a sales claim, a press release. The structure and prose are usually good; the embedded facts ("we have 50,000 customers", "this was launched in 2023", "the regulation took effect last March") need checking.

Code that calls APIs or libraries. The skeleton is usually right; the specifics — method names, parameter orders, version compatibility — are where AI invents. Run it. Trust the linter. Check the docs for any unfamiliar method. See How to verify AI output before you trust it.

Analysis of data you provide. If you paste a spreadsheet and ask AI to analyze it, the prose answer will sound confident. The numbers in the prose will sometimes be wrong. Always re-derive the key numbers yourself or with a code interpreter.

Decision frameworks for unfamiliar domains. AI can produce a coherent-sounding framework for "how to price a B2B SaaS product" or "how to negotiate a salary offer." Some frameworks are decent; some are generic; some are confidently wrong. Cross-check anything important against a real source.

Low-trust: where AI is unreliable and you should not lean on it

These are tasks where AI's failure mode is significant enough that the output is more dangerous than helpful for serious work.

Current events, recent news, recent prices, recent product features. Most models have a training cutoff. They do not know what happened last week. If they answer anyway, they are guessing based on extrapolation. Even AI search tools, which can browse, sometimes misread or fabricate from the live page.

Precise calculations and arithmetic. A language model is not a calculator. It will sometimes get $14,500 × 0.07 wrong by a meaningful amount because it is generating digits, not computing. Use a calculator, a spreadsheet, or a tool with code execution for any number that matters.

Legal, medical, financial, or regulatory advice. Not because AI cannot generate plausible text on these — it can. But the consequences of being subtly wrong are high, and verifying requires a real expert anyway. Use it for "explain this concept to me so I can have a smarter conversation with my lawyer" — not as a substitute for the lawyer.

Citations and source attribution. AI hallucinates citations more often than almost any other category. It will produce author names, paper titles, page numbers, and journal references that look real and are not. If you need a real source, find one yourself.

Personal claims about specific people. AI will sometimes confidently state biographical facts about real people that are completely wrong. Never trust an AI biography of a public figure without checking.

Anything where being almost right is worse than being wrong. If "close enough" causes a customer call, a production bug, a misfiled tax return, or a fight — verify it.

Why fluent ≠ correct

The deepest mistake new AI users make is calibrating to confidence. The model is trained to produce text that sounds right. It does not have a separate "uncertainty" signal that adjusts the prose. So "Stripe was founded in 2010" and "Stripe was founded in 2007" sound equally confident, and one of them is wrong.

The fix is structural, not stylistic. Treat AI output by category:

  • If it is something only AI saw (a document you pasted, a question about your own code), trust it more.
  • If it is something AI is remembering (a fact about the world, a price, a date), verify it.

This is the habit that separates people who use AI safely from people who get burned by it.

A short calibration checklist before you ship AI output

Before you send, post, commit, or publish anything AI wrote:

  1. Did it invent any specific name, number, date, or URL? Check each one.
  2. Did it cite a source? Open the source and check the claim is in it.
  3. Did it use any code, API, or library? Run it or look it up.
  4. Did it make a claim that, if wrong, would embarrass you or hurt someone? Get a second source.
  5. Did it sound exactly right? Especially check that one.

The first four are mechanical. The fifth is calibration — "this is suspiciously well-written" is often a sign that the model produced a confident-sounding answer to a question it does not actually know.

What this means in practice

You do not need to verify everything. That defeats the speed benefit.

The right model is: most AI output is text you would have written anyway, just faster. For that category — drafts, restructures, summaries, explanations of familiar topics — light editing is enough. The verify-heavy category is smaller: facts, numbers, citations, current events, recent product details, code that calls real APIs.

People who internalize this get most of the speed benefit AI offers and avoid most of the failure modes. People who do not either (a) verify nothing and eventually ship something wrong publicly, or (b) verify everything and lose most of the speed.


Read next: How to verify AI output before you trust it for the practical checklist by content type.