All terms
Agents & tools

Computer use

Also known as: browser agent, GUI agent, desktop agent

Agents that control a real desktop — moving the mouse, typing, reading the screen — instead of calling structured APIs.

What it means

Computer use is a class of agentic AI where the model sees a screenshot of a real desktop or browser, decides where to click and what to type, and emits low-level UI actions (mouse_move, click, type, scroll). It's how you automate software that doesn't have an API: legacy enterprise apps, finicky web forms, anything behind a login wall. Anthropic shipped Computer Use with Claude 3.5 Sonnet in October 2024, the first frontier model with this capability. OpenAI followed with Operator in early 2025. Google, Adept, and others have variants. By 2026 every major lab has a computer-use offering, and the loop is roughly the same: take a screenshot, ask the model what to do next, execute that action in a sandbox VM, repeat. It works — but it's slow, expensive, and brittle compared to API-based tools. A task that takes a structured tool call 200ms takes a computer-use agent 30+ seconds because of screenshot tokens and reasoning per step. Pixel-level UIs change. Pop-ups break flows. Captchas exist. Rates of unattended completion on real-world tasks are still in the 40-70% range on benchmarks like OSWorld and WebArena, depending on difficulty. Where it shines: back-office automation where the alternative is hiring someone, demos that need to look like a human is using the app, and bridging to systems with no API. Where it doesn't: anything time-sensitive, anything you can call a real API for, anything that needs to be 99.9% reliable.

Example

You ask an Operator session to "find the cheapest flight from SFO to NYC next Tuesday on United.com and screenshot the booking page." It opens Chrome, navigates to united.com, fills the form, sorts by price, and screenshots — all by clicking pixels.

Why it matters

Computer use is the bridge between agentic AI and the long tail of software that has no API. It's also the most visceral demo of what 'agent' means — you literally watch the model use the same tools you do. State of the art in 2026 is real but slow; expect rapid improvement as labs optimize visual reasoning and action latency.

Related terms

See it in a comparison