New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

Claude Mythos and GPT-5.5 just set a new benchmark by developing real browser exploits autonomously. This means AI can now perform complex tasks without human intervention, raising concerns about security and ethical use.
More in Agents
Anthropic’s long-sidelined Fable 5 is greenlit to return
Anthropic just greenlit the return of Fable 5, which had been sidelined for a while. This means they're moving forward with developing this AI agent, potentially enhancing their offerings in autonomous systems.

Claude Science is Anthropic’s newest flagship product
Anthropic just launched Claude Science, their latest flagship AI product focused on scientific tasks. This tool aims to assist researchers by generating insights and automating data analysis, making scientific workflows more efficient.
ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration
Hugging Face just launched ScarfBench, a benchmarking tool for AI agents focused on migrating Enterprise Java frameworks. This tool helps developers evaluate agent performance in real-world migration tasks, making the process smoother and more efficient.
Anthropic launches Claude Sonnet 5 as a cheaper way to run agents
Anthropic just launched Claude Sonnet 5, offering a more cost-effective way to run AI agents. This update makes it easier for developers to implement agent-based solutions without breaking the bank.