ResearchThe Decoder·July 3, 2026

UK's AI Security Institute finds standard benchmarks systematically underestimate what AI agents can actually do

UK's AI Security Institute reveals that standard benchmarks fail to accurately assess AI agents' capabilities. This means existing evaluations might not reflect the true potential and performance of these systems.

Read the full article on The Decoder

More in Research

ResearchMIT Technology Review13h

A device that revives eyeballs from dead donors could make eye transplants possible

Researchers developed a device that can revive eyeballs from deceased donors, potentially making eye transplants feasible. This breakthrough could significantly improve the availability of donor organs for those in need of vision restoration.

ResearchThe Decoder20h

GPT and Claude failed Bridgewater's finance tests because the right answers were never public

GPT and Claude struggled with Bridgewater's finance tests since the correct answers weren't publicly available. This highlights the limitations of AI models when faced with proprietary knowledge and specific industry standards.

ResearchLatent Space1d

AIEWF Daily Dispatch: The great loops debate and the state of AI engineering

AI engineers are debating the effectiveness of different loop structures in AI programming. This discussion could lead to more efficient coding practices and improved AI performance.

ResearchAWS Machine Learning1d

Best practices for multi-turn reinforcement learning in Amazon SageMaker AI

AWS just shared best practices for multi-turn reinforcement learning in Amazon SageMaker AI. This guidance helps developers optimize their models for better performance in interactive environments.