All news
ResearchThe Decoder·July 3, 2026

UK's AI Security Institute finds standard benchmarks systematically underestimate what AI agents can actually do

UK's AI Security Institute reveals that standard benchmarks fail to accurately assess AI agents' capabilities. This means existing evaluations might not reflect the true potential and performance of these systems.

More in Research