Back to Feed
AI– 10
GPT-5.5 Tops New AI Agents Benchmark, Beats Claude
VentureBeat·
A new benchmark called Agents' Last Exam (ALE), developed by UC Berkeley researchers and over 300 experts, aims to measure AI's ability to perform real-world professional tasks. OpenAI's GPT-5.5 achieved the highest score with a 24.0% pass rate, surpassing Anthropic's Claude Fable 5, which scored 22.0%. ALE evaluates AI agents across five layers—reasoning, perception, orchestration, tool use, and runtime—using realistic workflows from 55 industries, unlike previous benchmarks that were prone to 'cheating' and grading issues. The benchmark's design prevents data contamination and ensures rigorous evaluation of AI capabilities in complex, long-horizon tasks, highlighting that even top models still have significant room for improvement.
Tags
ai
product
Original Source
VentureBeat — venturebeat.com