AI– 10

GPT-5.5 Tops New AI Agents Benchmark, Beats Claude

VentureBeat·June 10, 2026 at 11:16 PM

A new benchmark called Agents' Last Exam (ALE), developed by UC Berkeley researchers and over 300 experts, aims to measure AI's ability to perform real-world professional tasks. OpenAI's GPT-5.5 achieved the highest score with a 24.0% pass rate, surpassing Anthropic's Claude Fable 5, which scored 22.0%. ALE evaluates AI agents across five layers—reasoning, perception, orchestration, tool use, and runtime—using realistic workflows from 55 industries, unlike previous benchmarks that were prone to 'cheating' and grading issues. The benchmark's design prevents data contamination and ensures rigorous evaluation of AI capabilities in complex, long-horizon tasks, highlighting that even top models still have significant room for improvement.

GPT-5.5 Tops New AI Agents Benchmark, Beats Claude

Astrophysicist Uses Codex for Black Hole Simulations

Apple's new Siri AI is curt

New AI model trained from scratch for $1,500

XAI Co-Founder Launches New Personalized AI Startup