subscribe / issues / tags /

Person: "eric-wallace"

o3 solves AIME, GPQA, Codeforces, makes 11 years of progress in ARC-AGI and 25% in FrontierMath

o3 o3-mini o1-mini gpt-3 gpt-4o o1 openai benchmarking math reasoning model-performance inference-speed cost-efficiency alignment safety-testing sama eric-wallace

OpenAI announced the o3 and o3-mini models with groundbreaking benchmark results, including a jump from 2% to 25% on the FrontierMath benchmark and 87.5% on the ARC-AGI reasoning benchmark, representing about 11 years of progress on the GPT3 to GPT4o scaling curve. The o1-mini model shows superior inference efficiency compared to o3-full, promising significant cost reductions on coding tasks. The announcement was accompanied by community discussions, safety testing applications, and detailed analyses. Sama highlighted the unusual cost-performance tradeoff, and Eric Wallace shared insights on the o-series deliberative alignment strategy.

© 2026 • AINews

You can also subscribe by rss .

Press Esc or click anywhere to close