Company: "nyu"

Feb 04, 2025

o3 o3-mini-high o3-deep-research-mini openai google-deepmind nyu uc-berkeley hku reinforcement-learning benchmarking inference-speed model-performance reasoning test-time-scaling agent-design sama danhendrycks ethan-mollick dan-shipper

OpenAI released the full version of the o3 agent, with a new Deep Research variant showing significant improvements on the HLE benchmark and achieving SOTA results on GAIA. The release includes an "inference time scaling" chart demonstrating rigorous research, though some criticism arose over public test set results. The agent is noted as "extremely simple" and currently limited to 100 queries/month, with plans for a higher-rate version. Reception has been mostly positive, with some skepticism. Additionally, advances in reinforcement learning were highlighted, including a simple test-time scaling technique called budget forcing that improved reasoning on math competitions by 27%. Researchers from Google DeepMind, NYU, UC Berkeley, and HKU contributed to these findings. The original Gemini Deep Research team will participate in the upcoming AI Engineer NYC event.

You can also subscribe by rss .

Press Esc or click anywhere to close