All tags
Topic: "program-synthesis"
not much happened today
gemini-3.1-pro gpt-5.5 opus-4.7-xhigh agent-moderncolbert google-deepmind lighton nous-research research-benchmarks math medical-benchmarks agentic-systems program-synthesis retrieval-augmentation training-optimization superoptimization scaling-laws training-efficiency gpu-optimization attention-mechanisms soohak polynoamial torchcompiled leloykun che_shr_cat jjitsev omarsar0
Research-level reasoning benchmarks are advancing with 439 new math problems from 64 mathematicians and expanded medical benchmarks in Medmarks v1.0 covering 30 benchmarks and 61 models. Google DeepMind's AI Co-Mathematician achieves 48% on FrontierMath Tier 4, while Gemini 3.1 Pro improves physics benchmark scores significantly. GPT-5.5 high/xhigh outperforms Opus 4.7 xhigh on program synthesis tasks. Retrieval benchmarks favor smaller models like LightOn's Agent-ModernColBERT with 149M parameters. Training optimization advances include SOAP/Muon-style updates reducing training steps, and a Lean4-to-TileLang superoptimizer achieving 1.8× speedup on A100 GPUs. Scaling laws are reconsidered with arguments for measuring in bytes rather than tokens. New training-time efficiency methods like Lighthouse Attention enable subquadratic training wrappers removable before deployment.
Gemini 2.5 Computer Use preview beats Sonnet 4.5 and OAI CUA
gemini-2.5 gpt-5-pro glm-4.6 codex google-deepmind openai microsoft anthropic zhipu-ai llamaindex mongodb agent-frameworks program-synthesis security multi-agent-systems computer-use-models open-source moe developer-tools workflow-automation api vision reasoning swyx demishassabis philschmid assaf_elovic hwchase17 jerryjliu0 skirano fabianstelzer blackhc andrewyng
Google DeepMind released a new Gemini 2.5 Computer Use model for browser and Android UI control, evaluated by Browserbase. OpenAI showcased GPT-5 Pro, new developer tools including Codex with Slack integration, and agent-building SDKs at Dev Day. Google DeepMind's CodeMender automates security patching for large codebases. Microsoft introduced an open-source Agent Framework for multi-agent enterprise systems. AI community discussions highlight agent orchestration, program synthesis, and UI control advancements. GLM-4.6 update from Zhipu features a large Mixture-of-Experts model with 355B parameters.