All tags
Person: "ofirpress"
GPT-5 Codex launch and OpenAI's quiet rise in Agentic Coding
gpt-5-codex qwen3-next-80b openai alibaba together-ai nvidia agentic-ai software-engineering long-context mixture-of-experts model-optimization cuda-acceleration inference-efficiency routing task-adaptive-thinking sama swyx omarsar0 ofirpress
OpenAI released GPT-5-Codex, an agentic coding model optimized for long-running software engineering tasks with dynamic task-adaptive thinking, multi-hour autonomy, and improved code quality. It achieves 51% accuracy on an unreleased large refactor benchmark and integrates deeply with developer tools like Xcode. Meanwhile, Alibaba launched Qwen3-Next-80B, a hybrid MoE model with native long-context support (262k tokens, extensible to 1M+), targeting efficient reasoning and repository-scale code analysis, supported by Together AI and NVIDIA with CUDA-accelerated attention. The trend towards hybrid SSM + MoE architectures is noted, emphasizing efficiency and scaling in China and US training regimes. Community discussions highlight the importance of variable compute and routing for inference efficiency and quality.
OpenAI rolls out GPT-5 and GPT-5 Thinking to >1B users worldwide; -mini and -nano help claim Pareto Frontier
gpt-5 gpt-5-mini gpt-5-nano claude-4.1-sonnet claude-4.1-opus openai cursor_ai jetbrains microsoft notion perplexity_ai factoryai model-architecture context-windows pricing-models coding long-context prompt-engineering model-benchmarking model-integration tool-use reasoning sama scaling01 jeffintime embirico mustafasuleyman cline lmarena_ai nrehiew_ ofirpress sauers_
OpenAI launched GPT-5, a unified system featuring a fast main model and a deeper thinking model with a real-time router, supporting up to 400K context length and aggressive pricing that reclaims the Pareto Frontier of Intelligence. The rollout includes variants like gpt-5-mini and gpt-5-nano with significant cost reductions, and integrations with products such as ChatGPT, Cursor AI, JetBrains AI Assistant, Microsoft Copilot, Notion AI, and Perplexity AI. Benchmarks show GPT-5 performing strongly in coding and long-context reasoning, roughly matching Claude 4.1 Sonnet/Opus on SWE-bench Verified. The launch was accompanied by a GPT-5 prompting cookbook and notable community discussions on pricing and performance.
not much happened today
aria o1-preview o1-mini gemini-1.5-pro gemini-1.5-flash gemini-1.5 claude-3.5-sonnet rhymes-ai openai anthropic google meta-ai-fair oxylabs multimodality mixture-of-experts long-context retrieval-augmented-generation benchmarking software-engineering llm-evaluation prompt-engineering web-scraping python production-applications mervenoyann osanseviero dbrxmosaicai ylecun ofirpress clefourrier omarsar0 rohanpaul_ai svpino finbarrtimbers _philschmid
Rhymes AI released Aria, a new 25.3B parameter multimodal MoE model supporting text, code, image, and video with a 64k token context window and Apache-2.0 license. OpenAI's o1-preview and o1-mini models show consistent improvement over Anthropic and Google Gemini 1.5 Pro/Flash on long context RAG benchmarks up to 128k tokens, while Google Gemini 1.5 models excel at extreme context lengths up to 2 million tokens. Meta AI expanded rollout to 21 countries with new language support but remains unavailable in the EU. The one-year anniversary of SWE-bench benchmark for software engineering tasks was celebrated, alongside the introduction of SWE-bench Multimodal. New AI tools include OxyCopilot by Oxylabs for web scraping, Taipy for Python-based production apps, and Latitude for prompt engineering. Industry insights highlight changing AI funding dynamics and OpenAI's strategic focus on consumer products like ChatGPT. "all recaps done by Claude 3.5 Sonnet, best of 4 runs."