All tags
Model: "codex"
ElevenLabs $500m Series D at $11B, Cerebras $1B Series H at $23B, Vibe Coding -> Agentic Engineering
gemini-3 claude codex google openai github microsoft deepmind agent-frameworks model-deployment benchmarking cost-optimization software-development async-processing gpu-acceleration coding-agents user-adoption game-theory workflow-integration sama sundarpichai reach_vb
Google's Gemini 3 is being integrated widely, including a new Chrome side panel and Nano Banana UX features, with rapid adoption and a 78% unit-cost reduction in serving costs. The Gemini app reached 750M+ MAU in Q4 2025, nearing ChatGPT's user base. Google is also benchmarking AI "soft skills" through games like Poker and Chess in the Kaggle Game Arena. Meanwhile, coding agents are converging in IDEs: VS Code launched Agent Sessions supporting Claude and Codex agents with features like parallel subagents and integrated browsers. GitHub Copilot now allows agent choice between Claude and OpenAI Codex for async backlog clearing. OpenAI reports 1M+ active users for Codex with expanded integration surfaces, though some users request better GPU support. The coding-agent ecosystem is professionalizing with community platforms like OpenClaw and tooling such as ClawHub and CLI updates. "Gemini 3 adoption faster than any other model" and "VS Code as home for coding agents" highlight major industry shifts.
OpenAI Codex App: death of the VSCode fork, multitasking worktrees, Skills Automations
codex openai agent-based-systems parallel-processing software-testing developer-workflows automation product-feedback-loop neurosymbolic-ai benchmarking sama reach_vb gdb skirano embirico ajambrosino thsottiaux nbaschez yuchenj_uw badlogicgames random_walker
OpenAI launched the Codex app on macOS as a dedicated agent-native command center for coding, featuring multiple agents in parallel, built-in worktrees for conflict isolation, skills for reusable bundles, and scheduled automations. The app emphasizes developer workflows like Plan mode for upfront task decomposition and is gaining positive adoption signals from insiders including @sama. There is movement towards ecosystem standardization of skills folders, signaling early conventions in agent tooling. Codex also exemplifies a "self-improving" product feedback loop combining humans and agents. In coding agents practice, best practices include a "test-first" approach to bug fixes, the "conductor" model where one developer manages 5-10 agents in parallel, and a neurosymbolic framing explaining why coding agents succeed due to software's verifiability and symbolic tooling. Benchmark skepticism remains about productivity studies that do not reflect agentic workflows.
not much happened today
claude-3 codex gemini gpt-5.2-pro anthropic openai google sakana-ai cursor baseten epoch-ai-research deepmind benchmarking reasoning continual-learning reinforcement-learning model-performance agentic-ai security model-training sama fchollet shane_legg demishassabis
Anthropic launches "Claude in Excel Pro" with enhanced features. OpenAI reveals upcoming Codex agent loop and cybersecurity measures. Google boosts Gemini App quotas and partners with Sakana AI for advanced AI Scientist projects in Japan. Cursor introduces Agent Skills for dynamic context focus. GPT-5.2 Pro achieves 31% on FrontierMath Tier 4, showing significant benchmark progress. Baseten raises $300M at a $5B valuation targeting high-performance inference. Discussions highlight math benchmarks as indicators of AI capability, uneven AGI progress, and the importance of reasoning and continual learning as future frontiers. Notable figures include Sam Altman, François Chollet, Shane Legg, and Demis Hassabis.
ChatGPT starts testing ads on free tier + new $8/mo Go plan in the US
chatgpt-go codex openai ollama ads monetization memory agent-orchestration human-in-the-loop cli-tools context-length workflow-optimization sama sam_altman fidjissimo scaling01 tomwarren embirico adamdotdev ollama thsottiaux lateinteraction dbreunig
OpenAI announced the ChatGPT Go tier at $8/month with ads testing in the US free tier, emphasizing that ads will not influence responses and will be clearly labeled. The update includes memory improvements and a "very fast Codex" feature teased by Sam Altman. The Codex CLI ecosystem now supports open-weight models with improved context length. Discussions highlight the importance of human-in-the-loop for reliability in agent orchestration and file interface improvements over traditional retrieval-augmented generation.
not much happened today
kimi-k2 qwen3-next nemotron-nano-2 granite-4.0 gpt-4.5 copilot codex vllm perplexity-ai ibm anthropic graphiti claude cursor-ai microsoft mixture-of-experts model-integration cloud-computing hybrid-models benchmarking agent-systems memory-persistence semantic-search code-retrieval context-length-optimization tool-use evaluation-frameworks software-development scaling01 cedric_chee aravsrinivas omarsar0 _avichawla pierceboggan jo_parkhurst jyangballin ofirpress ml_angelopoulos
Kimi-K2 Reasoner has been integrated into vLLM and will soon be supported by SGLang, featuring a massive 1.2 trillion parameter MoE configuration. Perplexity AI released research on cloud-portable trillion-parameter MoE kernels optimized for AWS EFA, with potential integration into vLLM. IBM's vLLM team formalized hybrid dense and sparse expert models, supporting models like Qwen3-Next, Nemotron Nano 2, and Granite 4.0. Kimi-K2 reportedly scores 77% on GPQA Diamond, outperforming GPT-4.5 at 71.4%, though this is unverified.
Anthropic published a guide on efficient tool-heavy agent systems using MCP patterns, drastically reducing context tokens by ~98.7%. Graphiti MCP demonstrated shared memory across apps like Claude Desktop and Cursor for persistent agent memory. VS Code introduced an "Agent sessions" feature to unify agent management, including Copilot and Codex. Cursor AI improved coding accuracy via semantic search and code retrieval embeddings. New evaluation frameworks like CodeClash and LMArena assess agent and coding model performance in realistic multi-round tasks and occupation-tagged leaderboards.
Gemini 2.5 Computer Use preview beats Sonnet 4.5 and OAI CUA
gemini-2.5 gpt-5-pro glm-4.6 codex google-deepmind openai microsoft anthropic zhipu-ai llamaindex mongodb agent-frameworks program-synthesis security multi-agent-systems computer-use-models open-source moe developer-tools workflow-automation api vision reasoning swyx demishassabis philschmid assaf_elovic hwchase17 jerryjliu0 skirano fabianstelzer blackhc andrewyng
Google DeepMind released a new Gemini 2.5 Computer Use model for browser and Android UI control, evaluated by Browserbase. OpenAI showcased GPT-5 Pro, new developer tools including Codex with Slack integration, and agent-building SDKs at Dev Day. Google DeepMind's CodeMender automates security patching for large codebases. Microsoft introduced an open-source Agent Framework for multi-agent enterprise systems. AI community discussions highlight agent orchestration, program synthesis, and UI control advancements. GLM-4.6 update from Zhipu features a large Mixture-of-Experts model with 355B parameters.
OpenAI Realtime API GA and new `gpt-realtime` model, 20% cheaper than 4o
gpt-realtime gpt-4o-realtime grok-code-fast-1 codex mai-1-preview mai-voice-1 gemini-cli openai xai microsoft google speech-to-speech instruction-following function-calling telephony webrtc voice-agents multilingual-switching voice-control benchmarks coding-models ide-integration developer-tools model-updates swyx juberti omarsar0 reach_vb pbbakkum skcd42 mohitreddy13 cline kevinweil gdb sama _philschmid
OpenAI launched the gpt-realtime model and Realtime API to GA, featuring advanced speech-to-speech capabilities, new voices (Cedar, Marin), image input, SIP telephony, and a ~20% price cut. Benchmarks show improvements over gpt-4o-realtime on BigBench and ComplexFuncBench. xAI introduced Grok Code Fast 1, a speed-optimized coding model integrated with popular IDEs, while OpenAI Codex received major upgrades for local and cloud development workflows. Google’s Gemini CLI improved multi-editor support, and new models like Microsoft MAI-1-preview and MAI-Voice-1 were announced. "The new all-in-one WebRTC API removes the ephemeral token step and supports video on the same connection," highlighting enhanced developer tooling.
OpenAI updates Codex, VSCode Extension that can sync tasks with Codex Cloud
codex stepwiser gemini-2.5-flash nemotron-cc-math jet-nemotron openai facebook-ai-fair google-deepmind nvidia process-reward-modeling reinforcement-learning chain-of-thought spatial-reasoning multi-image-fusion developer-tools code-review ide-extension cli cloud-computing model-efficiency jaseweston tesatory benjamindekr tokumin fabianstelzer officiallogank
OpenAI Codex has launched a new IDE Extension integrating with VS Code and Cursor, enabling seamless local and cloud task handoff, sign-in via ChatGPT plans, upgraded CLI, and GitHub code review automation. Facebook AI researchers introduced StepWiser, a process-level reward model improving reasoning and training by chunk-by-chunk evaluation, achieving SOTA on ProcessBench. Google DeepMind's Gemini 2.5 Flash Image model showcases advanced spatial reasoning, multi-image fusion, and developer tools including a browser extension for image remixing. NVIDIA revealed efficiency data on Nemotron-CC-Math (133B) and Jet-Nemotron models.
not much happened today
seedance-1.0 codex claude-code kling-2.1 veo-3 bytedance morph-labs huggingface deeplearning.ai figure-ai langchain sakana-ai video-generation autoformalization ai-assisted-coding api-design context-engineering reinforcement-learning ai-evals hypernetworks model-fine-tuning foundation-models andrew_ng hwchase17 adcock_brett clementdelangue akhaliq jxmnop hamelhusain sh_reya
Bytedance showcased an impressive state-of-the-art video generation model called Seedance 1.0 without releasing it, while Morph Labs announced Trinity, an autoformalization system for Lean. Huggingface Transformers deprecated Tensorflow/JAX support. Andrew Ng of DeepLearning.AI highlighted the rise of the GenAI Application Engineer role emphasizing skills in AI building blocks and AI-assisted coding tools like Codex and Claude Code. Engineering teams are increasingly testing API designs against LLMs for usability. Figure AI's CEO stressed speed as a key competitive advantage, and LangChain introduced the concept of Context Engineering for AI agents. Reinforcement learning on LLMs shows transformative potential, and the community values AI evals and data work. Sakana AI released Text-to-LoRA, a hypernetwork method for generating task-specific LoRA adapters from natural language, enabling efficient model customization. The video generation race heats up with Bytedance's Seed-based model praised for quality, challenging American labs, alongside models like Kling 2.1 and Veo 3.
not much happened today
codex claude-4-opus claude-4-sonnet gemini-2.5-pro gemini-2.5 qwen-2.5-vl qwen-3 playdiffusion openai anthropic google perplexity-ai bing playai suno hugging-face langchain-ai qwen mlx assemblyai llamacloud fine-tuning model-benchmarking text-to-video agentic-ai retrieval-augmented-generation open-source-models speech-editing audio-processing text-to-speech ultra-low-latency multimodality public-notebooks sama gdb kevinweil lmarena_ai epochairesearch reach_vb wightmanr deeplearningai mervenoyann awnihannun jordirib1 aravsrinivas omarsar0 lioronai jerryjliu0 nerdai tonywu_71 _akhaliq clementdelangue _mfelfel
OpenAI rolled out Codex to ChatGPT Plus users with internet access and fine-grained controls, improving memory features for free users. Anthropic's Claude 4 Opus and Sonnet models lead coding benchmarks, while Google's Gemini 2.5 Pro and Flash models gain recognition with new audio capabilities. Qwen 2.5-VL and Qwen 3 quantizations are noted for versatility and support. Bing Video Creator launched globally enabling text-to-video generation, and Perplexity Labs sees increased demand for travel search. New agentic AI tools and RAG innovations include LlamaCloud and FedRAG. Open-source releases include Holo-1 for web navigation and PlayAI's PlayDiffusion for speech editing. Audio and multimodal advances feature Suno's music editing upgrades, Google's native TTS in 24+ languages, and Universal Streaming's ultra-low latency speech-to-text. Google NotebookLM now supports public notebooks. "Codex's internet access brings tradeoffs, with explicit warnings about risk" and "Gemini 2.5 Pro is cited as a daily driver by users".
not much happened today
chatgpt o3 o4 bagel-7b medgemma acereason-nemotron-14b codex gemini openai bytedance google nvidia sakana-ai-labs deep-learning-ai gemini agenticseek anthropic agentic-systems multimodality reasoning code-generation prompt-engineering privacy ethical-ai emergence synthetic-data speech-instruction-tuning low-resource-languages humor scaling01 mervenoyann sakananailabs _philschmid omarsar0 teortaxestex andrewlampinen sedielem cis_female
OpenAI plans to evolve ChatGPT into a super-assistant by 2025 with models like o3 and o4 enabling agentic tasks and supporting a billion users. Recent multimodal and reasoning model releases include ByteDance's BAGEL-7B, Google's MedGemma, and NVIDIA's ACEReason-Nemotron-14B. The Sudoku-Bench Leaderboard highlights ongoing challenges in AI creative reasoning. In software development, OpenAI's Codex aids code generation and debugging, while Gemini's Context URL tool enhances prompt context. AgenticSeek offers a local, privacy-focused alternative for autonomous agents. Ethical concerns are raised about AGI development priorities and Anthropic's alignment with human values. Technical discussions emphasize emergence in AI and training challenges, with humor addressing misconceptions about Gemini 3.0 and async programming in C. A novel synthetic speech training method enables instruction tuning of LLMs without real speech data, advancing low-resource language support.