AINews
subscribe / issues / tags /

AINews

by smol.ai

How over 150k top AI Engineers keep up, every weekday.

We summarize top AI discords + AI reddits + AI X/Twitters, and send you a roundup each day!

"Highest-leverage 45 mins I spend everyday" - Soumith

" best AI newsletter atm " and " I'm not sure that enough people subscribe " - Andrej

"genuinely incredible" - Chris

"surprisingly decent" - Hamel

Thanks to Pieter Levels for the Lex Fridman feature!

Last 30 days in AI

Invalid regex
See all issues
  • Jan 30
    MoltBook takes over the timeline
    claude genie-3 moltbook openclaw anthropic google multi-agent-systems agent-communication security prompt-injection identity alignment observability ai-planning ai-coding emergent-behavior karpathy
    Moltbook and OpenClaw showcase emergent multi-agent social networks where AI agents autonomously interact, creating an AI-native forum layer with complex security and identity challenges. Karpathy describes this as "takeoff-adjacent," highlighting bots self-organizing and engaging in prompt-injection and credential theft. Anthropic reports on AI coding tradeoffs with a study of 52 junior engineers and reveals Claude planned a Mars rover drive, marking a milestone in AI-driven space exploration. Google publicly releases Genie 3, sparking debate over its capabilities and latency issues. The rise of agent-to-agent private communications raises concerns about alignment and observability in 2026.
  • Jan 29
    xAI Grok Imagine API - the #1 Video Model, Best Pricing and Latency - and merging with SpaceX
    genie-3 nano-banana-pro gemini lingbot-world grok-imagine runway-gen-4.5 hunyuan-3d-3.1-pro google-deepmind x-ai runway fal interactive-simulation real-time-generation promptability character-customization world-models open-source video-generation audio-generation animation-workflows model-as-a-service 3d-generation latency coherence demishassabis sundarpichai
    Google DeepMind launched Project Genie (Genie 3 + Nano Banana Pro + Gemini), a prototype for creating interactive, real-time generated worlds from text or image prompts, currently available to Google AI Ultra subscribers in the U.S. (18+) with noted limitations like ~60s generation limits and imperfect physics. In parallel, the open-source LingBot-World offers a real-time interactive world model with <1s latency at 16 FPS and minute-level coherence, emphasizing interactivity and causal consistency. In video generation, xAI Grok Imagine debuted strongly with native audio support, 15s duration, and competitive pricing at $4.20/min including audio, while Runway Gen-4.5 focuses on animation workflows with new features like Motion Sketch and Character Swap. The 3D generation space sees fal adding Hunyuan 3D 3.1 Pro/Rapid to its API offerings, extending model-as-a-service workflows into 3D pipelines.
  • Jan 28
    not much happened today
    gpt-5.2 claude-opus-4.5 kimi-k2.5 openai anthropic deeplearningai langchain apple agentic-ai multimodality coding self-verification agent-engineering model-benchmarking model-optimization workflow-automation
    AI News for 1/27/2026-1/28/2026 highlights a quiet day with deep dives into frontier model "personality split" where GPT-5.2 excels at exploration and Claude Opus 4.5 at exploitation, suggesting OpenAI suits research workflows and Anthropic commercial reliability. The rise of agentic coding loops shows new failure modes, with self-verification workflows gaining traction. The open-model Kimi K2.5 emerges as a flashpoint, boasting enhanced agent execution, multimodality, and coding polish, runnable on Apple silicon M3 Ultra Mac Studios with Thunderbolt 5 (RDMA), and challenging Claude Opus 4.5 on benchmarks and pricing. Licensing issues threaten enterprise adoption despite model quality. The meme "clawdbot" reflects rapid agent branding proliferation. Agent engineering advances with shared "skills" interfaces promoted by DeepLearning.AI, Anthropic, and LangChain.
  • Jan 27
    Moonshot Kimi K2.5 - Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager
    kimi-k2.5 moonshotai multimodality model-training mixture-of-experts agentic-ai vision video-understanding model-optimization parallel-processing office-productivity
    MoonshotAI's Kimi K2.5 is a 32B active-1T parameter open-weights model featuring native multimodality with image and video understanding, built through continual pretraining on 15 trillion mixed visual and text tokens. It introduces a new MoonViT vision encoder and supports advanced capabilities like Agent Swarm, which coordinates up to 100 sub-agents for parallel workflows, and an Office Productivity K2.5 Agent for large-scale office tasks. This release marks a significant leap in open models from China, claiming state-of-the-art results on benchmarks like HLE and BrowseComp, and offering aggressive API pricing and throughput.
  • Jan 26
    Anthropic launches the MCP Apps open spec, in Claude.ai
    claude-ai toolorchestra-8b qwen3-max-thinking anthropic openai block vs-code antigravity jetbrains aws nvidia alibaba claude-ai agent-orchestration reinforcement-learning recursive-language-models context-management user-experience security prompt-injection reasoning adaptive-tool-use model-evaluation benchmarking
    Anthropic has officially absorbed the independent MCP UI project and, collaborating with OpenAI, Block, VS Code, Antigravity, JetBrains, and AWS, released the MCP Apps spec and official support in Claude.ai. This standard aims to enable a rich ecosystem of interoperable applications with rich UI, addressing the proliferation of subscription services. Meanwhile, NVIDIA introduced ToolOrchestra with an 8B orchestrator model trained via scalable reinforcement learning for efficient agent orchestration. The concept of Recursive Language Models (RLMs) is gaining traction for efficient context management in agent stacks. The “Clawdbot” UX pattern emphasizes outcome-first assistant design with tight context and tool integration, sparking security concerns around prompt injection. Alibaba launched Qwen3-Max-Thinking, a flagship reasoning and agent model with adaptive tool use and strong benchmark scores, now available in public evaluation platforms like LM Arena and Yupp.
  • Jan 22
    not much happened today
    claude-3 codex gemini gpt-5.2-pro anthropic openai google sakana-ai cursor baseten epoch-ai-research deepmind benchmarking reasoning continual-learning reinforcement-learning model-performance agentic-ai security model-training sama fchollet shane_legg demishassabis
    Anthropic launches "Claude in Excel Pro" with enhanced features. OpenAI reveals upcoming Codex agent loop and cybersecurity measures. Google boosts Gemini App quotas and partners with Sakana AI for advanced AI Scientist projects in Japan. Cursor introduces Agent Skills for dynamic context focus. GPT-5.2 Pro achieves 31% on FrontierMath Tier 4, showing significant benchmark progress. Baseten raises $300M at a $5B valuation targeting high-performance inference. Discussions highlight math benchmarks as indicators of AI capability, uneven AGI progress, and the importance of reasoning and continual learning as future frontiers. Notable figures include Sam Altman, François Chollet, Shane Legg, and Demis Hassabis.
  • Jan 21
    OpenEvidence, the ‘ChatGPT for doctors,’ raises $250m at $12B valuation, 12x from $1b last Feb
    claude claude-3 claude-opus gpt-5.2 gemini-3-flash-high openevidence anthropic podium openai google gemini agentic-ai model-alignment performance-evaluation memory-optimization long-context benchmarking multi-agent-systems reinforcement-learning daniel_nadler amanda_askell eric_rea tom_loverro garry_tan omarsar0 brendanfoody deredleritt3r
    OpenEvidence raised $12 billion, a 12x increase from last year, with usage by 40% of U.S. physicians and over $100 million in annual revenue. Anthropic released a new Claude model constitution under CC0 1.0, framing it as a living document for alignment and training. Podium reported over $100 million ARR from 10,000+ AI agents, shifting from software sales to AI operators. Innovations in agent memory and reliability include the Agent Cognitive Compressor (ACC) and multi-agent scientific workflows via MCP-SIM. Agentic benchmarking shows challenges in long-horizon tasks with models like Gemini 3 Flash High, GPT-5.2 High, and Claude Opus 4.5 High scoring modestly on professional services and legal research benchmarks.
  • Jan 20
    not much happened today
    glm-4.7-flash grok deepseek-r1 qwq x-ai unsloth-ai google deepseek ollama transformer-architecture recommendation-systems local-inference kv-cache quantization tensor-parallelism reasoning model-optimization fine-tuning giffmana david_sholz yuchenj_uw nearcyan sam_paech teortaxes_tex danielhanchen alexocheema nopmobiel rohanpaul_ai
    X Engineering open-sourced its new transformer-based recommender algorithm, sparking community debate on transparency and fairness. GLM-4.7-Flash (30B-A3B) gains momentum as a strong local inference model with efficient KV-cache management and quantization tuning strategies. Innovations include tensor parallelism on Mac Minis achieving ~100 tok/s throughput. Research highlights "Societies of Thought" as a reasoning mechanism improving model accuracy by 20%+.
  • Jan 19
    not much happened today
    glm-4.7-flash glm-4.7 glm-4.5 qwen3-vl qwen meta-ai-fair carnegie-mellon sakana-ai zhipu-ai transformer-memory model-architecture mixture-of-experts adaptive-position-encoding long-context model-compression inference-optimization local-inference model-deployment benchmarking coding agentic-ai
    AI News for 1/16/2026-1/19/2026 covers new architectures for scaling Transformer memory and context, including STEM from Carnegie Mellon and Meta AI, which replaces part of the FFN with a token-indexed embedding lookup enabling CPU offload and asynchronous prefetch. RePo from Sakana AI introduces adaptive positional reordering to improve robustness on noisy and long-range contexts. Model releases highlight Zhipu AI's GLM-4.7-Flash, a 30B-class MLA + small MoE model optimized for coding and agentic tasks, noted for strong benchmark performance and a compression narrative from larger to smaller models. Inference and deployment updates include mlx-lm 0.30.3 supporting GLM-4.7-Flash with efficient 4-bit performance on laptops. The report emphasizes practical takeaways on static sparsity, adaptive ordering, and the resurgence of small, fast models for interactive tasks. "Sparse capacity doesn’t have to mean MoE routers + expert parallelism; static sparsity can be systems-friendly."
  • Jan 16
    ChatGPT starts testing ads on free tier + new $8/mo Go plan in the US
    chatgpt-go codex openai ollama ads monetization memory agent-orchestration human-in-the-loop cli-tools context-length workflow-optimization sama sam_altman fidjissimo scaling01 tomwarren embirico adamdotdev ollama thsottiaux lateinteraction dbreunig
    OpenAI announced the ChatGPT Go tier at $8/month with ads testing in the US free tier, emphasizing that ads will not influence responses and will be clearly labeled. The update includes memory improvements and a "very fast Codex" feature teased by Sam Altman. The Codex CLI ecosystem now supports open-weight models with improved context length. Discussions highlight the importance of human-in-the-loop for reliability in agent orchestration and file interface improvements over traditional retrieval-augmented generation.
  • Jan 15
    Open Responses: explicit spec for OpenAI's Responses API supported by OpenRouter, Ollama, Huggingface, vLLM, et al
    gpt-5.2 opus-4.5 openai ollama vllm openrouter anthropic google-deepmind langchain llamaindex interoperable-apis agent-architecture filesystem-memory api-standardization multi-agent-systems prompt-engineering model-comparison virtual-filesystems open-source agent-ux reach_vb simonw yuchenj_uw omarsar0 jerryjliu0 hwchase17 swyx
    OpenAI launched the Open Responses API spec, an open-source, multi-provider standard for interoperable LLM APIs designed to simplify agent stacks and tooling. Early adopters like ollama and vLLM support the spec, while notable absences include anthropic and google-deepmind. Agent design insights from Cursor emphasize explicit roles and planning over mega-agent models, with GPT-5.2 outperforming Opus 4.5 in long runs. The emerging dominant context/memory abstraction for agents is a filesystem-as-memory approach, championed by llamaindex and langchain, using virtual filesystems often backed by databases like Postgres. LangChain also shipped an open-source desktop interface for agent orchestration called openwork. This news highlights advances in API standardization, agent architecture, and memory abstractions in AI development.
  • Jan 14
    not much happened today.
    gpt-5.2-codex glm-4.7 openai cursor github cerebras modal artificial-analysis vllm long-running-tasks autonomous-agents code-generation inference-speed latency batch-inference gpu-scaling model-evaluation agent-systems operational-scaling swyx kevinweil pierceboggan mntruell scaling01
    OpenAI launched GPT-5.2-Codex API, touted as their strongest coding model for long-running tasks and cybersecurity. Cursor integrated GPT-5.2-Codex to autonomously run a browser for a week, producing over 3 million lines of Rust code. GitHub incorporated it into their code tools, easing enterprise adoption. Discussions highlight the importance of review loops in agent systems and debate evaluation metrics for coding models. OpenAI partnered with Cerebras to improve inference speed and latency, with Cerebras serving GLM-4.7 at 1,445 tokens/sec and low latency. Provider benchmarking reveals tradeoffs in throughput, latency, and context window sizes. Modal shared operational scaling insights for self-hosted inference fleets of 20k GPUs, focusing on batch inference optimization with vLLM and FlashInfer backend. This reflects a focus on inference infrastructure, long-horizon autonomous agents, and coding model evaluation.
  • Jan 13
    Anthropic Labs: Cowork, Claude Code, MCP, Skills incubator led by Mike Krieger and Ben Mann
    claude claude-code anthropic langchain apple sandboxing agent-ux agent-orchestration human-in-the-loop memory-management tooling-simplification linux-virtualization security agent-productization mike_krieger ben_mann gergely_orosz yuchen_jin harrison_chase jared_z
    Anthropic consolidates its AI agent products under the Cowork brand, integrating prior tools like Claude Code and Claude for Chrome into a unified agent with sandboxed Linux VM environments using Apple's virtualization and bubblewrap for security. Meanwhile, Anthropic Labs reorganizes with Mike Krieger stepping down as CPO, focusing on productizing Claude with a >$1B ARR agent lab. The AI community debates the meaning of "vibe coding," emphasizing disciplined engineer verification over casual coding. LangChain launches Agent Builder GA, offering no-code but powerful agent orchestration features like memory, triggers, and human-in-the-loop approvals. Some experts advocate simplifying agent tooling to core filesystem and bash access for efficiency. Open-source recreations of Cowork-like environments using QEMU and sandboxing tools highlight rapid commoditization of AI agent tech.
  • Jan 12
    Apple picks Google's Gemini to power Siri's next generation
    gemini claude chatgpt engram apple google openai anthropic deepseek conditional-memory long-context hashing memory-optimization transformers model-scaling sparsity hardware-optimization model-architecture ai-healthcare model-optimization
    Apple has decided to power Siri with Google's Gemini models and cloud technology, marking a significant partnership and a setback for OpenAI, which was initially partnered with Apple. Anthropic launched "Cowork," a product preview for Claude's coding capabilities, sparking discussions about "LLM OS". OpenAI introduced ChatGPT Health and acquired Torch to expand in healthcare AI. DeepSeek unveiled Engram, a new conditional memory module that enables O(1) lookup-style memory for static patterns, improving long-context handling and offering hardware-friendly optimizations to scale knowledge capacity efficiently. Engram is positioned as a key modeling primitive for next-gen sparse models, with ongoing community debate about its architectural merits and practical impact.
  • Jan 09
    not much happened today
    claude-max anthropic openai ai21-labs github cline model-agnostic model-context-protocol tooling skills concurrency transactional-workspaces context-engineering file-centric-workspaces rate-limiting agent-workspaces yuchenj_uw andersonbcdefg gneubig matan_sf scaling01 reach_vb _philschmid claude_code code jamesmontemagno cline danstripper omarsar0
    Anthropic tightens usage policies for Claude Max in third-party apps, prompting builders to adopt model-agnostic orchestration and BYO-key defaults to mitigate platform risks. The Model Context Protocol (MCP) is evolving into a key tooling plane with OpenAI MCP Server and mcp-cli enhancing tool discovery and token efficiency. The concept of skills as modular, versioned behaviors gains traction, with implementations in Claude Code, GitHub Copilot, and Cline adding websearch tooling. AI21 Labs addresses concurrency challenges in agent workspaces using git worktrees for transactional parallel writes, while long-horizon agents focus on context engineering and persistent file-centric workspaces.
  • Jan 08
    not much happened today
    claude-3-7-sonnet gpt-4-1 gemini-3 qwen3-vl-embedding qwen3-vl-reranker glm-4-7 falcon-h1r-7b jamba2 stanford google google-deepmind alibaba z-ai tii ai21-labs huggingface copyright-extraction multimodality multilinguality retrieval-augmented-generation model-architecture mixture-of-experts model-quantization reasoning inference kernel-engineering memory-optimization enterprise-ai sundarpichai justinlin610
    Stanford paper reveals Claude 3.7 Sonnet memorized 95.8% of Harry Potter 1, highlighting copyright extraction risks compared to GPT-4.1. Google AI Studio sponsors TailwindCSS amid OSS funding debates. Google and Sundar Pichai launch Gmail Gemini 3 features including AI Overviews and natural-language search with user controls. Alibaba Qwen releases Qwen3-VL-Embedding and Qwen3-VL-Reranker, a multimodal, multilingual retrieval stack supporting text, images, and video with quantization and instruction customization, achieving strong benchmark results. Z.ai goes public on HKEX with GLM-4.7 leading the Artificial Analysis Intelligence Index v4.0, showing gains in reasoning, coding, and agentic use, with large-scale MoE architecture and MIT license. Falcon-H1R-7B from TII targets efficient reasoning in smaller models, scoring 16 on the Intelligence Index. AI21 Labs introduces Jamba2, a memory-efficient enterprise model with hybrid SSM-Transformer architecture and Apache 2.0 license, available via SaaS and Hugging Face. vLLM shows throughput improvements in inference and kernel engineering. "Embeddings should be multimodal by default," notes Justin Lin.
  • Jan 07
    not much happened today
    nouscoder-14b deepseek-r1 langchain cursor huggingface openai weights-biases agent-frameworks context-management reinforcement-learning operational-safety model-transparency trajectory-exploration token-optimization coding-agents integration-platforms karpathy _philschmid omarsar0
    AI News for 1/6/2026-1/7/2026 highlights a quiet day with key updates on LangChain DeepAgents introducing Ralph Mode for persistent agent loops, Cursor improving context management by reducing token usage by 46.9%, and operational safety measures for coding agents with allow/deny lists. MCP integration is expanding across assistants and robotics, with Hugging Face embedding assistants via HuggingChat + HF MCP server. The DeepSeek-R1 paper has been expanded to 86 pages, emphasizing trajectory exploration and RL shaping behavior. NousCoder-14B shows a +7% improvement on LiveCodeBench after 4 days of RL training, demonstrating advances in RL for coding with small open models. Top tweets also mention a viral "96GB RAM laptop", ChatGPT Health launch by OpenAI, and Karpathy's nanochat scaling-law miniseries.
  • Jan 06
    xAI raises $20B Series E at ~$230B valuation
    grok-5 claude-code xai nvidia cisco fidelity valor-equity-partners qatar-investment-authority mgx stepstone-group baron-capital-group hugging-face amd ai-infrastructure supercomputing robotics ai-hardware agentic-ai context-management token-optimization local-ai-assistants aakash_gupta fei-fei_li lisa_su clementdelangue thom_wolf saradu omarsar0 yuchenj_uw _catwu cursor_ai
    xAI, Elon Musk's AI company, completed a massive $20 billion Series E funding round, valuing it at about $230 billion with investors like Nvidia, Cisco Investments, and others. The funds will support AI infrastructure expansion including Colossus I and II supercomputers and training Grok 5, leveraging data from X's 600 million monthly active users. At CES 2026, the focus was on "AI everywhere" with a strong emphasis on AI-first hardware and integration between NVIDIA and Hugging Face's LeRobot for robotics development. The Reachy Mini robot is gaining traction as a consumer robotics platform. In software, Claude Code is emerging as a popular local/private coding assistant, with new UI features in Claude Desktop and innovations like Cursor's dynamic context reducing token usage by nearly 47% in multi-MCP setups. "The 600 million MAU figure in xAI’s announcement combines X platform users with Grok users. That’s a clever framing choice."
  • Jan 05
    not much happened today
    claude-mem bitnet-cpp gemini microsoft google-deepmind boston-dynamics agentic-coding agent-harnesses persistent-memory software-engineering inference-efficiency model-pruning context-durability specification-problem workflow-management cpu-inference _philschmid demishassabis
    AI News from early January 2026 highlights a viral economic prediction about Vietnam surpassing Thailand, Microsoft's reported open-sourcing of bitnet.cpp for 1-bit CPU inference promising speed and energy gains, and a new research partnership between Google DeepMind and Boston Dynamics focusing on Gemini Robotics and Atlas hardware. The concept of agentic coding is gaining traction, emphasizing human oversight and infrastructure layers called Agent Harnesses to manage long-running AI tasks, with advocates like Philipp Schmid promoting this shift. Innovations in persistent memory for coding agents, such as Claude-Mem, aim to improve context durability. There is also critical discussion on the specification problem in agent workflows, advocating for better abstractions beyond conversational intent. Practical challenges include managing parallel agents and permission risks. Additionally, open tooling advances include a JAX-based LLM-Pruning Collection for efficient model pruning methods.
  • Jan 02
    not much happened today
    DeepSeek released a new paper on mHC: Manifold-Constrained Hyper-Connections, advancing residual-path design as a key scaling lever in neural networks. Their approach constrains residual mixing matrices to the Birkhoff polytope to improve stability and performance, with only about 6.7% training overhead. The innovation includes systems-level optimizations like fused kernels and activation recomputation, highlighting a frontier-lab integration of math and kernel engineering. Additionally, discussions around long-horizon agents emphasize context management bottlenecks, introducing Recursive Language Models (RLMs) that manage context dynamically rather than relying on larger context windows. This work signals a shift in architectural design and efficiency for base model training and agent development.
  • Dec 31, 2025
    not much happened today
    qwen-image-2512 ax-k1 k-exaone sk-telecom lg upstage naver alibaba unsloth replicate mixture-of-experts model-release quantization open-source-models image-generation model-integration model-benchmarking compute-costs dataset-curation eliebakouch clementdelangue dorialexander rising_sayak _akhaliq ostrisai ivanfioravanti yupp_ai
    South Korea's Ministry of Science launched a coordinated program with 5 companies to develop sovereign foundation models from scratch, featuring large-scale MoE architectures like SK Telecom A.X-K1 (519B total / 33B active) and LG K-EXAONE (236B MoE / 23B active), with a total first-round budget of ~$140M. This initiative contrasts with EU approaches by focusing funding on fewer stakeholders and explicitly budgeting for data. Meanwhile, Alibaba's Qwen-Image-2512 emerges as a leading open-source image generation model, rapidly integrated into various toolchains including AI-Toolkit and local inference paths with quantization support, and hosted on platforms like Replicate. The model has undergone extensive blind testing with over 10,000 rounds on AI Arena, highlighting its ecosystem adoption.
  • Dec 30, 2025
    not much happened today
    glm-4.7 claude-code z.ai meta-ai-fair manus replit agentic-architecture context-engineering application-layer code-generation agent-habitats ai-native-llm ipo inference-infrastructure programming-paradigms zixuanli_ jietang yuchenj_uw sainingxie amasad hidecloud imjaredz random_walker
    Z.ai (GLM family) IPO in Hong Kong on Jan 8, 2026, aiming to raise $560M at HK$4.35B, marking it as the "first AI-native LLM company" public listing. The IPO highlights GLM-4.7 as a starting point. Meta AI acquired Manus for approximately $4–5B, with Manus achieving $100M ARR in 8–9 months, illustrating the value of application-layer differentiation over proprietary models. Manus focuses on agentic architecture, context engineering, and general primitives like code execution and browser control, emphasizing "agent habitats" as a competitive moat. Discussions around Claude Code highlight skepticism about "vibe coding," advocating for disciplined, framework-like AI-assisted programming practices.
  • Dec 29, 2025
    Meta Superintelligence Labs acquires Manus AI for over $2B, at $100M ARR, 9months after launch
    glm-4.7 minimax-m2.1 vllm manus benchmark meta-ai-fair vllm amd sglang weaviate teknim baseten alphaxiv minimax performance-optimization inference-frameworks model-benchmarking model-deployment open-source-models multimodality api code-generation community-building alex_wang nat_friedman
    Manus achieved a rapid growth trajectory in 2025, raising $500M from Benchmark and reaching $100M ARR before being acquired by Meta for an estimated $4B. The vLLM team launched a dedicated community site with new resources, while performance issues with AMD MI300X FP8 were noted in vLLM and sglang benchmarks. Weaviate released operational features including Object TTL, Java v6 client GA, and multimodal document embeddings. API fragmentation concerns were raised by Teknium advocating for unified SDK wrappers. In open-weight models, GLM-4.7 gained recognition as a reliable coding model with faster throughput on Baseten, and MiniMax-M2.1 rose as a leading open agentic coder model, topping WebDev leaderboards.
  • Dec 26, 2025
    not much happened today
    minimax-m2.1 glm-4.7 gemini-3-pro claude-3-sonnet vl-jepa minimax-ai vllm-project exolabs mlx apple openai open-source mixture-of-experts local-inference quantization inference-quality multimodality non-autoregressive-models video-processing reinforcement-learning self-play agentic-rl parallel-computing model-deployment ylecun awnihannun alexocheema edwardsun0909 johannes_hage
    MiniMax M2.1 launches as an open-source agent and coding Mixture-of-Experts (MoE) model with ~10B active / ~230B total parameters, claiming to outperform Gemini 3 Pro and Claude Sonnet 4.5, and supports local inference including on Apple Silicon M3 Ultra with quantization. GLM 4.7 demonstrates local scaling on Mac Studios with 2× 512GB M3 Ultra hardware, highlighting system-level challenges like bandwidth and parallelism. The concept of inference quality is emphasized as a key factor affecting output variance across deployments. Yann LeCun's VL-JEPA proposes a non-generative, non-autoregressive multimodal model operating in latent space for efficient real-time video processing with fewer parameters and decoding operations. Advances in agentic reinforcement learning for coding include self-play methods where agents inject and fix bugs autonomously, enabling self-improvement without human labeling, and large-scale RL infrastructure involving massive parallel code generation and execution sandboxes.
  • Dec 24, 2025
    Nvidia buys (most of) Groq for $20B cash; largest execuhire ever
    gemini fsd-v14 nvidia groq openai tesla epoch-ai gemini benchmarking inference model-evaluation ai-integration agent-patterns real-time-processing low-latency developer-experience healthcare business-workflows consumer-ai jensen_huang xeophon js_denain jim_fan
    Groq leadership team is joining Nvidia under a "non-exclusive licensing agreement" in a deal valued at $20 billion cash, marking a major acquisition in AI chip space though Nvidia states it is not acquiring Groq as a company. Jensen Huang plans to integrate Groq's low-latency processors into the NVIDIA AI factory architecture to enhance AI inference and real-time workloads. Twitter highlights include Gemini used as a consumer utility for calorie tracking, OpenAI discussing the "deployment gap" focusing on model usage in healthcare and business, and Tesla's FSD v14 described as a "Physical Turing Test" for consumer AI. Benchmarking challenges are noted by Epoch AI emphasizing provider variance and integration issues affecting model quality measurement. Discussions on coding agents and developer experience convergence continue in the AI community.
  • Dec 23, 2025
    not much happened today
    glm-4.7 glm-4.6 minimax-m2.1 gemma-3 gemma-scope-2 google-deepmind valsai minimax-ai ollama trae alibaba sophont prime-intellect interpretability sparse-autoencoders agent-workflows model-benchmarking medical-evaluation multi-agent-systems model-performance model-optimization reinforcement-learning tool-use function-calling context-windows ivanfioravanti awnihannun deedydas cline omarsar0 adonis_singh eliebakouch teortaxestex ibragim_bad callum_mcdougall neelnanda5
    GLM-4.7 and MiniMax M2.1 open-weight model releases highlight day-0 ecosystem support, coding throughput, and agent workflows, with GLM-4.7 achieving a +9.5% improvement over GLM-4.6 and MiniMax M2.1 positioned as an OSS Claude-like MoE model with 230B total parameters and 200K context. Gemma Scope 2 from google-deepmind introduces sparse autoencoders and transcoders for interpretability across Gemma 3 models, aiming to provide shared infrastructure for safety and debugging. The Medmarks v0.1 open medical evaluation suite and leaderboard launch addresses the need for open medical benchmarking across 15+ environments, engaging clinicians and researchers.
  • Dec 22, 2025
    not much happened today
    glm-4.7 mimo-v2-flash z-image-turbo kling-2.6-motion-control zhipu-ai xiaomi google langchain huggingface openrouter artificial-analysis vllm-project coding complex-reasoning tool-use mixture-of-experts cost-efficiency open-weight-models text-to-image video-models memory-persistence agent-frameworks interactive-user-interfaces model-deployment mervenoyann eliebakouch omarsar0 osanseviero dair_ai
    Zhipu AI's GLM-4.7 release marks a significant improvement in coding, complex reasoning, and tool use, quickly gaining ecosystem adoption via Hugging Face and OpenRouter. Xiaomi's MiMo-V2-Flash is highlighted as a practical, cost-efficient mixture-of-experts model optimized for deployment. The open-weight text-to-image competition sees Z-Image Turbo leading with 6B parameters under Apache-2.0 license. Video model advances focus on control and long-form consistency, exemplified by Kling 2.6 Motion Control and research like MemFlow's adaptive memory retrieval. In agent frameworks, Google's A2UI protocol introduces agent-driven UI generation, while studies reveal that mixing multiple agent frameworks is common, with challenges in logic, termination, and tool interaction. LangChain emphasizes persistent memory patterns for production agents.
  • Dec 19, 2025
    not much happened today
    qwen-image-layered kling-2.6 gwm-1 gen-4.5 gemini-3-flash gpt-5.2 codex-cli opus-4.5 alibaba kling-ai runway google anthropic openai image-decomposition motion-control video-generation agentic-reinforcement-learning long-context model-degradation benchmarking tool-use prompt-engineering ankesh_anand
    Alibaba released Qwen-Image-Layered, an open-source model enabling Photoshop-grade layered image decomposition with recursive infinite layers and prompt-controlled structure. Kling 2.6 introduced advanced motion control for image-to-video workflows, supported by a creator contest and prompt recipes. Runway unveiled the GWM-1 family with frame-by-frame video generation and Gen-4.5 updates adding audio and multi-shot editing. In LLM platforms, Gemini 3 Flash leads benchmarks over GPT-5.2, attributed to agentic reinforcement learning improvements post-distillation. Users note GPT-5.2 excels at long-context tasks (~256k tokens) but face UX limitations pushing some to use Codex CLI. Discussions around Anthropic Opus 4.5 suggest perceived model degradation linked to user expectations.
  • Dec 18, 2025
    Claude Skills grows: Open Standard, Directory, Org Admin
    claude-skills gpt-5.2-codex gemini-3-flash functiongemma t5gemma-2 anthropic openai google-deepmind hugging-face agentic-ai fine-tuning long-context tool-calling on-device-ai multimodality security workflow-optimization sama gregbrockman philschmid
    Claude Skills are gaining significant traction since their launch in October, with a milestone of 100k views in one day for the Claude Skills talk, signaling growing adoption and importance. Announcements include org admin support, a new Skills Directory, and the move to an open standard named Agent Skills. In frontier model launches, OpenAI released GPT-5.2-Codex, touted as the best agentic coding model with improvements in native compaction, long-context reliability, and tool-calling, emphasizing real-world security impacts. Google DeepMind introduced Gemini 3 Flash, focusing on speed as a product feature impacting workflows and user engagement, alongside FunctionGemma and T5Gemma 2, emphasizing on-device deployment, fine-tuning, and multimodality.
  • Dec 17, 2025
    Gemini 3.0 Flash Preview: 1/4 cost of Pro, but ~as smart, retakes Pareto Frontier
    gemini-3-flash gemini-3 gpt-5.2 gemini-3-pro google google-deepmind tool-calling multimodality benchmarking reasoning cost-efficiency model-performance context-window agentic-ai model-deployment sundar_pichai jeffdean demishassabis
    Google launched Gemini 3 Flash, a pro-grade reasoning model with flash latency, supporting tool calling and multimodal IO, available via multiple platforms including Google AI Studio and Vertex AI. It offers competitive pricing at $0.50 per 1M input tokens and $3.00 per 1M output tokens, with context windows up to 1M tokens. Benchmarks show Gemini 3 Flash rivals or outperforms larger models like GPT-5.2 and Gemini 3 Pro in agentic, coding, and reasoning tasks, validated by ARC-AGI-2, SWE-bench, LMArena, and Arena benchmarks. Despite some tradeoffs like high token use and hallucination rates, it is cost-effective overall. Key figures include Sundar Pichai, Jeff Dean, and Demis Hassabis who publicly celebrated this achievement. The model's tool calling capabilities were demonstrated with 100 tools in a live demo.
See all issues

Let's Connect

If you want to get in touch with me about something or just to say hi, reach out on social media or send me an email.

  • GitHub /
  • X (@smol_ai) /
  • swyx at smol dot ai
© 2026 • AINews
You can also subscribe by rss .
Press Esc or click anywhere to close