Frozen AI News archive

Kimi K2 - SOTA Open MoE proves that Muon can scale to 15T tokens/1T params

**Moonshot AI** has released **Kimi K2**, a **1 trillion parameter** Mixture-of-Experts model trained on **15.5 trillion tokens** using the new **MuonClip** optimizer, achieving state-of-the-art results on benchmarks like **SWE-Bench Verified (65.8%)** and **TAU2 (58.4%)**. This model is competitive with **GPT-4.1** and **Sonnet 4** on non-thinking tasks and is available under an **MIT license**. Meanwhile, **xAI** announced **Grok-4**, noted for its "LEAST censored frontier model" status and strong long-context performance but criticized for rushed post-training. **Mistral AI** updated its **Devstral 2507** models with improved performance and cost efficiency. The community is excited about the potential of the **MuonClip** optimizer, which may surpass the long-standing AdamW optimizer in machine learning.

Canonical issue URL

MuonClip is all you need?

AI News for 7/10/2025-7/11/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (226 channels, and 8321 messages) for you. Estimated reading time saved (at 200wpm): 647 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

A lot of folks are excited about the Windsurf-OpenAI deal falling through (something we did NOT see coming), but fortunately we have a more technical story to headline today:

The relatively stealthy Chinese lab Moonshot AI (backed by Alibaba and Tencent, one of the AI Tigers alongside DeepSeek, Zhipu, MiniMax, and 01) has burst on the scene with Kimi K2, which by many metrics seems to be a far better base model than DeepSeek V3 (and presumably would do very well when scaled to a reasoning model). Coming it at 1T parameters, this would also be the largest SOTA Open model released since the ChatGPT wave (we think? corrections welcome) which is very notable coming on the back of a new SOTA Closed LLM yesterday.

The model is great, does well on pelicans, but researchers in the LLM community are more excited about MuonClip, the modified Muon optimizer proposed and scaled by Mooonshot that produced perhaps one of the most beautiful loss curves in Machine Learning history:

The long-standing AdamW may finally have met it's match. Congrats to the team.


Quick plug for our friends at Weights&Biases - join swyx and friends at the Agent Protocols Hackathon in SF this weekend and win a robot dog! **SIGN UP NOW IF YOU'RE IN SF.**


AI Twitter Recap

New Model Releases & Performance

New AI Techniques & Research

AI Infrastructure, Tooling, & Developer Experience

Company & Industry News

Broader Commentary

Humor & Memes


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Kimi K2 MoE Model Release and Community Reactions

2. New Model and Benchmark Launches: IBM Granite 4.0 and Google MedGemma 27B

3. llama.cpp GPU and Hardware Support Enhancements

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. Grok's Alignment with Elon Musk's Political Views

2. Major New AI Model and Feature Launches (Grok 4, GPT-5, Kontext Presets/Komposer)

3. AI in the Real World: Industry Impact, Job Disruption, and Privacy Concerns


AI Discord Recap

A summary of Summaries of Summaries by X.ai Grok-4

Theme 1: Grok 4 Sparks Hype and Gripes

Theme 2: Kimi K2 Model Drops with Massive Params

Theme 3: Quantization Tricks Squeeze Model Performance

Theme 4: AI Agents Gear Up for Complex Tasks

Theme 5: Hardware Hustles for LLM Efficiency


Discord: High level Discord summaries

Perplexity AI Discord


LMArena Discord


OpenAI Discord


Unsloth AI (Daniel Han) Discord


Cursor Community Discord


OpenRouter (Alex Atallah) Discord


LM Studio Discord


HuggingFace Discord


GPU MODE Discord


Nous Research AI Discord


Yannick Kilcher Discord


Eleuther Discord


Latent Space Discord


aider (Paul Gauthier) Discord


MCP (Glama) Discord


Notebook LM Discord


Manus.im Discord Discord


Cohere Discord


Torchtune Discord


Modular (Mojo 🔥) Discord


DSPy Discord


LlamaIndex Discord


Nomic.ai (GPT4All) Discord


Gorilla LLM (Berkeley Function Calling) Discord


tinygrad (George Hotz) Discord


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #announcements (1 messages):

Social Media Announcement


Perplexity AI ▷ #general (1233 messages🔥🔥🔥):

Kingfall model, Grok 4 Performance, Comet Browser, O3 Pro, Next big thing


Perplexity AI ▷ #sharing (3 messages):

Chevrolet, Blender addon, Management analysis


Perplexity AI ▷ #pplx-api (1 messages):

Non-deterministic Models, Buggy Playground, API Reference


LMArena ▷ #general (1108 messages🔥🔥🔥):

Early Access APIs, Model with Tools vs No Tools, Grok 4 heavy on coding, Kimi K2 benchmarks, LLMs leaning on tools for logic/math stuff


OpenAI ▷ #ai-discussions (800 messages🔥🔥🔥):

MCP SuperAssistant, Grok 4, Gemini 3, NNC architecture, Financial AI audits


OpenAI ▷ #gpt-4-discussions (6 messages):

GPT-4o Model Degradation, Custom GPT limitations, GPT-4o vs GPT-4o mini


OpenAI ▷ #prompt-engineering (34 messages🔥):

GPT-4o-mini TPD Limit, Persona Features Control Emergent Misalignment, Exploring Consciousness in LLMs Survey, Human Personality Controls Behavior, LLM Output sentence formatting


OpenAI ▷ #api-discussions (34 messages🔥):

GPT-4o Mini TPD Limit, Persona Features Control Emergent Misalignment, Exploring Consciousness in LLMs, Human Personality Controls Behavior, Writing long articles in ChatGPT


Unsloth AI (Daniel Han) ▷ #general (476 messages🔥🔥🔥):

Multi-GPU support with Unsloth, Model Intercommunication Techniques, Unsloth and Lora Models, Moonshot AI's Kimi 2 Instruct Model, Training AI for Bodo Language


Unsloth AI (Daniel Han) ▷ #off-topic (82 messages🔥🔥):

Text-to-Speech LLMs, Grok 4, AGI benchmarks, Memory in AI, Reasoning in AI


Unsloth AI (Daniel Han) ▷ #help (70 messages🔥🔥):

Orpheus TTS issues, Multi-GPU ETA, Datasets version problems, Gradients checkpoints, Bodo Language Model


Unsloth AI (Daniel Han) ▷ #research (56 messages🔥🔥):

Reka AI's Quantization, Gemini Deep Research, AI OS Dev Study, Kimi-K2-Base, GPT 4.5


Unsloth AI (Daniel Han) ▷ #unsloth-bot (25 messages🔥):

Unsloth on Kaggle with 2xT4, device_map = balanced, Close Discord Threads, Embedding training precision error, SFTTrainer and CPT usage


Cursor Community ▷ #general (581 messages🔥🔥🔥):

Linux commands on Windows, Cursor Tweet on X, Auto Agent, New Pricing, Grok 4


Cursor Community ▷ #background-agents (18 messages🔥):

Cursor Github App Installation Issues, Disable Power Forwarding in Cursor, Node Version Management in Remote Workspace, Automatic Port Forwarding Prevention


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Cypher Alpha, Kimi K2, Moonshot, Novita, Parasail


OpenRouter (Alex Atallah) ▷ #general (412 messages🔥🔥🔥):

Grok 3 mini endpoints, OpenRouter Credit Issues, Prompt Optimization, Image Token Double Counting, Grok 4 Rate Limits


OpenRouter (Alex Atallah) ▷ #new-models (5 messages):

Switchpoint Router, $/mtok Pricing


OpenRouter (Alex Atallah) ▷ #discussion (11 messages🔥):

Mistral deep research model, Amazon & Anthropic AI alliance, Microsoft & OpenAI partnership, Devstral Medium Pricing, Translation models


LM Studio ▷ #general (94 messages🔥🔥):

Qwen3-4b 4bit, LM Studio stuttering, Falcon H1 Issues, LM Studio Autorunning, Hunyuan Troubleshooting


LM Studio ▷ #hardware-discussion (94 messages🔥🔥):

VRAM Importance vs Generation, Multi-GPU Setups and PSU Configurations, CPU vs GPU for LLM Performance, DDR Generations Impact, GDDR vs DDR


HuggingFace ▷ #announcements (1 messages):

Gemma 3n, SmolLM3, responses.js, EoMT, Sentence Transformers v5


HuggingFace ▷ #general (94 messages🔥🔥):

Supergrok access, Quantized model inference speed, Inference providers pricing, AI agent moderator bot on Discord, HF account deletion


HuggingFace ▷ #i-made-this (6 messages):

2DOF Arm Sim Feedback, ModelNet40 Accuracy, Codaco App Launch, Legml-1, Python-backend template


HuggingFace ▷ #agents-course (7 messages):

AI Agent Initialization, HF Course Certificate, Tools for Image/Audio, Agents Course Structure, Prompt for One-Word Answer


GPU MODE ▷ #general (11 messages🔥):

Tensor Layout Visualization, CUDA & GPU Programming Books, Meetup Advertisement


GPU MODE ▷ #triton (1 messages):

Triton Kernel Padding, Sequence Length Optimization, Memory Management in Triton


GPU MODE ▷ #cuda (9 messages🔥):

Nsight Compute Debugging, NCCL Hangs with cudaMemcpy, GEMM Kernel Optimization on H100


GPU MODE ▷ #torch (15 messages🔥):

Mapping Kernels, torch_compile_debug, AOT Graphs, Memory Usage, Activation Checkpointing


GPU MODE ▷ #beginner (2 messages):

Nvidia Development Tools, Loop Tiling Optimization, Memory Access Parallelism


GPU MODE ▷ #irl-meetup (1 messages):

AI Conference, San Francisco, September 17-18, Networking Opportunities, AI Trends


GPU MODE ▷ #rocm (4 messages):

AMD bank conflicts, NVIDIA bank conflicts, L1 cache performance


GPU MODE ▷ #liger-kernel (3 messages):

Prof. Dao's new project, Liger performance, RMSNorm bandwidth optimization, Softmax optimization


GPU MODE ▷ #self-promotion (2 messages):

GPU Optimization, GPU Trading, AI Compute Infrastructure, Thunder Compute's VS Code Extension


GPU MODE ▷ #🍿 (1 messages):

LLM Kernel optimization, Fine tuning LLMs


GPU MODE ▷ #thunderkittens (1 messages):

Float32 matrix transpose, tile op, transpose_sep, ThunderKittens


GPU MODE ▷ #submissions (4 messages):

H100 speed, B200 speed, MI300 speed, trimul leaderboard


GPU MODE ▷ #factorio-learning-env (39 messages🔥):

v3 Release, OpenAI Credits, Task Stopping, Meeting


GPU MODE ▷ #cutlass (14 messages🔥):

CuteDSL Limitations, Dynamic Values in CuteDSL, Tensor Allocation in CuteDSL, tensor core performance


Nous Research AI ▷ #general (79 messages🔥🔥):

Grok-4 reasoning and knowledge, Self-play during training, Deep-Hermes distillation to 14B, Brain Algorithms vs AI Algorithms, Qwen-14B


Nous Research AI ▷ #ask-about-llms (17 messages🔥):

Temp = 0 Variety, Avoiding Doom Loops, HIPAA Compliance, Kaida and Storywriter repos, litellm Differences


Nous Research AI ▷ #research-papers (1 messages):

superbear12: https://arxiv.org/abs/2507.02778


Nous Research AI ▷ #interesting-links (1 messages):

Liquid Foundation Models v2, Generative AI models


Nous Research AI ▷ #research-papers (1 messages):

superbear12: https://arxiv.org/abs/2507.02778


Yannick Kilcher ▷ #general (77 messages🔥🔥):

LLMs Death, Explainable Networks, Energy Consumption, Capitalist Market Dynamics, Facial Recognition Research


Yannick Kilcher ▷ #paper-discussion (4 messages):

EnergyMatching implementation, EnergyMatching paper discussion


Yannick Kilcher ▷ #ml-news (18 messages🔥):

Cyborg Bees, Mistral incremental improvement vs licensing, BrowserOS, METR's AI evaluation, Kimi-K2-Instruct


Eleuther ▷ #general (15 messages🔥):

LLM Safety Testing, Inference Cost Decline, Anthropic's LLM Neuron Activation Tracing, 1-bit LLMs, Decentralized Compute


Eleuther ▷ #research (30 messages🔥):

LLMs and Em Dashes, ByteDance MoE Kernels, Tokenizer-Free Models, N-Simplical Attention


Eleuther ▷ #lm-thunderdome (33 messages🔥):

Mixed Precision arg for HFLMs, Harness Evaluation Speed, Loading Models with Correct Dtype, Softmax Defaulting to Float32, Mixed Precision PR


Eleuther ▷ #gpt-neox-dev (7 messages):

WandB project, NGC container, NVIDIA H100 PCIe GPUs, RoPE_Pct


Latent Space ▷ #ai-general-chat (66 messages🔥🔥):

Groq valuation, Buying Subreddits, Reddit deep research agent, Grok-4 rate limit, AI generated videos


Latent Space ▷ #ai-announcements (1 messages):

swyxio: special double podcast this week! https://x.com/latentspacepod/status/1943774304166195402


aider (Paul Gauthier) ▷ #general (48 messages🔥):

Grok 4 coding ability, Kimi k2 Model, Copilot request limits, Aider console logs


aider (Paul Gauthier) ▷ #questions-and-tips (8 messages🔥):

aider and ollama, models for architect mode, leaderboards, aider in local language


MCP (Glama) ▷ #general (30 messages🔥):

MCP Superassistant, Malware Injection, MCP Server Posting, Multiple MCP Servers, FastMCP Reverse Proxy


MCP (Glama) ▷ #showcase (8 messages🔥):

MCPJam inspector fix, MCP client for Elicitation, Aidderall MCP server, Neurabase MCP server hosting


Notebook LM ▷ #use-cases (3 messages):

Quantitative Data Analysis, PDF Export, Trending Topics, Excel Data Extraction, Image Uploads


Notebook LM ▷ #general (21 messages🔥):

Audio Overviews, Image Uploading, Latex Rendering, Code Writing Prompts, Chat History Disappearance


Manus.im Discord ▷ #general (17 messages🔥):

SafeScan QR Launch, Manus Feature Suggestions, Subscription question, Registration error, Michael Seibel compliment


Cohere ▷ #🧵-general-thread (8 messages🔥):

Session locations, New office


Cohere ▷ #👋-introduce-yourself (3 messages):

Introductions, Monocular Depth Estimation, Knowledge Distillation, PyTorch


Torchtune ▷ #dev (5 messages):

Efficient CE, GRPO Sync


Torchtune ▷ #papers (5 messages):

small batches vs large batches, optim-in-bwd support, optimal batch sizes


Modular (Mojo 🔥) ▷ #general (5 messages):

Assembly coding in Mojo, Tracking Modular Community Events


Modular (Mojo 🔥) ▷ #mojo (2 messages):

Mojo MAX Tutorial, Custom Ops Matmul


DSPy ▷ #papers (2 messages):

Infer-Retrieve-Rank (IReRa), label classification, xmc.dspy GitHub repository, DSPy compatibility


DSPy ▷ #general (4 messages):

Prompt Optimization, Context Engineering with DSPy, MiProV2 Errors, Base64 Images


LlamaIndex ▷ #blog (3 messages):

Snowflake data agents, LeSearch agent, NotebookLlama features


LlamaIndex ▷ #general (2 messages):

Cloudflare AI Gateway, Automatic LLM Fallback, LlamaIndex Integration


Nomic.ai (GPT4All) ▷ #general (3 messages):

Multi-modal Models, Gemma 3, Architectural Floor Plan Feedback


Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (3 messages):

vllm, sglang, Llama 3B vs 8B


tinygrad (George Hotz) ▷ #general (2 messages):

PatternMatcher, UPat -> UPat rules, Egraph rewrite rules, Turing completeness