Frozen AI News archive

Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch

**Moonshot AI** launched **Kimi K2 Thinking**, a **1 trillion parameter** mixture-of-experts (MoE) model with **32 billion active experts**, a **256K context window**, and native **INT4 quantization-aware training**. It achieves state-of-the-art results on benchmarks like **HLE (44.9%)**, **BrowseComp (60.2%)**, and agentic tool use with **200-300 sequential tool calls**. The model is deployed with **vLLM** support and OpenAI-compatible APIs, available on platforms like Arena, Baseten, and Yupp. Early user reports note some API instability under launch load. Meanwhile, **Google** announced the **TPU v7 (Ironwood)** with a **10× peak performance improvement** over TPU v5p, aimed at training and agentic inference for models like **Gemini**. **Apple** added support for M5 Neural Accelerators in llama.cpp for inference acceleration.

Canonical issue URL

Open Weights is all you need?

AI News for 11/5/2025-11/6/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (200 channels, and 5907 messages) for you. Estimated reading time saved (at 200wpm): 479 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Chatter has been high for a while as Kimi was prepping the open source ecosystem for this release, but the benchmarks are the surprising thing: for the first time, an open model is claiming to beat SOTA closed models (GPT5, Claude 4.5 Sonnet Thinking) at important major benchmarks:

Bar graph showing Kimi K2 Thinking's performance across various AI benchmarks, highlighting its state-of-the-art results in

Even more encouraging, Artificial Analysis even volunteered another SOTA in their independent testing:

Bar graph showing Kimi K2 Thinking's performance on the Tau2 Bench Telecom Agentic Tool Use

It is early days, but vibe checks are good.

There's no paper, but the model card has a few more details on the native INT4 training and the 200-300-long tool calling capabilities given the 256k context window.

Congrats Kimi/Moonshot!!


AI Twitter Recap

Moonshot AI’s Kimi K2 Thinking: open‑weights 1T INT4 reasoning MoE, long‑horizon tools


New AI silicon and inference stack updates (TPU v7, Apple M‑series, adaptive decoding)


Agent frameworks, wallets, and managed RAG


Research and benchmarks: memorization vs. generalization; agent/data‑science evals


Developer tools and media models


People and orgs


Top tweets (by engagement)


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Kimi K2 Thinking Model Release

2. DroidRun AI Tool Discussion

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. XPeng Humanoid Robot Insights

2. Google Ironwood AI Chip Launch

3. OpenAI GPT-5.1 Source Code Leak


AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. Moonshot's Kimi K2 Thinking: Agentic Reasoning Hits Production

2. Benchmarks, Leaderboards, and 'Who’s Winning' Meta

3. GPU Systems: FP4 Tricks, Real Bandwidth, and Triton Tactics

4. Research & Libraries: Linear Maps, Numerics, and New Video Diffusion

5. Ecosystem Moves: Siri Rumors, Agent Cookouts, and Real‑Time Query Editing


Discord: High level Discord summaries

LMArena Discord


Perplexity AI Discord


LM Studio Discord


Unsloth AI (Daniel Han) Discord


Cursor Community Discord


GPU MODE Discord


Moonshot AI (Kimi K-2) Discord


OpenRouter Discord


Modular (Mojo 🔥) Discord


OpenAI Discord


Latent Space Discord


Yannick Kilcher Discord


HuggingFace Discord


Nous Research AI Discord


Eleuther Discord


DSPy Discord


tinygrad (George Hotz) Discord


aider (Paul Gauthier) Discord


MCP Contributors (Official) Discord


Manus.im Discord Discord


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (1066 messages🔥🔥🔥):

MovementLabs AI, GPT-5, Gemini 3 Pro vs Lithiumflow, Genie 3, Open Source Chinese LLMs


LMArena ▷ #announcements (1 messages):

Kimi-k2-thinking model, LMArena updates


Perplexity AI ▷ #general (1074 messages🔥🔥🔥):

Bounty payments, AdBlock on Comet, Youtube Ads, Comet Browsers for Linux, Best AI for Coding


Perplexity AI ▷ #pplx-api (4 messages):

API Documentation, API Usage


LM Studio ▷ #general (155 messages🔥🔥):

vast.ai rental, UV for python versions, GPTs agents learning, longer token context history, Intel llm-scaler


LM Studio ▷ #hardware-discussion (802 messages🔥🔥🔥):

3090 vs 3080 benchmarks, multi-GPU setups, OpenRouter API, EPYC, AMD Radeon™ AI PRO R9700


Unsloth AI (Daniel Han) ▷ #general (323 messages🔥🔥):

Unsloth for diffusion models, Masking Issues, Qwen3 Model Tuning Nightmares, Compute Metrics Functions, Colab TPU support


Unsloth AI (Daniel Han) ▷ #introduce-yourself (4 messages):

LLM Training Principles, Research Paper Recommendations, Workflow Automation


Unsloth AI (Daniel Han) ▷ #off-topic (48 messages🔥):

Parakeet models vs Whisper, Model Compliance & Truth, Human-Level TTS Training, Model Uncensoring, Coordinates of speech bubbles


Unsloth AI (Daniel Han) ▷ #help (53 messages🔥):

TRL's RLOO trainer REINFORCE implementation, Qwen3-coder-30b on 5080GPU slow API calls, GPT-OSS-120B quantization issue, Granite 4.0 Hybrid models issues, Adding new tokens to Qwen/Qwen3-4B-Instruct-2507


Cursor Community ▷ #general (285 messages🔥🔥):

Cursor Model Composer Limits, Cursor App Crashes, Cursor's 'Auto' Mode Pricing, Cursor and Grok Code Costs, Drag and Drop Issues


Cursor Community ▷ #background-agents (8 messages🔥):

Cursor 2.0, internal error, base64 image, Cursor agent API


GPU MODE ▷ #general (67 messages🔥🔥):

mattpharr joins discord, FP4 kernel, Nvidia interview, Blackwell PTX ISA, Datacrunch CUDA support


GPU MODE ▷ #triton-gluon (21 messages🔥):

Triton kernel recompilation, Gluon examples with comms, Expressing swizzling in Gluon, Replicating Triton JIT in C++, Triton C++ Bridge


GPU MODE ▷ #cuda (21 messages🔥):

Memory Bandwidth Saturation, CUDA Kernel Tuning, Triton Stream Execution


GPU MODE ▷ #torch (26 messages🔥):

torch.AcceleratorError, CUDA error recovery, Kernel Benchmarking, BackendBench multiprocessing eval, Soumith Chintala


GPU MODE ▷ #cool-links (7 messages):

Numerical Stability, GEMM correctness checks, LLM generated kernels, fp16 vs fp32 numerics


GPU MODE ▷ #jobs (1 messages):

Hiring, Engineering Positions, Low-level Developers, AI System Performance


GPU MODE ▷ #beginner (7 messages):

PyTorch/vllm on AMD AI PCs, Image to Image ViT Optimization, 1D Convolution Kernel for Tensara Problem


GPU MODE ▷ #jax-pallas-mosaic (1 messages):

jax.experimental, gpu collective_matmul_mgpu.py


GPU MODE ▷ #torchao (2 messages):

Accelerated Sparse Computation


GPU MODE ▷ #off-topic (1 messages):

Milk Couch


GPU MODE ▷ #intel (1 messages):

oneAPI 2025.3.1, Intel Fortran Compiler


GPU MODE ▷ #metal (2 messages):

Candle Framework, Metal backend


GPU MODE ▷ #self-promotion (3 messages):

Bit Counting, Geometric Series, SSE Popcount, CUDA Intrinsics


GPU MODE ▷ #submissions (1 messages):

vectorsum_v2, A100, B200, H100, L4


GPU MODE ▷ #hardware (8 messages🔥):

DGX Spark experiences, GDDR chip replacement, DGX Spark as Datacenter Proxy, SM120 GPU in DGX Spark, Strix Halo vs DGX Spark


GPU MODE ▷ #tpu (1 messages):

Profile Collection Strategies, Reducing Profile Duration, Debugging Function Calls


GPU MODE ▷ #amd-competition (12 messages🔥):

Website bug reports, Ranking fixes, Submission validity, Grand prize winner


GPU MODE ▷ #cutlass (12 messages🔥):

sum reduce kernel in cutedsl, TMA assumptions, tv-layout data partitioning, CuTe DSL functionality, PTX instruction wrapper


GPU MODE ▷ #mojo (1 messages):

Mojo Kernel Boilerplate for Competitions, Mojo Competition Submission Structure


GPU MODE ▷ #singularity-systems (2 messages):

picograd, tinygrad, eager mode, PatternMatcher abstraction, pedagogical perspective


GPU MODE ▷ #multi-gpu (10 messages🔥):

KernelBench for Multi-GPU, NCCL vs NVSHMEM, NCCL4Py preview


GPU MODE ▷ #opencl-vulkan (4 messages):

Compute Pipelines on Android, Slang's Drawbacks


GPU MODE ▷ #helion (1 messages):

t_cc: Thanks for the quick fix!


GPU MODE ▷ #nvidia-competition (30 messages🔥):

NCU profiling with Popcorn, 50x series consumer cards and nvfp4/tensor core gen 5 support, Optimizing scores on every GPU vs. one best GPU for the hackathon, Regional eligibility for Indian participants, Kernel testing platform without a GPU


Moonshot AI (Kimi K-2) ▷ #announcements (1 messages):

Kimi K2 Thinking Model, HLE Benchmark, BrowseComp Benchmark, Agentic Search, 256K Context Window


Moonshot AI (Kimi K-2) ▷ #general-chat (214 messages🔥🔥):

Kimi K2 Thinking, GPT-5 Comparison, OpenRouter vs Direct API, INT4 Quantization, Agentic Mode


OpenRouter ▷ #announcements (1 messages):

Kimi K2 Thinking, MoonShot AI, Test-time scaling, Agentic performance


OpenRouter ▷ #app-showcase (5 messages):

image generation failures, cat girl images


OpenRouter ▷ #general (176 messages🔥🔥):

OpenRouter downtime, Qwen3 Rate Limits, GPT-5 Image Mini Issues, Apple using Google AI for Siri, DeepSeek OCR integration


OpenRouter ▷ #new-models (2 messages):

``


OpenRouter ▷ #discussion (26 messages🔥):

Tiger Data Agent Cookout, Claude Prompt Jailbreak, GPT Model Censorship, OpenAI Codex Update, OpenRouter Chatroom Issues


Modular (Mojo 🔥) ▷ #general (120 messages🔥🔥):

Modular YouTube channel, Martin's Generic Radix-n FFT, DSLs in Mojo, Rust interoperability, Mojo's Safety Features


Modular (Mojo 🔥) ▷ #announcements (1 messages):

New Beginners Channel


Modular (Mojo 🔥) ▷ #mojo (52 messages🔥):

Compiler Intrinsic Packaging, LayoutTensor vs NDBuffer, Graph Representation Optimization, Expanding libc in Mojo


OpenAI ▷ #annnouncements (1 messages):

Interrupt long-running queries, Add new context without restarting, Refining deep research, GPT-5 Pro queries


OpenAI ▷ #ai-discussions (73 messages🔥🔥):

Conscious AI Ethics, Solving Mazes with AI, AI Spatial Reasoning, GPT-5 capabilities, Sora Code Channel


OpenAI ▷ #gpt-4-discussions (7 messages):

Selling ChatGPT Plus subscriptions, Research study on developer poaching, Anime videos made by Sora


OpenAI ▷ #prompt-engineering (11 messages🔥):

GPT Pro prompting tips, Gemini Deep Research Comparison, Sora Nerf, Behavioural Orchestration


OpenAI ▷ #api-discussions (11 messages🔥):

Prompt Engineering Tips, Sora 2 Nerf, Behavioral Orchestration, Hierarchical communication, Abstraction through open variables


Latent Space ▷ #ai-general-chat (75 messages🔥🔥):

CodeClash Benchmark, Wabi YouTube-for-Apps, Polaris Alpha, Kimi K2 Thinking Model, OpenAI's CFO pitch


Latent Space ▷ #ai-announcements (4 messages):

Zuckerberg, Priscilla Chan, Latent Space podcast, Chan Zuckerberg Initiative, Curing All Disease with AI


Latent Space ▷ #private-agents (21 messages🔥):

Local model for JSON schema conversion, Apple Private Compute Cloud, OpenPCC privacy features, Confident Security


Yannick Kilcher ▷ #general (73 messages🔥🔥):

Slow Mode in Discord, ML Paper Filtering, Devin AI vs Claude Code, LLM Protection Blogposts, Tiny Recursive Models


Yannick Kilcher ▷ #paper-discussion (22 messages🔥):

RNN resurgence, Learning from Failures, VISUAL ARCHITECTURE


Yannick Kilcher ▷ #ml-news (5 messages):

OpenAI requests Federal Backstop, Crooked Schemes


HuggingFace ▷ #general (50 messages🔥):

Correlation & Causation Stock Market LLM, Reasoning Scratchpad Models, AI security shortcomings, Hugging Face new regulations, Model Types


HuggingFace ▷ #today-im-learning (2 messages):

Image analysis, Screenshots


HuggingFace ▷ #i-made-this (5 messages):

Muther Room On-Device LLM Demo, TraceVerde Observability Tool, AI Agent Decision Making


HuggingFace ▷ #reading-group (1 messages):

beluwugachan: Now ai can be your reading buddy


HuggingFace ▷ #core-announcements (1 messages):

SANA-Video Model, Diffusers library


HuggingFace ▷ #agents-course (1 messages):

fusco0984: Hello, if joining the agents course today, can I get the certificate of completion?


Nous Research AI ▷ #general (50 messages🔥):

Tokenizer highlighting, Flash attention with Qwen3-VL, LLM dataset creator, China OS models, Proscrastinating


Nous Research AI ▷ #ask-about-llms (2 messages):

Discord Channel Silencing


Nous Research AI ▷ #research-papers (2 messages):

Breakthrough moment, New paper


Nous Research AI ▷ #research-papers (2 messages):

arxiv papers, breakthrough moment


Eleuther ▷ #general (19 messages🔥):

Introductions Channel, Post Length, AI Developer Study Notes


Eleuther ▷ #research (1 messages):

synquid: https://openreview.net/forum?id=Q7mLKxQ8qk


Eleuther ▷ #interpretability-general (10 messages🔥):

Equivalent Linear Mappings, Jacobian in input embedding space, low-dimensional semantic structure, Gemma Scope SAE latents


DSPy ▷ #show-and-tell (1 messages):

Tau Bench results, fastWorkflow, GEPA workflow optimization


DSPy ▷ #general (13 messages🔥):

Conversation History in DSPy Modules, LLM Context Loss in ReAct Modules, Deserialization of Complex Pydantic OutputFields, DSPy Prompt in Java, Rate Limits for DSPy Batch Requests


tinygrad (George Hotz) ▷ #general (4 messages):

Tinybox benchmarks, Tinygrad out of band mechanism, VIZ performance over SSH


tinygrad (George Hotz) ▷ #learn-tinygrad (8 messages🔥):

Uop SPECIAL, ntid Access, UOps Errors, UOps Kernel Generation, PyTorch Tensors to Tinygrad Tensors


aider (Paul Gauthier) ▷ #general (5 messages):

Aider-ce Documentation, Claude Sonnet 4-5-20250929 support, Enable reasoning on models like Haiku-4-5, opus-4-1


aider (Paul Gauthier) ▷ #questions-and-tips (6 messages):

aider memory usage with Qwen, grep vs rg, Aider Discord Plugin, Gemini vs GPT-5, Aider scripting help


MCP Contributors (Official) ▷ #general (9 messages🔥):

Image handling as tool input, MCP tool for image conversion to URL, Reddit thread on Code Execution with MCP