Frozen AI News archive

Qwen3-Next-80B-A3B-Base: Towards Ultimate Training & Inference Efficiency

**MoE (Mixture of Experts) models** have become essential in frontier AI models, with **Qwen3-Next** pushing sparsity further by activating only **3.7% of parameters** (3B out of 80B) using a hybrid architecture combining **Gated DeltaNet** and **Gated Attention**. This new design includes **512 total experts** (10 routed + 1 shared), **Zero-Centered RMSNorm** for stability, and improved MoE router initialization, resulting in **~10× cheaper training and 10× faster inference** compared to previous models. **Alibaba's Qwen3-Next** reportedly outperforms **Gemini-2.5-Flash-Thinking** and approaches the flagship 235B model's performance, with deployments on **Hugging Face**, **Baseten**, and native **vLLM** support for efficient inference.

Canonical issue URL

Gated Attention is all you need?

AI News for 9/10/2025-9/11/2025. We checked 12 subreddits, 544 Twitters and 22 Discords (187 channels, and 4884 messages) for you. Estimated reading time saved (at 200wpm): 414 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Since Noam Shazeer et al invented them in his annus mirabilis, MoE models have steadily increased in importance through GPT4 and Mixtral(8 experts). DeepSeek (160 experts), Snowflake (128 experts) and others then pushed the sparsity even further, and today it is fair to say that no frontier model is served without being an MoE (we have outright confirmations from Gemini, whereas the rest are strong rumors.)

Today's Qwen3-Next release pushes model sparsity even further - the industry has switched from "expert count" to total param vs active param ratio - and 3.75% (3B / 80B = 3.75%) is appreciably lower than GPT-OSS' 4.3% and Qwen3's own prior 10%.

According to them:

Ultra-Sparse MoE: Activating Only 3.7% of Parameters

Qwen3-Next uses a highly sparse MoE design: 80B total parameters, but only ~3B activated per inference step. Experiments show that, with global load balancing, increasing total expert parameters while keeping activated experts fixed steadily reduces training loss. Compared to Qwen3’s MoE (128 total experts, 8 routed), Qwen3-Next expands to 512 total experts, combining 10 routed experts + 1 shared expert — maximizing resource usage without hurting performance.

But for the ML folks, the probable bigger win is the strict pareto win seen in pretraining:

The authors credit a few architecture advancements:


AI Twitter Recap

Alibaba’s Qwen3-Next hybrid architecture and early ecosystem support

Image generation and OCR: ByteDance Seedream 4.0, Florence-2, PaddleOCRv5, Points-Reader

Developer platforms: VS Code + Copilot, Hugging Face speedups, vLLM hiring

Agent training and production agents: RL, tools, HITL, and benchmarks

Speech, audio, and streaming seq2seq

Systems and infra: MoE training, determinism trade-offs, and comms stack

Top tweets (by engagement)


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Qwen3-Next-80B A3B Launch + Tri-70B Apache-2.0 Checkpoints

2. Qwen3-Next Teasers and Coming-Soon Posts

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Seedream/Seedance 4.0 Image Model Releases and Benchmarks

2. UK Government AI Adoption and ChatGPT Ads Monetization

3. Real-world AI Impacts: Builder Traction, Medical Triage, and Consciousness Debate


AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. Generation Efficiency and Kernel-Level Wins

2. Leaderboards, MoE Moves, and New Models

3. Agentic Tools and Connectors Go Practical

4. Systems Tooling Shifts and GPU Gotchas

5. Mojo/MAX Platform: Custom Ops and Bindings


Discord: High level Discord summaries

Perplexity AI Discord


Unsloth AI (Daniel Han) Discord


LMArena Discord


HuggingFace Discord


Cursor Community Discord


OpenRouter Discord


GPU MODE Discord


LM Studio Discord


OpenAI Discord


Nous Research AI Discord


Latent Space Discord


DSPy Discord


Modular (Mojo 🔥) Discord


Yannick Kilcher Discord


Eleuther Discord


aider (Paul Gauthier) Discord


Moonshot AI (Kimi K-2) Discord


Manus.im Discord Discord


The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (1195 messages🔥🔥🔥):

DeepSeek vs ChatGPT, Grok's persona, GPT-5 vs Perplexity, Comet Browser


Perplexity AI ▷ #sharing (5 messages):

Shareable Threads, ProductHunt Vote, Image attachments


Perplexity AI ▷ #pplx-api (3 messages):

Friend Requests, API Errors, num_search_results errors


Unsloth AI (Daniel Han) ▷ #general (562 messages🔥🔥🔥):

Unsloth hardware compatibility, Training TTS models on Colab, Multi-GPU Support Roadmap, BERT models in Unsloth, Dynamic GGUF Quantization Requests


Unsloth AI (Daniel Han) ▷ #introduce-yourself (4 messages):

AI Engineering, AI Startups, Microservices, LLMs


Unsloth AI (Daniel Han) ▷ #off-topic (109 messages🔥🔥):

48GB 4090s, Privacy-First NixOS-Based OS, Unsloth Dependency Hell, Luau Learning and LeetCode, Promptwright DAG


Unsloth AI (Daniel Han) ▷ #help (220 messages🔥🔥):

Phi3-mini quantization, Unsloth BERT models, System prompt structure, Custom Santa voice, Qwen2.5-vl perceived dimensions


Unsloth AI (Daniel Han) ▷ #showcase (11 messages🔥):

Markov Chains, MoonshotAI's checkpoint-engine, vLLM v0.10.2rc1


Unsloth AI (Daniel Han) ▷ #research (2 messages):

Spectral Edit, Audio Analysis, LLM Inference


LMArena ▷ #general (820 messages🔥🔥🔥):

O3 Model Performance, Psychological Tactics in Prompting, AI Spritesheet Animation, LM Arena Legacy Website Removal, Nano-Banana vs Seedream V4


LMArena ▷ #announcements (3 messages):

Seedream-4, Qwen3-next-80b-a3b-instruct, Qwen3-next-80b-a3b-thinking, Hunyuan-image-2.1


HuggingFace ▷ #general (218 messages🔥🔥):

PEFT QLoRA Training, ArXiv Endorsement Request, WACV Paper Submission, LLM Fine-tuning Course Study Group, Mobile App Image Search


HuggingFace ▷ #today-im-learning (1 messages):

saadkhan_188: Same situation as ☝🏻


HuggingFace ▷ #smol-course (47 messages🔥):

Multilingual Smol Course, GPU Setup for Smol Course, Study Group for Smol Course, Loss issues with fine-tuning, Certification Process


HuggingFace ▷ #agents-course (2 messages):

Ollama, local model


Cursor Community ▷ #general (178 messages🔥🔥):

Cursor Issues, Cursor Auto Mode, Cursor Pricing, Student Verification, Token Refund


Cursor Community ▷ #background-agents (1 messages):

Cursor Linear integration, Default repository settings, Linear integration issues


OpenRouter ▷ #app-showcase (2 messages):

``


OpenRouter ▷ #general (125 messages🔥🔥):

Query Prompting Race Condition Bug, Token Calculation, JSONDecodeError, Moonshot AI Provider Selection, LongCat Implementation


OpenRouter ▷ #new-models (3 messages):

``


OpenRouter ▷ #discussion (29 messages🔥):

Grok Code inference pricing, Kilocode's Free Grok usage, OpenRouter pricing model


GPU MODE ▷ #general (1 messages):

Lambda Labs, Cloud GPUs, GPU Availability, GPU Instance Shortages, Cloud Computing


GPU MODE ▷ #triton (6 messages):

CUDA, PTX, TLX authors, Triton Compiler


GPU MODE ▷ #cuda (6 messages):

Flash Attention 1 vs Flash Attention 2, Q-outer vs KV-outer loops, FA2 main difference


GPU MODE ▷ #torch (18 messages🔥):

CUDA Graph Warmup, vLLM uv pip build, Prefill Compile


GPU MODE ▷ #algorithms (1 messages):

person12341234432: whaddafak is thaat


GPU MODE ▷ #beginner (5 messages):

CUDA benchmarks, GPU Synchronization, P104-100 BIOS Flash


GPU MODE ▷ #pmpp-book (2 messages):

PMPP Book, Kernel Writing, Learning on the Fly


GPU MODE ▷ #rocm (27 messages🔥):

MI300 dual VALU issue, waves per simd control, compute throughput calculation, AMD GPU for local running, Strix Halo unified memory machine


GPU MODE ▷ #self-promotion (3 messages):

MXFP quantization in Triton, Paged Attention in vLLM


GPU MODE ▷ #submissions (24 messages🔥):

MI300x8 submissions, Leaderboard Submission Questions, amd-all2all leaderboard


GPU MODE ▷ #factorio-learning-env (3 messages):

Factorio Learning Environment, Game Modding, Resource Management, Automation Strategies


GPU MODE ▷ #amd-competition (26 messages🔥):

Wuxin hints, Submission ranking updates, Multiple file submissions, Fairness in competition results, Triton error on AMD GPU


GPU MODE ▷ #general (18 messages🔥):

Kernel Development Roadmap, GPU Mode Leaderboard, KernelBot Development, AMD Competition, Reference Kernels


GPU MODE ▷ #multi-gpu (2 messages):

Claude vs AI tools, AI debugging, AI expertise


GPU MODE ▷ #low-bit-training (16 messages🔥):

Blackwell (5090) support for cuBLAS, Low precision training codebase, Custom Zero-3 quantization for forward and backward passes, CUDA memory copies vs NCCL AllGather, NCCL CE Collectives and SM usage


LM Studio ▷ #general (66 messages🔥🔥):

NVMe speed improvement, Model for Python code generation, Markdown rendering bug with sub tags, VRAM misidentification on Vulkan, Context usage in taskbar bug


LM Studio ▷ #hardware-discussion (86 messages🔥🔥):

Western Digital Drives Failure Rate, PNY NVIDIA DGX Spark ETA Issues, Framework Product Concerns, RAM and Motherboard Issues, AMD APU VRAM Utilization


OpenAI ▷ #ai-discussions (108 messages🔥🔥):

Gemini 2.5 Pro Hallucination, GPT-5 is SICK GOOD! ❤️‍🔥, Custom MCPs in OpenAI, GPT-5 generated code, custom gpt voice chat issues


OpenAI ▷ #gpt-4-discussions (2 messages):

Account access issues, Two-factor authentication, Password reset


OpenAI ▷ #prompt-engineering (14 messages🔥):

Transparent Optimizations Proposal, GPT-5 Prompting Guide, Instruction Following Best Practices, Structured Prompting Techniques, AI Self Help Conversation Analyzer


OpenAI ▷ #api-discussions (14 messages🔥):

Transparent Optimizations, Claude 4 sonnet, Novelists vs natural dialogue, GPT-5 agents, Structured prompting techniques


Nous Research AI ▷ #general (90 messages🔥🔥):

Disable WebGL, Agent Building, LLM philosophizing, Qwen3, Tokenizer filtering for dataset quality


Nous Research AI ▷ #research-papers (2 messages):

Set Block Decoding (SBD), Masked Token Prediction (MATP), Llama-3.1 8B, Qwen-3 8B, discrete diffusion literature


Nous Research AI ▷ #interesting-links (1 messages):

promptsiren: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/


Nous Research AI ▷ #research-papers (2 messages):

Set Block Decoding (SBD), next token prediction (NTP), masked token prediction (MATP), Llama-3.1 8B, Qwen-3 8B


Latent Space ▷ #ai-general-chat (68 messages🔥🔥):

GPT-OSS, Sam Altman Interview, Codex Power Users, OpenAI Oracle Deal, OpenAI Evals


Latent Space ▷ #genmedia-creative-ai (4 messages):

ByteDance Seedream 4.0, Artificial Analysis leaderboards, Google's Nano-Banana


DSPy ▷ #show-and-tell (2 messages):

DSPY Blog Writing Agent, Math GPT App


DSPy ▷ #general (59 messages🔥🔥):

DSPy transpilation to other languages, RL in DSPy, DSPy with Java, Instructions mutability by optimizers, DSPy maintainers


Modular (Mojo 🔥) ▷ #general (3 messages):

Mojo Dev Environment, Docker Container Checkpoint, Existing Images as Base Image


Modular (Mojo 🔥) ▷ #mojo (34 messages🔥):

Mojo Compiler Roadmap, DPDK Bindings Generation, c_binder_mojo Tool, Fortran out Pattern, Clang AST parser


Modular (Mojo 🔥) ▷ #max (17 messages🔥):

Adding bitwise_and op, Torch Max backend wheel size, Custom Ops, Graphs Slow to Stage


Yannick Kilcher ▷ #general (5 messages):

Sparsity Ratio, Saturday Session Papers


Yannick Kilcher ▷ #paper-discussion (10 messages🔥):

Planning with Reasoning using Vision Language World Model, Prompt Templating System, POM


Yannick Kilcher ▷ #ml-news (14 messages🔥):

Spiking Neural Networks (SNNs), Vertical Integration & Specialized Hardware for AI, China's AI Hardware Ambitions


Eleuther ▷ #general (5 messages):

Crank detection questions, Introduction to the community


Eleuther ▷ #research (19 messages🔥):

Hallucination definition, Bin packing vs. truncation, RAG problem


Eleuther ▷ #multimodal-general (3 messages):

Discord Channel Link


aider (Paul Gauthier) ▷ #general (8 messages🔥):

AI Documentation Agent Tuning, Evaluation methodologies for AI agents, Defining good outputs for AI models, Vercel AI SDK Usage, Prompt Engineering tips


aider (Paul Gauthier) ▷ #questions-and-tips (14 messages🔥):

aider /load command, Aider codebase edits, Aider repo map


Moonshot AI (Kimi K-2) ▷ #general-chat (21 messages🔥):

Kimi K2 search capabilities, K2 research sending email during research, Models for creative writing