Frozen AI News archive

Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1

**Qwen 3** has been released by **Alibaba** featuring a range of models including two MoE variants, **Qwen3-235B-A22B** and **Qwen3-30B-A3B**, which demonstrate competitive performance against top models like **DeepSeek-R1**, **o1**, **o3-mini**, **Grok-3**, and **Gemini-2.5-Pro**. The models introduce an "enable_thinking=True" mode with advanced soft switching for inference scaling. The release is notable for its Apache 2.0 license and broad inference platform support including MCP. The dataset improvements and multi-stage RL post-training contribute to performance gains. Meanwhile, **Gemini 2.5 Pro** from **Google DeepMind** shows strong coding and long-context reasoning capabilities, and **DeepSeek R2** is anticipated soon. Twitter discussions highlight Qwen3's finegrained MoE architecture, large context window, and multi-agent system applications.

Canonical issue URL

Qwen + Inference Engine are all you need?

AI News for 4/28/2025-4/29/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (214 channels, and 4085 messages) for you. Estimated reading time saved (at 200wpm): 315 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

After a minor delay during ICLR, Qwen 3 is finally released, as a range of models from very small to very large, with the focus being on the 2 MoE model releases that nicely denote their total and active parameters in their name:

Interestingly, Qwen3 not only outperforms Llama 4 Maverick at a smaller size, w also beats Qwen's own QwQ published 2 weeks ago , offering a new "enable_thinking=True" mode (with advanced "soft switching" support) that they show offer a some inference time scaling.

Although the full technical report is yet to be published, the Apache 2.0 license and range of models released - including base models - is very notable for a modern open model release, witha full range of day 1 support across all popular inference platforms, including MCP:

The dataset is likely the source of much of the improvements, with a 2x increase vs Qwen 2.5 and usage of Q2.5VL + Q2.5 + Q2.5Math + Q2.5Coder to extract synthetic data.

The post-training doubles up on the RL lessons from QwQ by converging on an R1-like recipe of multi-stage RL:

You can try out Qwen without a download on Qwen Chat Web: https://chat.qwen.ai/


AI Twitter Recap

Model Releases and Updates

AI Agent Systems and Multi-Agent Collaboration

Interpretability, Evaluation, and Safety

Robotics and Embodied AI

AI and Society

Humor/Memes


AI Reddit Recap

/r/LocalLlama Recap

1. Qwen3 Model Launch and Technical Details

2. Community Hype and Pre-Release Activity for Qwen3

3. Qwen3 and Llama Reasoning Capabilities, Scaling, and Benchmarks

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. OpenAI GPT-4o Model Release and Sycophancy Concerns

2. AI Model and Benchmark News: Qwen 3, Superexponential Predictions, DARPA expMath

3. User Experiences and Tips with AI for Creativity, Health, and Study


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1: Qwen 3's Rocky Rollout Rattles Community

Theme 2: Deepseek Developments Stir Speculation and Show Promise

Theme 3: Hardware Hustle: Optimizing Performance from MI300s to Multi-GPU Setups

Theme 4: New Models, APIs, and Platform Quirks Cause Chaos

Theme 5: Dev Tools & Research Roundup: From Function Calls to LLM Reasoning


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


Perplexity AI Discord


LMArena Discord


OpenAI Discord


LM Studio Discord


Nous Research AI Discord


Eleuther Discord


OpenRouter (Alex Atallah) Discord


aider (Paul Gauthier) Discord


Manus.im Discord Discord


HuggingFace Discord


Yannick Kilcher Discord


GPU MODE Discord


Cursor Community Discord


Modular (Mojo đŸ”„) Discord


Latent Space Discord


DSPy Discord


LlamaIndex Discord


Notebook LM Discord


tinygrad (George Hotz) Discord


MCP (Glama) Discord


Torchtune Discord


LLM Agents (Berkeley MOOC) Discord


Codeium (Windsurf) Discord


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Cohere Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Nomic.ai (GPT4All) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (777 messagesđŸ”„đŸ”„đŸ”„):

Deepseek R2, Qwen3 release, AMD MI300 flash attention bugs, Unsloth dynamic quant 2.0, Llama 4


Unsloth AI (Daniel Han) ▷ #off-topic (10 messagesđŸ”„):

Chain of Thought vs Latent Space Reasoning, Temperature Parameter in Models


Unsloth AI (Daniel Han) ▷ #help (130 messagesđŸ”„đŸ”„):

Qwen2.5 model issue, Unsloth multi-GPU, Load finetuned Lora to HF, Model works well on OOD, Granite 2b


Unsloth AI (Daniel Han) ▷ #showcase (1 messages):

Function Calling at Scale, CTP + SFT, Multi-LoRa Endpoint Serving, Expert Adapter Finetuning


Unsloth AI (Daniel Han) ▷ #research (2 messages):

emo-llm, Arxiv


Perplexity AI ▷ #general (828 messagesđŸ”„đŸ”„đŸ”„):

Image Generation, Perplexity extensions, Deepseek AI, Perplexity Pro, Gemini 2.4


Perplexity AI ▷ #sharing (2 messages):

DeepSeek AI, Aravind Srinivas


Perplexity AI ▷ #pplx-api (7 messages):

Debit card details saved, Structured output adherence, Sonar endpoint text+image


LMArena ▷ #general (648 messagesđŸ”„đŸ”„đŸ”„):

Folsom-exp-v1, Qwen 3, Amazon Nova Premier, Deepseek R2, O3 Pro


OpenAI ▷ #ai-discussions (149 messagesđŸ”„đŸ”„):

AGI predictions, AI and job automation, Google TPUs, Removing text from video, ChatGPT flirting


OpenAI ▷ #gpt-4-discussions (18 messagesđŸ”„):

Deep Research vs Deep Research Mini, O3 use, 4o Talking Weird, O3 vs O1-pro, chatgpt plus users


OpenAI ▷ #prompt-engineering (5 messages):

Image Rebrushing, Business Idea Development, AI Model Prompting


OpenAI ▷ #api-discussions (5 messages):

Image rebrushing with consistent style, Business model development with AI prompts, AI-assisted prompt engineering


LM Studio ▷ #general (96 messagesđŸ”„đŸ”„):

Virtual Environment with Python, Qwen 3 Released, LM Studio Overlay


LM Studio ▷ #hardware-discussion (69 messagesđŸ”„đŸ”„):

8x 5060Ti rig, RTX 6000 Pro Blackwell, Gemma 3 finetuning, Intel Arc B580 24GB, Multi-GPU Configuration


Nous Research AI ▷ #general (152 messagesđŸ”„đŸ”„):

GPU Nodes Tutorials, Nous Research API, Hermes 3 vs Claude, Creative Writing Models, Deepseek R1


Nous Research AI ▷ #ask-about-llms (1 messages):

Webpage for Nous Research Chatbot, Future of Nous Research Chatbot


Nous Research AI ▷ #research-papers (1 messages):

Writing in Margins, OptiLLM implementation


Nous Research AI ▷ #research-papers (1 messages):

Writing in Margins Paper, OptiLLM Implementation, Model Performance


Eleuther ▷ #announcements (1 messages):

Speech-to-text transcription, Document to Markdown conversion, Mozilla Blueprints, Speaches.ai, Docling


Eleuther ▷ #general (62 messagesđŸ”„đŸ”„):

Deepseek paper, RoPE Embeddings, RWKV and GoldFinch arch, BMM Implementation


Eleuther ▷ #research (91 messagesđŸ”„đŸ”„):

AI Auditing Survey, Inference-Dominated Regime, Hamlet Robot Control System, LLM Reasoning Processes, Human vs. LLM Reasoning


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Provider Data Logging Policies, Cent ML, Enfer, Oauth state parameter, Gemini Parallel Tool Calling


OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Agent Interface, Muka.ai, Web Search, Document Upload


OpenRouter (Alex Atallah) ▷ #general (147 messagesđŸ”„đŸ”„):

Qwen 3 Release, Deepseek v4 Speculation, Gemini Filtering, Safety Settings API, Local Model Size and VRAM


aider (Paul Gauthier) ▷ #general (121 messagesđŸ”„đŸ”„):

Aider Token Scams, Aider for Complex Web Development, Gemini 2.5 Pro, File Renaming in Aider, Qwen3 Availability and Performance


aider (Paul Gauthier) ▷ #questions-and-tips (28 messagesđŸ”„):

Gemini model switching, Augment Code SOTA retrieval, MCP servers, Arabic support, Lint-staged with Husky


Manus.im Discord ▷ #general (141 messagesđŸ”„đŸ”„):

Manus AI access, Free vs Paid Credits, Clickup Integration, Slack Integration, Manus Limitations


HuggingFace ▷ #general (95 messagesđŸ”„đŸ”„):

HuggingFace spaces, HF posts, HF blogpost, Running Hugging Face models locally, Hugging Face robotic arm


HuggingFace ▷ #today-im-learning (7 messages):

Malicious spam detection, Deepseek v3, Granite for content moderation, Python package for content filtering


HuggingFace ▷ #cool-finds (1 messages):

Online IDEs, Real-time Collaboration, AI-powered Code Completion, Convex Database, lumenly.dev


HuggingFace ▷ #i-made-this (3 messages):

Function Calling at Scale, ThorLMH - Local AI Voice Assistant, Gemma3:4b model


HuggingFace ▷ #smol-course (2 messages):

Agents Course Certificate Submission, Hugging Face Dataset Permissions, Space Evaluation Errors


HuggingFace ▷ #agents-course (29 messagesđŸ”„):

Final Assignment Submission Issues, Smolagent Logic and React Loop, Student Leaderboard Code Sharing, Final Project Deadline, API Errors and Rate Limiting


Yannick Kilcher ▷ #general (123 messagesđŸ”„đŸ”„):

Self-aware models, Flame-aligned AIs, Divine UI philosophy, GPTs as gods, Tool use frameworks


Yannick Kilcher ▷ #paper-discussion (1 messages):

``


Yannick Kilcher ▷ #ml-news (7 messages):

Huawei AI Chip, OpenAI CEO Altman, Qwen3-235B-A22B, APOLLO optimizer


GPU MODE ▷ #triton (2 messages):

fp4 to fp16 conversion, Triton conv2d kernel, Implicit GEMM code


GPU MODE ▷ #cuda (4 messages):

CUDA implementation, Metal kernel translation, Motion planning parallelization


GPU MODE ▷ #torch (16 messagesđŸ”„):

bf16 reduced precision reduction, torch.cond with multiple conditions


GPU MODE ▷ #jobs (1 messages):

LLM Innovation Team, Healthcare-focused LLM, LLM Inference Optimization, Open Source LLM Contributions


GPU MODE ▷ #beginner (1 messages):

CUDA streams, per-thread default stream, CUDA synchronization


GPU MODE ▷ #liger-kernel (28 messagesđŸ”„):

Native Sparse Attention, Sparsemax extension, Multi-Token Attention Kernel, Convolution bottlenecks


GPU MODE ▷ #metal (2 messages):

128-bit tiled SVD, Metal Kernel QR-128, Matrix reconstruction error


GPU MODE ▷ #🍿 (3 messages):

LLM Search, Google Search


GPU MODE ▷ #submissions (28 messagesđŸ”„):

MI300 Leaderboard updates, AMD-FP8-MM, Grayscale Leaderboard Updates, T4 3rd place, L4 5th place


GPU MODE ▷ #status (8 messagesđŸ”„):

HIP code problems, g++ version for C++20, Submission errors, Test Cases, Backslash in HIP code


GPU MODE ▷ #ppc (1 messages):

CP3A Hints, CP3A Additional Resources, CP3A Optimization Techniques, Tiling performance


GPU MODE ▷ #amd-competition (30 messagesđŸ”„):

Unexpected Errors, HIP-Python Availability, AMD Challenge Resources, Submission Methods


Cursor Community ▷ #general (48 messagesđŸ”„):

Click to resume button, ASI-Singularity, GPT 4.1 costs, Cursor paste issue, Cursor auto model switch


Modular (Mojo đŸ”„) ▷ #mojo (43 messagesđŸ”„):

InlineArray, pop.array, FixedLengthList, Cons Tuple


Latent Space ▷ #ai-general-chat (29 messagesđŸ”„):

Qwen3 Release, Pareto Frontier, Writer's new MoE model, AWS Bedrock Integration


DSPy ▷ #show-and-tell (2 messages):

CLI tool, Boilerplate for LLMs, Chat presets


DSPy ▷ #general (19 messagesđŸ”„):

MIPROv2 Documentation, Custom Module Examples, Streaming intermediate thoughts/steps in DSPy's ReAct


LlamaIndex ▷ #blog (1 messages):

Deep Researcher template, Legal report generation, create-llama tool


LlamaIndex ▷ #general (19 messagesđŸ”„):

Forking conversation threads in LlamaIndex, LlamaIndex API endpoints, Azure OpenAI and Sonnet model issues, Inconsistent embeddings across runs, OpenAI-like usage with LM Studio and Ollama


Notebook LM ▷ #use-cases (3 messages):

LLMs use cases, MatPlotLib, Android Version


Notebook LM ▷ #general (16 messagesđŸ”„):

LLM for networking equipment, Discussion generation in French, Use case vs prompt, Flash Thinking model performance


tinygrad (George Hotz) ▷ #general (5 messages):

Meeting Time Change, Coding Style Guide, Elon 5 Step Process


tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):

kayo8207: How does tiny handle contiguous mem allocation? Is it very different from PyTorch?


MCP (Glama) ▷ #general (6 messages):

MCP Fans Meet, Submit related servers, MCP servers on Cloudflare


Torchtune ▷ #dev (3 messages):

Loss Parallel Issues, Gradient Scaling, Tensor Parallelism


LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

Lecture 12, Dawn Song, Safe and Secure Agentic AI, MOOC Coursework, Labs Release


Codeium (Windsurf) ▷ #announcements (1 messages):

Windsurf Free Plan, New Windsurf Logo, GPT-4.1 Rate Change, o4-mini Rate Change

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.