Frozen AI News archive

NVIDIA Nemotron 3: hybrid Mamba-Transformer completely open source models from 30B to 500B

**NVIDIA** has released **Nemotron 3 Nano**, a fully open-source hybrid Mamba-Transformer Mixture-of-Experts (MoE) model with a **30B parameter size** and a **1 million token context window**. It includes open weights, training recipes, datasets, and an RL environment suite called NeMo Gym, supporting commercial use under the NVIDIA Open Model License. The model achieves state-of-the-art results on benchmarks like SWE-Bench and Artificial Analysis Intelligence Index, outperforming **Qwen3-30B A3B**. Ecosystem support is immediate with integrations into inference stacks like **vLLM**, **llama.cpp**, and **Baseten**. Upcoming larger models, Nemotron Super and Ultra, will feature NVFP4 pretraining and LatentMoE routing to optimize compute. This release marks a significant milestone for open-source American AI with comprehensive open assets and advanced hybrid architecture.

Canonical issue URL

a good day for open source American AI.

AI News for 12/12/2025-12/15/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (206 channels, and 15997 messages) for you. Estimated reading time saved (at 200wpm): 1294 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Nvidia's Nemotron is not often in the top tiers of open models, but distinguishes by being COMPLETELY open, as in, "we will openly release the model weights, pre- and post-training software, recipes, and all data for which we hold redistribution rights." (Nemotron 3 paper), as well as American-origin. Nano 3 is competitive with Qwen3:

Comparison of Qwen3-30B-A3B-Base and Nemotron 3 Nano 30B-

When these are released, they effectively serve as the checkpoint for the state of the art in LLM training, because they basically gather all of the table stakes things known to work. Among the notable choices - hybrid archs enabling long (1m) context:

Hybrid State Space Model + Transformer Architecture

Nemotron 3 model architecture visualization showing interleaved Mamba-2 and MoE layers with select self-attention

Technical architecture diagram showing details of the Nemotron 3 Nano hybrid Mamba-Transformer Mixture-of-Experts

Multi environment RL (Nemo-Gym and Nemo-RL open sourced)

A technical document page describing the post-training methodology for the Nemotron 3 Nano AI model, highlighting its hybrid architecture, multi

A technical document page describing the infrastructure for NeMo Gym, a framework for reinforcement learning environments with three core server types: agents, models

Per the Nano 3 tech report, they will be releasing all their datasets:

Diagram of Nemotron 3 Nano layer architecture showing a hybrid Mamba-Transformer Mixture of Experts (M


AI Twitter Recap

NVIDIA’s Nemotron 3: open hybrid MoE models, data, and agent stack

Reasoning, retrieval, and coding agents: new techniques and results

Inference and infra: multimodal serving, quantization, schedulers

Agent/coding toolchain and evals

Vision, video, 3D worlds

Product signals: OpenAI, Google, Allen, Arena

Top tweets (by engagement, AI‑focused)


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. NVIDIA Nemotron 3 Nano Release

2. Google Model Announcement

3. Frustration with Tech Performance

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Advanced AI Model Benchmarks

2. Innovative Storage and Robotics Technologies

3. Creative AI Applications in Media and Design


AI Discord Recap

A summary of Summaries of Summaries by gpt-5.2

1. Kernel & GPU Systems: Papers, Microbenchmarks, and Real Speedups

2. LLM Product Plumbing: Observability, Routing, and Multimodal Quirks

3. Training & Finetuning Tricks: Throughput Wins and Safety Side-Effects

4. Model Releases, Benchmark Drama, and ‘Did You Just Cheat?’

5. MCP + Agent Tooling: Specs, Flags, and Ecosystem Paper Cuts


Discord: High level Discord summaries

BASI Jailbreaking Discord


LMArena Discord


Unsloth AI (Daniel Han) Discord


Cursor Community Discord


Perplexity AI Discord


LM Studio Discord


OpenRouter Discord


Yannick Kilcher Discord


HuggingFace Discord


GPU MODE Discord


Latent Space Discord


Nous Research AI Discord


Moonshot AI (Kimi K-2) Discord


Eleuther Discord


tinygrad (George Hotz) Discord


DSPy Discord


Modular (Mojo 🔥) Discord


Manus.im Discord Discord


MCP Contributors (Official) Discord


aider (Paul Gauthier) Discord


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

BASI Jailbreaking ▷ #general (1190 messages🔥🔥🔥):

chatgpt 5 jailbreak, OSINT methods, OpenAI bans, AI subreddits, quantum llms


BASI Jailbreaking ▷ #jailbreaking (250 messages🔥🔥):

Gemini 3 Jailbreak, Claude Jailbreak, ChatGPT 5.2 Jailbreak, Tesavek Janus JB, Nano Banana Jailbreak


BASI Jailbreaking ▷ #redteaming (17 messages🔥):

Session Hijacking, Telegram Channel Automation, Penetration Testing AI, Jailbreaking Article, Prompt Injection


LMArena ▷ #general (1177 messages🔥🔥🔥):

GPT 5.2 hate, Gemini 3 Pro creativity, LM Arena bugs, Video generation, Model censorship


LMArena ▷ #announcements (1 messages):

GLM-4.6v, Text Arena, Vision Arena


Unsloth AI (Daniel Han) ▷ #general (893 messages🔥🔥🔥):

VRAM usage with packing, Padding Free Update, Data-Driven Accuracy, Layered Learning Rates, Base Model Autocomplete


Unsloth AI (Daniel Han) ▷ #introduce-yourself (6 messages):

LLM Training, Unsloth AI, DGX Spark


Unsloth AI (Daniel Han) ▷ #off-topic (1273 messages🔥🔥🔥):

iOS 26, GPU upgrade for christmas, SambaNova AI chip startup, DPO Training discussion


Unsloth AI (Daniel Han) ▷ #help (335 messages🔥🔥):

Multi GPU Training with Unsloth, FP8 Reinforcement Learning outdated, 4090 and load_in_fp8 support, Gradient Checkpointing Disable, GSPO run imploded


Unsloth AI (Daniel Han) ▷ #research (80 messages🔥🔥):

Misalignment Research, Subliminal Misalignment, DPO Experiment, Reasoning Traces, Adult Language Learning


Cursor Community ▷ #general (918 messages🔥🔥🔥):

Vercel Publishing, Cursor Revert Changes Bug, Agentic-Coding IDEs / CLIs, Cursor Usage Limits, GPT Business Plans


Perplexity AI ▷ #general (894 messages🔥🔥🔥):

Qwen iPhone app availability, Image editing models, Laptop vs iPad for typing, Outputting Perplexity answers to MD files, Perplexity billing issues


Perplexity AI ▷ #sharing (1 messages):

photon_13: https://amannirala.com/blog/mcp-over-engineering-layers-of-abstraction/


Perplexity AI ▷ #pplx-api (1 messages):

billionthug: Yeah


LM Studio ▷ #general (656 messages🔥🔥🔥):

LLM Safety Policies, Microcontroller Code Struggles, Brave API Issues, Exa.ai MCP, Qwen3 Coder


LM Studio ▷ #hardware-discussion (154 messages🔥🔥):

Unreal Engine Games Assistance, GPU power cables, DDR5 RAM price increase, MyLifeBits Project, ZFS appeal and kernel downgrading


OpenRouter ▷ #announcements (2 messages):

Broadcast, Observability, Langfuse, LangSmith, Weave


OpenRouter ▷ #general (194 messages🔥🔥):

Z.AI Video Input, Nano Banana Pro settings, DeepSeek V3, Droid Model, BYOK bypass


OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models


OpenRouter ▷ #discussion (96 messages🔥🔥):

Intel Acquires SambaNova, Databricks CEO chip company seed round, Minecraft LLM server, Gemini 3 reasoning tokens, Kimi Delta Attention


Yannick Kilcher ▷ #general (204 messages🔥🔥):

Nvidia Triton Scaling, Deepseek R2, Predictive Coding, Bayesian Program Learning, Flow Matching


Yannick Kilcher ▷ #paper-discussion (11 messages🔥):

DeepSeek 3.2, Paper Presentation Reschedule


Yannick Kilcher ▷ #agents (5 messages):

Schmidhuber AI Agents, MLST interview, Exploration vs Exploitation


Yannick Kilcher ▷ #ml-news (44 messages🔥):

Samsung DDR5 vs HBM, China's Collapse, Fragile Chip Supply, ChinaXiv Research


HuggingFace ▷ #general (170 messages🔥🔥):

Spam DMs, Sparse Cores, GUI fine-tuning software, NVIDIA Triton Server Scaling, AI model size


HuggingFace ▷ #i-made-this (15 messages🔥):

Neurosymbolic AI Project, HF Wrapped 2025, Madlab GUI Finetuning Toolkit, Text-to-Speech Models in 2025, Comment Works


HuggingFace ▷ #gradio-announcements (2 messages):

MCP 1st Birthday Hackathon Winners, MCP Hackathon Certificates, Anthropic Awards, Modal Innovation Award, LlamaIndex Award


HuggingFace ▷ #agents-course (11 messages🔥):

Course question space deletion, API issues, Agent chunk relevance and LLM, Agents course assistance, Smol course future


GPU MODE ▷ #general (12 messages🔥):

CUDA server, Tiny TPU, Hip Kittens Paper, Paper Reading Group


GPU MODE ▷ #triton-gluon (3 messages):

TritonForge, MXFP4 Emulation on SM86, Data Center GPU Prioritization


GPU MODE ▷ #cuda (4 messages):

Tensor Core Optimization, LDSM instruction pipelining, Asynchronous Memory Copy, SMEM data loading strategies


GPU MODE ▷ #cool-links (8 messages🔥):

NVIDIA's Blackwell Architecture, girl.surgery bad paper, NVIDIA Acquires SchedMD, ldmatrix.x4


GPU MODE ▷ #jobs (2 messages):

Red Hat AI hiring, smallest.ai hiring


GPU MODE ▷ #beginner (8 messages🔥):

LLM inference, VLLM internals, GPU kernel engineering, CUDA experience


GPU MODE ▷ #self-promotion (3 messages):

Huggingface Nanotron, Qwen3-omni viz tool


GPU MODE ▷ #submissions (33 messages🔥):

NVIDIA Performance, GEMM Leaderboard


GPU MODE ▷ #cutlass (10 messages🔥):

smem swizzling, atom permutation on K axis, tiled MMA, Cute DSL python version


GPU MODE ▷ #teenygrad (1 messages):

LambdaLabs Research Grant, teenygrad H100 hours, j4orz.ai/sitp textbook


GPU MODE ▷ #multi-gpu (1 messages):

DMA, ML training, ML inference, FiCCO schedules, GPU DMA engines


GPU MODE ▷ #nvidia-competition (72 messages🔥🔥):

Submission failing despite successful local build, GPU performance inconsistencies, PTX code errors, Utilizing 2SMs when M<256, Cute-dsl NCU line number issues


GPU MODE ▷ #robotics-vla (1 messages):

Planner model, Subtask decomposition, Vision-Language-Action Models, LITEN paper


Latent Space ▷ #ai-general-chat (142 messages🔥🔥):

Museum of Science Decline, AI-Generated Stock Art, Open-Source Git Replication, Claude Model Removed from Cursor, OpenAI document processing infrastructure Leaked


Latent Space ▷ #genmedia-creative-ai (14 messages🔥):

New Twitter Post, Nitter Link Errors, Missing Content


Nous Research AI ▷ #general (82 messages🔥🔥):

Derrida and Baudrillard on AI, Oracle's AI Strategy, Nvidia's Open Source Support, Local LLMs gaining traction, Nvidia's CUDA bet


Nous Research AI ▷ #research-papers (32 messages🔥):

Grok Coincidence, RL Optimizers, Byte Level LLMs


Nous Research AI ▷ #interesting-links (2 messages):

Embeddings, History of embeddings, Embeddings in Modern AI


Nous Research AI ▷ #research-papers (32 messages🔥):

Grok coincidence, RL optimizers, Byte LLMs


Moonshot AI (Kimi K-2) ▷ #general-chat (87 messages🔥🔥):

Kimi slides API, Local Kimi K2 on NPU, Kimi memory feature sync, Kimi K2 tokenizer endpoint, Kimi Android update with memory


Eleuther ▷ #general (30 messages🔥):

PyTorch LLM on Kaggle TPU VMs, Scaling NVIDIA Triton Server, NSF initiative, LLM assistance in ML papers, Algoverse Mentors


Eleuther ▷ #research (17 messages🔥):

Karpathy’s 2025 'What-If' Fine-Tune Experiment, Part-time PhD in CS focusing on AI Agents, Muon's effectiveness reason, Training with weights natively in 8-bit, GANs


Eleuther ▷ #interpretability-general (9 messages🔥):

Superweight Ablation, Orthogonal Repair, Neuron-Specific Features, High Dimensional Orthogonality, Extreme Weights Importance


tinygrad (George Hotz) ▷ #general (50 messages🔥):

Linux 6.19 char misc, tinygrad meeting #100, Llama 405b, Python Speed, Gradient Checkpointing


DSPy ▷ #show-and-tell (34 messages🔥):

ReasoningLayer.ai launch, Neurosymbolic AI, DSPy GEPA, uv tool install slowness, MCP mode


DSPy ▷ #general (14 messages🔥):

BAMLAdapter, DSPy Skills, Field Specific Instructions


Modular (Mojo 🔥) ▷ #mojo (32 messages🔥):

Variable declaration scope, Mimicking const, C++ lambda syntax, Julia v. Mojo, LLM modular book error


Manus.im Discord ▷ #general (22 messages🔥):

Manus Auth redirect bug, Gemini 3.0 vs Manus, Firebase, Antigravity, and Google AI Studio, Conversation Mode with Wide Research, Manus 1.6 release


MCP Contributors (Official) ▷ #general (12 messages🔥):

C++ MCP, Dangerous tool flag, MCP Server Publication Error, Response Annotations, Tool Resolution Proposal


aider (Paul Gauthier) ▷ #general (6 messages):

Aider OpenAIException, Aider active development


aider (Paul Gauthier) ▷ #questions-and-tips (4 messages):

aider gpt-5, litellm errors, aider model config