All tags
Person: "ctnzr"
NVIDIA Nemotron 3: hybrid Mamba-Transformer completely open source models from 30B to 500B
nemotron-3-nano qwen3-30b-a3b-base nvidia huggingface togethercompute baseten vllm llamaindex hybrid-architecture mixture-of-experts reinforcement-learning long-context model-release open-source-models model-training model-optimization benchmarking agent-training ctnzr andrew_n_carr awnihannun
NVIDIA has released Nemotron 3 Nano, a fully open-source hybrid Mamba-Transformer Mixture-of-Experts (MoE) model with a 30B parameter size and a 1 million token context window. It includes open weights, training recipes, datasets, and an RL environment suite called NeMo Gym, supporting commercial use under the NVIDIA Open Model License. The model achieves state-of-the-art results on benchmarks like SWE-Bench and Artificial Analysis Intelligence Index, outperforming Qwen3-30B A3B. Ecosystem support is immediate with integrations into inference stacks like vLLM, llama.cpp, and Baseten. Upcoming larger models, Nemotron Super and Ultra, will feature NVFP4 pretraining and LatentMoE routing to optimize compute. This release marks a significant milestone for open-source American AI with comprehensive open assets and advanced hybrid architecture.
not much happened today
gemma-3-270m canary-1b parakeet-tdt-0.6b nemotron-nano-v2 qwen-image-edit dino-v3 nvidia alibaba tencent meta-ai-fair ibm datology synthetic-data multilingual-asr self-supervised-learning vision model-efficiency training-data data-augmentation model-speedup domain-transfer demishassabis adrgrondin rasbt reach_vb ctnzr clementdelangue natolambert _akhaliq itspaulai mervenoyann xenovacom tomaarsen pratyushmaini code_star leavittron k_schuerholt giffmana
Gemma 3 270M, an ultra-small model optimized for edge and mobile use, was released and is gaining adoption. NVIDIA launched two open multilingual ASR models, Canary 1B and Parakeet-TDT 0.6B, trained on 1 million hours of data with CC-BY licensing, plus the efficient Nemotron-Nano v2 9B model with significant speedups. Alibaba's Qwen-Image-Edit offers bilingual text editing and semantic image transformations. Tencent Hunyuan introduced a controllable game-world video generator trained on over 1 million gameplay recordings. Meta's DINOv3 presents a scalable self-supervised vision backbone with strong domain transfer capabilities. IBM quietly released efficient English embedding models under a commercial-friendly license. The BeyondWeb synthetic data paper shows significant training speed and performance gains over prior datasets. Analysis of HRM architecture suggests performance improvements largely stem from data augmentation and scaffolding rather than novel architecture. "Models and datasets are openly licensed and available on Hugging Face."
not much happened today
deepseek-r1-0528 pali-gemma-2 gemma-3 shieldgemma-2 txgemma gemma-3-qat gemma-3n-preview medgemma dolphingemma signgemma claude-4 opus-4 claude-sonnet-4 codestral-embed bagel qwen nemotron-cortexa gemini-2.5-pro deepseek-ai huggingface gemma claude bytedance qwen nemotron sakana-ai-labs benchmarking model-releases multimodality code-generation model-performance long-context reinforcement-learning model-optimization open-source yuchenj_uw _akhaliq clementdelangue osanseviero alexalbert__ guillaumelample theturingpost lmarena_ai epochairesearch scaling01 nrehiew_ ctnzr
DeepSeek R1 v2 model released with availability on Hugging Face and inference partners. The Gemma model family continues prolific development including PaliGemma 2, Gemma 3, and others. Claude 4 and its variants like Opus 4 and Claude Sonnet 4 show top benchmark performance, including new SOTA on ARC-AGI-2 and WebDev Arena. Codestral Embed introduces a 3072-dimensional code embedder. BAGEL, an open-source multimodal model by ByteDance, supports reading, reasoning, drawing, and editing with long mixed contexts. Benchmarking highlights include Nemotron-CORTEXA topping SWEBench and Gemini 2.5 Pro performing on VideoGameBench. Discussions on random rewards effectiveness focus on Qwen models. "Opus 4 NEW SOTA ON ARC-AGI-2. It's happening - I was right" and "Claude 4 launch has dev moving at a different pace" reflect excitement in the community.
Hybrid SSM/Transformers > Pure SSMs/Pure Transformers
mamba-2-hybrid gpt-4 qwen-72b table-llava-7b nvidia lamini-ai sakana-ai luma-labs mixture-of-experts benchmarking fine-tuning multimodality text-to-video model-performance memory-optimization preference-optimization video-understanding multimodal-tables bryan-catanzaro bindureddy ylecun ctnzr corbtt realsharonzhou andrew-n-carr karpathy _akhaliq omarsar0
NVIDIA's Bryan Catanzaro highlights a new paper on Mamba models, showing that mixing Mamba and Transformer blocks outperforms either alone, with optimal attention below 20%. Mixture-of-Agents (MoA) architecture improves LLM generation quality, scoring 65.1% on AlpacaEval 2.0 versus GPT-4 Omni's 57.5%. The LiveBench AI benchmark evaluates reasoning, coding, writing, and data analysis. A hybrid Mamba-2-Hybrid model with 7% attention surpasses a Transformer on MMLU accuracy, jumping from 50% to 53.6%. GPT-4 performs better at temperature=1. Qwen 72B leads open-source models on LiveBench AI. LaminiAI Memory Tuning achieves 95% accuracy on a SQL agent task, improving over instruction fine-tuning. Sakana AI Lab uses evolutionary strategies for preference optimization. Luma Labs Dream Machine demonstrates advanced text-to-video generation. The MMWorld benchmark evaluates multimodal video understanding, and Table-LLaVa 7B competes with GPT-4V on multimodal table tasks.