All tags
Person: "ctnzr"
not much happened today
deepseek-r1-0528 pali-gemma-2 gemma-3 shieldgemma-2 txgemma gemma-3-qat gemma-3n-preview medgemma dolphingemma signgemma claude-4 opus-4 claude-sonnet-4 codestral-embed bagel qwen nemotron-cortexa gemini-2.5-pro deepseek-ai huggingface gemma claude bytedance qwen nemotron sakana-ai-labs benchmarking model-releases multimodality code-generation model-performance long-context reinforcement-learning model-optimization open-source yuchenj_uw _akhaliq clementdelangue osanseviero alexalbert__ guillaumelample theturingpost lmarena_ai epochairesearch scaling01 nrehiew_ ctnzr
DeepSeek R1 v2 model released with availability on Hugging Face and inference partners. The Gemma model family continues prolific development including PaliGemma 2, Gemma 3, and others. Claude 4 and its variants like Opus 4 and Claude Sonnet 4 show top benchmark performance, including new SOTA on ARC-AGI-2 and WebDev Arena. Codestral Embed introduces a 3072-dimensional code embedder. BAGEL, an open-source multimodal model by ByteDance, supports reading, reasoning, drawing, and editing with long mixed contexts. Benchmarking highlights include Nemotron-CORTEXA topping SWEBench and Gemini 2.5 Pro performing on VideoGameBench. Discussions on random rewards effectiveness focus on Qwen models. "Opus 4 NEW SOTA ON ARC-AGI-2. It's happening - I was right" and "Claude 4 launch has dev moving at a different pace" reflect excitement in the community.
Hybrid SSM/Transformers > Pure SSMs/Pure Transformers
mamba-2-hybrid gpt-4 qwen-72b table-llava-7b nvidia lamini-ai sakana-ai luma-labs mixture-of-experts benchmarking fine-tuning multimodality text-to-video model-performance memory-optimization preference-optimization video-understanding multimodal-tables bryan-catanzaro bindureddy ylecun ctnzr corbtt realsharonzhou andrew-n-carr karpathy _akhaliq omarsar0
NVIDIA's Bryan Catanzaro highlights a new paper on Mamba models, showing that mixing Mamba and Transformer blocks outperforms either alone, with optimal attention below 20%. Mixture-of-Agents (MoA) architecture improves LLM generation quality, scoring 65.1% on AlpacaEval 2.0 versus GPT-4 Omni's 57.5%. The LiveBench AI benchmark evaluates reasoning, coding, writing, and data analysis. A hybrid Mamba-2-Hybrid model with 7% attention surpasses a Transformer on MMLU accuracy, jumping from 50% to 53.6%. GPT-4 performs better at temperature=1. Qwen 72B leads open-source models on LiveBench AI. LaminiAI Memory Tuning achieves 95% accuracy on a SQL agent task, improving over instruction fine-tuning. Sakana AI Lab uses evolutionary strategies for preference optimization. Luma Labs Dream Machine demonstrates advanced text-to-video generation. The MMWorld benchmark evaluates multimodal video understanding, and Table-LLaVa 7B competes with GPT-4V on multimodal table tasks.