All tags
Model: "r1-1776"
not much happened today
grok-3 deepseek-r1 siglip-2 o3-mini-high r1-1776 llamba-1b llamba-3b llamba-8b llama-3 alphamaze audiobox-aesthetics xai nvidia google-deepmind anthropic openai bytedance ollama meta-ai-fair benchmarking model-releases performance reasoning multimodality semantic-understanding ocr multilinguality model-distillation recurrent-neural-networks visual-reasoning audio-processing scaling01 iscienceluvr philschmid arankomatsuzaki reach_vb mervenoyann wightmanr lmarena_ai ollama akhaliq
Grok-3, a new family of LLMs from xAI using 200,000 Nvidia H100 GPUs for advanced reasoning, outperforms models from Google, Anthropic, and OpenAI on math, science, and coding benchmarks. DeepSeek-R1 from ByteDance Research achieves top accuracy on the challenging SuperGPQA dataset. SigLIP 2 from GoogleDeepMind improves semantic understanding and OCR with flexible resolutions and multilingual capabilities, available on HuggingFace. OpenAI's o3-mini-high ranks #1 in coding and math prompts. Perplexity's R1 1776, a post-trained version of DeepSeek R1, is available on Ollama. The Llamba family distills Llama-3.x into efficient recurrent models with higher throughput. AlphaMaze combines DeepSeek R1 with GRPO for visual reasoning on ARC-AGI puzzles. Audiobox Aesthetics from Meta AI offers unified quality assessment for audio. The community notes that Grok 3's compute increase yields only modest performance gains.
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
deepseek-native-sparse-attention r1-1776 paligemma-2-mix muse baichuan-m1-14b stripedhyena-2 huggingface deepseek perplexity-ai google-deepmind microsoft baichuan stripedhyena gpu-training scaling multimodality vision model-training foundation-models medical-llm genome-modeling robotic-manipulation interactive-content eliebakouch nouamanetazi lvwerra thom-wolf proftomyeh alex-wang aravsrinivas _akhaliq _philschmid mervenoyann reach_vb arankomatsuzaki maximelabonne
Huggingface released "The Ultra-Scale Playbook: Training LLMs on GPU Clusters," an interactive blogpost based on 4000 scaling experiments on up to 512 GPUs, providing detailed insights into modern GPU training strategies. DeepSeek introduced the Native Sparse Attention (NSA) model, gaining significant community attention, while Perplexity AI launched R1-1776, an uncensored and unbiased version of DeepSeek's R1 model. Google DeepMind unveiled PaliGemma 2 Mix, a multi-task vision-language model available in 3B, 10B, and 28B sizes. Microsoft introduced Muse, a generative AI model trained on the game Bleeding Edge, and presented Magma, a foundation model for multimodal AI agents excelling in UI navigation and robotic manipulation. Baichuan-M1-14B was announced as a state-of-the-art medical LLM trained on 20T tokens, and a fully open-source 40B genome modeling model using StripedHyena 2 architecture was also released. "Making your own gaming experience is coming sooner than you'd think," noted in relation to Muse.