All tags
Company: "nemotron"
not much happened today
deepseek-r1-0528 pali-gemma-2 gemma-3 shieldgemma-2 txgemma gemma-3-qat gemma-3n-preview medgemma dolphingemma signgemma claude-4 opus-4 claude-sonnet-4 codestral-embed bagel qwen nemotron-cortexa gemini-2.5-pro deepseek-ai huggingface gemma claude bytedance qwen nemotron sakana-ai-labs benchmarking model-releases multimodality code-generation model-performance long-context reinforcement-learning model-optimization open-source yuchenj_uw _akhaliq clementdelangue osanseviero alexalbert__ guillaumelample theturingpost lmarena_ai epochairesearch scaling01 nrehiew_ ctnzr
DeepSeek R1 v2 model released with availability on Hugging Face and inference partners. The Gemma model family continues prolific development including PaliGemma 2, Gemma 3, and others. Claude 4 and its variants like Opus 4 and Claude Sonnet 4 show top benchmark performance, including new SOTA on ARC-AGI-2 and WebDev Arena. Codestral Embed introduces a 3072-dimensional code embedder. BAGEL, an open-source multimodal model by ByteDance, supports reading, reasoning, drawing, and editing with long mixed contexts. Benchmarking highlights include Nemotron-CORTEXA topping SWEBench and Gemini 2.5 Pro performing on VideoGameBench. Discussions on random rewards effectiveness focus on Qwen models. "Opus 4 NEW SOTA ON ARC-AGI-2. It's happening - I was right" and "Claude 4 launch has dev moving at a different pace" reflect excitement in the community.
Claude 3.5 Sonnet (New) gets Computer Use
claude-3.5-sonnet claude-3.5-haiku llama-3.1 nemotron anthropic zep nvidia coding benchmarks computer-use vision multimodal-memory model-updates ai-integration philschmid swyx
Anthropic announced new Claude 3.5 models: 3.5 Sonnet and 3.5 Haiku, improving coding performance significantly, with Sonnet topping several coding benchmarks like Aider and Vectara. The new Computer Use API enables controlling computers via vision, scoring notably higher than other AI systems, showcasing progress in AI-driven computer interaction. Zep launched a cloud edition for AI agents memory management, highlighting challenges in multimodal memory. The update also mentions Llama 3.1 and Nemotron models from NVIDIA.
Gemini launches context caching... or does it?
nemotron llama-3-70b chameleon-7b chameleon-34b gemini-1.5-pro deepseek-coder-v2 gpt-4-turbo claude-3-opus gemini-1.5-pro nvidia meta-ai-fair google deepseek hugging-face context-caching model-performance fine-tuning reinforcement-learning group-relative-policy-optimization large-context model-training coding model-release rohanpaul_ai _philschmid aman-sanger
Nvidia's Nemotron ranks #1 open model on LMsys and #11 overall, surpassing Llama-3-70b. Meta AI released Chameleon 7B/34B models after further post-training. Google's Gemini introduced context caching, offering a cost-efficient middle ground between RAG and finetuning, with a minimum input token count of 33k and no upper limit on cache duration. DeepSeek launched DeepSeek-Coder-V2, a 236B parameter model outperforming GPT-4 Turbo, Claude-3-Opus, and Gemini-1.5-Pro in coding tasks, supporting 338 programming languages and extending context length to 128K. It was trained on 6 trillion tokens using the Group Relative Policy Optimization (GRPO) algorithm and is available on Hugging Face with a commercial license. These developments highlight advances in model performance, context caching, and large-scale coding models.