All tags
Model: "llama-3.1"
DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level
deepseek-r1 deepseek-v3 qwen-2.5 llama-3.1 llama-3.3-70b deepseek ollama qwen llama reinforcement-learning fine-tuning model-distillation model-optimization reasoning reward-models multi-response-sampling model-training
DeepSeek released DeepSeek R1, a significant upgrade over DeepSeek V3 from just three weeks prior, featuring 8 models including full-size 671B MoE models and multiple distillations from Qwen 2.5 and Llama 3.1/3.3. The models are MIT licensed, allowing finetuning and distillation. Pricing is notably cheaper than o1 by 27x-50x. The training process used GRPO (reward for correctness and style outcomes) without relying on PRM, MCTS, or reward models, focusing on reasoning improvements through reinforcement learning. Distilled models can run on Ollama and show strong capabilities like writing Manim code. The release emphasizes advances in reinforcement-learning, fine-tuning, and model-distillation with a novel RL framework from DeepSeekMath.
Claude 3.5 Sonnet (New) gets Computer Use
claude-3.5-sonnet claude-3.5-haiku llama-3.1 nemotron anthropic zep nvidia coding benchmarks computer-use vision multimodal-memory model-updates ai-integration philschmid swyx
Anthropic announced new Claude 3.5 models: 3.5 Sonnet and 3.5 Haiku, improving coding performance significantly, with Sonnet topping several coding benchmarks like Aider and Vectara. The new Computer Use API enables controlling computers via vision, scoring notably higher than other AI systems, showcasing progress in AI-driven computer interaction. Zep launched a cloud edition for AI agents memory management, highlighting challenges in multimodal memory. The update also mentions Llama 3.1 and Nemotron models from NVIDIA.
Did Nvidia's Nemotron 70B train on test?
nemotron-70b llama-3.1-70b llama-3.1 ministral-3b ministral-8b gpt-4o claude-3.5-sonnet claude-3.5 nvidia mistral-ai hugging-face zep benchmarking reinforcement-learning reward-models temporal-knowledge-graphs memory-layers context-windows model-releases open-source reach_vb philschmid swyx
NVIDIA's Nemotron-70B model has drawn scrutiny despite strong benchmark performances on Arena Hard, AlpacaEval, and MT-Bench, with some standard benchmarks like GPQA and MMLU Pro showing no improvement over the base Llama-3.1-70B. The new HelpSteer2-Preference dataset improves some benchmarks with minimal losses elsewhere. Meanwhile, Mistral released Ministral 3B and 8B models featuring 128k context length and outperforming Llama-3.1 and GPT-4o on various benchmarks under the Mistral Commercial License. NVIDIA's Nemotron 70B also surpasses GPT-4o and Claude-3.5-Sonnet on key benchmarks using RLHF (REINFORCE) training. Additionally, Zep introduced Graphiti, an open-source temporal knowledge graph memory layer for AI agents, built on Neo4j.
Cerebras Inference: Faster, Better, AND Cheaper
llama-3.1-8b llama-3.1-70b gemini-1.5-flash gemini-1.5-pro cogvideox-5b mamba-2 rene-1.3b llama-3.1 gemini-1.5 claude groq cerebras cursor google-deepmind anthropic inference-speed wafer-scale-chips prompt-caching model-merging benchmarking open-source-models code-editing model-optimization jeremyphoward sam-altman nat-friedman daniel-gross swyx
Groq led early 2024 with superfast LLM inference speeds, achieving ~450 tokens/sec for Mixtral 8x7B and 240 tokens/sec for Llama 2 70B. Cursor introduced a specialized code edit model hitting 1000 tokens/sec. Now, Cerebras claims the fastest inference with their wafer-scale chips, running Llama3.1-8b at 1800 tokens/sec and Llama3.1-70B at 450 tokens/sec at full precision, with competitive pricing and a generous free tier. Google's Gemini 1.5 models showed significant benchmark improvements, especially Gemini-1.5-Flash and Gemini-1.5-Pro. New open-source models like CogVideoX-5B and Mamba-2 (Rene 1.3B) were released, optimized for consumer hardware. Anthropic's Claude now supports prompt caching, improving speed and cost efficiency. "Cerebras Inference runs Llama3.1 20x faster than GPU solutions at 1/5 the price."
Mistral Large 2 + RIP Mistral 7B, 8x7B, 8x22B
mistral-large-2 mistral-nemo-12b llama-3.1-8b llama-3.1-70b llama-3.1 llama-3-405b yi-34b-200k gpt-4o mistral-ai meta-ai-fair groq togethercompute code-generation math function-calling reasoning context-windows model-deprecation pretraining posttraining benchmarking
Mistral Large 2 introduces 123B parameters with Open Weights under a Research License, focusing on code generation, math performance, and a massive 128k context window, improving over Mistral Large 1's 32k context. It claims better function calling capabilities than GPT-4o and enhanced reasoning. Meanwhile, Meta officially released Llama-3.1 models including Llama-3.1-70B and Llama-3.1-8B with detailed pre-training and post-training insights. The Llama-3.1 8B model's 128k context performance was found underwhelming compared to Mistral Nemo and Yi 34B 200K. Mistral is deprecating older Apache open-source models, focusing on Large 2 and Mistral Nemo 12B. The news also highlights community discussions and benchmarking comparisons.