All tags
Person: "hardmaru"
TinyZero: Reproduce DeepSeek R1-Zero for $30
deepseek-r1 qwen o1 claude-3-sonnet claude-3 prime ppo grpo llama-stack deepseek berkeley hugging-face meta-ai-fair openai deeplearningai reinforcement-learning fine-tuning chain-of-thought multi-modal-benchmark memory-management model-training open-source agentic-workflow-automation model-performance jiayi-pan saranormous reach_vb lmarena_ai nearcyan omarsar0 philschmid hardmaru awnihannun winglian
DeepSeek Mania continues to reshape the frontier model landscape with Jiayi Pan from Berkeley reproducing the OTHER result from the DeepSeek R1 paper, R1-Zero, in a cost-effective Qwen model fine-tune for two math tasks. A key finding is a lower bound to the distillation effect at 1.5B parameters, with RLCoT reasoning emerging as an intrinsic property. Various RL techniques like PPO, DeepSeek's GRPO, or PRIME show similar outcomes, and starting from an Instruct model speeds convergence. The Humanity’s Last Exam (HLE) Benchmark introduces a challenging multi-modal test with 3,000 expert-level questions across 100+ subjects, where models perform below 10%, with DeepSeek-R1 achieving 9.4%. DeepSeek-R1 excels in chain-of-thought reasoning, outperforming models like o1 while being 20x cheaper and MIT licensed. The WebDev Arena Leaderboard ranks DeepSeek-R1 #2 in technical domains and #1 under Style Control, closing in on Claude 3.5 Sonnet. OpenAI's Operator is deployed to 100% of Pro users in the US, enabling tasks like ordering meals and booking reservations, and functions as a research assistant for AI paper searches and summaries. Hugging Face announces a leadership change after significant growth, while Meta AI releases the first stable version of Llama Stack with streamlined upgrades and automated verification. DeepSeek-R1's open-source success is celebrated, and technical challenges like memory management on macOS 15+ are addressed with residency sets in MLX for stability.
Titans: Learning to Memorize at Test Time
minimax-01 gpt-4o claude-3.5-sonnet internlm3-8b-instruct transformer2 google meta-ai-fair openai anthropic langchain long-context mixture-of-experts self-adaptive-models prompt-injection agent-authentication diffusion-models zero-trust-architecture continuous-adaptation vision agentic-systems omarsar0 hwchase17 abacaj hardmaru rez0__ bindureddy akhaliq saranormous
Google released a new paper on "Neural Memory" integrating persistent memory directly into transformer architectures at test time, showing promising long-context utilization. MiniMax-01 by @omarsar0 features a 4 million token context window with 456B parameters and 32 experts, outperforming GPT-4o and Claude-3.5-Sonnet. InternLM3-8B-Instruct is an open-source model trained on 4 trillion tokens with state-of-the-art results. Transformer² introduces self-adaptive LLMs that dynamically adjust weights for continuous adaptation. Advances in AI security highlight the need for agent authentication, prompt injection defenses, and zero-trust architectures. Tools like Micro Diffusion enable budget-friendly diffusion model training, while LeagueGraph and Agent Recipes support open-source social media agents.