Frozen AI News archive

TinyZero: Reproduce DeepSeek R1-Zero for $30

**DeepSeek Mania** continues to reshape the frontier model landscape with Jiayi Pan from Berkeley reproducing the *OTHER* result from the DeepSeek R1 paper, R1-Zero, in a cost-effective Qwen model fine-tune for two math tasks. A key finding is a lower bound to the distillation effect at **1.5B parameters**, with RLCoT reasoning emerging as an intrinsic property. Various RL techniques like PPO, DeepSeek's GRPO, or PRIME show similar outcomes, and starting from an Instruct model speeds convergence. The **Humanity’s Last Exam (HLE) Benchmark** introduces a challenging multi-modal test with **3,000 expert-level questions** across **100+ subjects**, where models perform below **10%**, with **DeepSeek-R1** achieving **9.4%**. DeepSeek-R1 excels in chain-of-thought reasoning, outperforming models like **o1** while being **20x cheaper** and MIT licensed. The **WebDev Arena Leaderboard** ranks DeepSeek-R1 #2 in technical domains and #1 under Style Control, closing in on **Claude 3.5 Sonnet**. OpenAI's **Operator** is deployed to 100% of Pro users in the US, enabling tasks like ordering meals and booking reservations, and functions as a research assistant for AI paper searches and summaries. Hugging Face announces a leadership change after significant growth, while Meta AI releases the first stable version of **Llama Stack** with streamlined upgrades and automated verification. DeepSeek-R1's open-source success is celebrated, and technical challenges like memory management on macOS 15+ are addressed with residency sets in MLX for stability.

Canonical issue URL

AI News for 1/23/2025-1/24/2025. We checked 7 subreddits, 433 Twitters and 34 Discords (225 channels, and 3926 messages) for you. Estimated reading time saved (at 200wpm): 409 minutes. You can now tag @smol_ai for AINews discussions!

DeepSeek Mania continues to realign the frontier model landscape. Jiayi Pan from Berkeley reproduced the OTHER result from the DeepSeek R1 paper, R1-Zero, in a cheap Qwen model finetune, for two math tasks (so not a general result at all, but a nice proof of concept).

image.png

Full code and WandB logs available.

The most interesting new finding is that there is a lower bound to the distillation effect we covered yesterday - 1.5B is as low as you go. RLCoT reasoning is itself an emergent property.

image.png

More findings:


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Evaluations and Benchmarks

AI Agents and Applications

Company News and Updates

Technical Challenges and Solutions

Academic and Research Progress

Memes/Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. DeepSeek-R1 Success and Community Excitement

Theme 2. Benchmarking Sub-24GB AI Models

Theme 3. Expectations for Llama 4 as Next SOTA

Theme 4. SmolVLM 256M: A Leap in Local Multimodal Models

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Yann LeCun and the Deepseek Open Source Debate

Theme 2. OpenAI's Stargate Initiative and Political Associations

Theme 3. ChatGPT's Operator Role and Misuse Attempts

Theme 4. Rapid AI Advancements in SWE-Bench Performance


AI Discord Recap

A summary of Summaries of Summaries

Gemini 2.0 Flash Thinking (gemini-2.0-flash-thinking-exp)

Theme 1. DeepSeek R1 Dominates Discussions: Performance and Open Source Acclaim

Theme 2. Cursor and Codeium IDEs: Updates, Outages, and User Growing Pains

Theme 3. Unsloth AI: Fine-Tuning, Datasets, and Performance Trade-offs

Theme 4. Model Context Protocol (MCP): Integration and Personalization Take Center Stage

Theme 5. Hardware Horizons: RTX 5090 Disappoints, VRAM Limits Llama-3.3

DeepSeek R1 (deepseek-reasoner)

Theme 1. DeepSeek R1 Dominates Coding & Reasoning Tasks

Theme 2. Fine-Tuning & Hardware Hacks for Efficiency

Theme 3. IDE & Tooling Growing Pains

Theme 4. Regulatory Heat & Security Headaches

Theme 5. Novel Training & Inference Tricks

o1-2024-12-17

Theme 1. DeepSeek R1 Rocks the Benchmarks

Theme 2. Creative Model Fine-Tuning & Research

Theme 3. Tools & IDE Updates for AI Co-Dev

Theme 4. GPU & Policy Shakeups

Theme 5. Audio, Visual & Text Innovations


PART 1: High level Discord summaries

Cursor IDE Discord


Unsloth AI (Daniel Han) Discord


Codeium (Windsurf) Discord


OpenRouter (Alex Atallah) Discord


Latent Space Discord


Perplexity AI Discord


LM Studio Discord


aider (Paul Gauthier) Discord


Interconnects (Nathan Lambert) Discord


GPU MODE Discord


OpenAI Discord


Stackblitz (Bolt.new) Discord


Nous Research AI Discord


Yannick Kilcher Discord


Notebook LM Discord Discord


Nomic.ai (GPT4All) Discord


MCP (Glama) Discord


LlamaIndex Discord


Cohere Discord


Modular (Mojo 🔥) Discord


LAION Discord


Eleuther Discord


Stability.ai (Stable Diffusion) Discord


tinygrad (George Hotz) Discord


Torchtune Discord


LLM Agents (Berkeley MOOC) Discord


Axolotl AI Discord


DSPy Discord


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The OpenInterpreter Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Cursor IDE ▷ #general (604 messages🔥🔥🔥):

DeepSeek R1 vs. other models, Cursor functionality and issues, AI models and productivity, Updates on Cursor version 0.45.2, General discussion about AI in coding

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (347 messages🔥🔥):

Fine-tuning LLMs for language accuracy, Integration of Unsloth in Llama-Factory, Continued pretraining for language models, Posit computing implications, Evo model performance on nucleotide prediction

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (29 messages🔥):

Dual RTX 3090 setups, Training reasoning models repositories, Dolphin-R1 dataset creation, vLLM compatibility with Open-webui

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (107 messages🔥🔥):

Fine-tuning models, Resolving errors in model training, Utilizing chat templates, Exporting models from Colab, Dataset formatting for training

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (9 messages🔥):

LoHan Framework for LLM Fine-Tuning, NVMe Offloading Techniques, Flash Memory Utilization in LLMs

Links mentioned:


Codeium (Windsurf) ▷ #announcements (1 messages):

Windsurf 1.2.2 Release, Cascade Web Search, Lag Improvements, Memory System Enhancements

Link mentioned: Windsurf Editor Changelogs | Windsurf Editor and Codeium extensions: Latest updates and changes for the Windsurf Editor.


Codeium (Windsurf) ▷ #content (1 messages):

Web Search Feature, Demo Video Launch

Link mentioned: Tweet from Windsurf (@windsurf_ai): Just surfin' the web! 🏄


Codeium (Windsurf) ▷ #discussion (78 messages🔥🔥):

Windsurf Issues, Supercomplete Functionality, Account Registration Problems, C# Extension for Windsurf, Windsurf 1.2.2 Release

Links mentioned:


Codeium (Windsurf) ▷ #windsurf (299 messages🔥🔥):

Windsurf login issues, Windsurf updates and performance, Open Graph metadata in Vite, Input lag in Windsurf, Cascade service outages

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

DeepSeek R1 updates, DeepSeek provider outage


OpenRouter (Alex Atallah) ▷ #general (290 messages🔥🔥):

DeepSeek R1 Performance, Gemini API Access, BlackboxAI Concerns, Rate Limits and Key Usage, Provider Issues on OpenRouter

Links mentioned:


Latent Space ▷ #ai-general-chat (75 messages🔥🔥):

DeepSeek R1 Model, Fireworks Streaming Transcription Service, Braintrust AI Proxy, Perplexity Assistant, New OpenAI Features

Links mentioned:


Latent Space ▷ #ai-in-action-club (193 messages🔥🔥):

Model Context Protocol (MCP), MCP integration with tools, Obsidian MCP server, MCP capabilities and connection, MCP party planning

Links mentioned:


Perplexity AI ▷ #general (239 messages🔥🔥):

Perplexity Assistant Launch on iOS, Comparison of AI Models, User Experience Issues with Perplexity, Feedback on Assistant Features, Alternatives to Perplexity

Links mentioned:


Perplexity AI ▷ #sharing (8 messages🔥):

Latest Pop Culture, Action-Adventure Movies, Upcoming Tech Conferences, AI-Developed Drugs, Laravel Framework


Perplexity AI ▷ #pplx-api (6 messages):

API updates, Sonar API calls, API pricing and searches

Links mentioned:


LM Studio ▷ #general (88 messages🔥🔥):

LM Server Access, Network Settings Confusion, Vision Models for LLM Studio, Tool Use in LM Studio, Model Compatibility Issues

Links mentioned:


LM Studio ▷ #hardware-discussion (117 messages🔥🔥):

NVIDIA RTX 5090 performance, Llama-3.3 model requirements, AI hardware comparisons, GPU memory capacity, LM Studio performance on older hardware

Links mentioned:


aider (Paul Gauthier) ▷ #general (159 messages🔥🔥):

R1 Model Performance, DeepSeek API Concerns, Aider Benchmark Results, Using Different Models in Aider, New AI Tools and Developments

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (41 messages🔥):

Logging Practices in Python, Aider's Workflow and Context Management, Deepseek Model Performance, Managing Ignored Files in Git, Architect Mode in Aider


aider (Paul Gauthier) ▷ #links (4 messages):

Deleted Messages, Admin Actions


Interconnects (Nathan Lambert) ▷ #news (42 messages🔥):

Sky-T1-32B-Flash, DeepSeek-R1 performance, RLHF discussions, AI outlets and influencers

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (29 messages🔥):

DeepSeek Performance, OpenAI and Benchmark Comparisons, Chinese Work Attitude vs American Perceptions, Reasoning in Model Training, Cope Discussions

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (64 messages🔥🔥):

Discord Summarization Tools, DeepSeek Salaries, Applied NLP Reading List, AI Development, Tech Competition

Links mentioned:


Interconnects (Nathan Lambert) ▷ #memes (2 messages):

OpenAI's Job Automation Model, Claude's Mechanistic Interpretability, Test-Time Scaling Quotes

Links mentioned:


Interconnects (Nathan Lambert) ▷ #cv (4 messages):

Advanced NLP Course Update, Consequential Models in Multimodality, Audio Domain Applications of ViT and CLIP, LLaVA for Unified Embeddings


Interconnects (Nathan Lambert) ▷ #reads (28 messages🔥):

Interconnects subscription value, Operator AI agent feedback, ModernBERT insights, Challenges with Semianalysis, On-device intelligence discussions

Links mentioned:


Interconnects (Nathan Lambert) ▷ #retort-podcast (8 messages🔥):

Adobe Podcast Enhance Speech Tool, Audio Setup for Interviews, Quality of Audio vs. Magic Audio


Interconnects (Nathan Lambert) ▷ #policy (5 messages):

Presidential AI Action Order, AI Regulations Review, New AI Action Plan, Special Advisor for AI and Crypto, Free Market AI Development

Links mentioned:


GPU MODE ▷ #general (18 messages🔥):

Jailbreaking Models, MLOps Resources, Flash Infer Talk, Attention Methods, Flex Attention vs Differential Attention


GPU MODE ▷ #cuda (78 messages🔥🔥):

CUDA Toolkit 12.8 Release, Blackwell Architecture Features, TensorCore Instructions, FP8 and FP4 Support, Compatibility of sm_90a and sm_100a

Links mentioned:


GPU MODE ▷ #torch (22 messages🔥):

Async Computation with Custom CUDA Kernels, bfloat16 vs float32 Precision in PyTorch, Tensor Parallel Configurations in HF Transformers, Learning Rate Schedulers in PyTorch, Vision-Based Models Optimization

Links mentioned:


GPU MODE ▷ #announcements (1 messages):

Flash Infer, Deep Learning Techniques, Code Generation, Custom Kernels, Attention Patterns


GPU MODE ▷ #cool-links (1 messages):

cpeterson42: AI Infrastructure community on X: https://x.com/i/communities/1879760488256491834


GPU MODE ▷ #jobs (1 messages):

ComfyUI hiring, Machine Learning Engineers

Link mentioned: Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team


GPU MODE ▷ #beginner (4 messages):

DeepSpeed Integration, Hugging Face Accelerate Library, Throughput Comparison, Communication Overhead

Link mentioned: Accelerate: no description found


GPU MODE ▷ #lecture-qa (2 messages):

Parallel Prefix Sum in Distributed Systems, MPI_Scan in CUDA


GPU MODE ▷ #self-promotion (2 messages):

ComfyUI Meetup, DeepSeek R1 Model Performance

Links mentioned:


GPU MODE ▷ #thunderkittens (1 messages):

Zeroing Accumulators, ThunderKittens GitHub

Link mentioned: ThunderKittens/kernels/matmul/H100/matmul.cu at main · HazyResearch/ThunderKittens: Tile primitives for speedy kernels. Contribute to HazyResearch/ThunderKittens development by creating an account on GitHub.


GPU MODE ▷ #arc-agi-2 (20 messages🔥):

Polynomial Equations Addition, Maze Task Implementation, Refactoring Reasoning-Gym Structure, Dynamic Reward System, External Contributor Recognition

Links mentioned:


OpenAI ▷ #annnouncements (1 messages):

Canvas updates, HTML & React code rendering, ChatGPT desktop app rollout, Access tiers for new features


OpenAI ▷ #ai-discussions (131 messages🔥🔥):

Deepseek and R1, API Usage for Chatbots, AI Integrated IDEs, Token Costs for AI Models, Operator's Browser Interaction

Link mentioned: DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1: Contribute to deepseek-ai/DeepSeek-R1 development by creating an account on GitHub.


OpenAI ▷ #gpt-4-discussions (2 messages):

Release of O3


OpenAI ▷ #prompt-engineering (7 messages):

Public vs Private Information, NDA Discussions, Misinformation Concerns


OpenAI ▷ #api-discussions (7 messages):

2023 Trends, NDA Compliance, Misinformation Concerns


Stackblitz (Bolt.new) ▷ #prompting (1 messages):

React + TypeScript + Tailwind web app, App architecture, Data management strategies, Development workflows, Versioning standards

Link mentioned: start prompt strategy: ## Start with Initial Prompt: Create a React + TypeScript + Tailwind web app with: Layout: Persistent header and sidebar navigation with menu items (e.g., Menu 1, Menu 2, Menu 3) and a submenu unde...


Stackblitz (Bolt.new) ▷ #discussions (144 messages🔥🔥):

Stripe Webhook Implementation, Issues with Bolt Functions, Messaging System with Supabase, Chat Loading Problems, OpenAI API Errors

Links mentioned:


Nous Research AI ▷ #general (134 messages🔥🔥):

DiStRo and GPU Training, Tiny Stories Model Training, OpenAI and Market Reactions, New AI Models and Reasoning Capabilities, Importance of Self-Attention in Transformers

Links mentioned:


Yannick Kilcher ▷ #general (98 messages🔥🔥):

Memory bus width, Math questions for LLMs, Reasoning and visual problems in LLMs, Open weight Differential Transformer models, Anime avatars in OSS ML community

Links mentioned:


Yannick Kilcher ▷ #paper-discussion (18 messages🔥):

Myopic Optimization with Non-myopic Approval (MONA), AI Insurance, GPRO and PPO Comparison, GAE and Advantage Estimation, LMArena Rankings

Link mentioned: MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking: Future advanced AI systems may learn sophisticated strategies through reinforcement learning (RL) that humans cannot understand well enough to safely evaluate. We propose a training method which avoid...


Yannick Kilcher ▷ #ml-news (9 messages🔥):

AG2 Announcement, R1 Server Performance, AI Community Dynamics, Stargate Influence

Links mentioned:


Notebook LM Discord ▷ #use-cases (7 messages):

Podcast Editing with NotebookLM, Reverse Turing Test with Generative Output, AI Host Animation Tools

Link mentioned: UNREAL MYSTERIES 7: The Callisto Mining Incident: David and Hannah follows the adventures of the legendary Malcolm Steele in his role as SPACE MINER on Jupiters moon Callisto. Learn what he, Ted and Jessica ...


Notebook LM Discord ▷ #general (53 messages🔥):

Uploading PDFs, Gemini Advanced capabilities, NotebookLM use cases, Quiz preparation, Interactive mode loading issue

Links mentioned:


Nomic.ai (GPT4All) ▷ #announcements (1 messages):

GPT4All v3.7.0 release, Windows ARM Support, macOS update fixes, Code Interpreter Improvements, Chat Templating Enhancements


Nomic.ai (GPT4All) ▷ #general (50 messages🔥):

Prompt Engineering Challenges, Model Compatibility for GPT4All, Image Analysis Tools, Translation Model Recommendations


MCP (Glama) ▷ #general (22 messages🔥):

MCP Server Timeout Fix, MCP Server Installation Issues, MySQL and SQLite Usage, Claude Google Search Feature, MCP-Alchemy Repository

Links mentioned:


MCP (Glama) ▷ #showcase (16 messages🔥):

Orange Flair Request, MCP Agentic Tool Confusion, Integration of Glama into Clients, Long Term Memory Personalization Tool

Links mentioned:


LlamaIndex ▷ #blog (1 messages):

AI agents, Joint webinar, Task management in AI, Redis, Webinar recording


LlamaIndex ▷ #general (36 messages🔥):

LlamaIndex agent tutorials, Parallel streaming issues, Using LlamaParse for PDFs, Real-time event streaming, Export controls on AI models

Links mentioned:


Cohere ▷ #discussions (22 messages🔥):

US Export Controls on AI Model, Cohere's Compliance Concerns, Oracle Japan's Use of Cohere's Model, Market Impact of GPU Restrictions, Blackwell Operations Costs

Link mentioned: New U.S. Export Controls on Advanced Computing Items and Artificial Intelligence Model Weights: Seven Key Takeaways: no description found


Cohere ▷ #api-discussions (1 messages):

sssandra: pls post the issue in <#1324436975436038184> , we'll help troubleshoot there


Cohere ▷ #cmd-r-bot (5 messages):

9-letter words, Word lists, Interesting words


Modular (Mojo 🔥) ▷ #general (8 messages🔥):

Forum Post Creation, Async Code in Mojo

Links mentioned:


Modular (Mojo 🔥) ▷ #announcements (1 messages):

MAX Builds Page Launch, Community-built Packages, Project Submission Process


Modular (Mojo 🔥) ▷ #mojo (17 messages🔥):

__iadd__ method, Accidental typing, New Mojo CLI flags


LAION ▷ #general (25 messages🔥):

Audio Dataset Project, Speaker Recognition Model, Labeling Noise and Music Levels

Link mentioned: Google Colab: no description found


Eleuther ▷ #general (7 messages):

Teacher-Student Model Divergence, AI4Science Open Source Projects, Layer Convergence Bias, Vanishing Gradients Discussion

Link mentioned: Which Layer is Learning Faster? A Systematic Exploration of...: We empirically show that the shallower layers converge faster than the deeper layers in neural networks, and provide the theoretical justification and practical value of this finding.


Eleuther ▷ #research (14 messages🔥):

ModernBERT and ModernBART, Hybrid Language Models, Chain-of-Thought Reasoning, Agent-R Self-Critique Framework, Latro RL vs Pause Tokens

Links mentioned:


Eleuther ▷ #interpretability-general (2 messages):

Mech Interp, Bilinear MLPs, Weights and Activations Duality, ICLR 2025 Paper

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (20 messages🔥):

Generating ice text with specific fonts, ControlNet for image generation, Adobe Firefly for custom text, Understanding image poisoning, Using sketch to image


tinygrad (George Hotz) ▷ #general (14 messages🔥):

ILP for View Simplification, Mask Representation Challenges, Multi Merge Search Patterns, Three View Simplification Approaches, Stride Alignment in Merges

Link mentioned: Complete view pair add using ILP (draft) by eliotgolding · Pull Request #8736 · tinygrad/tinygrad: Proof of concept, uses scipy&#39;s solver.$ python test/external/fuzz_view_add_completeness.pyMissed adds: 5/1109 0.45%$ NEWADD=1 python test/external/fuzz_view_add_completeness.pyMissed adds: ...


Torchtune ▷ #general (10 messages🔥):

Running on Windows, Windows Subsystem for Linux (WSL), Regex for Data Cleaning, Non-Supported Features in Windows, Issues with Triton and Xformers


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (5 messages):

First Lecture Announcement, LLM Agents Acceptance


Axolotl AI ▷ #general (5 messages):

Scams reported in Discord, Nebius AI multi-node training, SLURM and Torch Elastic challenges


DSPy ▷ #general (2 messages):

Signature Definition Issues, MATH Dataset Alternatives

Link mentioned: hendrycks/competition_math · Datasets at Hugging Face: no description found







{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}