Frozen AI News archive

s1: Simple test-time scaling (and Kyutai Hibiki)

**"Wait" is all you need** introduces a novel reasoning model finetuned from **Qwen 2.5 32B** using just **1000 questions with reasoning traces** distilled from **Gemini 2.0 Flash Thinking**, enabling controllable test-time compute by appending "Wait" to extend reasoning. Lead author **Niklas Muennighoff**, known for work on **Bloom**, **StarCoder**, and **BIG-bench**, highlights this method's efficiency and its reproduction of the famous o1 scaling chart. Additionally, **Kyutai Moshi**'s Hibiki project demonstrates impressive offline French-English live translation on iPhone. Recent AI model releases include **DeepSeek R1 and R3 open source models**, potentially marking a major open-source milestone, **Hugging Face's SmolLM2** emphasizing data-centric training for small LMs, and **IBM's Granite-Vision-3.1-2B**, a small vision-language model with strong performance. Key research papers spotlight **LIMO** for minimal demonstration reasoning achieving high accuracy on AIME and MATH benchmarks, and **Token-Assisted Reasoning** mixing latent and text tokens to improve language model reasoning.

Canonical issue URL

AI News for 2/5/2025-2/6/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (210 channels, and 4396 messages) for you. Estimated reading time saved (at 200wpm): 490 minutes. You can now tag @smol_ai for AINews discussions!

We're regrettably late to covering this paper, but late is better than never. s1: Simple test-time scaling documents a new reasoning model with 2 novel contributions:

Lead author Niklas Muennighoff, who notably worked on Bloom, StarCoder, MTEB, and contributed to BIG-bench, notes that this second trick reproduces the famous o1 scaling chart:

image.png

Compared to Bespoke-Stratos (our coverage here), the filtering is also remarkably sample efficient.

image.png

We would also recommend Simonw and Tim Kellogg's explainers.

Honorable mention today:

Kyutai Moshi made a splash last year (our coverage here) for its realtime voice with inner monologue, and now Hibiki shows very impressive French-English live translation offline on an iPhone. Not bad for an intern project.

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

AI Models and Releases

AI Research Papers and Findings

AI Tools and Platforms

AI Industry News and Events

Personal Achievements and Updates

Memes/Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Hibiki Speech-to-Speech Translation - FR to EN Capability

Theme 2. Challenges with Gemini 2.0 Pro Experimental Model

Theme 3. Open WebUI Releases Code Interpreter and Exa Search Features

Theme 4. Over-Tokenized Transformer Enhances LLM Performance

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Altman admits reduced competitive edge for OpenAI

Theme 2. Deep Reconstruction using AI tools for complex analysis

Theme 3. Open Source AI for Trackable Health Diagnostics


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. Breakthroughs in Model Capabilities and Performance

Theme 2. Tooling and Framework Enhancements for AI Engineers

Theme 3. Navigating Challenges in Model Performance and Infrastructure

Theme 4. Community Driven Innovations and Open Source Contributions

Theme 5. Ethical Debates and Business Model Scrutiny in AI


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


Stability.ai (Stable Diffusion) Discord


Codeium (Windsurf) Discord


aider (Paul Gauthier) Discord


OpenAI Discord


Cursor IDE Discord


Perplexity AI Discord


OpenRouter (Alex Atallah) Discord


LM Studio Discord


MCP (Glama) Discord


Yannick Kilcher Discord


Eleuther Discord


Nous Research AI Discord


Interconnects (Nathan Lambert) Discord


Notebook LM Discord


LLM Agents (Berkeley MOOC) Discord


GPU MODE Discord


Nomic.ai (GPT4All) Discord


Torchtune Discord


Modular (Mojo šŸ”„) Discord


Latent Space Discord


LlamaIndex Discord


MLOps @Chipro Discord


Cohere Discord


Gorilla LLM (Berkeley Function Calling) Discord


DSPy Discord


The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Unsloth AI (Daniel Han) ā–· #general (516 messagesšŸ”„šŸ”„šŸ”„):

GRPO and vLLM integration, DeepSeek models and fine-tuning, Quantization techniques, Multi-turn conversational datasets, AI ethics and data privacy

Links mentioned:


Unsloth AI (Daniel Han) ā–· #announcements (1 messages):

Reasoning in Unsloth, DeepSeek-R1, Model Fine-Tuning, New Model Support

Links mentioned:


Unsloth AI (Daniel Han) ā–· #off-topic (38 messagesšŸ”„):

Model Merging, DeepSeek V3, User Benefit vs. Corporate Profit, OpenAI developments, Societal Value in AI

Links mentioned:


Unsloth AI (Daniel Han) ā–· #help (188 messagesšŸ”„šŸ”„):

Unsloth model training, GRPO and reward functions, Model merging issues, Continued pretraining with LoRA, Adapter performance comparison

Links mentioned:


Unsloth AI (Daniel Han) ā–· #showcase (1 messages):

yaska0971: Name strings tooooooooooo long. Please shorten it


Unsloth AI (Daniel Han) ā–· #research (6 messages):

Realistic AI Research Domains, TPU Research Cloud with JAX, OpenMoE Project, Pretraining Small Transformers

Links mentioned:


Stability.ai (Stable Diffusion) ā–· #announcements (1 messages):

Maxfield Introduction, Community Engagement Initiatives, Feature Request Board, Showcasing Researcher Progress


Stability.ai (Stable Diffusion) ā–· #general-chat (459 messagesšŸ”„šŸ”„šŸ”„):

Stability AI Updates, Model Compatibility, AI Prompting Techniques, Community Dynamics, AI Subscriptions and Costs

Links mentioned:


Codeium (Windsurf) ā–· #announcements (3 messages):

Gemini 2.0 Flash, Windsurf Next Beta, Windsurf 1.2.6 Patch Fixes, Cascade Web Search

Links mentioned:


Codeium (Windsurf) ā–· #discussion (31 messagesšŸ”„):

Codeium Jetbrains Plugin Issues, DeepSeek Feature Request, Function Length Display in CodeLens, Educational Email Discounts, Version Updates and Bug Reports


Codeium (Windsurf) ā–· #windsurf (345 messagesšŸ”„šŸ”„):

Issues with Windsurf Performance, Gemini Flash vs Sonnet, Usage of Multiple AI Models, Windsurf Installation and Login Problems, User Experience with Cascading Files

Links mentioned:


aider (Paul Gauthier) ā–· #general (337 messagesšŸ”„šŸ”„):

Hiring Update, Aider Error Handling, DeepSeek and Gemini Models, LLM Editing Formats, Pen Testing with LLMs

Links mentioned:


aider (Paul Gauthier) ā–· #questions-and-tips (23 messagesšŸ”„):

Aider Support for Agents, Staging Changes in Aider, Commit Messages with R1, Model Configuration Issues, Architect Mode Functionality

Link mentioned: Reasoning models: How to configure reasoning model settings from secondary providers.


aider (Paul Gauthier) ā–· #links (2 messages):

Gemini 2.0, Open Deep Research, HuggingFace, Agent frameworks

Link mentioned: Open-source DeepResearch – Freeing our search agents: no description found


OpenAI ā–· #ai-discussions (276 messagesšŸ”„šŸ”„):

Gemini 2.0 Pro, OpenAI vs DeepSeek, AI for Coding, Chatbot Aggregators, AI Model Comparisons

Link mentioned: Fire Writing GIF - Fire writing - Discover & Share GIFs: Click to view the GIF


OpenAI ā–· #gpt-4-discussions (5 messages):

Deep Research chat for Plus users


OpenAI ā–· #prompt-engineering (6 messages):

Response Length Control, Undesired Behavior in AI Models, Input Influencing Output


OpenAI ā–· #api-discussions (6 messages):

Controlling AI Response Length, Managing Undesired AI Behavior


Cursor IDE ā–· #general (282 messagesšŸ”„šŸ”„):

Cursor IDE Updates, Gemini 2.0 Performance, Clipboard Comparison Tools, MCP Server Configurations, Context Limitations in AI Models

Links mentioned:


Perplexity AI ā–· #general (247 messagesšŸ”„šŸ”„):

Perplexity AI Focus Mode, Query Handling in Perplexity Pro, R1 vs. Other Models, Performance Issues with Deepseek, User Concerns regarding Model Specifications

Links mentioned:


Perplexity AI ā–· #sharing (22 messagesšŸ”„):

Tesla Robotaxi Launch, AI Skills Development Opportunities, USA vs China AI Race, Deepfake Technology from ByteDance, Trans Athlete Executive Order

Link mentioned: YouTube: no description found


Perplexity AI ā–· #pplx-api (7 messages):

Perplexity API usage, Sonar Pro Reasoning devs, Image uploading limitations, Monthly cost limits and invoicing


OpenRouter (Alex Atallah) ā–· #announcements (14 messagesšŸ”„):

DeepSeek Insurance, Kluster integration issues, Qwen model deprecation, Website downtime update

Links mentioned:


OpenRouter (Alex Atallah) ā–· #app-showcase (1 messages):

Y CLI Development, Terminal Enthusiasm, Chat Data Management, MCP Client Support, Deepseek-r1 Integration

Links mentioned:


OpenRouter (Alex Atallah) ā–· #general (242 messagesšŸ”„šŸ”„):

DeepInfra issues, Gemini 2.0 Flash readiness, OpenRouter authentication service, Error handling with models, Provider performance discrepancies

Links mentioned:


LM Studio ā–· #general (215 messagesšŸ”„šŸ”„):

LM Studio API error handling, Model performance inquiries, Obsidian Smart Connections integration, Updating AI models and features, Safety of downloading AI models from TheBloke

Links mentioned:


LM Studio ā–· #hardware-discussion (23 messagesšŸ”„):

DDR5 6000 EXPO Performance, Hardware Configuration for LMS, Memory Testing Tools, Multi-GPU Setup on PCIe 3.0

Link mentioned: GitHub - CoolCmd/TestMem5: TestMem5Ā - PC RAM stress test: TestMem5Ā - PC RAM stress test. Contribute to CoolCmd/TestMem5 development by creating an account on GitHub.


MCP (Glama) ā–· #general (97 messagesšŸ”„šŸ”„):

Home Assistant MCP Client/Server, MCP Server Usage, Goose MCP Client, Image Display in Claude, MCP Server Configurations

Links mentioned:


MCP (Glama) ā–· #showcase (54 messagesšŸ”„):

PulseMCP Use Cases, MCP Servers, Claude for Research, Web Research Tools, Markdown in Discord

Links mentioned:


Yannick Kilcher ā–· #general (112 messagesšŸ”„šŸ”„):

Gemini 2.0 Performance, DeepSpeed with Hugging Face, AI Legislation Impact, Australia's Internet Infrastructure, Open Source AI Models

Links mentioned:


Yannick Kilcher ā–· #paper-discussion (10 messagesšŸ”„):

Harmonic Loss Paper, VideoJAM Discussion, EU Discussion Hours, DeepSeek Hosting

Link mentioned: VideoJAM: VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Model


Yannick Kilcher ā–· #ml-news (10 messagesšŸ”„):

Gemini 2.0 Flash, Flash-lite issues, S1 reasoning model, Inference scaling insights, OpenAI scaling laws

Links mentioned:


Eleuther ā–· #general (22 messagesšŸ”„):

Collaboration on LLM Research, Deepspeed and Hugging Face, Benchmarking LLMs, Weight Decay in Fine Tuning, RWKV Architecture Development

Link mentioned: GitHub - lechmazur/generalization: Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.: Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then d...


Eleuther ā–· #research (92 messagesšŸ”„šŸ”„):

Multi Token Prediction Inference, Independent Research in AI/ML, A/B Testing and Reward Modeling, Quadratic Fitting for Parameter Estimation, DeepSeek MTP Implementation

Links mentioned:


Eleuther ā–· #interpretability-general (1 messages):

MATS cohort applications, Mechanistic Interpretability Research, Mentoring in AI research

Links mentioned:


Eleuther ā–· #lm-thunderdome (7 messages):

cons@64, majority voting, eval configuration in YAML


Eleuther ā–· #gpt-neox-dev (1 messages):

Sequence parallelism implementation, Model parallelism size issues, AttributeError in Megatron library, Training crash log

Link mentioned: aflah: Weights & Biases, developer tools for machine learning


Nous Research AI ā–· #general (100 messagesšŸ”„šŸ”„):

Deep Research Feedback, AI Backlash and Crypto, Purpose AI Agent in Trusts, New AI Models and Training Techniques, Fine-tuning Approaches

Links mentioned:


Nous Research AI ā–· #ask-about-llms (1 messages):

DeepSeek-R1 training loop, Reward loss vs KL loss sensitivity, Pitfalls of small instruct models, Model size considerations, Hyperparameter importance


Nous Research AI ā–· #research-papers (1 messages):

Synthetic data generation, Seed-based approaches, Magpie output issues, Self-instruct alternatives, Awesome-LLM-Synthetic-Data resource

Link mentioned: GitHub - wasiahmad/Awesome-LLM-Synthetic-Data: A reading list on LLM based Synthetic Data Generation šŸ”„: A reading list on LLM based Synthetic Data Generation šŸ”„ - wasiahmad/Awesome-LLM-Synthetic-Data


Nous Research AI ā–· #interesting-links (3 messages):

Deep Dive into LLMs, Mina's zkML Library

Link mentioned: Deep Dive into LLMs like ChatGPT: This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full traini...


Nous Research AI ā–· #research-papers (1 messages):

Synthetic Data Generation, Self-instruct, Magpie, WizardLM, Awesome LLM Synthetic Data

Link mentioned: GitHub - wasiahmad/Awesome-LLM-Synthetic-Data: A reading list on LLM based Synthetic Data Generation šŸ”„: A reading list on LLM based Synthetic Data Generation šŸ”„ - wasiahmad/Awesome-LLM-Synthetic-Data


Interconnects (Nathan Lambert) ā–· #news (39 messagesšŸ”„):

John Schulman leaves Anthropic, Hibiki speech-to-speech translation model, Le Chat AI sidekick, GitHub Copilot agent mode, OpenAI updated chain of thought

Links mentioned:


Interconnects (Nathan Lambert) ā–· #ml-questions (3 messages):

LRMs test-time scaling, Model decision-making, Training phase scaling


Interconnects (Nathan Lambert) ā–· #ml-drama (9 messagesšŸ”„):

Crowd-sourced prompts, Jailbreaking models, Open Source Community, Incentives in AI

Link mentioned: Tweet from Pliny the Liberator šŸ‰ (@elder_plinius): I don’t want to provide my world-class expertise just for you to hoard crowd-sourced prompts and construct elaborate security theater performances to appease investors who are foolish enough to believ...


Interconnects (Nathan Lambert) ā–· #random (23 messagesšŸ”„):

ChatGPT Fishing Techniques, Long Chain of Thought in LLMs, Qwen Model Discoveries, Deep Research Applications

Links mentioned:


Interconnects (Nathan Lambert) ā–· #memes (2 messages):

Duality of Man, Discussion on X, Post by mcmillen.dev

Links mentioned:


Interconnects (Nathan Lambert) ā–· #rl (11 messagesšŸ”„):

RL dataset skepticism, Unsloth GRPO support, Unified memory usage, Training on same GPUs, DM paper on rollouts

Link mentioned: Train your own R1 reasoning model locally: You can now reproduce your own DeepSeek-R1 reasoning model with Unsloth 100% locally. Using GRPO.Open-source, free and beginner friendly.


Interconnects (Nathan Lambert) ā–· #reads (8 messagesšŸ”„):

Open source AI, DeepSeek's impact on Scale AI, AI's evolving definitions, The importance of human oversight, Dario on Chinatalk

Link mentioned: 🌁#86: Four Freedoms of Open AI: – what are they? Defining the future


Interconnects (Nathan Lambert) ā–· #policy (1 messages):

xeophon.: https://x.com/AndrewCurran_/status/1887505463211925557


Notebook LM ā–· #use-cases (17 messagesšŸ”„):

Collaboration and Similarities, Corruption in Religious Leadership, AI Features for Creativity, Uses of NotebookLM in Law, Max Headroom's Comeback

Links mentioned:


Notebook LM ā–· #general (78 messagesšŸ”„šŸ”„):

NotebookLM Model Limitations, Audio Overview Customization, Spreadsheet Data Analysis, Sharing Notebooks Issues, Interactive Mode Problems

Links mentioned:


LLM Agents (Berkeley MOOC) ā–· #mooc-announcements (1 messages):

Fall 2024 MOOC Certificates, Coursework Submission Challenges, Future MOOC Opportunities


LLM Agents (Berkeley MOOC) ā–· #mooc-questions (57 messagesšŸ”„šŸ”„):

Certificate issuance timeline, Quiz availability and results, Certificate tier breakdown, Communication and support, Feedback on course experience


GPU MODE ā–· #general (13 messagesšŸ”„):

Output vs Input Token Pricing, Independent Research in AI/ML, Niche Fields for Research, Economizing AI Research


GPU MODE ā–· #triton (4 messages):

Triton warp specialization, Triton compiler on NVIDIA Blackwell, Installing Triton on RTX 5080, Deepseek fused MLA implementation in Triton

Links mentioned:


GPU MODE ā–· #cuda (3 messages):

CUDA GEMM Implementation, Double Buffering Performance Issues, Register Usage Optimization, Memory Sector Utilization


GPU MODE ā–· #algorithms (8 messagesšŸ”„):

FP8 Attention, Hadamard Transform, CUDA Elementwise Kernel for Mixed Integer Linear Programming, Grouped GEMM Implementation, Torch Nested Tensor

Link mentioned: fast-hadamard-transform/csrc at master Ā· Dao-AILab/fast-hadamard-transform: Fast Hadamard transform in CUDA, with a PyTorch interface - Dao-AILab/fast-hadamard-transform


GPU MODE ā–· #cool-links (1 messages):

iron_bound: https://www.youtube.com/watch?v=7xTGNNLPyMI


GPU MODE ā–· #torchao (2 messages):

PyTorch Team Visibility, User Concerns


GPU MODE ā–· #off-topic (2 messages):

Japanese government discussions, Text-generation-inference n-gram decoding


GPU MODE ā–· #thunderkittens (1 messages):

Linear Attention Model, Distillation Process, Training Challenges


GPU MODE ā–· #reasoning-gym (18 messagesšŸ”„):

Sokoban Puzzles, Rush Hour Puzzle, Reasoning-Gym Integration

Links mentioned:


Nomic.ai (GPT4All) ā–· #general (48 messagesšŸ”„):

Model Performance Comparison, Language Model Constraints, DeepSeek Model Insights

Links mentioned:


Torchtune ā–· #general (30 messagesšŸ”„):

GRPO implementation success, Kolo support for Torchtune, Config issues with Llama 3.1 and Qwen 2.5, Hugging Face fast tokenizer support

Links mentioned:


Torchtune ā–· #dev (16 messagesšŸ”„):

GitHub Checks on Full DPO Distributed PR, GPU Testing Issues, Recipe Test Failures, VRAM Usage Optimization

Links mentioned:


Modular (Mojo šŸ”„) ā–· #general (2 messages):

Mojo language development, 12/18 community meeting insights

Link mentioned: Modular milestones: GPUs, 2024 reflections, and the road ahead šŸš€: In this extra special community meeting, we reflected on 2024's progress and shared updates on:šŸ§‘ā€šŸš€ MAX 24.6, featuring MAX GPU!šŸ”„ Our overall approach to M...


Modular (Mojo šŸ”„) ā–· #mojo (24 messagesšŸ”„):

Parser Rewriting, Script Functionality, Mojo Open-Source Aspirations, UpdateDOM Function, Production Readiness of Mojo

Link mentioned: Link to class method in Python docstring: I want to add a link to a method in my class from within the docstring of another method of the same class. I want the link to work in Sphinx and preferentially also in Spyder and other Python IDEs...


Modular (Mojo šŸ”„) ā–· #max (16 messagesšŸ”„):

MAX Serve CLI, OpenAI Completion API Issues, OpenAI Model Compatibility, Msty client for local models

Links mentioned:


Latent Space ā–· #ai-general-chat (36 messagesšŸ”„):

Hibiki translation model, Melanie Mitchell's AI perspectives, Mistral AI's Le Chat, OpenAI's o3-mini updates, PDF parsing advancements

Links mentioned:


LlamaIndex ā–· #blog (2 messages):

Gemini 2.0 availability, LlamaParse for financial documents

Links mentioned:


LlamaIndex ā–· #general (4 messages):

Embedding Print Removal, Pull Request Suggestion, Documentation Clarity

Links mentioned:


MLOps @Chipro ā–· #general-ml (6 messages):

LLMs in Classification, Latency Requirements in ML, Composite Pipeline for Noisy Data


Cohere ā–· #api-discussions (6 messages):

Fine-tuning Error, System Design Interview Questions


Gorilla LLM (Berkeley Function Calling) ā–· #discussion (4 messages):

Tool-using model system prompts, Hugging Face dataset transformation issues, Dataset file format mismatch


DSPy ā–· #papers (1 messages):

batmanosama: https://arxiv.org/abs/2502.02508


DSPy ā–· #examples (2 messages):

Git Repository, Colab Notebook

Link mentioned: Google Colab: no description found



{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}