Frozen AI News archive

How To Scale Your Model, by DeepMind

**Researchers at Google DeepMind (GDM)** released a comprehensive "little textbook" titled **"How To Scale Your Model"** covering modern Transformer architectures, inference optimizations beyond O(N^2) attention, and high-performance computing concepts like rooflines. The resource includes practical problems and real-time comment engagement. On AI Twitter, several key updates include the open-sourced humanoid robotics model **ASAP** inspired by athletes like **Cristiano Ronaldo**, **LeBron James**, and **Kobe Bryant**; a new paper on **Mixture-of-Agents** proposing the **Self-MoA** method for improved LLM output aggregation; training of reasoning LLMs using the **GRPO algorithm** from **DeepSeek** demonstrated on **Qwen 0.5**; findings on bias in LLMs used as judges highlighting the need for multiple independent evaluations; and the release of **mlx-rs**, a Rust library for machine learning with examples including **Mistral** text generation. Additionally, **Hugging Face** launched an AI app store featuring over **400,000 apps** with 2,000 new daily additions and 2.5 million weekly visits, enabling AI-powered app search and categorization.

Canonical issue URL

AI News for 2/3/2025-2/4/2025. We checked 7 subreddits, 433 Twitters and 34 Discords (225 channels, and 3842 messages) for you. Estimated reading time saved (at 200wpm): 425 minutes. You can now tag @smol_ai for AINews discussions!

In a surprise drop, some researchers released a "little textbook" on how they scale models at GDM:

image.png

A commenter confirmed this was GDM internal documentation, with Gemini references redacted.

How To Scale Your Model comes in 12 parts and starts with a nice update of what standard Transformers today look like:

image.png

and explains how inference differs from the standard O(N^2) understanding of attention:

image.png

but also introduces standard high performance computing concepts like rooflines:

image.png

even coming with worked problems for the motivated reader to test their understanding... and comments are being read in realtime.

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

AI Model Releases and Research Papers

AI Tools and Platforms Announcements

AI Events, Conferences, and Hiring

AI Ethics, Safety, and Policy

General AI Industry Commentary


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. DeepSeek R1 & R1-Zero: Rapid Model Training Achievements

Theme 2. DeepSeek-R1 Model: Implications of Shorter Correct Answers

Theme 3. OpenAI Research: Embracing Open-Source via Hugging Face

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. OmniHuman-1: China's Multimodal Marvel

Theme 2. Huawei's Ascend 910C Challenges Nvidia H100

Theme 3. O3 Mini: OpenAI's Usability Leap

Theme 4. OpenAI Unveils OpenAI Sans Font


AI Discord Recap

A summary of Summaries of Summaries by o1-mini-2024-09-12

Theme 1. Model Optimization Mania

Theme 2. AI Tool Wars

Theme 3. Ethics and Safety Shenanigans

Theme 4. Hackathons and Collaborative Sparks

Theme 5. AI in Legal and Customer Service Realms


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


Codeium (Windsurf) Discord


aider (Paul Gauthier) Discord


Cursor IDE Discord


Yannick Kilcher Discord


LM Studio Discord


Perplexity AI Discord


OpenAI Discord


Interconnects (Nathan Lambert) Discord


GPU MODE Discord


Eleuther Discord


Nous Research AI Discord


Stability.ai (Stable Diffusion) Discord


Notebook LM Discord Discord


Latent Space Discord


Nomic.ai (GPT4All) Discord


Stackblitz (Bolt.new) Discord


Torchtune Discord


Modular (Mojo 🔥) Discord


LLM Agents (Berkeley MOOC) Discord


MCP (Glama) Discord


Cohere Discord


DSPy Discord


LAION Discord


LlamaIndex Discord


tinygrad (George Hotz) Discord


MLOps @Chipro Discord


OpenInterpreter Discord


OpenRouter (Alex Atallah) Discord


The Axolotl AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Unsloth AI (Daniel Han) ▷ #general (692 messages🔥🔥🔥):

DeepSeek R1 Model, Model Quantization, Fine-Tuning Techniques, Data Generation for Training, Transformer Architectures

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (1 messages):

shiyaozhidewa: I want deepseek r1 abliterated 671B


Unsloth AI (Daniel Han) ▷ #help (96 messages🔥🔥):

CUDA Out of Memory Errors, Finetuning Strategies for Models, Installation Instructions for Unsloth, Using Experts in MoE Frameworks, Logging with Weights & Biases

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (11 messages🔥):

DeepSeek R1 model, Klarity library, YouTube video releases, Bulgarian language model performance, Math versions of models

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (1 messages):

not_qty: This video is great... https://youtu.be/_1f-o0nqpEI?si=s2B-o5y2d5ztsV0U


Codeium (Windsurf) ▷ #content (1 messages):

Windsurf Docs Shortcuts, Mintlify Auto-hosting, Community Contributions on Twitter

Link mentioned: Tweet from Kevin Hou (@kevinhou22): we love docs! 📖 I'm working on improving / adding more @ docs shortcuts to @windsurf_ailmk what you want and I'll add as many as I can... 🧵also shoutout @mintlify for auto-hosting all docs w...


Codeium (Windsurf) ▷ #discussion (68 messages🔥🔥):

Windsurf functionalities, Codeium performance and errors, Account activation issues, Hackathon opportunities, Qodo skepticism

Links mentioned:


Codeium (Windsurf) ▷ #windsurf (401 messages🔥🔥):

Windsurf O3 Mini Pricing, Issues with File Editing, User Experience with Models, Context Window Limitations, Integration of New Features

Links mentioned:


aider (Paul Gauthier) ▷ #general (339 messages🔥🔥):

O1 Pro Performance, Weak Models in Aider, OpenRouter vs Direct API, Using Shell for LLM Tools, Challenges with Testing and Refactoring

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (17 messages🔥):

Aider file management, Aider CLI formatting issues, Auto-adding files in Aider, Understanding Aider chat modes, C# file-scoped namespaces

Links mentioned:


Cursor IDE ▷ #general (320 messages🔥🔥):

Cursor IDE updates, Comparison with other AI tools, User experiences with different models, Cost of using AI tools, Impressions of Supermaven

Links mentioned:


Yannick Kilcher ▷ #general (234 messages🔥🔥):

Ethics of AI Training Data, Health Care Discourse, Political Ideologies, Human Cooperation, Financial Inequality

Links mentioned:


Yannick Kilcher ▷ #paper-discussion (18 messages🔥):

Anthropic classifiers, Universal jailbreaks, Paper discussion attendance, Alignment techniques, Hallucination in LLMs

Link mentioned: Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming: Large language models (LLMs) are vulnerable to universal jailbreaks-prompting strategies that systematically bypass model safeguards and enable users to carry out harmful processes that require many m...


Yannick Kilcher ▷ #agents (9 messages🔥):

Deepseek R1 600B, Tokenization Concerns, Image Analysis Capability, Memory and Puzzle Connection


Yannick Kilcher ▷ #ml-news (6 messages):

AI copyright and patent laws, Training AI and societal benefits, New AI model release requirements, Polyapprox project


LM Studio ▷ #general (141 messages🔥🔥):

Model Performance Issues, API Model Specification, Compatibility with Intel Macs, RAG and Inference in LM Studio, Tool & Function Callbacks

Links mentioned:


LM Studio ▷ #hardware-discussion (120 messages🔥🔥):

RAM differences in server setups, Running models on different hardware, M4 Ultra performance expectations, GPU configurations and capabilities, Inference speeds of various models

Links mentioned:


Perplexity AI ▷ #announcements (1 messages):

New Chief Security Officer, Security updates


Perplexity AI ▷ #general (165 messages🔥🔥):

Perplexity Pro Features, API Usage and Limits, User Experience Feedback, Academic Writing Assistance, Model Comparison and Performance

Links mentioned:


Perplexity AI ▷ #sharing (13 messages🔥):

Phobia information sources, MSI A520M motherboard, Pigs' appearance, Trump executive order, Linux desktop usage


Perplexity AI ▷ #pplx-api (9 messages🔥):

Llama 3.1 Sonar Model Deprecation, Sonar-Reasoning Errors, API Access for Image Retrieval, Litellm Model Updates, API Model Name Identification


OpenAI ▷ #ai-discussions (171 messages🔥🔥):

DeepSeek, O1 Pro Model Performance, AI Sentience and Future Impact, Orchestration Services in AI, User Experiences with OpenAI and Other Models

Links mentioned:


OpenAI ▷ #gpt-4-discussions (11 messages🔥):

User feedback on AI performance, AI Storybooks for Kids, Device login limits for Pro users, Recent updates and emoji usage, Accuracy of deep research info


OpenAI ▷ #prompt-engineering (1 messages):

Structured generation, JSON Schema enhancements, UIForm utility


OpenAI ▷ #api-discussions (1 messages):

Structured Generation Techniques, UIForm Open Source Utility, JSON Schema Enhancements


Interconnects (Nathan Lambert) ▷ #news (43 messages🔥):

SoftBank OpenAI Partnership, Google Gemini Updates, Harmonic Loss for Neural Networks, MultiChallenge Benchmark for Conversations, OpenAI Website Redesign

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (40 messages🔥):

Noam Brown's Views, Zetta vs. Llama-1 Dynamics, OpenAI's Hardware Ambitions, Internal Cultural Issues, Collaboration with OpenAI

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (84 messages🔥🔥):

GRPO effectiveness on Llama 2, DeepSeek training on Huawei Ascend, NVIDIA Digits interest, RLHF and training costs, Website certificate issues

Links mentioned:


Interconnects (Nathan Lambert) ▷ #memes (3 messages):

Qwen/Llama models, Old RL + NLP papers

Link mentioned: Tweet from Teortaxes▶️ (DeepSeek🐳 Cheerleader since 2023) (@teortaxesTex): never once was wrong to bet on the guy in the clown hat.Quoting anton (@abacaj) wait... there's no way it's this easy right


Interconnects (Nathan Lambert) ▷ #reads (12 messages🔥):

Prime paper release, Scaling deep learning models, JAX usage in industry

Links mentioned:


Interconnects (Nathan Lambert) ▷ #policy (2 messages):

Mandatory Licensing for AI, Copyright Issues in AI, Fair Use for AI Use


GPU MODE ▷ #general (17 messages🔥):

1D Block Tiling vs Cublas, LlamaGen Image Models, Gen AI Hackathon, MAGVIT Video Tokenization, Generation Time Comparisons

Links mentioned:


GPU MODE ▷ #triton (13 messages🔥):

Triton kernel optimization, k_cross dimensions, Error in tutorial example, TMA performance, Warp specialization

Link mentioned: import torchimport tritonimport triton.language as tldef benchmark(f, jo - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.


GPU MODE ▷ #cuda (17 messages🔥):

Cache Efficiency with Large Inputs, Tensor Cores Impact on Performance, Microarchitecture Confusions, SASS Observations on Blackwell, CUDA Stream Dependencies


GPU MODE ▷ #torch (27 messages🔥):

Redis Connection Management, Custom Triton/CUDA Kernels, Compiler Subprocesses, Redis Python Client

Links mentioned:


GPU MODE ▷ #algorithms (4 messages):

FP8 Attention Performance, Sensitivity of Attention to Quantization, Long Sequence Length Inference, Flash Attention 3, Quantization in DeepSeek V3


GPU MODE ▷ #cool-links (1 messages):

iron_bound: https://www.youtube.com/watch?v=rCwgAGG2sZQ


GPU MODE ▷ #jobs (1 messages):

Staff Software Engineer, Model performance, Inference engine, Performance monitoring, Generative media models

Link mentioned: Staff Software Engineer, ML Performance & Systems: Staff Software Engineer, ML Performance & Systems


GPU MODE ▷ #beginner (6 messages):

GPU mode lecture 16, Nvidia CUB resources, Kernel Tuner on GitHub, CUDA kernel optimization project, Fused attention tutorial issues

Links mentioned:


GPU MODE ▷ #off-topic (13 messages🔥):

Cursor vs Github Copilot, Sapiens Model Conversion, Efficiency of Codebase Tools

Link mentioned: facebook/sapiens-depth-0.3b · Hugging Face: no description found


GPU MODE ▷ #self-promotion (3 messages):

DreamCoder Optimization, NVIDIA GTC Sessions

Link mentioned: NVIDIA #GTC2025 Conference Session Catalog: Experience GTC 2025 In-Person and Online March 17-21, San Jose.


GPU MODE ▷ #edge (1 messages):

E2E AutoML Model Compression, Edge Deployment Optimization, GitHub Project: sconce

Link mentioned: GitHub - satabios/sconce: E2E AutoML Model Compression Package: E2E AutoML Model Compression Package. Contribute to satabios/sconce development by creating an account on GitHub.


GPU MODE ▷ #reasoning-gym (68 messages🔥🔥):

Python RNG Stability, Kimi Multimodal Model, Sokoban Puzzle Solver, Deterministic Clue Generation, Reasoning Model Performance

Links mentioned:


Eleuther ▷ #announcements (2 messages):

Probability of Sampling Neural Networks, Interpretability of MLPs and GLUs, SVD and Adversarial Examples, Phase Transition in Training, Closed-Form Polynomial Approximations

Links mentioned:


Eleuther ▷ #general (25 messages🔥):

Mixture of Experts, Custom LLM Tool, LLM Evaluation Harness Issues, Inducing Reasoning in Post-training, NLP Novice Contributions

Link mentioned: A Visual Guide to Mixture of Experts (MoE): Demystifying the role of MoE in Large Language Models


Eleuther ▷ #research (59 messages🔥🔥):

Harmonic Loss, Self-play Theorem Prover, Polynomial Transformers, Feature Addition in Models, Physical Intelligence Open Source

Links mentioned:


Eleuther ▷ #interpretability-general (5 messages):

Tuned Lens library, Affine translator for Llama 3.2 1B, Training data for GPT-2


Eleuther ▷ #gpt-neox-dev (6 messages):

Bucket size for zero, Training models on different A100 configurations, Activation Checkpointing vs GAS, Model Performance Metrics, Optimization strategies

Link mentioned: gpt-neox/configs/hubble/Speed_Exps/1_1B_Baseline_BS_48_Both_Fusion_GQA_KV_Heads_4.yml at olmo-support · aflah02/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries - aflah02/gpt-neox


Nous Research AI ▷ #general (74 messages🔥🔥):

DeepSeek and AI advancements, Open source recommendation systems, Reinforcement Learning in AI, Mistral funding, Peano axioms in reasoning

Links mentioned:


Nous Research AI ▷ #ask-about-llms (11 messages🔥):

Git Repositories and Modular Systems, Model Performance Assessment, Retrieval-Augmented Generation (RAG), Web Scraping Policies

Links mentioned:


Nous Research AI ▷ #interesting-links (4 messages):

Society Library Mission, Political AI Agent, SWE Arena Vibe Coding

Links mentioned:


Nous Research AI ▷ #reasoning-tasks (5 messages):

Community Contributions, Project Documentation, GitHub Assistance


Stability.ai (Stable Diffusion) ▷ #general-chat (79 messages🔥🔥):

Image to Video Tools, Stable Diffusion Issues, Non-NSFW Image Requests, Model Performance Concerns, Editing Specific Characters in Images

Links mentioned:


Notebook LM Discord ▷ #use-cases (16 messages🔥):

Podcast Announcement, Using NotebookLM in Legal Practice, Customer Service Use Cases, Lip Sync Technology, Overcoming Glossophobia

Links mentioned:


Notebook LM Discord ▷ #general (38 messages🔥):

Customization of Audio Overviews, Podcast Feature Enhancements, Google Account Issues, NotebookLM in Google WorkSpace, Limits in Free Version

Links mentioned:


Latent Space ▷ #ai-general-chat (51 messages🔥):

Claude Constitutional Classifiers, FAIR internal dynamics, AI Admaker Icon, How to Scale Your Model, Pi0 Vision Language Action Model

Links mentioned:


Nomic.ai (GPT4All) ▷ #general (49 messages🔥):

MathJax for LaTeX support, DeepSeek with localdocs errors, EU AI Act implications, Discussion on the EU, AI-driven communication

Links mentioned:


Stackblitz (Bolt.new) ▷ #prompting (4 messages):

User Story Management, User Tiers Updates, Bolt's Markdown Functionality, Guidelines Consistency


Stackblitz (Bolt.new) ▷ #discussions (36 messages🔥):

Supabase vs Firebase preferences, Bolt performance issues, Data persistence in Bolt, GDPR-compliant hosting alternatives, Edge functions and API key authentication

Links mentioned:


Torchtune ▷ #general (11 messages🔥):

SFT Dataset Customization, Office Hour Details, Discord Channel for Event


Torchtune ▷ #dev (24 messages🔥):

Seed handling in DPO recipes, Debugging DataLoader issues, Dataset influence on sampling, Gradient accumulation in DPO, Ladder-residual architecture modification

Links mentioned:


Torchtune ▷ #papers (2 messages):

Data Augmentation in LLMs, R1-V Project Introduction, Verifiable Rewards, General Counting Abilities in Models

Links mentioned:


Modular (Mojo 🔥) ▷ #general (1 messages):

Community Showcase, Forum Updates

Link mentioned: Community Showcase: Community projects that use MAX and Mojo


Modular (Mojo 🔥) ▷ #mojo (35 messages🔥):

Hot Reloading in Rust, Mojo ABI Alternatives, Python's Asyncio APIs, Thread Safety in Asynchronous APIs, Memory Management in Futures

Link mentioned: GitHub - MagicStack/uvloop: Ultra fast asyncio event loop.: Ultra fast asyncio event loop. Contribute to MagicStack/uvloop development by creating an account on GitHub.


LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

Lecture 2 with Jason Weston, Self-Improvement in LLMs, MOOC Curriculum Updates

Link mentioned: CS 194/294-280 (Advanced LLM Agents) - Lecture 2, Jason Weston: no description found


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (28 messages🔥):

Course Enrollment Confirmation, Hackathon Results Update, Certificate Release Inquiry, Quiz Deadlines, Research Project Participation


LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (2 messages):

Attendance form for Berkeley students


MCP (Glama) ▷ #general (21 messages🔥):

Legacy ERP Integration with VBS Scripts, Running MCP Server in Cursor, Enterprise MCP Protocol Progress, CORS Issues with Localhost on Windows, Using ngrok for Server Access


Cohere ▷ #discussions (7 messages):

Command-R+ model, Cohere support for Canadian AI, Bug with open-source copilot tool, Financial Semantic Search Webinar

Link mentioned: Cohere AI - 400 Bad Request: provided raw prompt is invalid · Issue #3881 · continuedev/continue: Before submitting your bug report I believe this is a bug. I'll try to join the Continue Discord for questions I'm not able to find an open issue that reports the same bug I've seen the tr...


Cohere ▷ #cmd-r-bot (5 messages):

Cmd R Bot Interaction


Cohere ▷ #projects (1 messages):

Tech Content Consumption Preferences, Survey on Tech Enthusiast Engagement

Link mentioned: User Survey: We’re two recent graduates working on a personal project to understand how people prefer to consume tech content online. Your insights will help us create better, more engaging experiences for tech en...


Cohere ▷ #cohere-toolkit (1 messages):

arctic_angel: ^^


DSPy ▷ #general (12 messages🔥):

dspy.py file issue, Image pipeline error in dspy2.6.2, Assertions availability in the latest version, LLM observability in Databricks


LAION ▷ #general (8 messages🔥):

OpenEuroLLM, EU AI Regulation, Meme Coins, Community Involvement in AI, AI Language Models

Links mentioned:


LlamaIndex ▷ #blog (1 messages):

DocumentContextExtractor, Contextual Retrieval, RAG accuracy improvements


LlamaIndex ▷ #general (4 messages):

Implementing Timeout in LlamaIndex, User Interfaces with LlamaIndex

Link mentioned: llama_index/llama-index-integrations/llms/llama-index-llms-azure-inference/llama_index/llms/azure_inference/base.py at 7391f302e18542c68b9cf5025afb510af4a52324 · run-llama/llama_index: LlamaIndex is the leading framework for building LLM-powered agents over your data. - run-llama/llama_index


tinygrad (George Hotz) ▷ #general (4 messages):

Tinybox shipping, Service alternatives for shipping


MLOps @Chipro ▷ #events (1 messages):

Hosted Iceberg challenges, Panel discussion on Iceberg, Role-Based Access Control (RBAC), Expert solutions for data teams, Open-source table formats

Link mentioned: ​​Pain in the Ice: What's Going Wrong with My Hosted Iceberg?!, Thu, Feb 6, 2025, 9:00 AM | Meetup: AboutIceberg, which has recently emerged as a leading open-source table format, has received widespread acclaim across the data engineering space. It’s no surprise th


MLOps @Chipro ▷ #general-ml (2 messages):

LLMs vs. Traditional ML, TF-IDF + Logistic Regression Success


OpenInterpreter ▷ #general (3 messages):

Open Interpreter Development Status, Missing Documentation for 1.0, Implementing DeepSeek r1


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Cloudflare joins OpenRouter, Gemma 7B-IT release, Llama models availability

Links mentioned:


OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Model Error Display






{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}