Frozen AI News archive

DeepSeek's Open Source Stack

**DeepSeek's Open Source Week** was summarized by PySpur, highlighting multiple interesting releases. The **Qwen QwQ-32B model** was fine-tuned into **START**, excelling in PhD-level science QA and math benchmarks. **Character-3**, an omnimodal AI video generation model by Hedra Labs and Together AI, enables realistic animated content creation. **Google DeepMind** introduced the **Gemini embedding model** with an 8k context window, ranking #1 on MMTEB, alongside the **Gemini 2.0 Code Executor** supporting Python libraries and auto-fix features. **Inception Labs' Mercury Coder** is a diffusion-based code generation model offering faster token processing. **OpenAI** released **GPT-4.5**, their largest model yet but with less reasoning ability than some competitors. **AI21 Labs** launched **Jamba Mini 1.6**, noted for superior output speed compared to Gemini 2.0 Flash, GPT-4o mini, and Mistral Small 3. A new dataset of 1.9M scanned pages was released for OCR benchmarking, with **Mistral OCR** showing competitive but not top-tier document parsing performance compared to LLM/LVM-powered methods. *"Cracked engineers are all you need."*

Canonical issue URL

AI News for 3/7/2025-3/8/2025. We checked 7 subreddits, 433 Twitters and 28 Discords (224 channels, and 4696 messages) for you. Estimated reading time saved (at 200wpm): 406 minutes. You can now tag @smol_ai for AINews discussions!

We didn't quite know how to cover DeepSeek's "Open Source Week" from 2 weeks ago, since each release was individually interesting but not quite hitting the bar of generally useful and we try to cover "the top news of the day". But the kind folks at PySpur have done us the favor of collating all the releases and summarizing them:

image.png

It even comes with little flash quizzes to test your understanding and retention!!

image.png

We think collectively this is worth some internalization.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

Models & Releases

Tools & Applications

Research & Datasets

Industry & Business

Opinions & Discussions

Humor & Memes


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. FT: Llama 4 w/ Voice Expected Soon, Enhancing Voice AI

Theme 2. QwQ-32B Performance Settings and Improvements

Theme 3. QwQ vs. qwen 2.5 Coder Instruct: Battle of 32B

Theme 4. Meta's Latent Tokens: Pushing AI Reasoning Forward

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

error in pipeline that we are debugging... sorry


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. IDE Showdown: Cursor, Windsurf, and the Code Editor Arena

Theme 2. Model Benchmarks and Optimization Breakthroughs

Theme 3. Diffusion Models Disrupt Language Generation

Theme 4. MCP and Agent Security Threats Loom Large

Theme 5. Hardware Hustle: 9070XT vs 7900XTX and Native FP4 Support


PART 1: High level Discord summaries

Cursor IDE Discord


Unsloth AI (Daniel Han) Discord


Nomic.ai (GPT4All) Discord


Codeium (Windsurf) Discord


Perplexity AI Discord


LM Studio Discord


HuggingFace Discord


OpenRouter (Alex Atallah) Discord


Latent Space Discord


OpenAI Discord


aider (Paul Gauthier) Discord


Yannick Kilcher Discord


MCP (Glama) Discord


Modular (Mojo 🔥) Discord


GPU MODE Discord


Eleuther Discord


LlamaIndex Discord


Cohere Discord


Notebook LM Discord


DSPy Discord


AI21 Labs (Jamba) Discord


tinygrad (George Hotz) Discord


Torchtune Discord


LLM Agents (Berkeley MOOC) Discord


Gorilla LLM (Berkeley Function Calling) Discord


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Cursor IDE ▷ #general (1144 messages🔥🔥🔥):

Cursor vs Lmarena, Cursor 0.47, Claude 3.7, Grok struggles, vibe coding

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (254 messages🔥🔥):

GPU memory management for training large models, ktransformers IQ1 benchmarks, QwQ-32B optimizations and best practices, GRPO algorithm optimizations

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (82 messages🔥🔥):

RLHF with Unsloth GRPO on Qwen7b, Qualitative vs Quantitative Improvement, Reward Model Bias, KL Divergence Issues, Qwen for Sudoku


Unsloth AI (Daniel Han) ▷ #help (114 messages🔥🔥):

RAM Configuration for Mac Studio, ktransformers Performance, RoPE Scaling, Custom Datasets, Multi-GPU Parallelism with Unsloth

Link mentioned: unslothai/unsloth: Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥 - unslothai/unsloth


Unsloth AI (Daniel Han) ▷ #research (6 messages):

Diffusion Effect, Rust Code, Deepseek Coder v2, Unsloth and MoE


Nomic.ai (GPT4All) ▷ #general (264 messages🔥🔥):

Registry Editing Risks, Quantization impact on RAM and VRAM, File path limitations on windows, Trustless authentication, Diffusion-based language models

Links mentioned:


Codeium (Windsurf) ▷ #discussion (7 messages):

IDE Telemetry Settings, Codeium Website Payment Updates


Codeium (Windsurf) ▷ #windsurf (238 messages🔥🔥):

Windsurf stability issues, Credit consumption, Cascade problems, Model performance comparison (Cursor vs. Windsurf), MCP server issues

Links mentioned:


Perplexity AI ▷ #general (184 messages🔥🔥):

Perplexity Pro Account Issues, GPT-4.5 Usage, Commercial Use of Perplexity and Copyright, Sonnet 3.7 Extended Performance, Perplexity Mobile App and Claude

Links mentioned:


Perplexity AI ▷ #sharing (5 messages):

Apple Foldable iPhone, OpenAI AI Agent, Amazon Prime AI Dubbing, DuckDuckGo AI Search


LM Studio ▷ #announcements (1 messages):

LM Studio 0.3.12, QwQ template bug fixes, RAG chunking speed improvement

Link mentioned: LM Studio 0.3.12: Bug fixes and document chunking speed improvements for RAG


LM Studio ▷ #general (104 messages🔥🔥):

Open Source LLM for Coding on M2 Macbook Pro, DeepSeek v2.5 1210, Qwen Coder, Finetuning Large Language Models, Context Length and Memory Management

Links mentioned:


LM Studio ▷ #hardware-discussion (80 messages🔥🔥):

9070XT vs 7900XTX, ROCm and Vulkan performance, Native FP4 support, CodeGPT extension issues on WSL, Quantization impact on model quality


HuggingFace ▷ #general (118 messages🔥🔥):

Open Source Alternatives to Replit/Bolt, Gradio Dexie Wrapper Proposal, Obsidian user is from Obsidian, Suspecting Dataset Misuse in Research Papers, Hugging Face Datasets and DOI Generation

Links mentioned:


HuggingFace ▷ #cool-finds (3 messages):

HF Docker Repository, fxtwitter

Link mentioned: OpenStreetMap AI Helper - a Hugging Face Space by mozilla-ai: no description found


HuggingFace ▷ #i-made-this (1 messages):

Downloads, Community Appreciation


HuggingFace ▷ #computer-vision (1 messages):

OCR-2.0 Guidance


HuggingFace ▷ #smol-course (5 messages):

Smol Agents Course, Pokemon LLM Agent Benchmark, HuggingFace Token issues

Links mentioned:


HuggingFace ▷ #agents-course (37 messages🔥):

Course Start Dates, LLM as Agent Component, RAG as Environment, Course Completion Status, Image Generation Troubles


OpenRouter (Alex Atallah) ▷ #general (144 messages🔥🔥):

Perplexity API copyright issues, OpenRouter latency with Anthropic API, Groq provider in OpenRouter, Gemini embedding model, Testing reasoning parameter in OpenRouter models

Links mentioned:


Latent Space ▷ #ai-general-chat (8 messages🔥):

Minion.ai, Gemini Embedding Model, Claude code vs cursor.sh vs VSCode+Cline

Links mentioned:


Latent Space ▷ #ai-in-action-club (132 messages🔥🔥):

Web3 Agents, ElizaOS framework, AI Personas, Agent-as-a-Service, CryptoKitties

Links mentioned:


OpenAI ▷ #ai-discussions (72 messages🔥🔥):

ChatGPT token limits, Share GPS with AI, Local LLMs, AI copilots for skilled trades, Temporary chat box

Link mentioned: AI Copilot Technical Manuals: no description found


OpenAI ▷ #gpt-4-discussions (32 messages🔥):

Manus AI Agent, OpenAI Plus O1 Limits, SimTheory O1 Message Cap, ChatGPT Memory and Folders


OpenAI ▷ #prompt-engineering (3 messages):

Model Following Request Patterns, Steerability Implications, Pre-Project Evaluation


OpenAI ▷ #api-discussions (3 messages):

Model's Presumptions, Steerability Impact, Pre-Project Evaluation, Method Optimization


aider (Paul Gauthier) ▷ #general (65 messages🔥🔥):

Aider showing reasoning, Jamba model release, AI-written code, Copilot account suspension, Claude token consumption

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (38 messages🔥):

API Key for Aider, MCP Agents Integration, Playwright Certificate Errors, QwQ-32B Local Model Benchmark, Aider Scripting and Web Content

Links mentioned:


Yannick Kilcher ▷ #general (61 messages🔥🔥):

LinkedIn premium referral codes, Entropy as a Penalty, DeepSeek Ban, Discrete Diffusion Modeling


Yannick Kilcher ▷ #paper-discussion (10 messages🔥):

Latent Reasoning, Chain-of-Thought Data, Context Compression, VQ-VAE

Link mentioned: Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning: Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data, where the step-by-step thought process is explicitly outlined by text tokens. However, this res...


Yannick Kilcher ▷ #ml-news (12 messages🔥):

Diffusion Models Hallucinations, Multi-step Agentic Workflows, LLADA Limitations, OpenAI's AGI shift, Chinese AI Agent Manus

Links mentioned:


MCP (Glama) ▷ #general (73 messages🔥🔥):

MCP security concerns, MCP adoption in commercial products, Malicious prompt injections, MCP and GitHub Copilot, Open Source vs Closed Source MCPs

Link mentioned: For Client Developers - Model Context Protocol: no description found


MCP (Glama) ▷ #showcase (9 messages🔥):

Mastra Agent, Searxng MCP Server, Typescript port of the python fetch server

Links mentioned:


Modular (Mojo 🔥) ▷ #general (68 messages🔥🔥):

Mojo's Dynamism, Python Interop, Monkey Patching Alternatives, Protocol Polymorphism

Links mentioned:


GPU MODE ▷ #general (2 messages):

SOTA agentic methods, Arxiv papers, algorithm complexity, state machines, framework abstractions


GPU MODE ▷ #triton (5 messages):

Triton Autotune use_cuda_graph argument, Triton Kernel SVD Quant Performance, Nunchaku SVD Quant Implementation

Links mentioned:


GPU MODE ▷ #cuda (1 messages):

PTX, CUDA C++


GPU MODE ▷ #torch (6 messages):

Distributed barrier, cuda synchronize, register_comm_hook, FSDP communication hook

Link mentioned: Added communication hook for sharded cases by aovladi · Pull Request #83254 · pytorch/pytorch: Fixes #79114An implementation of a FSDP communication hook interface for a sharded strategies:Added reduce_scatter_hook to default hooks. Note the difference of reduce_scatter from all_reduce, i...


GPU MODE ▷ #algorithms (2 messages):

NCCL AllReduce, Double Binary Trees, Ring Topology, Communication Latency

Link mentioned: Massively Scale Your Deep Learning Training with NCCL 2.4 | NVIDIA Technical Blog: Imagine using tens of thousands of GPUs to train your neural network. Using multiple GPUs to train neural networks has become quite common with all deep learning frameworks, providing optimized…


GPU MODE ▷ #cool-links (7 messages):

WoolyAI, CUDA abstraction layer, GPU resource utilization, PyTorch support

Link mentioned: Introduction | WoolyAI Documentation: What is Wooly?


GPU MODE ▷ #beginner (5 messages):

GPU Memory Buffers on Apple, cuda_graph in Triton Autotune, Resources for GPU/TPU Programming


GPU MODE ▷ #rocm (3 messages):

AMD GPU Rental, Compile HIP code, Runpod MI300 Access


GPU MODE ▷ #tilelang (1 messages):

Kernel Compilation, Matrix Shapes, TileLang


GPU MODE ▷ #self-promotion (7 messages):

Cute Kernels for Training, Triton vs CUDA, Custom Autotune Implementation, LLVM Compiler Efficiency

Links mentioned:


GPU MODE ▷ #thunderkittens (1 messages):

LCF concurrency, DDP+nccl, Deadlocks


GPU MODE ▷ #reasoning-gym (15 messages🔥):

Curriculum Creation, Reasoning Gym, Sonnet Context Experiment, Reasoning GAN Self-Play, LLMs Speed Up Developers

Link mentioned: Experiment: how much do LLMs speed up developers: METR is seeking software engineers who regularly work on large open-source projects to test the effectiveness of AI software engineering tools. Apply here (bit.ly/ai-speedup-apply) Questions? Contact...


GPU MODE ▷ #ppc (2 messages):

AVX-256 performance on 3a, Hybrid AVX-256/AVX-512 approach, Tiling and OpenMP


Eleuther ▷ #general (6 messages):

Open Source AI Projects, GPT-NeoX, Tooling Setup for Claude Code

Link mentioned: GitHub - KellerJordan/modded-nanogpt: NanoGPT (124M) in 3 minutes: NanoGPT (124M) in 3 minutes. Contribute to KellerJordan/modded-nanogpt development by creating an account on GitHub.


Eleuther ▷ #research (16 messages🔥):

Token Assorted Paper, TorchTitan Embedding Sharding, Embedding Layer Implementation

Link mentioned: Why use RowwiseParallel for nn.Embedding instead of ColwiseParallel? · Issue #785 · pytorch/torchtitan: Colwise makes the logic a bit more clear. Rowwise splits on the token dimension, leading to confusion on how the different shards handle tokens that are not present within their shard. From a bit o...


Eleuther ▷ #interpretability-general (3 messages):

Logit Lens, Multilingual language models, Llama-2 family

Link mentioned: Do Llamas Work in English? On the Latent Language of Multilingual Transformers: We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language -- a question of key importance for understanding how language mo...


LlamaIndex ▷ #blog (2 messages):

Knowledge Graph Visualization, Anthropic Cookbook Updates


LlamaIndex ▷ #general (22 messages🔥):

SQLTableRetrieverQueryEngine, Jina AI package issues, LlamaExtract beta request, Tool Calling with Reasoning Models

Links mentioned:


Cohere ▷ #「💬」general (15 messages🔥):

Command R7B inference time, Tool invocation with r7b model, Open source AI contributions


Cohere ▷ #「🔌」api-discussions (2 messages):

504 Gateway Error, Server Error


Cohere ▷ #「💡」projects (1 messages):

Knowledge Graphs, TogetherAI LLM, Topic Modelling


Notebook LM ▷ #use-cases (11 messages🔥):

Wondercraft AI Podcast, NotebookLM and Wondershare Integration, Drive Encryption, Podcast Audio Language

Link mentioned: Insane AI Podcast Results - Edit NotebookLM on Wondercraft: 🔥 LIMITED TIME: 50% OFF Wondercraft!Use this link and coupon code "MRC" https://mrc.fm/wondercraftIn this video, I walk you through a simple process to crea...


Notebook LM ▷ #general (3 messages):

NotebookLM, Chrome extensions, Web importers, YouTube URLs

Link mentioned: Chrome Web Store: Add new features to your browser and personalize your browsing experience.


DSPy ▷ #general (13 messages🔥):

DSPy batch function, vllm backend with 2 instances, LM subclass in vllm, pipeline parallel size in vllm


AI21 Labs (Jamba) ▷ #general-chat (6 messages):

string replacement, laptop break


tinygrad (George Hotz) ▷ #general (1 messages):

china_xi: is it normal for tinygrad jit spend more than 30 min on a 2 layer model?


tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):

china_xi: what might be the cause of all loss being nan except the first one (step 0) ?


Torchtune ▷ #dev (2 messages):

Audio modality in torchtune


LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (1 messages):

claire_csy: Can resend the link? It expired, thank you!


Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

Diffusion LLMs, LLaDA Model, Transformer vs Diffusion

Link mentioned: Diffusion LLMs - Revolutionary Language Model Architecture | LLaDA Research Hub: Discover how Diffusion LLMs are revolutionizing AI with parallel processing and advanced error correction. Learn about LLaDA architecture and stay updated with cutting-edge research.


{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}