Frozen AI News archive

Claude 3.7 Sonnet

**Anthropic** launched **Claude 3.7 Sonnet**, their most intelligent model to date featuring hybrid reasoning with two thinking modes: near-instant and extended step-by-step thinking. The release includes **Claude Code**, an agentic coding tool in limited preview, and supports a **128k output token capability** in beta. Claude 3.7 Sonnet performs well on coding benchmarks like **SWE-Bench Verified** and **Cognition's junior-dev eval**, and introduces advanced features such as streaming thinking, prompt caching, and tool use. The model is also benchmarked on **Pokebench**, reflecting agentic capabilities similar to the Voyager paper. The launch is accompanied by extensive documentation, cookbooks, and prompting guides for extended thinking. *"The first generally available hybrid reasoning model"* and *"first coding tool from Anthropic"* were highlighted in social media announcements.

Canonical issue URL

AI News for 2/24/2025-2/25/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (220 channels, and 5949 messages) for you. Estimated reading time saved (at 200wpm): 503 minutes. You can now tag @smol_ai for AINews discussions!

Taking a little leapfrog from the GPT5 roadmap, Claude 3.7 Sonnet launched today (don't ask about the name - note that there are TWO blogposts and documentation and cookbooks and prompting guides to read, alongside Claude Code which is in limited preview), after numerous leaks from private previews, as one model with an optional thinking mode, with an explicit token budget.

image.png

3.7 Sonnet does well on many coding benchmarks like SWE-Bench Verified and aider and Cognition's junior-dev eval, both with and without (MOSTLY uncensored!) thinking.

image.png

However the most popular new benchmark, covered in the second blogpost on extended thinking, is Pokebench which mirrors the Voyager paper as an agentic benchmark:

image.png

The feature set and documentation at launch is pretty impressive. Among the notable things likely to get buried by the headlines:


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

New Model Releases and Updates (Claude 3.7 Sonnet, Grok 3)

Research and Papers

Coding and Development Tools

AI Model Performance and Benchmarks

AI Industry and Business

Memes and Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. FlashMLA's Hopper GPU Optimization: A Game Changer

Theme 2. Claude 3.7 Sonnet Released: Exploring Hybrid AI Reasoning Model

Theme 3. Qwen Series: Advancing Open-Source AI with QwQ-Max

Theme 4. Critique on AI Benchmarking: Reliability and Misinterpretations

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. Claude Sonnet 3.7 Detailed Leak via AWS Bedrock


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Here's a unified summary of key discussion themes across the Discords, tailored for a technical engineer audience:

Theme 1. Claude 3.7 Sonnet: The Thinking Coder Arrives

Theme 2. Open Source Model Race Heats Up: Qwen and DeepSeek Battle for Reasoning Crown

Theme 3. IDE Showdown: Cursor vs. Windsurf (and Vim struggles)

Theme 4. Hardware Hacks and Kernel Deep Dives: GPU Mode Community Gets Granular

Theme 5. Community Contributions and Coursework: Learning and Building Together


PART 1: High level Discord summaries

Cursor IDE Discord


aider (Paul Gauthier) Discord


Codeium (Windsurf) Discord


OpenAI Discord


Unsloth AI (Daniel Han) Discord


OpenRouter (Alex Atallah) Discord


Interconnects (Nathan Lambert) Discord


GPU MODE Discord


Yannick Kilcher Discord


Eleuther Discord


Latent Space Discord


Nous Research AI Discord


MCP (Glama) Discord


HuggingFace Discord


LM Studio Discord


Stability.ai (Stable Diffusion) Discord


Modular (Mojo 🔥) Discord


LLM Agents (Berkeley MOOC) Discord


Notebook LM Discord


LlamaIndex Discord


Torchtune Discord


Cohere Discord


Nomic.ai (GPT4All) Discord


DSPy Discord


The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Cursor IDE ▷ #general (1056 messages🔥🔥🔥):

Claude 3.7, MCP Tools, Cursor Updates, Thinking Model, User Experiences

Links mentioned:


aider (Paul Gauthier) ▷ #general (935 messages🔥🔥🔥):

Claude 3.7 Performance, Comparisons with Other Models, Aider Features and Performance, Claude Code Evaluation, Rate Limits and API Issues

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (63 messages🔥🔥):

Architect Mode Configuration, Managing AI Models in Aider, Using OpenRouter for AI Access, Token Management in AI, Integration with Git in Aider

Links mentioned:


aider (Paul Gauthier) ▷ #links (2 messages):

Hacker News Wrapped, Kagi LLM Benchmarking Project

Links mentioned:


Codeium (Windsurf) ▷ #discussion (15 messages🔥):

Codeium Chat in Vim, Account Creation Issues, Codeium Extensions and Support, Version Update Queries, Forum Visibility for Issues


Codeium (Windsurf) ▷ #windsurf (675 messages🔥🔥🔥):

Claude 3.7 release, Windsurf vs Cursor, MCP issues, Ecommerce experiences, Laravel vs JavaScript frameworks

Links mentioned:


OpenAI ▷ #ai-discussions (611 messages🔥🔥🔥):

Claude 3.7 vs ChatGPT, Grok 3 Performance, AI in Coding, Integration of AI in Projects, AI for Document Processing

Links mentioned:


OpenAI ▷ #gpt-4-discussions (9 messages🔥):

O3 reasoning issues, Model feedback discrepancies, Screenshot posting limitations, Bug reporting process


Unsloth AI (Daniel Han) ▷ #general (345 messages🔥🔥):

Unsloth AI Challenge, DeepSeek Release, Training Configuration Issues, Hyperfitting and Resource Management, Custom Loss Function Implementation

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (1 messages):

deoxykev: New qwq https://qwenlm.github.io/blog/qwq-max-preview/


Unsloth AI (Daniel Han) ▷ #help (121 messages🔥🔥):

Unsloth on Mac, CUDA Memory Issues, Using Custom Datasets, Checkpointing and VRAM Management, Custom Loss Functions

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Claude 3.7 Sonnet, AI capabilities improvement, Pricing model for Claude, Extended Thinking feature

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (346 messages🔥🔥):

Claude 3.7 Sonnet Features, API Key Management, Chat Continuation Across Devices, Model Pricing and Performance, Image Handling in Claude 3.7

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (304 messages🔥🔥):

Claude 3.7 Sonnet Release, Qwen AI Developments, Agentic Coding Tools, Open Source Initiatives, RLHF and Code Quality

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (15 messages🔥):

Berkeley Advanced Agents MOOC, Hiring Announcements, RLHF Explanations, Stickers Discussion, AI Startups Customer Base

Links mentioned:


Interconnects (Nathan Lambert) ▷ #memes (1 messages):

Image Analysis, Discord Bot Engagement


Interconnects (Nathan Lambert) ▷ #nlp (1 messages):

0x_paws: https://x.com/srush_nlp/status/1894039989526155341?s=46&t=Y6KMaD0vAihdhw7S8bL5WQ


Interconnects (Nathan Lambert) ▷ #posts (3 messages):

New Post GIF, SnailBot

Link mentioned: New New Post GIF - New New post Post - Discover & Share GIFs: Click to view the GIF


GPU MODE ▷ #general (29 messages🔥):

SoTA gemv kernels, Using Tensor Cores, Leaderboards and Submissions, Memory Bound Operations, Data Types in Kernel Submissions

Links mentioned:


GPU MODE ▷ #torch (2 messages):

E2E example in TorchAO


GPU MODE ▷ #algorithms (1 messages):

Orthogonal matrices in Q and K, Hadamard matrices efficiency, Quantization of vectors


GPU MODE ▷ #cool-links (4 messages):

Linear Attention, DeepEP Communication Library, Out-of-doc PTX instructions

Links mentioned:


GPU MODE ▷ #jobs (3 messages):

PyTorch Partner Engineer positions, Collaboration in AI, Equal Employment Opportunity, Community and systems work

Link mentioned: Partner Engineer, PyTorch: Meta's mission is to build the future of human connection and the technology that makes it possible.


GPU MODE ▷ #beginner (3 messages):

Getting Started with GPU Programming, Learning PyTorch, Exploring Triton


GPU MODE ▷ #pmpp-book (1 messages):

Image Blurring, C++ vs CUDA Performance


GPU MODE ▷ #rocm (1 messages):

MIOpen Compilation Issues, RX 9850M XT Wavefront Size, PyTorch Memory Access Faults, Workaround for MIOpen Crashes


GPU MODE ▷ #liger-kernel (4 messages):

Liger Kernel Issue #537, Native Sparse Attention Triton Repo, Efficient Triton Implementations

Links mentioned:


GPU MODE ▷ #reasoning-gym (5 messages):

Branch Management, Auto Delete on Merge, Benchmark Task Idea, Python Dependency Issues

Link mentioned: Tweet from naklecha (@naklecha): i'm evaluating claude's ability to distil the vllm library into a simpler codebase containing only the parts it needs for llama8b inference. i think measuring an llm's codebase distillatio...


GPU MODE ▷ #general (62 messages🔥🔥):

Inline CUDA Submission Issues, Autotuning Concerns, UI Updates for Leaderboard, CUDA and Python Integration, Problem Constraints for Benchmarking

Links mentioned:


GPU MODE ▷ #submissions (62 messages🔥🔥):

Leaderboard submissions, Test submissions, Benchmark submissions, API submissions, Modal runners performance


GPU MODE ▷ #status (1 messages):

CUDA Inline, Compilation Caching, Benchmarking Efficiency


GPU MODE ▷ #ppc (6 messages):

Bend programming language, AMD mi300A and Nvidia Grace Hopper, Programming Parallel Computers course

Link mentioned: GitHub - HigherOrderCO/Bend: A massively parallel, high-level programming language: A massively parallel, high-level programming language - HigherOrderCO/Bend


GPU MODE ▷ #feature-requests-and-bugs (2 messages):

NCU Support, Leaderboard Updates, Standalone CUDA Examples, Generated PTX Retrieval Command


Yannick Kilcher ▷ #general (127 messages🔥🔥):

Grok 3 vs O1-Pro, xAI GPU count, Synthetic Data Generation, AI Hardware Solutions, Large Concept Models

Links mentioned:


Yannick Kilcher ▷ #paper-discussion (4 messages):

Native Sparse Attention, SigLIP 2 Improvements, Multilingual Retrieval Enhancements

Links mentioned:


Yannick Kilcher ▷ #ml-news (9 messages🔥):

Claude 3.7 Sonnet, QwQ-Max Preview, Linear Attention Tutorial, DeepEP Communication Library, Y Combinator Computer Vision Startup

Links mentioned:


Eleuther ▷ #general (37 messages🔥):

Brain vs. AI Parallelism, Language Processing Differences, AI Model Training Efficiency, Logit Bayesian Metacognition Dataset, Decentralized AI Model Evaluation

Link mentioned: The Proxy Structuring Engine: High Quality Structured Outputs at Inference Time


Eleuther ▷ #research (32 messages🔥):

MLA architecture, DeepSeek models, Looped models for reasoning, Fourier series and smoothness, KV cache optimization

Links mentioned:


Eleuther ▷ #interpretability-general (9 messages🔥):

Attention Maps, Neuron-based Methods, Intervention on Attention Maps, Emerging Syntax from Attention Maps


Eleuther ▷ #gpt-neox-dev (10 messages🔥):

Mixed Precision Training, Optimizer States in Mixed Precision, ZeRO Offload, BF16 Precision

Link mentioned: Megatron-LM/megatron/core/optimizer/optimizer_config.py at main · NVIDIA/Megatron-LM: Ongoing research training transformer models at scale - NVIDIA/Megatron-LM


Latent Space ▷ #ai-general-chat (79 messages🔥🔥):

Claude 3.7 Sonnet, Claude Code, Datacenter leasing concerns, Qwen enhancements, FlashMLA GitHub repository

Links mentioned:


Nous Research AI ▷ #general (68 messages🔥🔥):

Grok3 Tool Invocation, Claude 3.7 Sonnet Release, QwQ-Max-Preview Announcement, Structured Outputs Project, AI Alignment Discussion

Links mentioned:


Nous Research AI ▷ #ask-about-llms (6 messages):

Sonnet-3.7 Benchmarking, Misguided Attention Eval, Reasoning Mode Activation

Link mentioned: GitHub - cpldcpu/MisguidedAttention: A collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding information: A collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding information - cpldcpu/MisguidedAttention


Nous Research AI ▷ #interesting-links (4 messages):

Azure chat interface update, Video generation integration, Artifacts usability concerns

Link mentioned: Qwen Chat: no description found


MCP (Glama) ▷ #general (62 messages🔥🔥):

Anthropic MCP Registry API, LLM Version Updates, Haiku 3.5 Tool Support, Claude Code vs Aider Performance, MCP Server Recommendations

Links mentioned:


MCP (Glama) ▷ #showcase (11 messages🔥):

MetaMCP Licensing Changes, Enact Protocol MCP Server, Claude 3.7 Sonnet Reasoning Features

Links mentioned:


HuggingFace ▷ #general (20 messages🔥):

Gemma-9B with custom LoRA, Qwen Max Open Weight Release, Claude-3.7 model discussion, Python skills for course, Getting started in AI careers

Link mentioned: Tweet from Qwen (@Alibaba_Qwen): <think>...</think> QwQ-Max-PreviewQwen Chat: https://chat.qwen.ai/Blog: https://qwenlm.github.io/blog/qwq-max-preview/🤔 Today we release "Thinking (QwQ)" in Qwen Chat, backed by o...


HuggingFace ▷ #today-im-learning (2 messages):

Fine-tuning in agents course, Pytorch on Apple Silicon


HuggingFace ▷ #i-made-this (1 messages):

Ukrainian TTS dataset, speech-uk/opentts-mykyta

Link mentioned: Ukrainian Text-to-Speech - a speech-uk Collection: no description found


HuggingFace ▷ #computer-vision (5 messages):

Computer Vision Hangout, CV Course Experience, Marine Ecosystem Project, Basketball Team Object Detection


HuggingFace ▷ #agents-course (38 messages🔥):

AI for Business series, NBA Parlay Picks, Course Recommendations, NLP Course Enrollment, Unit 1 SFT Training


LM Studio ▷ #general (41 messages🔥):

Integration of LMS Server with WordPress, Qwen 2.5 VL Model Performance, Speculative Decoding, Deepseek R1 671B Requirements, Local vs. Hosted Model Usage

Links mentioned:


LM Studio ▷ #hardware-discussion (20 messages🔥):

A770 performance, PC build frustrations, M2 Max specifications, Comparison of Apple chips


Stability.ai (Stable Diffusion) ▷ #announcements (1 messages):

Feature Request Board, User Feedback Mechanism


Stability.ai (Stable Diffusion) ▷ #general-chat (52 messages🔥):

Image Generation Speed, SD3 Ultra Features, Dog Breed Image Datasets, Model Performance Comparisons, Resolution Settings


Modular (Mojo 🔥) ▷ #mojo (11 messages🔥):

Mojo FFI with GLFW/GLEW, Graphics Programming in Mojo, Dynamic linking and library loading, Error with lightbug_http dependency, Issue with small_time dependency version

Links mentioned:


Modular (Mojo 🔥) ▷ #max (20 messages🔥):

Hardware Accelerated Conway's Game of Life, GPU Utilization in MAX, SIMD Implementation by Daniel Lemire, Game of Life Computer Concepts, Space Patterns in Conway's Game

Link mentioned: Nicolas Loizeau - GOL computer: A new (and better) version of the GOL computer is available here : https://github.com/nicolasloizeau/scalable-gol-computer


LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

Lecture 4 with Hanna Hajishirzi, Tulu 3 advancements, Open training recipes, Reinforcement learning with verifiable rewards, Applications of language models in science

Link mentioned: CS 194/294-280 (Advanced LLM Agents) - Lecture 4, Hanna Hajishirzi: no description found


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (9 messages🔥):

Quiz Submissions for Latecomers, Research Track Application Status, Application Track for MOOC Students


LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (11 messages🔥):

Teaching Style Feedback, Research Group Proposals, Research Track Eligibility, Application Stack Inquiry, MOOC Curriculum Launch


Notebook LM ▷ #use-cases (2 messages):

Ease of Use, Instruction Prompts


Notebook LM ▷ #general (14 messages🔥):

Google Deep Research & Gemini, NotebookLM language settings, Scanning physical books, Multi-language prompts in NotebookLM, Claude 3.7 hype


LlamaIndex ▷ #blog (3 messages):

AI assistant availability, ComposioHQ updates, Claude Sonnet 3.7 release, Integration with Anthropic, Installation instructions


LlamaIndex ▷ #general (5 messages):

BM25 retriever with elastic search, MultiModalVectorStoreIndex issues


Torchtune ▷ #dev (6 messages):

Truncation Methods for Fine-tuning, PR Review of StatefulDataLoader

Link mentioned: Add support for StatefulDataLoader by joecummings · Pull Request #2410 · pytorch/torchtune: ContextWhat is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here)This PR adds support for the StatefulDataLoader class fr...


Torchtune ▷ #papers (2 messages):

DeepScaleR Model, DeepEP Communication Library

Links mentioned:


Cohere ▷ #cmd-r-bot (5 messages):

POS Validators profitability, Pool validator nodes, Asset value assessment


Nomic.ai (GPT4All) ▷ #announcements (1 messages):

GPT4All v3.10.0 Release, Remote Model Configuration, CUDA Compatibility Updates, Translation Improvements, Chat Template Enhancements


Nomic.ai (GPT4All) ▷ #general (4 messages):

Multi-Agent Framework in GPT4All, AI and Coding Understanding, Versioning Concerns, Nomic Embed Updates


DSPy ▷ #general (2 messages):

phi4 response format, migration from 2.5-style Assertions, dspy.BestOfN, dspy.Refine





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}