Frozen AI News archive

Every 7 Months: The Moore''s Law for Agent Autonomy

**METR** published a paper measuring AI agent autonomy progress, showing it has doubled every 7 months since **2019 (GPT-2)**. They introduced a new metric, the **50%-task-completion time horizon**, where models like **Claude 3.7 Sonnet** achieve 50% success in about 50 minutes. Projections estimate **1 day autonomy by 2028** and **1 month autonomy by late 2029**. Meanwhile, **Nvidia** released **Cosmos-Transfer1** for conditional world generation and **GR00T-N1-2B**, an open foundation model for humanoid robot reasoning with 2B parameters. **Canopy Labs** introduced **Orpheus 3B**, a high-quality text-to-speech model with zero-shot voice cloning and low latency. **Meta** reportedly delayed **Llama-4** release due to performance issues. **Microsoft** launched **Phi-4-multimodal**.

Canonical issue URL

AI News for 3/18/2025-3/19/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (227 channels, and 4117 messages) for you. Estimated reading time saved (at 200wpm): 426 minutes. You can now tag @smol_ai for AINews discussions!

Llama 4 rumors and $600 o1 pro API aside, we rarely get to feature a paper as a title story on AINews, so we are really happy when it happens. METR has long been known for doing quality analysis around AI progress, and in Measuring AI Ability to Complete Long Tasks they have an answer to a valuable question that has so far been extremely difficult to answer: agent autonomy is increasing, but how quickly?

Since 2019 (GPT2), it has doubled every 7 months.

image.png

Obviously agents can take a range of time to complete tasks, which has made this question difficult to answer, therefore the methodology is worthwhile as well:

"To quantify the capabilities of AI systems in terms of human capabilities, we propose a new metric: 50%-task-completion time horizon. This is the time humans typically take to complete tasks that AI models can complete with 50% success rate. We first timed humans with relevant domain expertise on a combination of RE-Bench, HCAST, and 66 novel shorter tasks. On these tasks, current frontier AI models such as Claude 3.7 Sonnet have a 50% time horizon of around 50 minutes."

image.png

The authors find a notable discontinuity at the 1min horizon:

image.png

and at the 80% cutoff, but the scaling laws remain robust.

At current rates, we will have:

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

AI Advancements and Model Releases

Research and Evaluation

Agent Development and Tooling

Frameworks and Libraries

Industry Partnerships and Events

Humor and Miscellaneous


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Llama4 Rumor: Launch Next Month, Multimodal, 1M Context

Theme 2. Microsoft's KBLaM and RAG Replacement Potential

Theme 3. Gemma 3 Uncensored Model Release

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. Gemini Plugins and AI Studio Usage

Theme 2. MailSnitch Uses Email Tagging for Spam Identification

Theme 3. Reverse Engineering ChatGPT: Strategies for Better Responses

Theme 4. Successfully Running Wan2.1 Locally


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. NVIDIA's Blackwell Blitzkrieg: New GPUs and Marketing Hype

Theme 2. Open Source AI Ecosystem: Tools, Datasets, and Community

Theme 3. Model Performance and Limitations: Gemini, Claude, and Open Source Alternatives

Theme 4. AI Agents and Tooling: Agents Course, MCP, and Workflow Innovations

Theme 5. Hardware and Software Challenges: Performance, Compatibility, and Costs


PART 1: High level Discord summaries

Cursor Community Discord


Unsloth AI (Daniel Han) Discord


LM Studio Discord


HuggingFace Discord


OpenAI Discord


OpenRouter (Alex Atallah) Discord


aider (Paul Gauthier) Discord


Interconnects (Nathan Lambert) Discord


Yannick Kilcher Discord


Perplexity AI Discord


Notebook LM Discord


MCP (Glama) Discord


Nous Research AI Discord


LMArena Discord


GPU MODE Discord


Modular (Mojo 🔥) Discord


Latent Space Discord


Eleuther Discord


LlamaIndex Discord


Cohere Discord


LLM Agents (Berkeley MOOC) Discord


Torchtune Discord


tinygrad (George Hotz) Discord


AI21 Labs (Jamba) Discord


DSPy Discord


MLOps @Chipro Discord


Nomic.ai (GPT4All) Discord


The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Cursor Community ▷ #general (1174 messages🔥🔥🔥):

Sonnet MAX model analysis, Cursor Plan&Build agent enhancements, Claude Max pricing and limitations, Open Empathic Project collaboration, Windsurf vs. Cursor pricing

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (419 messages🔥🔥🔥):

Vision finetuning issues with Gemma 3, Unsloth and vLLM event in SF, Evaluate model performance with Wandb, Fine-tuning base vs instruct models, Unsloth with multinode and multigpu finetuning

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (3 messages):

``


Unsloth AI (Daniel Han) ▷ #help (121 messages🔥🔥):

AMD Support for BnB and Triton, Gemma Base Model Changes, Training GPTs Agent, OpenAI's sidebars, Multi-turn conversation datasets + LLM fine-tuning

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (1 messages):

Unsloth mention


Unsloth AI (Daniel Han) ▷ #research (3 messages):

ZO2 Framework, Zeroth Order Optimization, RWKV-7 Model

Links mentioned:


LM Studio ▷ #general (115 messages🔥🔥):

Leaderboards, OpenVoice, LM Studio User Guide, Oblix Project, 4090 and pcie

Links mentioned:


LM Studio ▷ #hardware-discussion (271 messages🔥🔥):

NVIDIA Digit Pricing, 5090 bandwidth vs M4 Max, NPU vs iGPU for small models, Multi GPU Performance Issues, NVIDIA RTX PRO 6000 Blackwell

Links mentioned:


HuggingFace ▷ #general (204 messages🔥🔥):

Local AI Home Server Setup, Books for learning LLMs/Agents, Mistral Models, Age Verification App, Text Correction Tool

Links mentioned:


HuggingFace ▷ #today-im-learning (2 messages):

Stochastic Variational Inference, Inference Algorithm, Reparameterization

Link mentioned: Auto-Encoding Variational Bayes: How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We in...


HuggingFace ▷ #i-made-this (13 messages🔥):

Gemma-3 Spaces, Gemini Image Editing, Oblix AI Orchestration, Road Rash Remake, Age Verification App

Links mentioned:


HuggingFace ▷ #reading-group (2 messages):

Women in AI & Robotics, Language Models Debate, AI Reading Group Sessions


HuggingFace ▷ #NLP (1 messages):

OpenAssistant Dataset Release 2 (OASST2), LLM post-training

Link mentioned: OpenAssistant/oasst2 · Datasets at Hugging Face: no description found


HuggingFace ▷ #gradio-announcements (2 messages):

Gradio Sketch AI-powered code generation, Gradio Dataframe Overhaul, Multi-cell selection & copy, Column freezing & row numbers, Search & filter functions

Links mentioned:


HuggingFace ▷ #smol-course (4 messages):

Pushing Tools to Hugging Face Hub, Issues with Hugging Face Course, VS Code integration


HuggingFace ▷ #agents-course (39 messages🔥):

Ollama Integration, Unit 2.3 LangGraph Materials, First Agent Template Fails

Links mentioned:


HuggingFace ▷ #open-r1 (3 messages):

R1 Distills, Foundation Model Training with R1


OpenAI ▷ #ai-discussions (188 messages🔥🔥):

Gemini vs ChatGPT, o1 and o3-mini-high model comparison, GPT-4.5 creative writing, DeepSeek Banned

Link mentioned: ‎Gemini - Correction d'une évaluation de géométrie : Created with Gemini Advanced


OpenAI ▷ #gpt-4-discussions (4 messages):

Model Access, Error Deleting Conversations, Emoji Insertion in Code


OpenAI ▷ #prompt-engineering (18 messages🔥):

ChatGPT Personalizations, Unhelpful Assistant, GPT-4o Sandbox Testing, Mixing Helpful Unhelpfulness


OpenAI ▷ #api-discussions (18 messages🔥):

ChatGPT Personalization, Unhelpful Assistant, GPT-4o Behavior, API Cost, Addiction


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Anthropic Downtime, Claude 3.7 Sonnet Issues


OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

Claude 3.5 Sonnet, OpenRouterGo SDK, Gemini 2.0 Pro EXP 02-05

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (208 messages🔥🔥):

Gemini model RP stability, EXAONE-Deep-32B License Issues, max_completion_tokens vs max_tokens, ChatGPT-4o speed differences, Prompt Caching issues

Links mentioned:


aider (Paul Gauthier) ▷ #general (131 messages🔥🔥):

Ignoring files/dirs in repo map, Multi-line edit mode and vim-mode, Aider screen recording on model settings, Voice model for audio commentary, DeeperSearch and Grok 3

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (58 messages🔥🔥):

Aider v0.77.1, Termux installation issues, Local LLM PDF handling, Model cost-value ratio, Multimodal LLMs with Aider

Links mentioned:


aider (Paul Gauthier) ▷ #links (2 messages):

Gemini Collaboration, Gemini Canvas, ChatWithJFKFiles

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (2 messages):

T1 model, Alphabet not taken

Link mentioned: Tweet from Hunyuan (@TXhunyuan): Please set aside your valuable time. Let's step into T1 together.


Interconnects (Nathan Lambert) ▷ #ml-questions (2 messages):

Multi-turn Fine Tuning, SFT Codebase Masking


Interconnects (Nathan Lambert) ▷ #random (96 messages🔥🔥):

AI Review AI vs Human, Intology Paper Ban, Reasoning Model Temperature, NVIDIA Home Droid, NVIDIA DGX Spark and Station

Links mentioned:


Interconnects (Nathan Lambert) ▷ #memes (26 messages🔥):

VTA strike, Semianalysis platform change, AI2 funding model, NVIDIA marketing tactics, Blackwell GPU DeepSeek-R1 inference

Link mentioned: Tweet from Lucas Nestler (@_clashluke): "H200 performance [measured on H100 node]""1.67x speedup of B200 vs H200* [after going from fp8 to fp4]"*"H100"https://x.com/NVIDIAAIDev/status/1902068372608852304Quoting NVIDI...


Interconnects (Nathan Lambert) ▷ #rl (1 messages):

natolambert: another one https://arxiv.org/abs/2503.14286 to consider, havent read yet


Interconnects (Nathan Lambert) ▷ #reads (12 messages🔥):

RWKV Evaluation, RNN Infinite Context, xLSTM vs. Llama, Automated Theorem-Proving, RLVR Dataset

Links mentioned:


Interconnects (Nathan Lambert) ▷ #posts (24 messages🔥):

Substack A/B testing, Post Training Effort, High effort vs viral posts


Interconnects (Nathan Lambert) ▷ #policy (4 messages):

California AB-412 Bill, AI Startups, Miles Brundage new role, AI2 Recommendation to OSTP, Open Source AI

Links mentioned:


Yannick Kilcher ▷ #general (121 messages🔥🔥):

OpenAI triple threat, AI addiction and its implications, Practical AI Development exercises, Smart Glasses and Data Harvesting, Simulated vs. Real World Data for AI Training

Links mentioned:


Yannick Kilcher ▷ #paper-discussion (15 messages🔥):

Karatsuba matrix multiplication, Predictive coding, G-retriever presentation, Daily paper discussion scheduling

Links mentioned:


Yannick Kilcher ▷ #ml-news (14 messages🔥):

AI Copyright, OpenAI Revolving Door, Nvidia GTC Fraud, Deepseek, Llama 4

Links mentioned:


Perplexity AI ▷ #general (103 messages🔥🔥):

Claude 3.5 vs Perplexity, Perplexity Incognito Mode, AI Data Retention, O1 hype, R1 Useless

Links mentioned:


Perplexity AI ▷ #sharing (8 messages🔥):

Perplexity AI, Quantum Leap, Copilot, Language emergence, Electronic Warfare


Perplexity AI ▷ #pplx-api (3 messages):

Perplexity API, Rapid Web Searches, API Usage Troubleshooting


Notebook LM ▷ #use-cases (7 messages):

GDocs Mangling Layouts, Image-Only PDF Grounding, Customized Functionality in NotebookLM, Profanity in Podcast Casual Mode, Crawling Links within the Same Domain


Notebook LM ▷ #general (93 messages🔥🔥):

Line Breaks in NotebookLM, Audio Overviews Script, Gemini 2.0, Mind Map Feature, Source Limit

Links mentioned:


MCP (Glama) ▷ #general (96 messages🔥🔥):

Smithery registry, Glama API, Open-webui integration, Spring app with spring-ai-mcp-core, Claude Code MCP

Links mentioned:


MCP (Glama) ▷ #showcase (1 messages):

Duckduckgo MCP, Cursor on Windows, Python framework

Link mentioned: GitHub - ericaxelrod-1/model-context-protocol: Model Context Protocols for Cursor: Model Context Protocols for Cursor. Contribute to ericaxelrod-1/model-context-protocol development by creating an account on GitHub.


Nous Research AI ▷ #general (85 messages🔥🔥):

Phi-4 Model, Claude's Response, Vibe Coding, Nvidia Open Sources Coding Dataset, Small Scale LLM Experiments

Links mentioned:


LMArena ▷ #general (75 messages🔥🔥):

Lmarena testers, Perplexity vs OpenAI/Google, Gemini Deep Research vs GPT, LeCun on AGI, Grok 3 deepsearch

Links mentioned:


GPU MODE ▷ #general (9 messages🔥):

Digit Comparison, Memory Bandwidth, Gemma 3 on Macbook


GPU MODE ▷ #cuda (4 messages):

Blackwell ULTRA's attention instruction, Shared Memory Carveout


GPU MODE ▷ #torch (7 messages):

torch.distributed.tensor.parallel.style.ColwiseParallel, Autograd hook guarantee, Building PyTorch from source for RTX 5080

Link mentioned: pytorch/torch/_tensor.py at v2.6.0 · pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch


GPU MODE ▷ #algorithms (20 messages🔥):

matmul output fusions, nvfuser stalls, Tensor Cores vs CUDA Cores, cooperative or ping-pong warp specialization, fusing activation in GEMM

Link mentioned: rdspring1 - Overview: I contribute to PyTorch, Lightning-AI Thunder, and Nvidia/Fuser. - rdspring1


GPU MODE ▷ #cool-links (1 messages):

mobicham: https://www.youtube.com/watch?v=1bRmskFCnqY


GPU MODE ▷ #beginner (3 messages):

FSDP1, FSDP2, accelerate, trl

Link mentioned: WIP: Initial FSDP2 support by S1ro1 · Pull Request #3394 · huggingface/accelerate: Draft PR, feel free to discuss changes to the user-facing api.Fixes # (issue)Before submitting This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the ca...


GPU MODE ▷ #irl-meetup (1 messages):

IRL Meetup, GTC Meetup, Saturday Evening Event


GPU MODE ▷ #liger-kernel (2 messages):

Liger Kernel, Fused Linear Cross Entropy


GPU MODE ▷ #self-promotion (1 messages):

viking0nfire: We've released Triton support on https://LeetGPU.com/challenges 🚀

Check it out!


GPU MODE ▷ #🍿 (3 messages):

Automatic Kernel Optimization, Single GPU context, Distributed GEMM


GPU MODE ▷ #thunderkittens (1 messages):

ThunderKittens, Kernels, Batch Compilation, GPU Programming

Link mentioned: ThunderKittens/kernels/example_bind/example_bind.cu at main · HazyResearch/ThunderKittens: Tile primitives for speedy kernels. Contribute to HazyResearch/ThunderKittens development by creating an account on GitHub.


GPU MODE ▷ #reasoning-gym (4 messages):

DAPO Algorithm, RL Training, PPO, GRPO

Link mentioned: Tweet from Haibin (@eric_haibin_lin): @qiying_yu and team just dropped the DAPO algorithm (decoupled clip and dynamic sampling policy optimization)! DAPO-Zero-32B, a fully open-source RL reasoning model, surpasses DeepSeek-R1-Zero-Qwen-32...


GPU MODE ▷ #submissions (11 messages🔥):

Modal Runner Successes, conv2d Leaderboard, vectoradd Leaderboard, vectorsum Leaderboard, grayscale Leaderboard


GPU MODE ▷ #ppc (1 messages):

OpenMP Performance, printf Performance Impacts, std::cout Side Effects


Modular (Mojo 🔥) ▷ #general (11 messages🔥):

LeetGPU challenges, GTC talks, Nvidia Blackwell Ultra, Nvidia Ruben, Silicon Photonics

Link mentioned: LeetGPU: no description found


Modular (Mojo 🔥) ▷ #mojo (28 messages🔥):

CompactDict advantages, HashMap in stdlib, List.fill behavior, StringDict module, List index out of range

Link mentioned: GitHub - mzaks/compact-dict: A fast and compact Dict implementation in Mojo 🔥: A fast and compact Dict implementation in Mojo 🔥. Contribute to mzaks/compact-dict development by creating an account on GitHub.


Latent Space ▷ #ai-general-chat (31 messages🔥):

Patronus AI, Etsy, Nvidia GTC Keynote, Manus access trading bot, vLLM

Links mentioned:


Latent Space ▷ #ai-announcements (2 messages):

Cloudflare Agents Podcast, Evo2 Paper Discussion

Links mentioned:


Eleuther ▷ #general (2 messages):

Reconnecting with the community, Decentralized AI discussions


Eleuther ▷ #research (21 messages🔥):

Fine-tuning Gemini/OLMo, Distillation for Fine-tuning, Passkey/Fuzzy Rate Improvement, GoldFinch/GoldenGoose Hybrid Setup, Memory Expert Activation

Links mentioned:


Eleuther ▷ #interpretability-general (1 messages):

Latent Activations, Model Behavior, Sequence Processing


Eleuther ▷ #lm-thunderdome (5 messages):

lm_eval, HFLM, trust_remote_code, API keys


LlamaIndex ▷ #blog (2 messages):

LlamaIndex course on HuggingFace, Gen AI Toolbox for Databases

Link mentioned: Introduction to LlamaIndex - Hugging Face Agents Course: no description found


LlamaIndex ▷ #general (24 messages🔥):

Langchain's long-term memory vs LlamaIndex, Azure OpenAI and LlamaIndex, Resume parsing AI services, Agent tool calling

Links mentioned:


LlamaIndex ▷ #ai-discussion (1 messages):

Nebius AI Computing Platform, GPU clusters, Inference Services, AWS vs Lambda Labs vs CoreWeave


Cohere ▷ #「💬」general (4 messages):

Chase's full name, Cohere Expanse 32B knowledge cutoff


Cohere ▷ #「🔌」api-discussions (19 messages🔥):

Trial Key Usage Tracking, Websearch Connector Issues, command-r-plus-02-2024 vs command-a-03-2025

Link mentioned: Different Types of API Keys and Rate Limits — Cohere: This page describes Cohere API rate limits for production and evaluation keys.


Cohere ▷ #「💡」projects (1 messages):

MCP Server, Cohere Command A, Github Repo, Positive News

Link mentioned: GitHub - VectorInstitute/mcp-goodnews: A simple MCP application that delivers curated positive and uplifting news stories.: A simple MCP application that delivers curated positive and uplifting news stories. - VectorInstitute/mcp-goodnews


Cohere ▷ #「🤝」introductions (3 messages):

AI alignment, Open-source models, Federating RAG, Agentic apps, Python and Rust


LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

MOOC Coursework, AgentX Competition, LLM Agents Discord, AgentX Research Track

Link mentioned: Advanced Large Language Model Agents MOOC: MOOC, Spring 2025


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (23 messages🔥):

Quiz Deadlines, AgentX Research Track details, Certificate for December MOOC

Link mentioned: MOOC Curriculum: MOOC Curriculum & Certificate Instructions Thank you for joining us for our Advanced LLM Agents MOOC! We hope you've been enjoying the lectures so far! Below is a detailed description of our M...


LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (1 messages):

alexkim0889: Thank for the response <@1335446795765022794>! Yep, this helps


Torchtune ▷ #general (11 messages🔥):

FL Setup, Nvidia Delays


Torchtune ▷ #dev (4 messages):

recvVector failed, sendBytes failed, DPO recipes, cudaGetDeviceCount, NumCudaDevices


tinygrad (George Hotz) ▷ #general (8 messages🔥):

M1 Mac Training Issues, DeepSeek-R1 Home Deployment, Clang Dependency Validation, Training on CPU without Clang

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):

REDUCE_LOAD pattern clarification, Index Selection for reduce_input


AI21 Labs (Jamba) ▷ #jamba (3 messages):

AI21 Labs, keepitirie


AI21 Labs (Jamba) ▷ #general-chat (1 messages):

Welcome to New Members


DSPy ▷ #general (2 messages):

dspy.ChainOfThougt, Chain of draft technique, Reduce output tokens

Link mentioned: Implementing Chain Of Draft Prompt Technique with DSPy: Cut your output tokens by more than half


MLOps @Chipro ▷ #events (2 messages):

MLOps on AWS Workshop, AI4Legislation Seminar, Featureform, SVCAF's AI4Legislation competition

Links mentioned:


Nomic.ai (GPT4All) ▷ #general (1 messages):

GPT4All FAQ, Default Directories

Link mentioned: Frequently Asked Questions: GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use. - nomic-ai/gpt4all



{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}