Frozen AI News archive

Llama 4's Controversial Weekend Release

**Meta** released **Llama 4**, featuring two new medium-size MoE open models and a promised 2 Trillion parameter "behemoth" model, aiming to be the largest open model ever. The release included advanced training techniques like Chameleon-like early fusion with MetaCLIP, interleaved chunked attention without RoPE, native FP8 training, and training on up to 40 trillion tokens. Despite the hype, the release faced criticism for lack of transparency compared to Llama 3, implementation issues, and poor performance on some benchmarks. Meta leadership, including **Ahmad Al Dahle**, denied allegations of training on test sets. The smallest Scout model at 109B parameters is too large for consumer GPUs, and the claimed 10 million token context is disputed. The community response has been mixed, with some praising the openness and others pointing out discrepancies and quality concerns.

Canonical issue URL

AI News for 4/4/2025-4/7/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (229 channels, and 18760 messages) for you. Estimated reading time saved (at 200wpm): 1662 minutes. You can now tag @smol_ai for AINews discussions!

The headlines of Llama 4 are glowing: 2 new medium-size MoE open models that score well, and a third 2 Trillion parameter "behemoth" promised that should be the largest open model ever released, restoring Meta's place at the top of the charts:

image.png

SOTA training updates are always welcome: we note the adoption of Chameleon-like early fusion with MetaCLIP, interleaved, chunked attention without RoPE (commented on by many), native FP8 training, and trained on up to 40T tokens.

While the closed model labs tend to set the frontier, Llama usually sets the bar for what open models should be. Llama 3 was released almost a year ago, and subsequent updates like Llama 3.2 were just as well received.

Usual license handwringing aside, the tone of Llama 4's reception has been remarkably different.

  1. Llama 4 was released on a Saturday, much earlier than seemingly even Meta, which changed the release date last minute from Monday, expected. Zuck's official line is simply that it was "ready".
  2. Just the blogpost, nowhere near the level of the Llama 3 paper in transparency
  3. The smallest "Scout" model is 109B params, which cannot be run on consumer grade GPUs.
  4. The claimed 10m token context is almost certainly far above what the "real" context is when trained with 256k tokens (still impressive! but not 10m!)
  5. There was a special "experimental" version used for LMarena, which caused the good score - that is not the version that was released. This discrepancy forced LMarena to respond by releasing the full dataset for evals.
  6. It does very poorly on independent benchmarks like Aider
  7. Unsubstantiated posts on Chinese social media claim company leadership pushed for training on test to meet Zuck's goals.

The last point has been categorically denied by Meta leadership: image.png

but the whiff that something is wrong with the release has undoubtedly tarnished what would otherwise be a happy day in Open AI land.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

Large Language Models (LLMs) and Model Releases

AI Applications and Tools

Company Announcements and Strategy

Economic and Geopolitical Implications of AI

AI Safety, Ethics, and Societal Impact

Humor/Memes


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. "Transforming Time Series Forecasting with Neuroplasticity"

Theme 2. "Disappointment in Meta's Llama 4 Performance"

Theme 3. "Meta's AI Struggles: Controversies and Innovations"

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. "Llama 4 Scout and Maverick Launch Insights"

Theme 2. "AI Innovations in 3D Visualization and Image Generation"

Theme 3. "Evaluating AI Models with Long Context Windows"


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Exp

Theme 1: Llama 4's Context Window: Hype or Reality?

Theme 2: Open Models Make Moves: Qwen 2.5 and DeepSeek V3 Shine

Theme 3: Tool Calling Takes Center Stage: MCP and Aider

Theme 4: Code Editing Workflows: Gemini 2.5 Pro, Cursor, and Aider Compete

Theme 5: Quantization and Performance: Tinygrad, Gemma 3, and CUDA


PART 1: High level Discord summaries

LMArena Discord


Unsloth AI (Daniel Han) Discord


Manus.im Discord Discord


OpenRouter (Alex Atallah) Discord


aider (Paul Gauthier) Discord


Cursor Community Discord


Perplexity AI Discord


OpenAI Discord


LM Studio Discord


Latent Space Discord


Nous Research AI Discord


MCP (Glama) Discord


Eleuther Discord


HuggingFace Discord


Yannick Kilcher Discord


GPU MODE Discord


Notebook LM Discord


Modular (Mojo 🔥) Discord


Nomic.ai (GPT4All) Discord


LlamaIndex Discord


tinygrad (George Hotz) Discord


Cohere Discord


Torchtune Discord


LLM Agents (Berkeley MOOC) Discord


DSPy Discord


Gorilla LLM (Berkeley Function Calling) Discord


MLOps @Chipro Discord


The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

LMArena ▷ #general (1150 messages🔥🔥🔥):

Making ai sound human, Riveroaks eval, NightWhisper model, GPT-4.5 vs quasar

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (1294 messages🔥🔥🔥):

Qwen 2.5, FSDP isn't working, multi-GPU, Llama 4

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (11 messages🔥):

ChatGPT DDoS program, LLM Guideline Triggers, Dataset Substitution


Unsloth AI (Daniel Han) ▷ #help (770 messages🔥🔥🔥):

Lora merging script usage, Dataset sample size, Quantization, Inference speed

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (9 messages🔥):

Naming Conventions for Unsloth Models, Dynamic vs Unconditional Base Name (BNB)


Unsloth AI (Daniel Han) ▷ #research (37 messages🔥):

SFT finetuning Qwen2.5, Reward Modeling, eMOE viability, Llama 4 Models, LLMs and Knowledge Storage

Links mentioned:


Manus.im Discord ▷ #general (777 messages🔥🔥🔥):

Manus Credit System, Llama 4 and Meta, AI Image Generation, Website building AIs

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (82 messages🔥🔥):

Fallback Logic Removal, Quasar Alpha Model, Llama 4 Scout & Maverick Models, Rate Limits Update

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (755 messages🔥🔥🔥):

Llama 4 models, DeepSeek models, Gemini 2.5 Pro, OpenRouter Features, AI Image Generation

Links mentioned:


aider (Paul Gauthier) ▷ #general (932 messages🔥🔥🔥):

Gemini 2.5, Llama 4, Grok 3, MCP Tools, Nvidia NIM

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (58 messages🔥🔥):

Internal Libraries, Batch Editing, i18n Implementation, Shell Scripting, MCP Servers

Links mentioned:


Cursor Community ▷ #general (1056 messages🔥🔥🔥):

Sonnet Max Pricing, MCP Server Setup, Llama 4 Models, Agent Mode Issues

Links mentioned:


Perplexity AI ▷ #announcements (3 messages):

Comet Browser, Server Updates


Perplexity AI ▷ #general (941 messages🔥🔥🔥):

Focus Mode Removed, Comet Browser, Gemini 2.5 Pro API Availability, Llama 4, Deep Research Nerfed

Links mentioned:


Perplexity AI ▷ #sharing (18 messages🔥):

Gemini 2.5 Pro, Meta Llama, US Tariffs, Perplexity AI Support, AI in Cars


Perplexity AI ▷ #pplx-api (53 messages🔥):

Sonar API, Perplexity API support in ComfyUI, API Parameter Tier Restrictions, Sonar Deep Research Improvements, API Cookbook Revamp

Links mentioned:


OpenAI ▷ #ai-discussions (501 messages🔥🔥🔥):

Copilot 4o image maker, Free vs Paid ChatGpt version, renaissance style images, Mistral struggles, Model Merging

Links mentioned:


OpenAI ▷ #gpt-4-discussions (12 messages🔥):

Custom GPT 'Content failed to load' Error, Automod flagged 'Monday' message, Loving Monday's Personality


OpenAI ▷ #prompt-engineering (167 messages🔥🔥):

Moderation endpoint, Policy References, Universal Policies, AI as a critical part of society, Prompt engineering


OpenAI ▷ #api-discussions (167 messages🔥🔥):

Moderation Endpoint, Universal Policies, Creative TTRPG World Building, Prompt Engineering


LM Studio ▷ #general (511 messages🔥🔥🔥):

ComfyUI integration, LM Studio Terminal, REST API Load/Unload Models, Llama 4 analysis, Gemma 3 capabilities

Links mentioned:


LM Studio ▷ #hardware-discussion (132 messages🔥🔥):

Reka Flash 21B, Gemma 3 27B, Model Performance on M1 Ultra vs M4 Max, Nvidia DGX base cost increase, Ryzen AI Max+ 395 mini PCs

Links mentioned:


Latent Space ▷ #ai-general-chat (199 messages🔥🔥):

Tenstorrent Dev Day, Llama 4 launch, LLM Non-Determinism, MCP security, AI powered phishing

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):

Claude Plays Pokemon Hackathon


Latent Space ▷ #ai-in-action-club (255 messages🔥🔥):

LLM Codegen Workflow, AI Code Editors, Cursor vs Windsurf, Context Management in AI Editors, Model Hot-Swapping

Links mentioned:


Nous Research AI ▷ #general (308 messages🔥🔥):

Open Source Cursor Alternatives, Prompt Injection / Jailbreaking Tactics, Llama 4 launch and performance, Neural Plasticity via Neural Graffiti

Links mentioned:


Nous Research AI ▷ #ask-about-llms (27 messages🔥):

Claude Think Tool, Local LLM for 300 Pages of Text, Nous Capybara 34B Model, DeepHermes, BatchNorm and LayerNorm Implementations

Link mentioned: NousResearch/Nous-Capybara-34B · Hugging Face: no description found


Nous Research AI ▷ #research-papers (2 messages):

Reinforcement Learning for LLMs, Reward Modeling Improvements, Self-Principled Critique Tuning

Link mentioned: Inference-Time Scaling for Generalist Reward Modeling: Reinforcement learning (RL) has been widely adopted in post-training for large language models (LLMs) at scale. Recently, the incentivization of reasoning capabilities in LLMs from RL indicates that $...


Nous Research AI ▷ #interesting-links (9 messages🔥):

Claude Squad, Heterogeneous Recursive Planning, Panthalia Decentralized Compute, TextPulse Library

Links mentioned:


Nous Research AI ▷ #research-papers (2 messages):

Deepseek, Reinforcement Learning, Large Language Models, Reward Modeling, Self-Principled Critique Tuning

Link mentioned: Inference-Time Scaling for Generalist Reward Modeling: Reinforcement learning (RL) has been widely adopted in post-training for large language models (LLMs) at scale. Recently, the incentivization of reasoning capabilities in LLMs from RL indicates that $...


Nous Research AI ▷ #reasoning-tasks (6 messages):

Reasoning Benchmarking, Open Reasoning Tasks

Link mentioned: GitHub - NousResearch/Open-Reasoning-Tasks: A comprehensive repository of reasoning tasks for LLMs (and beyond): A comprehensive repository of reasoning tasks for LLMs (and beyond) - NousResearch/Open-Reasoning-Tasks


MCP (Glama) ▷ #general (293 messages🔥🔥):

MCP Governance SDK, MCP Protocol Revision 2025, MCP Desktop Workflow Integrations, Pinging MCP Servers Before Initialization, MCP Server for Microsoft Loop

Links mentioned:


MCP (Glama) ▷ #showcase (23 messages🔥):

MCP-k8s Docker Images, chat.md with MCP support, Cloudflare for Remote MCP Servers, WhatsMCP Oauth Support, Semgrep MCP Rewrite

Links mentioned:


Eleuther ▷ #general (39 messages🔥):

RAG evaluation with lm-evaluation-harness, RoR-Bench paper by the_alt_man, Llama 4 release, Aligning AGI using Bayesian Updating

Links mentioned:


Eleuther ▷ #research (204 messages🔥🔥):

Mixture of Experts, Large Language Models, Gradient-Free Learning Methods, Hyper-connections as alternative to residual connections, Attention Sinks in LLMs

Links mentioned:


Eleuther ▷ #interpretability-general (17 messages🔥):

Polytope lens for NNs, ReLU networks geometry, Machine Unlearning Workshop, Origami view of NNs, Expressivity of Deep Networks

Links mentioned:


Eleuther ▷ #lm-thunderdome (19 messages🔥):

lm-eval-harness EOS token, Llama 2 vs Llama 3 IFEval Score, Huggingface tokenization

Links mentioned:


HuggingFace ▷ #announcements (1 messages):

huggingface_hub v0.30.0, monoELECTRA reranker models, YourBench Custom Evals, Jetson Robot, Accelerate v1.6.0

Links mentioned:


HuggingFace ▷ #general (169 messages🔥🔥):

Llama-4-Scout vs Mistral Small 3.1, AI Engineer Interview, Deepmind created AGI Internally?, Fine Tuning Quantized Models, Huggingchat 500 error

Links mentioned:


HuggingFace ▷ #today-im-learning (16 messages🔥):

LLM Development, Sebastian Raschka Book, Andrej Karpathy Video, NLP course chapter 3

Link mentioned: Let's reproduce GPT-2 (124M): We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really...


HuggingFace ▷ #cool-finds (2 messages):

Windows CLI, Virtual Environment Reset, LocalAI, Dify

Link mentioned: The Complete Roadmap to Mastering Agentic AI in 2025 | Girish Kotte: Discover a comprehensive 12-step roadmap to mastering agentic AI in 2025. Learn everything from basic concepts to advanced deployment techniques with resource links for each stage. Perfect for develop...


HuggingFace ▷ #i-made-this (8 messages🔥):

MCP Server and RAG Application, Osyllabi AI Curriculum, DocQuery AI Documentation Search, Municipal Law Dataset, LlamaResearcher with Llama-4

Links mentioned:


HuggingFace ▷ #computer-vision (5 messages):

Data Annotation for OCR, VLM Fine-Tuning for Handwritten Text, Combining OCR Techniques with VLMs, Roboflow for managing images and labels, MS-Swift and PEFT/Unsloth Approaches


HuggingFace ▷ #NLP (5 messages):

Text Extraction from PDFs, Docling, SmolDocling, RolmOCR, Sci-BERT

Links mentioned:


HuggingFace ▷ #smol-course (24 messages🔥):

OpenWeatherMap API, ISO 3166-1 alpha-2 code, Qwen/Qwen2.5-Coder-32B-Instruct Alternatives, Hugging Face Token for Agent Creation, llm-course Channel

Links mentioned:


HuggingFace ▷ #agents-course (36 messages🔥):

MCP in Agent Course, Inference Usage Costs, Gemini Models, Course Feedback, Hallucination in Agents

Links mentioned:


Yannick Kilcher ▷ #general (177 messages🔥🔥):

Grok 3, Turing Machines, Raw Binary AI training, LLama 4, Quantization Techniques

Links mentioned:


Yannick Kilcher ▷ #paper-discussion (28 messages🔥):

Llama 4, DeepSeek Paper, PaperBench, Text Diffusion

Links mentioned:


Yannick Kilcher ▷ #ml-news (17 messages🔥):

GPT-6 release, Llama 4, Mindcraft Update, Adapting pre-training text, diffusion modeling to control LLMs

Links mentioned:


GPU MODE ▷ #general (17 messages🔥):

CUDA Python Package, Vectorized Memory Access, Llama-4 Router Normalization, High RAM/VRAM SSH Access

Link mentioned: CUDA Python: CUDA Python provides uniform APIs and bindings to our partners for inclusion into their Numba-optimized toolkits and libraries to simplify GPU-based parallel processing for HPC, data science, and AI.


GPU MODE ▷ #triton (18 messages🔥):

Triton Kernel Debugging, GPU Assembly Debugging, Grayscale Kernel Writing, Block Index Creation, Data Transposing


GPU MODE ▷ #cuda (18 messages🔥):

CUDA debugger, nvshmem + mpi, nvbench and ubuntu 24.04, Shared memory access in CUDA, cute::copy and tiled_copy behavior

Links mentioned:


GPU MODE ▷ #torch (10 messages🔥):

torch compile backend, libtorch, mojo, torchscript, gelu+mul fusion

Link mentioned: torchtitan/torchtitan/models/llama3/parallelize_llama.py at main · pytorch/torchtitan: A PyTorch native library for large model training. Contribute to pytorch/torchtitan development by creating an account on GitHub.


GPU MODE ▷ #announcements (1 messages):

GPU Mode Website, Active Leaderboards, Website Feedback

Link mentioned: Leaderboards – GPU MODE: no description found


GPU MODE ▷ #cool-links (6 messages):

Llama 4, Triton Distributed, Tensara Triton Support, AMD Instinct MI325X Performance

Links mentioned:


GPU MODE ▷ #jobs (6 messages):

Qualcomm AI Engineer Hiring, Suno ML roles and H100 resources, Zero latency music creation


GPU MODE ▷ #beginner (19 messages🔥):

Centralized GPU programming language, OpenCL and SYCL, ROCm and HIP, 4-bit operations in CUDA for LLMs, Performance roofline models and arithmetic intensity


GPU MODE ▷ #torchao (2 messages):

Int4WeightOnlyConfig, torch.compile for speedup, Compiling individual submodules

for n, m in model.named_modules():
    if isinstance(m, torch.nn.Linear):
        setattr(model, n, torch.compile(getattr(model, n)))

GPU MODE ▷ #irl-meetup (3 messages):

Silicon Valley Meetups, SF Meetups, Summer Intern Meetups


GPU MODE ▷ #self-promotion (37 messages🔥):

RL fine-tuning with sandboxed code interpreter, Gemma 3 QAT vs HQQ, Wavespeed AI inference API, Vector Sum CUDA Kernel optimization, Tom and Jerry video generation with transformers

Links mentioned:


GPU MODE ▷ #reasoning-gym (18 messages🔥):

Curriculum Learning for Reasoning, Llama 3 vs Qwen 2.5, Dream 7B Diffusion Model, Llama 4 Maverick coding, Claude Think Tool

Links mentioned:


GPU MODE ▷ #gpu模式 (3 messages):

Deepseek communication library, NVSHMEM and UVA, Peer-to-peer GPU communication


GPU MODE ▷ #general (1 messages):

leikowo: any way to have a ptx torch extension (not cuda with inline ptx) ?


GPU MODE ▷ #submissions (24 messages🔥):

matmul Leaderboard submissions, vectoradd Benchmark Submissions, Modal Runners success, grayscale Leaderboard submissions


GPU MODE ▷ #ppc (5 messages):

libsanitizer-collection.so, compute-sanitizer, LD_LIBRARY_PATH


GPU MODE ▷ #feature-requests-and-bugs (2 messages):

Leaderboard Units, Nanos vs Millis, Discord Cluster Manager


GPU MODE ▷ #hardware (2 messages):

Local LLM inference, Fine-tuning, GPU selection, L40 vs A100, Quantization


Notebook LM ▷ #use-cases (14 messages🔥):

Interactive voice mode, Mind maps rollout, Website URL use cases, Commercial scale version of NotebookLM


Notebook LM ▷ #general (154 messages🔥🔥):

NotebookLM's Discover feature rollout, Gemini 2.5 family, Mind Map evolution with generative AI, YouTube audio EQ Chrome extension, Google Cloud Next and Google I/O events

Links mentioned:


Modular (Mojo 🔥) ▷ #general (28 messages🔥):

Nvidia CUDA Python Support, Mojo GenAI, CuTile Programming Model, SIMD vs SIMT, Tenstorrent and Modular

Links mentioned:


Modular (Mojo 🔥) ▷ #mojo (85 messages🔥🔥):

Auto Lowering, MLIR Interpreter stress test, Implicit ctor hack, Mojo language spec, Mojo implicit copies

Link mentioned: ChronoFlare/chronoflare/init.mojo at main · bgreni/ChronoFlare: A time interval library written in mojo. Contribute to bgreni/ChronoFlare development by creating an account on GitHub.


Nomic.ai (GPT4All) ▷ #general (54 messages🔥):

Nomic Embed Text V2, GPT4All release cadence, Llama 4 release, ComfyUI for multimodal tasks, Semantic chunking

Links mentioned:


LlamaIndex ▷ #blog (3 messages):

MCP Servers, Full-Stack Agent Application, LlamaParse Layout Agent


LlamaIndex ▷ #general (46 messages🔥):

Workflow as a Tool, Multi-Agent System with Supervisor Pattern, RAG System with LlamaParse, Scalability Issue with DocumentSummaryIndex, Tools retry when exception occurred

async def tool_fn(...):
  """Some helpful description"""
  result = await workflow.run(...)
  return str(result)

tool = FunctionTool.from_defaults(tool_fn)

Link mentioned: GitHub - run-llama/multi-agent-concierge: An example of multi-agent orchestration with llama-index: An example of multi-agent orchestration with llama-index - run-llama/multi-agent-concierge


tinygrad (George Hotz) ▷ #general (16 messages🔥):

torch-geometric for tinygrad, Llama 4 10M context limitations, fast pattern matcher bounty, UOps generation, tinygrad YouTube video

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (24 messages🔥):

Tensor and SimpleMathTrait inheritance, Mesozoic tinygrad tutorials issues, METAL sync issue, AMD and BEAM issues


Cohere ▷ #「💬」general (19 messages🔥):

MCP with Command-A model, Cohere Tool Use, Cohere Scholars Program, Events Recording

Links mentioned:


Cohere ▷ #【📣】announcements (1 messages):

Aya Vision, Multilingual Multimodal Models, Open Weights Model


Cohere ▷ #「🔌」api-discussions (5 messages):

Notion Connector, Vector DB for Notion


Cohere ▷ #「🤖」bot-cmd (3 messages):

greetings


Torchtune ▷ #dev (22 messages🔥):

Fix for Timeout Crash, NeMo Resilient Training, RL Workflow, DeepSpeed Integration

Links mentioned:


Torchtune ▷ #papers (1 messages):

pjbontrager: You think they used AI to write that scrolling live updated chart?


LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

AI4Math, Theorem Proving, Autoformalization, Formal Mathematical Reasoning, Language Models


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (4 messages):

LLM Agents MOOC, AgentX Competition, Course Quiz

Link mentioned: Advanced Large Language Model Agents MOOC: MOOC, Spring 2025


DSPy ▷ #general (4 messages):

asyncio support, full-async fork of dspy, reasons to migrate


Gorilla LLM (Berkeley Function Calling) ▷ #discussion (3 messages):

GitHub PR Review, Phi-4 Support


MLOps @Chipro ▷ #events (1 messages):

Manifold Research, Multimodal AI, Self-assembling space robotics, Robotic metacognition, Community Research Call

Link mentioned: Community Research Call #4 · Zoom · Luma: Interested in generalist AI models, self-assembling space robots or machine self-awareness? Join us for Community Research Call #4!Community Research Calls…



{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}