Frozen AI News archive

ChatGPT responds to GlazeGate + LMArena responds to Cohere

**OpenAI** faced backlash after a controversial ChatGPT update, leading to an official retraction admitting they "focused too much on short-term feedback." Researchers from **Cohere** published a paper criticizing **LMArena** for unfair practices favoring incumbents like **OpenAI**, **DeepMind**, **X.ai**, and **Meta AI Fair**. The **Qwen3 family** by **Alibaba** was released, featuring models up to **235B MoE**, supporting **119 languages** and trained on **36 trillion tokens**, with integration into **vLLM** and support in tools like **llama.cpp**. Meta announced the second round of **Llama Impact Grants** to promote open-source AI innovation. Discussions on AI Twitter highlighted concerns about leaderboard overfitting and fairness in model benchmarking, with notable commentary from **karpathy** and others.

Canonical issue URL

AI Drama is all we need.

AI News for 4/29/2025-4/30/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (214 channels, and 5096 messages) for you. Estimated reading time saved (at 200wpm): 442 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

It is perhaps too coincidental that, the week after Dario Amodei stressed the Urgency of Interpretability, ChatGPT shipped an update that was so roundly hated it had to offer an official retraction overnight, saying "we focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time". Joanne Jang of the Model Spec even did a rare Reddit AMA sharing a little detail on their learnings:

Elsewhere on AI Twitter, the growing dissatistfaction with LMArena (after a rough Llama 4 weekend) came to a head as a group of researchers who primarily work at Cohere published a paper documenting unfair practices favoring big incumbents like OpenAI, DeepMind, X.ai and Meta.

They gave LMArena a heads up and they have responded, but the damage is done and there is officially an appetite for alternatives. Fortunately, the paper comes with actionable recommendations that LMArena can consider to restore confidence.


AI Twitter Recap

Model Releases and Updates (Qwen3, Llama, DeepSeek, MiMo)

Performance Benchmarking and Evaluation

Tools and Frameworks

AI Sycophancy, Safety, and Testing

Coding and Software Development

Hardware and Infrastructure

Theoretical and Philosophical Musings

Humor and Miscellaneous


AI Reddit Recap

/r/LocalLlama Recap

1. Qwen3 Series Model Performance and Mobile Usability

2. DeepSeek-Prover-V2-671B and JetBrains Mellum Model Releases

3. Model Benchmarking, UI-Capable Models, and Emerging LLM Leaders

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. OpenAI GPT-4o Sycophancy and Glazing Controversy

2. AI Code Generation and Workforce Transformation Predictions

3. Latest Innovations in AI-Driven Visual Content Creation


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1: Qwen 3 Models Stir Buzz and Bugs Across Platforms

Theme 2: Model Mania: Gemini Stumbles, Llama 4 Arrives, Sonnet Sputters

Theme 3: Fine-Tuning & Optimization Frontiers Push Efficiency

Theme 4: Tools & Platforms Navigate Glitches and Gains

Theme 5: Hardware Heats Up with Mac Speed, GPU Competitions, and New Tools



Discord: High level Discord summaries

Perplexity AI Discord


Unsloth AI (Daniel Han) Discord


LMArena Discord


LM Studio Discord


OpenRouter (Alex Atallah) Discord


aider (Paul Gauthier) Discord


GPU MODE Discord


OpenAI Discord


Yannick Kilcher Discord


Nous Research AI Discord


Cursor Community Discord


HuggingFace Discord


Notebook LM Discord


Manus.im Discord Discord


Latent Space Discord


Modular (Mojo 🔥) Discord


LlamaIndex Discord


Eleuther Discord


MCP (Glama) Discord


Torchtune Discord


DSPy Discord


Cohere Discord


Nomic.ai (GPT4All) Discord


Gorilla LLM (Berkeley Function Calling) Discord


The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #announcements (2 messages):

Perplexity AI on WhatsApp, Sonnet Model Behavior Update, Anthropic Status Incident


Perplexity AI ▷ #general (1112 messages🔥🔥🔥):

Free AI billing, Grok android app, Model Fallbacks, The Boys fanboy


Perplexity AI ▷ #sharing (1 messages):

_paradroid: https://www.perplexity.ai/search/d7bb905e-27e3-43e9-8b68-76bea1905457


Perplexity AI ▷ #pplx-api (14 messages🔥):

Sonar API Debit Card Issues, Hackathon Credits, Structured Output Issues, Async Deep Research API, API vs Web Results


Unsloth AI (Daniel Han) ▷ #general (899 messages🔥🔥🔥):

Qwen3, LM Studio issues, GGUF fixes, Training configuration, Multi-GPU support


Unsloth AI (Daniel Han) ▷ #off-topic (10 messages🔥):

ChatGPT ComfyUI Opinion, California AI Group, ComfyUI Demos


Unsloth AI (Daniel Han) ▷ #help (186 messages🔥🔥):

Unsloth installation issues, Qwen notebook issues, GRPO performance, Lora efficiency, Unsloth & Ollama/vLLM


Unsloth AI (Daniel Han) ▷ #showcase (4 messages):

Pi-Scorer, LLM-as-a-Judge, encoder model


Unsloth AI (Daniel Han) ▷ #research (47 messages🔥):

Dynamic BNB Quantization, LLMs in Medical Advice, Mixture of Experts with Gemma, Attention Head Routing, GRPO Fine-tuning


LMArena ▷ #general (544 messages🔥🔥🔥):

O3 Pro, Qwen 3, Gemini 2.5 Pro, Grok 3.5, Model Benchmarking and Evaluation


LM Studio ▷ #general (271 messages🔥🔥):

Qwen3 thinking, LM Studio on Android, Qwen3 experts number, Qwen3 bug fixes, Qwen3 with RAG


LM Studio ▷ #hardware-discussion (61 messages🔥🔥):

Framework Desktop vs. Flow Z13, AMD GPU 7900 XTX Value, Qwen3-30B-A3B Issues, MLX vs. llama.cpp Speed, Xeon Workstation for $1k


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Rate Limit, 2.5 Flash, Capacity


OpenRouter (Alex Atallah) ▷ #general (321 messages🔥🔥):

Qwen3 coding abilities, Gemini 2.5 flash issues and rate limits, OpenRouter Caching Issues, LLama 4 benchmark, Vertex issue with token counting


aider (Paul Gauthier) ▷ #general (186 messages🔥🔥):

Qwen3 models, Aider and Qwen3 integration, ktransformers VRAM optimization, Deepseek R2 release


aider (Paul Gauthier) ▷ #questions-and-tips (21 messages🔥):

AiderDesk Agent Mode, Repo Map Control, OpenRouter Model Support, Gemini 2.5 + Deepseek combo


aider (Paul Gauthier) ▷ #links (1 messages):

p0lyg0n: Great documentary on Deepseek: https://www.youtube.com/watch?v=Lo0FDmSbTp4


GPU MODE ▷ #general (10 messages🔥):

Apple Silicon, Cloud GPUs, CUDA, Metal, ROCm


GPU MODE ▷ #triton (2 messages):

fp8 quantization, fp32 accumulation, Triton matmul, Custom CUDA kernels, AMD


GPU MODE ▷ #torch (5 messages):

Torch Logger Methods Compilation, AOT Inductor Multithreading


GPU MODE ▷ #announcements (1 messages):

AMD MI300 competition, MoE kernels, FP8 submissions


GPU MODE ▷ #beginner (1 messages):

raymondz4gewu_60651: /get-api-url


GPU MODE ▷ #torchao (22 messages🔥):

Quantized Models and torch.bfloat16, vllm Compile Integration Debugging, gemlite Kernel Selection, torch.compile Debugging Challenges, torch.dtype Extensibility


GPU MODE ▷ #rocm (3 messages):

ROCm memory, CDNA3 ISA


GPU MODE ▷ #metal (3 messages):

QR decomposition, SIMD, Thread barriers, Single-threaded SVD


GPU MODE ▷ #self-promotion (3 messages):

GPU Price Tracker, AI/ML Engineer for Hire, Open Source IDE for AI/ML


GPU MODE ▷ #thunderkittens (1 messages):

Use Cases, Performance


GPU MODE ▷ #general (15 messages🔥):

FP8 quantization material, FP8 matmul, Deepseek-v3 tech report, prefixsum ranked timeout


GPU MODE ▷ #submissions (60 messages🔥🔥):

vectoradd benchmark on H100, amd-fp8-mm benchmark on MI300, amd-mixture-of-experts benchmark on MI300, prefixsum benchmark on H100, A100, matmul benchmark on L4


GPU MODE ▷ #status (2 messages):

Single GPU MoE Kernel, FP8 and MoE Kernels, Leaderboard Submissions


GPU MODE ▷ #amd-competition (23 messages🔥):

Aithe reference code, FP8 correctness verification, Submission ID, official problem writeup for this kernel


GPU MODE ▷ #cutlass (1 messages):

vkaul11: Are there kernels available to do fp8 multiplication with fp32 accumulation ?


OpenAI ▷ #ai-discussions (88 messages🔥🔥):

ChatGPT persistent memory, AI Agent Company, IAM360 Framework, AI-generated thumbnails


OpenAI ▷ #prompt-engineering (29 messages🔥):

Identity Systems in ChatGPT, Dynamic Game Master Role in RP, ChatGPT Internal Tools, Prompt Engineering Tips, LLM TTRPG game development


OpenAI ▷ #api-discussions (29 messages🔥):

Identity system in ChatGPT, RP prompt issues, Dynamic Game Master role, ChatGPT internal memory (bio tool), LLM TTRPG game development


Yannick Kilcher ▷ #general (79 messages🔥🔥):

PyQt5 Chat App, OR vs ML history, Gemini 2.5 Pro vs GPT-4o, Qwen 3 performance, FFN in Transformers


Yannick Kilcher ▷ #paper-discussion (8 messages🔥):

DeepSeek VL, Construction


Yannick Kilcher ▷ #ml-news (34 messages🔥):

Anonymous LLM on Reddit, ChatGPT's Convincing Skills, Meta's LlamaCon 2025, Llama 4 aka Little Llama, SAM 3 Development


Nous Research AI ▷ #announcements (1 messages):

Atropos RL framework, RLAIF models, GRPO tool calling, corporate fundamentals prediction, Psyche decentralized training network


Nous Research AI ▷ #general (110 messages🔥🔥):

Qwen 3 Overfitting, DeepSeek R2 Release, Huawei Ascend 910B, Atropos Release, Minos Model Refusals


Nous Research AI ▷ #ask-about-llms (2 messages):

Image loading issues


Cursor Community ▷ #general (101 messages🔥🔥):

VS Code Extension for Filtering .cs files in Git Changes View, Cursor Spending Limit Issues, Model Selection Purpose, Anthropic 3.7 Incident, Gemini 2.5 Pro Issues


HuggingFace ▷ #general (43 messages🔥):

Cloudflare Turnstile, whisper-large-v3-turbo issues, GGUF models and CPU offloading, Model Context Protocol (MCP), Fastest inference for running models


HuggingFace ▷ #cool-finds (1 messages):

cakiki: <@1298649243719958612> please don't cross-post


HuggingFace ▷ #i-made-this (9 messages🔥):

3D Animation Arena, Pi-Scorer alternative to LLM-as-a-Judge, HMR Models


HuggingFace ▷ #computer-vision (2 messages):

Defect annotation, Image masking, Filter usage


HuggingFace ▷ #agents-course (40 messages🔥):

Hugging Face Agents certification, Agents.json vs Prompts.yaml, Llama-3 access request, Models temporarily unavailable, Solving the final project with free resources


Notebook LM ▷ #announcements (1 messages):

Audio Overviews, Multilingual Support


Notebook LM ▷ #use-cases (28 messages🔥):

NotebookLM language support, Audio Overview limitations, Concise explanations, Smarter Models


Notebook LM ▷ #general (65 messages🔥🔥):

NotebookLM Updates, Multi-Language Support, Audio Overview Issues, Interactive Mode Bugs, Podcast Feature Requests


Manus.im Discord ▷ #general (75 messages🔥🔥):

Add on Credits, Manus Fellow Program, Manus Referral Program, Manus Credit System, Beta Testing


Latent Space ▷ #ai-general-chat (51 messages🔥):

X-Ware Red, Llama Prompt Ops, LLM Benchmarks Survey


Modular (Mojo 🔥) ▷ #general (13 messages🔥):

Bending Origins in Mojo, Origin-related headaches, Multiple Licenses in Modular Repository, Pointer usage to avoid origin issues


Modular (Mojo 🔥) ▷ #mojo (11 messages🔥):

importing Python packages, profiling blocks of code, SIMD width, vector strip-mining, flamegraph


LlamaIndex ▷ #blog (2 messages):

GPT-4o generates Tetris, PapersChat indexes papers


LlamaIndex ▷ #general (17 messages🔥):

Azure OpenAI timeouts, MessageRole.FUNCTION vs MessageRole.TOOL, Function agent and context issues


Eleuther ▷ #general (9 messages🔥):

RAG Chatbot challenges, GraphRAG for multiple sources, Local inference and small model training, Collaborating on AI research


Eleuther ▷ #research (10 messages🔥):

Recursive Symbolic Prompts, LLM Honesty Compliance, HHH Objectives in LLMs


MCP (Glama) ▷ #general (12 messages🔥):

Credential Passing, RAG type server for client file ingestion, Streamable HTTP Implementation and Authentication, Multi-Tenant Server Hosting, Open Source Models for Agentic Applications


MCP (Glama) ▷ #showcase (1 messages):

MCP Server, Real Time Push Notifications


Torchtune ▷ #dev (9 messages🔥):

foreach optimization, gradient scaling, DoRA + QAT


DSPy ▷ #general (6 messages):

MCP Usage, Displaying thoughts component in React


Cohere ▷ #💬-general (1 messages):

Markdown-based vs Image-based multimodal RAG on PDFs, Docling, EmbedV4


Cohere ▷ #🔌-api-discussions (2 messages):

Cohere rate limits for embed-v4, Embed V4 on Bedrock


Cohere ▷ #🤝-introductions (2 messages):

Cohere's Embed V4 model, Data Scientists introductions


Nomic.ai (GPT4All) ▷ #general (5 messages):

Embeddings, GPT4All, Manus AI, Embedding grouping


Gorilla LLM (Berkeley Function Calling) ▷ #discussion (3 messages):

Loose vs Strict Evaluation, Model Training Inconsistencies