Frozen AI News archive

not much happened today

**Microsoft** released **Phi-reasoning 4**, a finetuned 14B reasoning model slightly behind QwQ but limited by data transparency and token efficiency issues. **Anthropic** introduced remote MCP server support and a 45-minute Research mode in **Claude**. **Cursor** published a model popularity list. **Alibaba** launched **Qwen3-235B** and other Qwen3 variants, highlighting budget-friendly coding and reasoning capabilities, with availability on **Together AI** API. **Microsoft** also released **Phi-4-Mini-Reasoning** with benchmark performance on AIME 2025 and OmniMath. **DeepSeek** announced **DeepSeek-Prover V2** with state-of-the-art math problem solving, scaling to 671B parameters. **Meta AI**'s **Llama** models hit 1.2 billion downloads, with new **Llama Guard 4** and **Prompt Guard 2** for input/output filtering and jailbreak prevention. **Xiaomi** released the open-source reasoning model **MiMo-7B** trained on 25 trillion tokens. Discussions on AI model evaluation highlighted issues with the **LMArena leaderboard**, data access biases favoring proprietary models, and challenges in maintaining fair benchmarking, with suggestions for alternatives like **OpenRouterAI** rankings. *"LMArena slop and biased"* and *"61.3% of all data going to proprietary model providers"* were noted concerns.

Canonical issue URL

a quiet day.

AI News for 4/30/2025-5/1/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (214 channels, and 4767 messages) for you. Estimated reading time saved (at 200wpm): 453 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Microsoft released Phi-reasoning 4, a reasoning finetune of the 14B Phi-4 that is slightly behind QwQ in performance, but lack of transparency around their data and complains of inference-token hungriness limit the excitement around it.

Anthropic launched remote MCP server support in Claude and an up to 45-min long Research mode.

Cursor released their model popularity list, with not much surprises.


AI Twitter Recap

Language Models and Releases

AI Model Evaluation and Leaderboards

Applications of AI Agents and Tools

AI Safety, Ethics, and Responsible Development

Education and Learning in AI

Humor and Miscellaneous


AI Reddit Recap

/r/LocalLlama Recap

1. Phi 4 Reasoning Model Release and Discussion

2. Qwen 3 Models: Impressions and Capabilities

3. Novel Model and Training Method Announcements (TTS/ASR, KL Optimization)

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. CivitAI Adult Content Purge and Community Alternatives

2. Recent AI Model and Tech Launches & Benchmarks

3. Instructional Image Editing and UI Integration Releases


AI Discord Recap

A summary of Summaries of Summaries by chatgpt-4o-latest

1. Phi-4 Reasoning Model Release

2. Diffusion Language Models & Architecture Innovations

3. Qwen3 Model: Breakthroughs and Bugs

4. Multi-Model Ecosystem Showdowns & Scaling Benchmarks

5. Claude's Expanding Capabilities and Integrations


Discord: High level Discord summaries

LMArena Discord


Perplexity AI Discord


Unsloth AI (Daniel Han) Discord


GPU MODE Discord


OpenRouter (Alex Atallah) Discord


aider (Paul Gauthier) Discord


HuggingFace Discord


OpenAI Discord


Cursor Community Discord


Nous Research AI Discord


LM Studio Discord


Notebook LM Discord


Manus.im Discord Discord


Yannick Kilcher Discord


MCP (Glama) Discord


Latent Space Discord


Eleuther Discord


DSPy Discord


LlamaIndex Discord


LLM Agents (Berkeley MOOC) Discord


Nomic.ai (GPT4All) Discord


Cohere Discord


Gorilla LLM (Berkeley Function Calling) Discord


MLOps @Chipro Discord


The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Torchtune Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (1067 messages🔥🔥🔥):

Gemini vs. O3, Qwen3 Benchmarking, Context Arena Updates, Deep Research Tool Comparisons, Qwen3 4B


Perplexity AI ▷ #general (796 messages🔥🔥🔥):

UI change, Context length, Zen browser, Grok limitations, 3.7 Thinking


Perplexity AI ▷ #sharing (4 messages):

Tesla CEO Search, Android Bluetooth Bug, Arctic P8 Fan Curve, o4-mini AI Poetry


Perplexity AI ▷ #pplx-api (1 messages):

Sonar API, LlamaIndex, RAG Project


Unsloth AI (Daniel Han) ▷ #general (373 messages🔥🔥):

GLM models blog post, Unsloth Llama-4-Scout-17B-16E-Instruct-GGUF image support, Long context model recommendations, Unsloth fine-tuning Qwen3 agent with final reward only, Microsoft Phi-4-reasoning


Unsloth AI (Daniel Han) ▷ #off-topic (5 messages):

Unsloth Discord, Game Scripting API, AI for Game Development


Unsloth AI (Daniel Han) ▷ #help (114 messages🔥🔥):

LoRA Fine Tuning, Qwen3 UD 2.0 quants, Gemma-3-27b-it-Q4_K_M.gguf, Qwen3 LoRA merging issue, Qwen 2.5 VL 7B finetuning issue


Unsloth AI (Daniel Han) ▷ #research (8 messages🔥):

Test-Time RL, GRPO for Problem Description, Softmax with Softpick


GPU MODE ▷ #general (126 messages🔥🔥):

HF Tariffs, VSCode vs NeoVim, GPU databases, Lora finetuning FSDP Error


GPU MODE ▷ #triton (9 messages🔥):

Triton Kernel uses in vLLM/SGLang, CUDA/Cutlass/HIP kernels compared to Triton, Hardware vendors developing kernels, cuTile and IR open source, Mojo for GPU programming


GPU MODE ▷ #torch (9 messages🔥):

std::vector<std::vector<double>> schema, PyTorch SDPA performance on AMD, Torch Dynamo recompiles


GPU MODE ▷ #jobs (1 messages):

SemiAnalysis, System Modeling, Benchmarks


GPU MODE ▷ #beginner (3 messages):

NVCC installation, Cloud GPUs, Google Colab


GPU MODE ▷ #torchao (2 messages):

Fake Quantization, Linear Layers, Quantization Modes


GPU MODE ▷ #off-topic (8 messages🔥):

TLV SF Coffee, NxN Mat Mul, Outdoor New England


GPU MODE ▷ #rocm (3 messages):

ROCm MI300 benchmarks, ScalarLM, MI300 memory, AMD experiments


GPU MODE ▷ #lecture-qa (1 messages):

wecu: wow! that server is sick!


GPU MODE ▷ #liger-kernel (5 messages):

Multi Token Attention, Native Sparsity, Sparsemax Implementation


GPU MODE ▷ #self-promotion (1 messages):

PDF to LaTeX conversion, OCR Text Extraction, Asynchronous Processing, GPU Acceleration


GPU MODE ▷ #submissions (67 messages🔥🔥):

MI300 Leaderboard Updates, amd-fp8-mm Performance, vectorsum Benchmarks, amd-mixture-of-experts Results, Personal Best on AMD


GPU MODE ▷ #amd-competition (41 messages🔥):

Ranked vs Benchmark Performance Discrepancies, Leaderboard Reliability Concerns, Submission Timeouts and GH Action Limits, Problem Constraints


OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Inception's Mercury Coder, Gemini 2.5 Pro Vertex Token Counting Issue


OpenRouter (Alex Atallah) ▷ #general (252 messages🔥🔥):

Vanna.ai for SQLite DBs, Phala Confidential AI Endpoints on OpenRouter, Amazon Nova Premier Model, Claude API Issues, Aider for Code Refactoring


aider (Paul Gauthier) ▷ #general (143 messages🔥🔥):

Gemini vs Claude Code, Claude Code Proxy, Groq speed limitations, MCPaaS business


aider (Paul Gauthier) ▷ #questions-and-tips (85 messages🔥🔥):

LLM Selection Criteria (economics, coding scores, experience), Aider with Local LLMs (ollama, qwen3) performance issues, Gemini 2.5 Pro for Careful Codebase Edits, Managing Large Codebases with Aider, UI Prototyping with v0.dev and Aider


HuggingFace ▷ #general (37 messages🔥):

voice conversion models like 11labs, seed-vc for voice conversion, Spatial reasoning in open source vision LLMs, Liquid foundational models, Microsoft's new Phi-4 reasoning models


HuggingFace ▷ #today-im-learning (50 messages🔥):

RL Agent with PPO, LSTM for NER, Transformers Recommendations, E2B Secured Environment for Agent, HF Agents Course


HuggingFace ▷ #i-made-this (2 messages):

PDF to LaTeX Conversion, LLM Pokemon Battle, Grok Wins Pokemon Competition


HuggingFace ▷ #computer-vision (2 messages):

Google Document AI, Collaboration Opportunities


HuggingFace ▷ #NLP (1 messages):

Help Request


HuggingFace ▷ #smol-course (2 messages):

Managed Agents, Final_Answer tool, Kwarg Errors, Version compatibility


HuggingFace ▷ #agents-course (125 messages🔥🔥):

Unit 4 Deadline Extended, Unit 4 Submission Errors, Smolagents Issues, Gemini Free Tier Errors, Running Phoenix for Telemetry


OpenAI ▷ #ai-discussions (107 messages🔥🔥):

VAE vs U-Net, GPT-4 Retirement, Awakening Happening Now, Gemini 2.5 Pro vs GPT 4o, Content Filters and 'Granny-crusty-nun'


OpenAI ▷ #gpt-4-discussions (49 messages🔥):

Connected Apps in Settings, GPT-4o Personality Rollback, Token Consumption in GPT, GPT Coding Inefficiencies, GPTs and Follow-Up Questions


OpenAI ▷ #prompt-engineering (28 messages🔥):

ChatGPT Prompting, Task Functionality, Room Temperature Superconductors, Mental Health Support


OpenAI ▷ #api-discussions (28 messages🔥):

ChatGPT Prompt Engineering, Free vs Paid ChatGPT, Material Science Research, Room Temperature Superconductors, Mental Health Support


Cursor Community ▷ #general (182 messages🔥🔥):

Gemini 2.5 Pro, Benchmark Saturation Models, NixOS + Cursor, GitHub MCP with Cursor on Windows, Cursor as AWS of AI Editors


Nous Research AI ▷ #general (120 messages🔥🔥):

RP data impact, Small Model reasoning, Evaluating model performance, 405b FFT club


Nous Research AI ▷ #research-papers (3 messages):

Diagrams for the DoD, AI Applications in Defense


Nous Research AI ▷ #interesting-links (2 messages):

Cooperative AI, Multi-Agent Systems (MAS), Decentralized AI


Nous Research AI ▷ #research-papers (3 messages):

DoD Diagrams


LM Studio ▷ #general (53 messages🔥):

Qwen 3, Image models, Flash Attention, Gemma 3, LM Studio storage


LM Studio ▷ #hardware-discussion (73 messages🔥🔥):

Llama4 2T Model, DDR5 Offloading, Deepseek Model, Mac Studio M3 Ultra, Multi-GPU setups


Notebook LM ▷ #use-cases (8 messages🔥):

Embed Podcast Audio, LaTeX Symbols Troubleshooting, Caution about Unpublished Research


Notebook LM ▷ #general (80 messages🔥🔥):

Bulgarian mispronunciations, Audio overview host customization, PDF loading errors, Sharing notebookLM issues, Interactive mode issues


Manus.im Discord ▷ #general (86 messages🔥🔥):

LLM syntax errors causes, Call center robot persona, Tabnine AI agent, Manus credits expiration, building a fullstack app with manus


Yannick Kilcher ▷ #general (21 messages🔥):

Lean with Cursor setup, Autoformalization approaches with VSCode, PyTorch contribution process, Geometric Deep Learning anniversary, GPT-4 disappearance


Yannick Kilcher ▷ #paper-discussion (7 messages):

Perception Encoder Paper Discussion, ViT Image Resolution Handling, DeepSeek Prover Paper


Yannick Kilcher ▷ #agents (1 messages):

felix456: https://github.com/u2084511felix/vibescraper


Yannick Kilcher ▷ #ml-news (11 messages🔥):

Phi-4-reasoning, LLM Boredom, LLM Croatian Glitch


MCP (Glama) ▷ #general (28 messages🔥):

MCP Playground, Remote Serverless MCP Hosting Platform, C# SDK issues with streamable HTTP, LLM Tool Selection with Multiple MCP Servers, MCP Tool Type Adaptation


Latent Space ▷ #ai-general-chat (23 messages🔥):

Hallucinations on X, American Positivity on X, Radiance Fields, Claude Integrations, AI Assisted Coding


Eleuther ▷ #general (3 messages):

Downstream Capabilities of Frontier AI Models, ICML Acceptance, Othniel Introduction


Eleuther ▷ #research (17 messages🔥):

Linear Attention Models, Data Leakage, SFTTrainer issues, LLM Augmentation


DSPy ▷ #general (17 messages🔥):

LlamaCon Meta DSPy, Amazon AWS DSPy Migration, Journal Chemical Optimized LLM Prompts Reduce Chemical Hallucinations, DSPy 3.0 roadmap, VLM use with DSPy


LlamaIndex ▷ #blog (3 messages):

Multilingual Multimodal RAG, LlamaIndex Investments, Invoice Reconciliation Agent


LlamaIndex ▷ #general (7 messages):

HuggingFace Tokenizer with LlamaIndex, Qwen3 Models, LLMs producing non-deterministic results


LLM Agents (Berkeley MOOC) ▷ #hackathon-announcements (2 messages):

Auth0 Workshop, AgentX Prizes, Submission Guidelines


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (2 messages):

Assignments, Course Website, Labs Release


LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (6 messages):

AgentX, MOOC lectures


Nomic.ai (GPT4All) ▷ #general (10 messages🔥):

Mac RAM, GPU vs CPU offloading, VRAM requirements for LLMs, Qwen model performance


Cohere ▷ #💬-general (4 messages):

Chatlog Access, InterfaceUI Changes, Diffusion Models


Gorilla LLM (Berkeley Function Calling) ▷ #discussion (2 messages):

Model Output Quirks, Token Parsing, Model Specs Update