Frozen AI News archive

LlamaCon: Meta AI gets into the Llama API platform business

**Meta** celebrated progress in the **Llama** ecosystem at LlamaCon, launching an AI Developer platform with finetuning and fast inference powered by **Cerebras** and **Groq** hardware, though it remains waitlisted. Meanwhile, **Alibaba** released the **Qwen3** family of large language models, including **two MoE models** and **six dense models** ranging from **0.6B to 235B parameters**, with the flagship **Qwen3-235B-A22B** achieving competitive benchmark results and supporting **119 languages and dialects**. The Qwen3 models are optimized for coding and agentic capabilities, are Apache 2.0 licensed, and have broad deployment support including local usage with tools like **vLLM**, **Ollama**, and **llama.cpp**. Community feedback highlights Qwen3's scalable performance and superiority over models like OpenAI's **o3-mini**.

Canonical issue URL

Llama API is all you need?

AI News for 4/29/2025-4/30/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (214 channels, and 5096 messages) for you. Estimated reading time saved (at 200wpm): 442 minutes. Our new website iso now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

/r/localLlama fell in love with Qwen 3 from yesterday, but today belonged to Llama.

Though there were some rumors of a new Llama 4 reasoning model, LlamaCon ended up being a relatively no-big-surprises celebration of the undeniable progress in Llama-land. Zuck went back on Dwarkesh to discuss the controversial Llama 4 launch (our coverage):

https://www.youtube.com/watch?v=rYXeQbTuVl0

And for AI Engineers the main other notable update from the event was Meta launching, for the first time, an AI Developer platform, arguably their equivalent of Google's AI Studio, with finetuning capability and fast inference with Cerebras and Groq, although for now it remains waitlisted:


AI Twitter Recap

Qwen3 Model Release and Performance

Evaluation, Benchmarking, and Analysis of Qwen3

Google's Gemini Updates and Capabilities

ChatGPT Updates and Shopping Features

Runway References and Gen-4 Image Generation

AI Safety and Ethics

Multi-Agent Systems and LangGraph

Cursor and AI-Assisted Coding

Llama API, Tools, and Ecosystem

Other Models and Tools

Business, Investment, and Economic Impact

Humor and Miscellaneous


AI Reddit Recap

/r/LocalLlama Recap

1. Qwen3 Model Launches and Performance Benchmarks

2. Qwen3-30B-A3B MoE: Community Adoption and Use Cases

3. Qwen3 Small Models and Reasoning Capabilities (600M/4B)

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. New AI Model and Feature Launches (Qwen, LYNX, GPT-4o, Chroma, Hunyuan 3D)

2. AI-Driven Social, Ethical, and Psychological Impacts

3. Iterative Image Replication and Prompting Experiments with AI


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1: Qwen 3 Models Stir Buzz and Bugs Across Platforms

Theme 2: Model Mania: Gemini Stumbles, Llama 4 Arrives, Sonnet Sputters

Theme 3: Fine-Tuning & Optimization Frontiers Push Efficiency

Theme 4: Tools & Platforms Navigate Glitches and Gains

Theme 5: Hardware Heats Up with Mac Speed, GPU Competitions, and New Tools



Discord: High level Discord summaries

Perplexity AI Discord


Unsloth AI (Daniel Han) Discord


LMArena Discord


LM Studio Discord


OpenRouter (Alex Atallah) Discord


aider (Paul Gauthier) Discord


GPU MODE Discord


OpenAI Discord


Yannick Kilcher Discord


Nous Research AI Discord


Cursor Community Discord


HuggingFace Discord


Notebook LM Discord


Manus.im Discord Discord


Latent Space Discord


Modular (Mojo 🔥) Discord


LlamaIndex Discord


Eleuther Discord


MCP (Glama) Discord


Torchtune Discord


DSPy Discord


Cohere Discord


Nomic.ai (GPT4All) Discord


Gorilla LLM (Berkeley Function Calling) Discord


The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #announcements (2 messages):

Perplexity AI on WhatsApp, Sonnet Model Behavior Update, Anthropic Status Incident


Perplexity AI ▷ #general (1112 messages🔥🔥🔥):

Free AI billing, Grok android app, Model Fallbacks, The Boys fanboy


Perplexity AI ▷ #sharing (1 messages):

_paradroid: https://www.perplexity.ai/search/d7bb905e-27e3-43e9-8b68-76bea1905457


Perplexity AI ▷ #pplx-api (14 messages🔥):

Sonar API Debit Card Issues, Hackathon Credits, Structured Output Issues, Async Deep Research API, API vs Web Results


Unsloth AI (Daniel Han) ▷ #general (899 messages🔥🔥🔥):

Qwen3, LM Studio issues, GGUF fixes, Training configuration, Multi-GPU support


Unsloth AI (Daniel Han) ▷ #off-topic (10 messages🔥):

ChatGPT ComfyUI Opinion, California AI Group, ComfyUI Demos


Unsloth AI (Daniel Han) ▷ #help (186 messages🔥🔥):

Unsloth installation issues, Qwen notebook issues, GRPO performance, Lora efficiency, Unsloth & Ollama/vLLM


Unsloth AI (Daniel Han) ▷ #showcase (4 messages):

Pi-Scorer, LLM-as-a-Judge, encoder model


Unsloth AI (Daniel Han) ▷ #research (47 messages🔥):

Dynamic BNB Quantization, LLMs in Medical Advice, Mixture of Experts with Gemma, Attention Head Routing, GRPO Fine-tuning


LMArena ▷ #general (544 messages🔥🔥🔥):

O3 Pro, Qwen 3, Gemini 2.5 Pro, Grok 3.5, Model Benchmarking and Evaluation


LM Studio ▷ #general (271 messages🔥🔥):

Qwen3 thinking, LM Studio on Android, Qwen3 experts number, Qwen3 bug fixes, Qwen3 with RAG


LM Studio ▷ #hardware-discussion (61 messages🔥🔥):

Framework Desktop vs. Flow Z13, AMD GPU 7900 XTX Value, Qwen3-30B-A3B Issues, MLX vs. llama.cpp Speed, Xeon Workstation for $1k


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Rate Limit, 2.5 Flash, Capacity


OpenRouter (Alex Atallah) ▷ #general (321 messages🔥🔥):

Qwen3 coding abilities, Gemini 2.5 flash issues and rate limits, OpenRouter Caching Issues, LLama 4 benchmark, Vertex issue with token counting


aider (Paul Gauthier) ▷ #general (186 messages🔥🔥):

Qwen3 models, Aider and Qwen3 integration, ktransformers VRAM optimization, Deepseek R2 release


aider (Paul Gauthier) ▷ #questions-and-tips (21 messages🔥):

AiderDesk Agent Mode, Repo Map Control, OpenRouter Model Support, Gemini 2.5 + Deepseek combo


aider (Paul Gauthier) ▷ #links (1 messages):

p0lyg0n: Great documentary on Deepseek: https://www.youtube.com/watch?v=Lo0FDmSbTp4


GPU MODE ▷ #general (10 messages🔥):

Apple Silicon, Cloud GPUs, CUDA, Metal, ROCm


GPU MODE ▷ #triton (2 messages):

fp8 quantization, fp32 accumulation, Triton matmul, Custom CUDA kernels, AMD


GPU MODE ▷ #torch (5 messages):

Torch Logger Methods Compilation, AOT Inductor Multithreading


GPU MODE ▷ #announcements (1 messages):

AMD MI300 competition, MoE kernels, FP8 submissions


GPU MODE ▷ #beginner (1 messages):

raymondz4gewu_60651: /get-api-url


GPU MODE ▷ #torchao (22 messages🔥):

Quantized Models and torch.bfloat16, vllm Compile Integration Debugging, gemlite Kernel Selection, torch.compile Debugging Challenges, torch.dtype Extensibility


GPU MODE ▷ #rocm (3 messages):

ROCm memory, CDNA3 ISA


GPU MODE ▷ #metal (3 messages):

QR decomposition, SIMD, Thread barriers, Single-threaded SVD


GPU MODE ▷ #self-promotion (3 messages):

GPU Price Tracker, AI/ML Engineer for Hire, Open Source IDE for AI/ML


GPU MODE ▷ #thunderkittens (1 messages):

Use Cases, Performance


GPU MODE ▷ #general (15 messages🔥):

FP8 quantization material, FP8 matmul, Deepseek-v3 tech report, prefixsum ranked timeout


GPU MODE ▷ #submissions (60 messages🔥🔥):

vectoradd benchmark on H100, amd-fp8-mm benchmark on MI300, amd-mixture-of-experts benchmark on MI300, prefixsum benchmark on H100, A100, matmul benchmark on L4


GPU MODE ▷ #status (2 messages):

Single GPU MoE Kernel, FP8 and MoE Kernels, Leaderboard Submissions


GPU MODE ▷ #amd-competition (23 messages🔥):

Aithe reference code, FP8 correctness verification, Submission ID, official problem writeup for this kernel


GPU MODE ▷ #cutlass (1 messages):

vkaul11: Are there kernels available to do fp8 multiplication with fp32 accumulation ?


OpenAI ▷ #ai-discussions (88 messages🔥🔥):

ChatGPT persistent memory, AI Agent Company, IAM360 Framework, AI-generated thumbnails


OpenAI ▷ #prompt-engineering (29 messages🔥):

Identity Systems in ChatGPT, Dynamic Game Master Role in RP, ChatGPT Internal Tools, Prompt Engineering Tips, LLM TTRPG game development


OpenAI ▷ #api-discussions (29 messages🔥):

Identity system in ChatGPT, RP prompt issues, Dynamic Game Master role, ChatGPT internal memory (bio tool), LLM TTRPG game development


Yannick Kilcher ▷ #general (79 messages🔥🔥):

PyQt5 Chat App, OR vs ML history, Gemini 2.5 Pro vs GPT-4o, Qwen 3 performance, FFN in Transformers


Yannick Kilcher ▷ #paper-discussion (8 messages🔥):

DeepSeek VL, Construction


Yannick Kilcher ▷ #ml-news (34 messages🔥):

Anonymous LLM on Reddit, ChatGPT's Convincing Skills, Meta's LlamaCon 2025, Llama 4 aka Little Llama, SAM 3 Development


Nous Research AI ▷ #announcements (1 messages):

Atropos RL framework, RLAIF models, GRPO tool calling, corporate fundamentals prediction, Psyche decentralized training network


Nous Research AI ▷ #general (110 messages🔥🔥):

Qwen 3 Overfitting, DeepSeek R2 Release, Huawei Ascend 910B, Atropos Release, Minos Model Refusals


Nous Research AI ▷ #ask-about-llms (2 messages):

Image loading issues


Cursor Community ▷ #general (101 messages🔥🔥):

VS Code Extension for Filtering .cs files in Git Changes View, Cursor Spending Limit Issues, Model Selection Purpose, Anthropic 3.7 Incident, Gemini 2.5 Pro Issues


HuggingFace ▷ #general (43 messages🔥):

Cloudflare Turnstile, whisper-large-v3-turbo issues, GGUF models and CPU offloading, Model Context Protocol (MCP), Fastest inference for running models


HuggingFace ▷ #cool-finds (1 messages):

cakiki: <@1298649243719958612> please don't cross-post


HuggingFace ▷ #i-made-this (9 messages🔥):

3D Animation Arena, Pi-Scorer alternative to LLM-as-a-Judge, HMR Models


HuggingFace ▷ #computer-vision (2 messages):

Defect annotation, Image masking, Filter usage


HuggingFace ▷ #agents-course (40 messages🔥):

Hugging Face Agents certification, Agents.json vs Prompts.yaml, Llama-3 access request, Models temporarily unavailable, Solving the final project with free resources


Notebook LM ▷ #announcements (1 messages):

Audio Overviews, Multilingual Support


Notebook LM ▷ #use-cases (28 messages🔥):

NotebookLM language support, Audio Overview limitations, Concise explanations, Smarter Models


Notebook LM ▷ #general (65 messages🔥🔥):

NotebookLM Updates, Multi-Language Support, Audio Overview Issues, Interactive Mode Bugs, Podcast Feature Requests


Manus.im Discord ▷ #general (75 messages🔥🔥):

Add on Credits, Manus Fellow Program, Manus Referral Program, Manus Credit System, Beta Testing


Latent Space ▷ #ai-general-chat (51 messages🔥):

X-Ware Red, Llama Prompt Ops, LLM Benchmarks Survey


Modular (Mojo 🔥) ▷ #general (13 messages🔥):

Bending Origins in Mojo, Origin-related headaches, Multiple Licenses in Modular Repository, Pointer usage to avoid origin issues


Modular (Mojo 🔥) ▷ #mojo (11 messages🔥):

importing Python packages, profiling blocks of code, SIMD width, vector strip-mining, flamegraph


LlamaIndex ▷ #blog (2 messages):

GPT-4o generates Tetris, PapersChat indexes papers


LlamaIndex ▷ #general (17 messages🔥):

Azure OpenAI timeouts, MessageRole.FUNCTION vs MessageRole.TOOL, Function agent and context issues


Eleuther ▷ #general (9 messages🔥):

RAG Chatbot challenges, GraphRAG for multiple sources, Local inference and small model training, Collaborating on AI research


Eleuther ▷ #research (10 messages🔥):

Recursive Symbolic Prompts, LLM Honesty Compliance, HHH Objectives in LLMs


MCP (Glama) ▷ #general (12 messages🔥):

Credential Passing, RAG type server for client file ingestion, Streamable HTTP Implementation and Authentication, Multi-Tenant Server Hosting, Open Source Models for Agentic Applications


MCP (Glama) ▷ #showcase (1 messages):

MCP Server, Real Time Push Notifications


Torchtune ▷ #dev (9 messages🔥):

foreach optimization, gradient scaling, DoRA + QAT


DSPy ▷ #general (6 messages):

MCP Usage, Displaying thoughts component in React


Cohere ▷ #💬-general (1 messages):

Markdown-based vs Image-based multimodal RAG on PDFs, Docling, EmbedV4


Cohere ▷ #🔌-api-discussions (2 messages):

Cohere rate limits for embed-v4, Embed V4 on Bedrock


Cohere ▷ #🤝-introductions (2 messages):

Cohere's Embed V4 model, Data Scientists introductions


Nomic.ai (GPT4All) ▷ #general (5 messages):

Embeddings, GPT4All, Manus AI, Embedding grouping


Gorilla LLM (Berkeley Function Calling) ▷ #discussion (3 messages):

Loose vs Strict Evaluation, Model Training Inconsistencies