Frozen AI News archive

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

**Together AI and Agentica** released **DeepCoder-14B**, an open-source 14B parameter coding model rivaling OpenAI's **o3-mini** and **o1** on coding benchmarks, trained with an open-source RL framework from ByteDance and costing about **$26,880**. **Google DeepMind** launched **Gemini 2.5 Pro** with experimental "Flash" versions available to subscribers. **Moonshot AI** introduced **Kimi-VL-A3B**, a multimodal model with **128K context** outperforming **gpt-4o** on vision and math benchmarks. **Meta AI** released **Llama 4 Scout** and **Maverick**, with a larger **Behemoth** model in training, featuring mixture-of-experts and L2 norm techniques. **Runway** launched **Gen-4 Turbo** with 10x better results than Gen-3 at the same cost. **Google** announced **Imagen 3**, a high-quality text-to-image model now in Vertex AI, enabling easier object removal. The report highlights open-source contributions, reinforcement learning training optimizations, and significant model performance improvements across coding, multimodal, and image generation domains.

Canonical issue URL

AI News for 4/7/2025-4/8/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (229 channels, and 7279 messages) for you. Estimated reading time saved (at 200wpm): 692 minutes. You can now tag @smol_ai for AINews discussions!

After the DeepSeek R1 launch (our coverage here), a raft of "R1 but more open" clone attempts emerged, of which it seems only HuggingFace's OpenR1 is still posting active updates, if you discount the distillation work. However, today Together and the Agentica Project (previously of the DeepScaleR work) have come out with a 14B code-focused reasoning model that scores at O3-mini level:

image.png

Usually these projects are easy to game and therefore unremarkable, but this project distinguishes it self by being fully open source - dataset, code, recipe and all, meaning the educational value is high, particularly given the prior work of its collaborators.

Specifically for RL training, they note the sampler bottleneck:

image.png

so they have very good thoughts on pipelining:

image.png

and they also propose an update to DeepSeek's GRPO:

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

Model Releases and Updates

Evaluations and Benchmarks

Agentic Systems and Tooling

Industry Analysis

Humor/Memes


AI Reddit Recap

Our pipelines had an outage yesterday. Sorry!


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1: Model Mania: Gemini Reigns, Llama 4 Stumbles, New Contenders Emerge

Theme 2: Training & Fine-Tuning Frontiers

Theme 3: Tools & Platforms: Updates, Bugs, and Battles

Theme 4: The AI Ecosystem: Research, Rumors, and Real-World Use

Theme 5: GPU & Hardware Hustle


PART 1: High level Discord summaries

LMArena Discord


Unsloth AI (Daniel Han) Discord


OpenRouter (Alex Atallah) Discord


Cursor Community Discord


LM Studio Discord


Perplexity AI Discord


Manus.im Discord Discord


aider (Paul Gauthier) Discord


Notebook LM Discord


Interconnects (Nathan Lambert) Discord


Eleuther Discord


Nous Research AI Discord


GPU MODE Discord


HuggingFace Discord


MCP (Glama) Discord


Latent Space Discord


Yannick Kilcher Discord


Nomic.ai (GPT4All) Discord


Modular (Mojo 🔥) Discord


tinygrad (George Hotz) Discord


LlamaIndex Discord


Cohere Discord


Torchtune Discord


DSPy Discord


LLM Agents (Berkeley MOOC) Discord


MLOps @Chipro Discord


Codeium (Windsurf) Discord


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

LMArena ▷ #general (1134 messages🔥🔥🔥):

Gemini 2.5 Pro, OpenAI's Deep Research, Google's AI Strategy, DeepCoder-14B Preview Model, NightWhisper Model


LMArena ▷ #announcements (1 messages):

Alpha UI, Desktop & Mobile, Bugs, Leaderboard


Unsloth AI (Daniel Han) ▷ #general (586 messages🔥🔥🔥):

Unsloth DDP Support, GGUF vs bnb LoRA training, Llama 4 Analysis, cogito-v1 preview LLMs


Unsloth AI (Daniel Han) ▷ #off-topic (21 messages🔥):

iMatrix Dynamic Uploads, Apple BFloat, Model Pruning, Online DPO


Unsloth AI (Daniel Han) ▷ #help (175 messages🔥🔥):

GraniteModel bug, Unsloth on MacOS, Multi-GPU Support, Gemma 3 12b issues, GRPO training


Unsloth AI (Daniel Han) ▷ #showcase (2 messages):

Location clarification


Unsloth AI (Daniel Han) ▷ #research (36 messages🔥):

LLMs knowledge storage alternatives, RAG for memory offloading, Vector DBs and privacy, Retrieval augmented training, DeepSeek-V3


OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

Rate Limits, Credits, Quasar Rate Limit, Feedback on Rate Limiting


OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Olympia.chat, Shopify, SaaS Marketing, Turnkey Operation


OpenRouter (Alex Atallah) ▷ #general (758 messages🔥🔥🔥):

OpenRouter Frontend, Quasar Open Sourced, Free Model Rate Limits, API Keys Please, Gemini


Cursor Community ▷ #general (762 messages🔥🔥🔥):

Augment, Vector DB vs graph DB, Manus.im, Cursor C/C++ extension error, Model selection


LM Studio ▷ #general (158 messages🔥🔥):

Llama 4 Disappointment, GPU requirements and model sizes, LM Studio and Ollama, Jinja templates


LM Studio ▷ #hardware-discussion (398 messages🔥🔥):

Docker Bad, AMD ROCm WSL Woes, Memory Limits and Motherboards, Umbrella Rack SuperComputer, Fast Reading Skills


Perplexity AI ▷ #announcements (1 messages):

Perplexity for Startups program, API Credits, Enterprise Pro


Perplexity AI ▷ #general (453 messages🔥🔥🔥):

Gemini 2.5 Pro performance, Deep Research High rollout, Perplexity Discover tab, Manus Invites are still needed, AI image generation on Android


Perplexity AI ▷ #sharing (1 messages):

Llama 4, Benchmark Faking


Perplexity AI ▷ #pplx-api (29 messages🔥):

Perplexity API News Fetching, Perplexity API Sonar Prompting, Perplexity API Search Discrepancies, Perplexity API Citations, Perplexity API Sandbox


Manus.im Discord ▷ #general (463 messages🔥🔥🔥):

High Effort Mode, Manus Local Version, Genspark vs Manus, Llama 4 hype, Manus Credit Usage


aider (Paul Gauthier) ▷ #general (237 messages🔥🔥):

Gemini 2.5 vs Sonnet Thinking, Aider's auto-testing, Gemini 2.5 Pro vs exp, OpenRouter citation links, AI resume builder


aider (Paul Gauthier) ▷ #questions-and-tips (8 messages🔥):

Architect mode interruptions, Aider Response Time, Aider Cursor Rules


aider (Paul Gauthier) ▷ #links (8 messages🔥):

Software Engineer Gap Year, LLMs as AI Coworkers, Programming LLMs for Successful Outcomes


Notebook LM ▷ #use-cases (10 messages🔥):

NotebookLM Commercial Options, NotebookLM privacy assurances, NotebookLM Misreading Scholarly Articles


Notebook LM ▷ #general (204 messages🔥🔥):

Discovery Mode rollout, Google Cloud Next and Google I/O, NotebookLM Legal Use cases, New Gemini features with deep research, Podcast Audio Overviews


Interconnects (Nathan Lambert) ▷ #news (92 messages🔥🔥):

DeepSeek R2 Release, LlamaCon, Llama-4-Maverick, Style Control Ranking, HF version of Llama-4-Maverick


Interconnects (Nathan Lambert) ▷ #ml-questions (30 messages🔥):

OpenAI Image Gen Capabilities, Logprob Reward, Arxiv Publishing, Arxiv Moderation, Phi-CTNL


Interconnects (Nathan Lambert) ▷ #ml-drama (15 messages🔥):

Google AI Staff, AI Sabbatical, NVDA Tariffs, ASI, Google's management vibes


Interconnects (Nathan Lambert) ▷ #random (12 messages🔥):

Google Cloud Next, Qwen 3 Launch, GPT 4.5 preferences, Claude Code Credits, Tim Apple


Interconnects (Nathan Lambert) ▷ #memes (5 messages):

Jiankui He's X ad revenue


Interconnects (Nathan Lambert) ▷ #rl (24 messages🔥):

DAPO papers, OLMo, Tulu 3, BoN Sampling


Interconnects (Nathan Lambert) ▷ #reads (3 messages):

Karan Dalal Post, Yuxi Liu Essay


Interconnects (Nathan Lambert) ▷ #posts (1 messages):

natolambert: My post looks generous next to Marcus’s, oh my


Eleuther ▷ #general (106 messages🔥🔥):

Adam second-moment estimate buffers, Google DeepMind Patents, Hierarchical Perceiver, AI Auditing Survey, GFlowNets


Eleuther ▷ #research (35 messages🔥):

QKNorm, Soft RL, Llama 4 Memorization, Critical Batch Size, Reward, Value, Q-value letters


Eleuther ▷ #interpretability-general (9 messages🔥):

Baranuik and Balestriero's works, ReLU networks, Boris Hanin's ReLU networks paper, ICML machine unlearning workshop


Eleuther ▷ #lm-thunderdome (1 messages):

LM Harness, HotpotQA, Llama Eval, GPT Models


Nous Research AI ▷ #general (127 messages🔥🔥):

Llama-4-Scout-17B, Gemini 2.5 Pro Code Generation, aider-chat & Gemini 2.5 Pro, HiDream-I1 Image Model, DeepCogito LLMs & IDA


Nous Research AI ▷ #ask-about-llms (4 messages):

LayerNorm Implementation, Llama4 Context Window, H100 Usage


Nous Research AI ▷ #interesting-links (18 messages🔥):

Distributed data parallel training, Untrusted low-cost compute, Nous DeMo paper, Gradient compression algorithm, P2P interruptible compute


GPU MODE ▷ #general (10 messages🔥):

GPUMODE triton dataset, PyTorch version for triton kernels, GPUMODE website improvements, GPUMODE Job Portal


GPU MODE ▷ #triton (14 messages🔥):

block_ptr usage, tl.load and boundary_check, Boundary checks and performance


GPU MODE ▷ #cuda (4 messages):

Deepseek communication library, NVSHMEM and Unified Virtual Addressing (UVA), LDSM (Local Data Share Memory), Optimized smem load


GPU MODE ▷ #torch (9 messages🔥):

TorchTitan's Compile Strategy, FSDP Numerical Issues, FSDP2 Model Extraction


GPU MODE ▷ #cool-links (5 messages):

CUDA physics simulation kernels go open source, Triton-Distributed, SMERF 3D


GPU MODE ▷ #jobs (2 messages):

Krea hiring, ML engineers, GPU cluster, diffusion models, interns


GPU MODE ▷ #beginner (15 messages🔥):

Graph Neural Networks (GNNs), Graph Attention Networks (GATs), CUDA compilation of C code, NVIDIA Streaming Multiprocessors, Thread cooperation in CUDA


GPU MODE ▷ #torchao (1 messages):

torchao 0.10.0 release, MXFP8 training, PARQ, Module Swap Quantization API, Low Bit Kernels


GPU MODE ▷ #off-topic (1 messages):

twzy: met yann lecun today and he seemed pissed


GPU MODE ▷ #self-promotion (9 messages🔥):

Tom and Jerry Diffusion Transformers, Nvidia Hopper Distributed Shared Memory, Verifying Untrusted Low-Cost Compute, LiveDocs Code Documentation


GPU MODE ▷ #🍿 (1 messages):

AlphaGeometry, KernelBench, GPU kernel generation


GPU MODE ▷ #reasoning-gym (6 messages):

Quasar Alpha, Reasoning Gym Levels, Curricula Tasks


GPU MODE ▷ #gpu模式 (3 messages):

DeepSeek Communication Library, NVSHMEM and UVA, Intra-node communication


GPU MODE ▷ #general (11 messages🔥):

Submitting .py files with inline CUDA, CUDA Kernels, Grayscale CUDA Example, torch::extension


GPU MODE ▷ #submissions (17 messages🔥):

vectoradd benchmarks, grayscale benchmarks, Modal runners


GPU MODE ▷ #feature-requests-and-bugs (5 messages):

Leaderboard discrepancy, CUDA submission failure


GPU MODE ▷ #hardware (3 messages):

A100 vs L40, FP8 support, 4bit weights, Open source w4a8 kernels, GPU Fryer tool


HuggingFace ▷ #general (52 messages🔥):

FP4 Fine-tuning, Parasail Inference Provider, Llama.cpp Llama 4 Support, Mobile SQL Generation Models, Multi-Agent AI Deployment


HuggingFace ▷ #today-im-learning (3 messages):

Ollama local deployment, NLP in HuggingFace


HuggingFace ▷ #cool-finds (1 messages):

Daily Papers Podcast, Takara TLDR


HuggingFace ▷ #i-made-this (3 messages):

AI Runner, GAPRS


HuggingFace ▷ #computer-vision (3 messages):

Monocular Depth Models, Segmentation Problem, Tools Recognition Task


HuggingFace ▷ #smol-course (4 messages):

Dataset forms, Unit 1 Quiz failing to load, Agents Build Errors, Chat templating exercises


HuggingFace ▷ #agents-course (26 messages🔥):

Code Agents Ch. 2 Notebook Issues, Gemini Models as Alternatives, Course FAQ Request, any-agent library release, RAG with smart glasses challenge


HuggingFace ▷ #open-r1 (13 messages🔥):

Deepseek R1, Active AI Discord Chats


MCP (Glama) ▷ #general (75 messages🔥🔥):

Semgrep MCP server, MCP HTTP Streaming, MCP and CORS errors, MCP Github server issues, MCP for Graph API application


MCP (Glama) ▷ #showcase (15 messages🔥):

Semgrep rewrites MCP, C# MCP SDK, ASGI style in process fastmcp sessions


Latent Space ▷ #ai-general-chat (62 messages🔥🔥):

Shopify AI Mandate, Anthropic API Credits, API Latency Benchmarking, Cybercriminals and AI, LLM Automated Exploitation


Yannick Kilcher ▷ #general (14 messages🔥):

Llama 4 flops on benchmarks, Bayesian Structural EM, Procedural model representation DNA, Meta should have clarified, Disrupt Science Hackathon Details


Yannick Kilcher ▷ #paper-discussion (12 messages🔥):

Fast.ai Diffusion Methods, F_A_E_S_I_k=2 Discussion, Open Source beautiful.ai Alternatives


Yannick Kilcher ▷ #agents (1 messages):

Efficient Tool Calling Templates, Cogito 14b


Yannick Kilcher ▷ #ml-news (9 messages🔥):

Adapting Pre-training Text, Diffusion Modeling to Control LLMs, Llama 4 Release Issues, Iterative Improvement Strategy


Nomic.ai (GPT4All) ▷ #general (28 messages🔥):

IBM Granite 8B, RAG references, docling OCR, semantic chunking server, ComfyUI image generation


Modular (Mojo 🔥) ▷ #general (4 messages):

MLX vs MAX, Apple Silicon GPU limitations, MAX capabilities


Modular (Mojo 🔥) ▷ #mojo (16 messages🔥):

Mojo vs Rust, __moveinit__ and __copyinit__ in Mojo, Returning values in Mojo, Span lifetime in Mojo


tinygrad (George Hotz) ▷ #general (5 messages):

Tensor Naming, GPU Programming, Compiler Development, Tinygrad Contribution Resources, PMPP 4th ed


tinygrad (George Hotz) ▷ #learn-tinygrad (12 messages🔥):

METAL sync issue, AMD performance with BEAM=2, ContextVar type, LLaMA sharding issue, Device info loss after sampling


LlamaIndex ▷ #blog (2 messages):

RAG workflow tutorial, Auth0 Auth for GenAI with LlamaIndex


LlamaIndex ▷ #general (13 messages🔥):

Gemini 2.5 Pro, Google's latest unified SDK, StructuredPlannerAgent Docs, Agent Planning Tool


Cohere ▷ #「💬」general (8 messages🔥):

Events Recording Availability, Structured Output Examples, Pydantic Schema Integration, API Requests without Cohere Package, Model Recommendation for Company List Generation


Cohere ▷ #「🔌」api-discussions (1 messages):

Vector Databases, Model Compatibility, Explicit Recommendations


Cohere ▷ #「🤖」bot-cmd (1 messages):

competent: Currently not working!


Cohere ▷ #「🤝」introductions (2 messages):

Introduction to Aditya, Machine vision and control, Innovation accelerator, Openchain.earth project, Tools used by Aditya


Cohere ▷ #【🟢】status-updates (1 messages):

competent: Should work!


Torchtune ▷ #general (1 messages):

Contributor Tag Request, Discord Roles


Torchtune ▷ #dev (6 messages):

DeepSpeed Integration, FSDP vs DeepSpeed, FSDP Sharding, zero1-3 training


DSPy ▷ #show-and-tell (1 messages):

MIPRO, Automated Prompt Engineering, Task Complexity Scaling


LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

Kaiyu Yang, AI4Math, Theorem Proving, Autoformalization


MLOps @Chipro ▷ #events (1 messages):

Manifold Research Group, Multimodal AI, self-assembling space robotics, robotic metacognition, Community Research Call #4


Codeium (Windsurf) ▷ #announcements (1 messages):

Codeium rename, Windsurf Reddit, Windsurf Plugins



{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}