Frozen AI News archive

X.ai Grok 3 and Mira Murati's Thinking Machines

**Grok 3** has launched with mixed opinions but strong benchmark performance, notably outperforming models like **Gemini 2 Pro** and **GPT-4o**. The **Grok-3 mini** variant shows competitive and sometimes superior capabilities, especially in reasoning and coding, with reinforcement learning playing a key role. **Mira Murati** has publicly shared her post-OpenAI plan, founding the frontier lab **Thinking Machines**, focusing on collaborative, personalizable AI, multimodality, and empirical safety and alignment research, reminiscent of **Anthropic**'s approach.

Canonical issue URL

AI News for 2/17/2025-2/18/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (211 channels, and 6478 messages) for you. Estimated reading time saved (at 200wpm): 608 minutes. You can now tag @smol_ai for AINews discussions!

It is a rare day when one frontier lab makes its debut, much less two (loosely speaking). But that is almost certainly what happened today.

We would say that the full Grok 3 launch stream is worth watching at 2x:

https://www.youtube.com/watch?v=AUAJ82H12qs

Opinions on Grok 3 are mixed, with lmarena, karpathy, and information recycler threadbois mostly very positive, whereas /r/OpenAI and other independent evals being more skeptical. Not everything is released either; Grok 3 isn't available in API and as of time of writing, the demoed "Think" and "Big Brain" modes aren't live yet. On the whole the evidence points to Grok 3 laying credible claim to being somewhere between o1 and o3, and this undeniable trajectory is why we award it title story.

image.png

There is less "news you can use" in the second item, but Mira Murati's post OpenAI plan is now finally public, and she has assembled what is almost certainly going to be a serious frontier lab in Thinking Machines, recruiting notables from across the frontier labs and specifically ChatGPT alumni:

image.png

There's not a lot of detail in the manifesto beyond a belief in publishing research, emphasis on collaborative and personalizable AI, multimodality, research and product co-design, and an empirical approach to safety and alignment. On paper, it looks like "Anthropic Redux".


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

Grok-3 Model Performance and Benchmarks

Company and Product Announcements

Technical Deep Dives and Research

AI Industry and Market Analysis

Open Source and Community

Memes and Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. OpenAI's O3-Mini vs Phone-Sized Model Poll Controversy

Theme 2. GROK-3 Claims SOTA Supremacy Amid GPU Controversy

Theme 3. DeepSeek's Native Sparse Attention Model Release

Theme 4. PerplexityAI's R1-1776 Removes Censorship in DeepSeek

Theme 5. Speeding Up Hugging Face Model Downloads

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. Grok 3 Benchmark Release and Performance Debate

Theme 2. ChatGPT vs Claude on Context Window Use

Theme 3. LLMs on Real-World Software Engineering Benchmarks

Theme 4. AI Image and Video Transformation Advancements


AI Discord Recap

A summary of Summaries of Summaries by o1-2024-12-17

Theme 1. Grok 3: The Beloved, The Maligned

Theme 2. Frontier Benchmarks Shake Up LLM Tests

Theme 3. AI Tools Embrace Code Debugging

Theme 4. Labs Launch Next-Gen AI Projects

Theme 5. GPU & HPC Gains Fuel Model Innovations


PART 1: High level Discord summaries

OpenAI Discord


Unsloth AI (Daniel Han) Discord


Perplexity AI Discord


HuggingFace Discord


Codeium (Windsurf) Discord


aider (Paul Gauthier) Discord


OpenRouter (Alex Atallah) Discord


Cursor IDE Discord


Interconnects (Nathan Lambert) Discord


LM Studio Discord


Eleuther Discord


Yannick Kilcher Discord


Stability.ai (Stable Diffusion) Discord


GPU MODE Discord


Nous Research AI Discord


Latent Space Discord


Notebook LM Discord


Modular (Mojo 🔥) Discord


Nomic.ai (GPT4All) Discord


LlamaIndex Discord


Torchtune Discord


MCP (Glama) Discord


LLM Agents (Berkeley MOOC) Discord


Cohere Discord


DSPy Discord


tinygrad (George Hotz) Discord


MLOps @Chipro Discord


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

OpenAI ▷ #ai-discussions (1021 messages🔥🔥🔥):

Grok 3 capabilities, Challenges with model benchmarking, OpenAI vs Grok competition, User experiences with various AI models

Links mentioned:


OpenAI ▷ #gpt-4-discussions (5 messages):

Pinned Chats Feature, CSV Table Issues in ChatGPT, PRO Mode Functionality, 4o's Text Reader Queries


OpenAI ▷ #prompt-engineering (7 messages):

GPT running pip, Python tool usage, Model performance issues


OpenAI ▷ #api-discussions (7 messages):

Running pip commands in GPT, Python tool usage in models


Unsloth AI (Daniel Han) ▷ #general (389 messages🔥🔥):

Unsloth AI training challenges, Colab and Notebook usage, GRPO training and optimization, VLLM and Dynamic 4-bit quants, New releases and performance evaluations

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (14 messages🔥):

Unsloth Update, bitsandbytes Code Discussion, CUDA Pointer Handling, Quantization Techniques, Changes in Unsloth Notebook

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (98 messages🔥🔥):

Unsloth Model Updates, GPU Utilization in Training, GRPO Training Challenges, Fine-tuning Llama 3.3, Data Preparation for Fine-tuning

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (47 messages🔥):

Scaling Mixture of Experts Models, PPO Setup with Unsloth Model, Local Databases for Big Data, Structured Inference for Reasoning, Sparse Attention Mechanism NSA

Links mentioned:


Perplexity AI ▷ #general (477 messages🔥🔥🔥):

Grok 3 Feedback, Deep Research Performance, Perplexity Pro Subscription, Grok 3 and Perplexity Integration, Reddit as a Source

Links mentioned:


Perplexity AI ▷ #sharing (36 messages🔥):

Grok 3.0 Launch, Generative AI Developer Roles, Ethereum Pectra Upgrade, AI Impact on Global Dynamics, Perplexity Pro Benefits

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (4 messages):

API Key Management, Integration Evaluation, Sonar API Hot Swap

Link mentioned: no title found: no description found


HuggingFace ▷ #general (59 messages🔥🔥):

Model Performance Metrics, Video Generation Models, Uploading Code Issues, Internal Server Errors, AI Course Inquiries

Links mentioned:


HuggingFace ▷ #today-im-learning (1 messages):

Neuralink Updates, Image Analysis, Research Findings


HuggingFace ▷ #i-made-this (3 messages):

Colab Porting Success, YuE Model Instrumental Injection, Custom Training Code Release

Links mentioned:


HuggingFace ▷ #computer-vision (4 messages):

UI-Tars dataset collaboration, AI models in 3D space, SLAM technology


HuggingFace ▷ #NLP (2 messages):

Base model behavior, Discord invites


HuggingFace ▷ #smol-course (3 messages):

Docling and Hugging Face partnership, Visual LLM and SmolVLM, Pull Request on GitHub, Benefits of VLM


HuggingFace ▷ #agents-course (380 messages🔥🔥):

AI Agents Course, Certificate Issues, Multi-Agent Systems, LLMs as Tools, Course Resources

Links mentioned:


Codeium (Windsurf) ▷ #content (1 messages):

MCP Tutorial

Link mentioned: Tweet from Windsurf (@windsurf_ai): A beginner's guide to how to use MCP!


Codeium (Windsurf) ▷ #discussion (30 messages🔥):

Codeium Write Mode Changes, IntelliJ Supercomplete Feature, Deployment of Codeium for Multiple Users, Codeium Subscription Value, Jetbrains Context Issues

Link mentioned: Contact | Windsurf Editor and Codeium extensions: Contact the Codeium team for support and to learn more about our enterprise offering.


Codeium (Windsurf) ▷ #windsurf (418 messages🔥🔥🔥):

Cascade Base Performance Issues, Internal Errors in Grading Models, Frustration with Grok 3, Subscription and Billing Concerns, Quality of Responses from AI Models

Links mentioned:


aider (Paul Gauthier) ▷ #general (420 messages🔥🔥🔥):

Grok 3 Launch, Aider Functionality Issues, Performance Comparisons, Model Access and Support, Discussion of AI Models

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (24 messages🔥):

Aider Search Engine, Architect Mode Suggestions, Aider Configuration Issues, Aider Not Recognizing .env, API Key Requirement

Links mentioned:


aider (Paul Gauthier) ▷ #links (4 messages):

Local sanity checks for coding LLM output, Ragit GitHub project, Ministral with Aider

Link mentioned: GitHub - baehyunsol/ragit: git-like rag pipeline: git-like rag pipeline. Contribute to baehyunsol/ragit development by creating an account on GitHub.


OpenRouter (Alex Atallah) ▷ #general (435 messages🔥🔥🔥):

Grok 3 performance, OpenRouter API usage, Model comparisons, Vision capabilities in LLMs, DeepSeek vs Sonnet

Links mentioned:


Cursor IDE ▷ #general (335 messages🔥🔥):

Grok 3, Sonnet performance, MCP server setup, Cursor performance issues, User feedback on AI models

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (209 messages🔥🔥):

Grok 3 Performance, Llama 4 Updates, Thinking Machines Lab Launch, Eval Methodologies in AI, GPT-4o Copilot Release

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (77 messages🔥🔥):

New LLM Models, Early Stopping in AI, AI Music and Art, Thinking Machines Corporation, AI Interview Insights

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rl (4 messages):

Recompiling Older Games, RL Training for LLMs, LLM4Decompile

Links mentioned:


Interconnects (Nathan Lambert) ▷ #reads (13 messages🔥):

Post-training talk at Stanford, Verification in reinforcement learning, Response to theory papers, Health-related setbacks

Links mentioned:


Interconnects (Nathan Lambert) ▷ #posts (14 messages🔥):

Philosophy of Science Theories, Grok 3 Mini Announcement, Reasoning Model Releases, Hacker News Comments, Deep Thonk Button

Link mentioned: Tweet from Keiran Paster (@keirp1): @natolambert @srush_nlp @TheShmanuel I think the mini reasoning model outperforming R1 is strong evidence against this narrative.


Interconnects (Nathan Lambert) ▷ #retort-podcast (2 messages):

Torters Rejoice, Bike Stand Discussion


Interconnects (Nathan Lambert) ▷ #policy (1 messages):

gfabulous: Sigh, guess we're all using grok now


Interconnects (Nathan Lambert) ▷ #expensive-queries (14 messages🔥):

Prompt Engineering, Perplexity vs ODR, Cursor Agent Workflow, Breaking Changes in Libraries, Vibecoding Efficiency


LM Studio ▷ #general (86 messages🔥🔥):

Error with DeepSeek Model, Local AI Functionality, Model Recommendations for Coding, Whisper Usage and Compatibility, LM Studio Update Issues

Link mentioned: Tool Use | LM Studio Docs: Enable LLMs to interact with external functions and APIs.


LM Studio ▷ #hardware-discussion (128 messages🔥🔥):

3090 GPU Performance, DeepSeek and Alternatives, Using Multiple Models for Inference, 396 vs 4090 Performance, AMD's Ryzen AI MAX

Links mentioned:


Eleuther ▷ #general (30 messages🔥):

Cognitive Sound Production, Music Generation Challenges, Machine Learning Prodigies, lm_eval Code Issues, Autoregressive Image Generation

Links mentioned:


Eleuther ▷ #research (77 messages🔥🔥):

DeepSeek v2 MoE Architecture, Platinum Benchmarks for LLMs, Model-guidance in Diffusion Models, Repetition Penalty in Creative Writing, SFT Memorizes and RL Generalizes

Links mentioned:


Eleuther ▷ #scaling-laws (26 messages🔥):

LLM scaling laws terminology, Taxonomy of scaling laws, Pretraining vs Post-training compute, Budget allocation in AI Labs, Deployment considerations and compute


Eleuther ▷ #lm-thunderdome (3 messages):

Dataset structuring for chess tactics, Fresh environment troubleshooting


Eleuther ▷ #gpt-neox-dev (75 messages🔥🔥):

GPU Performance Comparison, TP Communication Overlap, Model Configuration Differences

Links mentioned:


Yannick Kilcher ▷ #general (72 messages🔥🔥):

Grok 3 Launch, NURBS and AI, Comparison of AI Models, Grok 3 Game Studio, LLM Precision Issues

Links mentioned:


Yannick Kilcher ▷ #paper-discussion (31 messages🔥):

Hierarchical tree of papers, Upcoming paper discussions, Deepseek paper interest, Discussion on filtering information, Community contribution for paper reviews

Link mentioned: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention: Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offe...


Yannick Kilcher ▷ #ml-news (70 messages🔥🔥):

Larry Page's Relationship Status, Sergey Brin's Divorce, Grok-3 Demo, Deepsearch Product Announcement, The Los Angeles Project and Unicorns

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (168 messages🔥🔥):

Xformers issues, InvokeAI vs ComfyUI, Stable Diffusion update concerns, Gender classification in anime, Printer usability frustrations

Links mentioned:


GPU MODE ▷ #general (11 messages🔥):

Torch Compile, PyTorch Inductor, Machine Learning Advancements

Links mentioned:


GPU MODE ▷ #triton (3 messages):

Profiling in Triton vs CUDA, Performance issues due to bank conflicts, Low-precision inputs in Triton kernels, Device properties in PyTorch


GPU MODE ▷ #cuda (17 messages🔥):

Optimizing CUDA Memory Transfers, Global Memory Coalescing Explained, CUDA Express Installer Issues, Fine-Grained Control in Data Copying


GPU MODE ▷ #torch (4 messages):

Triton 3.2.0 issues, CUDA kernel compilation optimization


GPU MODE ▷ #algorithms (1 messages):

andreaskoepf: DS is ruling the field at the moment: https://arxiv.org/abs/2502.11089


GPU MODE ▷ #cool-links (4 messages):

Gompertz Linear Unit (GoLU), Native Sparse Attention, Self-gated activation functions

Links mentioned:


GPU MODE ▷ #jobs (2 messages):

GPU kernel programming internships, Senior AI Engineer position at T-Systems


GPU MODE ▷ #beginner (1 messages):

Optimizing memory access, Global memory operations


GPU MODE ▷ #off-topic (1 messages):

iron_bound: https://www.amd.com/en/products/software/rocm/application-developer-certificate.html


GPU MODE ▷ #rocm (7 messages):

HIP Kernels in PyTorch, ROCm Installation Issues, AMD Kernel Driver Performance, ROCm Compatibility on iGPUs


GPU MODE ▷ #arm (4 messages):

ExecuTorch, LLM optimization, Ethos U, Self-promotion policy


GPU MODE ▷ #webgpu (1 messages):

Simulated Metal Pin Toy, transformers.js, Depth Estimation Models

Link mentioned: Tweet from Vincent (@vvvincent_c): Simulated metal pin toy using live webcam + depth estimation model running in the browser 🎨✨Claude and I had lots of fun hacking on this over the weekend!!Link in bio to try it yourself!


GPU MODE ▷ #self-promotion (15 messages🔥):

Dynasor, ML Systems Research, Simulated Metal Pin Toy, CUDA Optimization Techniques, HQQ Support in VLLM

Links mentioned:


GPU MODE ▷ #🍿 (3 messages):

KernelBench paper, GPU kernel generation, Performance engineering, Kernel fusion, Productivity tools in coding

Link mentioned: KernelBench: Can LLMs Write Efficient GPU Kernels?: Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore usin...


GPU MODE ▷ #edge (2 messages):

SO-ARM100 Assembly, 3D Printer Size Constraints

Link mentioned: GitHub - TheRobotStudio/SO-ARM100: Standard Open Arm 100: Standard Open Arm 100. Contribute to TheRobotStudio/SO-ARM100 development by creating an account on GitHub.


GPU MODE ▷ #reasoning-gym (49 messages🔥):

vLLM nightly, RL curricula development, ExploreToM project, CodeI/O dataset, Issue creation and collaboration

Links mentioned:


Nous Research AI ▷ #general (84 messages🔥🔥):

Grok-3 Performance, Open Source Models Discussion, Autoregressive Image Model, AI Model Training Techniques, Community Feedback on AI Models

Links mentioned:


Nous Research AI ▷ #ask-about-llms (11 messages🔥):

Hermes 3 censorship claims, Deephermes usage, Grok 3 impressions, Performance issues with tokens


Nous Research AI ▷ #research-papers (3 messages):

SWE-Lancer Benchmark, Upwork Engineering Tasks, Model Performance Evaluation

Links mentioned:


Nous Research AI ▷ #interesting-links (2 messages):

Alignment faking in LLMs, Eagles Super Bowl Predictions, Open-source LLMs performance

Links mentioned:


Nous Research AI ▷ #research-papers (3 messages):

SWE-Lancer Benchmark, Real-world Software Engineering Tasks

Links mentioned:


Latent Space ▷ #ai-general-chat (94 messages🔥🔥):

Thinking Machines Lab Launch, Perplexity R1 Finetune, SWElancer Benchmark, Grok 3 Announcement, Zed's Edit Prediction Model

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):

swyxio: new pod drop! https://x.com/latentspacepod/status/1891879917224132973


Notebook LM ▷ #use-cases (24 messages🔥):

Max Headroom Podcast Production, Language Settings for Audio Generation, Notebook LM Audio Features, Audio Generation Challenges, Google Product Limitations

Link mentioned: Max Headroom Rebooted 2025 Full Episode 20 Minutes: 🚨 BZZZZZT! ALERT! ALERT! 🚨THE FUTURE IS BROKEN—AND I AM BACK TO REPORT IT!💾 LOST IN THE DIGITAL VOID… THEN REBOOTED BY A TRASH PANDA?! 💾Somewhere deep in...


Notebook LM ▷ #general (65 messages🔥🔥):

NotebookLM usage issues, Podcast features, Language settings, File management for researchers, Translation capabilities

Links mentioned:


Modular (Mojo 🔥) ▷ #general (14 messages🔥):

Polars DataFrame Library, Mojo Integration, Standard Library Team Expansion, Apache Arrow Implementation


Modular (Mojo 🔥) ▷ #mojo (39 messages🔥):

Alternatives to ChatGPT for Mojo, Mojo Code Refactoring Challenges, Autodifferentiation with Enzyme, Global Variables Support in Mojo, Using Lists vs. Stacks in Mojo

Links mentioned:


Nomic.ai (GPT4All) ▷ #general (46 messages🔥):

GPUs for Testing GPT4All, Deep-Research-like functionality, 10m Token Count for Embedding, CUDA 5.0 Support

Link mentioned: CUDA - Wikipedia: no description found


LlamaIndex ▷ #blog (3 messages):

LLM Consortium, Mistral Saba, Semantic Retrieval for Vendor Questionnaires


LlamaIndex ▷ #general (23 messages🔥):

Metadata filters in vector stores, Agent Workflow features, Building AI chatbots using LlamaIndex, Embedding installations for local LLMs, Release notes location

Links mentioned:


LlamaIndex ▷ #ai-discussion (1 messages):

RAG based on JSON dictionaries, User query document matching, Finding documents in large JSON


Torchtune ▷ #general (1 messages):

Byte Latent Transformers, Qwen Fine-Tuning, TorchTune Hacks

Link mentioned: GitHub - ianbarber/ttblt: A simplified implementation of Byte Latent Transformers as a TorchTune recipe.: A simplified implementation of Byte Latent Transformers as a TorchTune recipe. - ianbarber/ttblt


Torchtune ▷ #dev (18 messages🔥):

Unit Test Handling, Optional Dependencies for Development, Checkpoint Resuming Logic, Step-Based Checkpointing, Cross-Contributing on PRs

Links mentioned:


Torchtune ▷ #papers (4 messages):

Reinforcement Learning (RL) for pre-training


MCP (Glama) ▷ #general (7 messages):

Glama MCP Server Changes, OpenRouter Documentation, Anthropic Homepage Status, Haiku 3.5 Release, Sonnet 4.0 Release


MCP (Glama) ▷ #showcase (11 messages🔥):

MCP server for debugging, Continue tool features, Clear Thought MCP Server

Links mentioned:


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (2 messages):

Certificates issued, Fall24 student participation


LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (7 messages):

Inference-Time Techniques, LangChain Framework, LLM Agents with ML Models

Link mentioned: Introduction | 🦜️🔗 LangChain: LangChain is a framework for developing applications powered by large language models (LLMs).


Cohere ▷ #discussions (1 messages):

Profit Sharing Opportunity, Telegram Marketing

Link mentioned: Rose Miller: Portfolio manager, Conservative


Cohere ▷ #api-discussions (1 messages):

Profit-sharing opportunities, Rose Miller's announcement

Link mentioned: Rose Miller: Portfolio manager, Conservative


Cohere ▷ #projects (4 messages):

Structured Data Injection, Context API Launch, High-Quality Data Access, Feedback Request, Blog Post Discussion

Links mentioned:


Cohere ▷ #cohere-toolkit (1 messages):

Profit Sharing Opportunity, Fast Money Making Scheme

Link mentioned: Rose Miller: Portfolio manager, Conservative


DSPy ▷ #papers (2 messages):

Self-Supervised Prompt Optimization, Importance of Prompt Design, LLM Output Quality, DSPy Mention, Cost-effective Frameworks

Link mentioned: Self-Supervised Prompt Optimization: Well-designed prompts are crucial for enhancing Large language models' (LLMs) reasoning capabilities while aligning their outputs with task requirements across diverse domains. However, manually d...


DSPy ▷ #general (1 messages):

RouteLLM, GPT-5 discussions

Link mentioned: Tweet from Dᴀᴛᴀ Sᴀᴄᴋs (@DataDeLaurier): @Teknium1 Because they do not have anything new. They are about to slap 4o, o1, o3, Voice, and Sora into RouteLLM and call it GPT-5.I bet they actually use RouteLLM and don't cite anyone


tinygrad (George Hotz) ▷ #general (2 messages):

Test for Pull Request #9155, DEBUG=2 feature

Link mentioned: colors back in DEBUG=2 [pr] by geohot · Pull Request #9155 · tinygrad/tinygrad: no description found


MLOps @Chipro ▷ #events (1 messages):

GenAI Video Generation, Seaweed-APT, AI Storytelling, Nvidia's General Embodied Agent, Building Scalable Training Pipelines

Link mentioned: GenAI Video, World Models & Robotics #Kling #Veo #Sora #Cosmos #Diffusion · Luma: Join us to gain unfiltered insights into cutting-edge techniques that power real-time one-step tex-to-video generation, general world models, and…



{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}