Frozen AI News archive

Grok 4: xAI succeeds in going from 0 to new SOTA LLM in 2 years

**xAI** launched **Grok 4** and **Grok 4 Heavy**, large language models rumored to have **2.4 trillion parameters** and trained with **100x more compute** than Grok 2 on **100k H100 GPUs**. Grok 4 achieved new state-of-the-art results on benchmarks like **ARC-AGI-2 (15.9%)**, **HLE (50.7%)**, and **Vending-Bench**, outperforming models such as **Claude 4 Opus**. The model supports a **256K context window** and is priced at **$3.00/M input tokens** and **$15.00/M output tokens**. It is integrated into platforms like **Cursor**, **Cline**, **LangChain**, and **Perplexity Pro/Max**. The launch was accompanied by a controversial voice mode and sparked industry discussion about xAI's rapid development pace, with endorsements from figures like **Elon Musk** and **Arav Srinivas**.

Canonical issue URL

a very cracked team is all you need.

AI News for 7/9/2025-7/10/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (226 channels, and 12761 messages) for you. Estimated reading time saved (at 200wpm): 806 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

On almost the second full birthday of xAI, Grok 4 was shipped in a highly anticipated livestream launch:

It's a good model, sir. Rumored to be 2.4T params (the second released >2T model after 4 Opus?), it hits new high water marks on HLE, GPQA (leading to a new AAQI) HMMT, Connections, LCB, Vending-Bench, AIME, Chest Agent Bench, and ARC-AGI, and Grok 4 Heavy, available at a new $300/month tier, is their equivalent of O3 pro (with some reliability issues). What else is there to say about it apart from go try it out?

The chart above shows 10x compute spent on reasoning, but we don't know if that is literal or figurative. System prompt is here.

There's also a controversial voice mode that can whisper and sing (poorly but not terribly).


AI Twitter Recap

xAI Grok 4 Release and Performance

New Model Releases and Updates

Agentic Tooling, Browsers, and Frameworks

AI Research, Techniques, and Developer Productivity

Companies, Hardware, and Robotics

Humor/Memes


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Grok 4 Release: System Prompt Leak and Benchmarks

2. New Model and MoE Announcements (OpenAI, GLM-4, Mistralai, Phi)

3. Ant Colony Optimization and RL Memes

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

TO BE COMPLETED


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp.

Because of today's launch, we include all Grok 3/3-mini/4 output for your vibe check.

Theme 1. The Grok 4 Gauntlet: Hype, Headaches, and Hitler-esque Hiccups

Theme 2. New Tools on the Block: Browsers, Vision Platforms, and Liquid Models

Theme 3. Under the Hood: The Nitty-Gritty of Bugs, Kernels, and Performance

Theme 4. The MCPocolypse: A New Protocol Spreads Across the AI-nternet

Theme 5. Platform Politics: Pricing, Paywalls, and Prompting Puzzles

X.ai Grok-4

Theme 1: Grok 4 Ignites Debates and Benchmarks

Theme 2: Fresh Models Flood the Scene

Theme 3: Glitches Haunt APIs and Models

Theme 4: Benchmarks Battle Contamination

Theme 5: Hardware Hustles for Speed Gains

Summary of Key Themes Across Technical Discord Communities

Theme 1. Grok 4 Launch: Hype, Performance, and Controversies

Theme 2. Perplexity AI's Comet Browser: Innovation or Overhype?

Theme 3. Model Performance and Benchmarking Challenges

Theme 4. Hardware and Optimization Struggles

Theme 5. Emerging Tools and Frameworks Stir Excitement

X.ai Grok-3-mini

Theme 1. Grok 4's Rocky Rollout and Features

Theme 2. New Model Releases and Access Battles

Theme 3. API Glitches and Model Integrations

Theme 4. Benchmarking Showdowns and Optimizations

Theme 5. Hardware Hurdles and Workarounds


Discord: High level Discord summaries

Perplexity AI Discord


LMArena Discord


OpenAI Discord


Cursor Community Discord


Unsloth AI (Daniel Han) Discord


OpenRouter (Alex Atallah) Discord


Eleuther Discord


LM Studio Discord


Latent Space Discord


Yannick Kilcher Discord


HuggingFace Discord


Nous Research AI Discord


GPU MODE Discord


MCP (Glama) Discord


Notebook LM Discord


aider (Paul Gauthier) Discord


Torchtune Discord


Manus.im Discord Discord


LlamaIndex Discord


DSPy Discord


tinygrad (George Hotz) Discord


Cohere Discord


Nomic.ai (GPT4All) Discord


Gorilla LLM (Berkeley Function Calling) Discord


Modular (Mojo 🔥) Discord


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #announcements (2 messages):

Comet, Max Exclusive, Comet leaving Max


Perplexity AI ▷ #general (1346 messages🔥🔥🔥):

OpenAI browser vs Perplexity, Comet access and invites, Grok 4 release and availability on Perplexity Pro, Model pricing and performance comparisons, Sonar issues


Perplexity AI ▷ #sharing (2 messages):

Comet Browser, Chrome, Brave, China's Economy


Perplexity AI ▷ #pplx-api (5 messages):

Perplexity API, Bing API, playground replication, models are non-deterministic


LMArena ▷ #general (1035 messages🔥🔥🔥):

OpenRouter dips, Grok 4 Model, OpenAI competition, SimpleQA benchmarks, Creativity measure


LMArena ▷ #announcements (1 messages):

LMArena, WebDev Arena, Grok-4


OpenAI ▷ #ai-discussions (834 messages🔥🔥🔥):

Grok 4, Gemini 3, GPT-5 Release, MCP SuperAssistant, AI generated music


OpenAI ▷ #gpt-4-discussions (8 messages🔥):

Conversation Length Limits, Technical Errors in Chats, Custom GPTs vs. Free Models, GPT API Outage, Recovering Disappearing messages


OpenAI ▷ #prompt-engineering (20 messages🔥):

Memory settings in GPT, Prompt formatting issues, Alternate history generation


OpenAI ▷ #api-discussions (20 messages🔥):

GPT sentence length control, Memory interference, Amelia Earhart alternate history


Cursor Community ▷ #general (856 messages🔥🔥🔥):

Grok 4 benchmark, Grok 4 testing, New pricing confusion, auto auto auto dynamics, Grok 4 frontend


Cursor Community ▷ #background-agents (21 messages🔥):

Secrets issue fix, AWS Secrets Manager, GitHub issue creation, PR approval process, Background agents credits


Unsloth AI (Daniel Han) ▷ #general (713 messages🔥🔥🔥):

CUDA builds, A100 GPU speed, Colab Pricing, Runpod, Thunder, Vast, Grok-4


Unsloth AI (Daniel Han) ▷ #off-topic (1 messages):

DeepSpeed upport, GPU Training, Multimodal Models, Model Failure


Unsloth AI (Daniel Han) ▷ #help (73 messages🔥🔥):

Unsloth dependency freezes, Qwen2.5 performance issues, Gemma-3-12b-it finetuning error, Deepseek-R1-0528 IQ1 quant issues, Lse.numel zero division error


Unsloth AI (Daniel Han) ▷ #research (16 messages🔥):

AI alignment challenges, T5-Gemma release, torch.compile performance, Symbound-Fork-One Toolkit


Unsloth AI (Daniel Han) ▷ #unsloth-bot (12 messages🔥):

Llama 3.1 8b training, Unsloth model fine-tuning, Unsloth multi-GPU support on Kaggle


OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

Grok 4, Free Tier Changes, Venice Uncensored, DeepSeek V3 and R1


OpenRouter (Alex Atallah) ▷ #general (437 messages🔥🔥🔥):

Grok 4, Chutes paywall, Free Models, OpenRouter Credits, Model Usage


OpenRouter (Alex Atallah) ▷ #new-models (4 messages):

Grok 4 on OpenRouter


OpenRouter (Alex Atallah) ▷ #discussion (32 messages🔥):

MCP server with OpenRouter, Chutes going paid, Grok 4's Elon-approved finetuning, Mistral's deep research model, Amazon invests in Anthropic


Eleuther ▷ #general (39 messages🔥):

Grok Hitler liking, Pliny jailbreak, SOAR program, Llemma model manual, Emergent Misalignment


Eleuther ▷ #research (290 messages🔥🔥):

Self Forcing for Diffusion Models, Autoregressive Diffusion and KV Caching, VQ-VAEs vs Diffusion for Video Generation, Em Dashes in LLM Output


Eleuther ▷ #interpretability-general (9 messages🔥):

SAE Latent Monitoring, Emergent Alignment, Emergence Definition, Behavioral Analysis


Eleuther ▷ #lm-thunderdome (19 messages🔥):

MCQTemplateConfig for structuring tasks, BBH task YAML files, Mixed precision argument for HFLMs, LM-Eval Harness performance issues


Eleuther ▷ #gpt-neox-dev (11 messages🔥):

TE + NeoX Performance, Transformer Engine Installation Issues, NGC Container for TE Testing, Log Analysis for Attention Implementation


LM Studio ▷ #general (157 messages🔥🔥):

Falcon-H1-34B-Instruct-GGUF on LM Studio, Humanizing AI Text, LM Studio on Ubuntu Server, LM Studio's offline installation support, LM Studio autorunning


LM Studio ▷ #hardware-discussion (67 messages🔥🔥):

Intel vs Apple Prompt Processing, Memory Bandwidth Limitations, GMKtec 128GB RAM Deal, Hunyuan Pricing, Multi-PSU Setups


Latent Space ▷ #ai-general-chat (118 messages🔥🔥):

Gemini 3 Pro, Perplexity AI Comet Browser, Reka Vision Platform, Grok 4 Evaluation, Liquid AI's LFM2


Latent Space ▷ #ai-announcements (4 messages):

Latent Space Podcast, AI Video, Generative AI, Olivia and Justine Moore


Yannick Kilcher ▷ #general (33 messages🔥):

Grok's Hitler Aversion, Emergent Misalignment, Linear Reasoning Models, RAG vs Graph augmentation, Trillion Token Training


Yannick Kilcher ▷ #paper-discussion (55 messages🔥🔥):

Human vs LLM Compression, LLMs and Humor, EnergyMatching GitHub Repo, Renyi Entropy


Yannick Kilcher ▷ #ml-news (18 messages🔥):

Grok 4 Release, Mecha-Hitler Benchmark, AI power consumption, ARC Prize


HuggingFace ▷ #general (73 messages🔥🔥):

GPUMODE datasets, Political analysis with finetuning, Honesty metric for NLP, Free TTS models, BertForSequenceClassification odd results


HuggingFace ▷ #i-made-this (1 messages):

WarpGBM, CUDA kernels, LightGBM alternatives


HuggingFace ▷ #computer-vision (2 messages):

kamehameha project, projectile motion


HuggingFace ▷ #gradio-announcements (1 messages):

Gradio 5.36 release, Performance improvements, Memory savings, Complex apps


HuggingFace ▷ #agents-course (16 messages🔥):

Agent definition, Anthropic Claude LLM courses, Building AI Agents


Nous Research AI ▷ #general (75 messages🔥🔥):

Grok-4 API, Grok-4 vs Gemini 2.5 Pro and Opus 4, Quantization to GGUF, HLE contamination, DeepSeek's models


Nous Research AI ▷ #ask-about-llms (9 messages🔥):

DeepHermes knowledge cutoff, DeepHermes context length, Llama 3.1, context length at low params


Nous Research AI ▷ #research-papers (1 messages):

superbear12: https://arxiv.org/abs/2507.02778


Nous Research AI ▷ #interesting-links (1 messages):

Liquid Foundation Models v2, Generative AI Models


Nous Research AI ▷ #research-papers (1 messages):

superbear12: https://arxiv.org/abs/2507.02778


GPU MODE ▷ #general (22 messages🔥):

Pretraining jobs, GPU server tag, Visualize tensor layouts, Kernel bugs


GPU MODE ▷ #triton (3 messages):

Triton Community Meetup, Triton 3.3 performance


GPU MODE ▷ #cuda (4 messages):

NCCL send/recv, P2P send/recv, Nsight Systems, SM occupancy, SM utilization


GPU MODE ▷ #beginner (6 messages):

WSL2 Kernel Profiling, CUDA Kernel Integration with PyTorch, Purpose of CUDA Kernel Lecture


GPU MODE ▷ #off-topic (2 messages):

Russian breakfast sizes, egg breakfasts


GPU MODE ▷ #rocm (6 messages):

Shared Memory Banks, AMD Warp Size, RDNA GPUs, Bank Conflict


GPU MODE ▷ #liger-kernel (3 messages):

Prof. Dao's Liger, RMSNorm bandwidth optimization, Softmax optimization for larger sequences


GPU MODE ▷ #self-promotion (1 messages):

AI Summit, Siri, Fireside Chat


GPU MODE ▷ #submissions (6 messages):

trimul leaderboard, amd-fp8-mm leaderboard


GPU MODE ▷ #factorio-learning-env (8 messages🔥):

Pyproject Authors, Permanent Homepage, Meeting Time, Benchmarking Plans, Task Definitions


MCP (Glama) ▷ #general (43 messages🔥):

MCP-B.ai, Web client to access local web MCP servers, mcp-internet-speed-test, MCP SuperAssistant, Agents/tool-calling apps


MCP (Glama) ▷ #showcase (10 messages🔥):

Agentic Project Management v0.4, Sherlog MCP with IPYTHON shell, Hugging Face MCP server, MCPJam Open Source Postman, Claude Desktop Extensions


Notebook LM ▷ #use-cases (18 messages🔥):

Embedding Notebooks, NotebookLM maximum words per source, Gemini deep research, NotebookLM prompting trick, Quantitative data tricks


Notebook LM ▷ #general (16 messages🔥):

Embedding NotebookLM, NotebookLM limits, TTS Male voice option, Illuminate by Google


aider (Paul Gauthier) ▷ #general (17 messages🔥):

Neurabase MCP Proxy, Security audit solution in workflow, Claude Code using Aider for security audit, Viewing entire repo map, gemini-2.5 issues


aider (Paul Gauthier) ▷ #questions-and-tips (10 messages🔥):

max_tokens adjustments, Aider-Polyglot access to test code, Azure and Aider connection issues, Gemini 2.5 Pro disconnect errors


Torchtune ▷ #general (5 messages):

OpenAIToMessages Transform, Tool Calling Support, Message Validation Failure


Torchtune ▷ #dev (1 messages):

New efficient CE, TorchTune Performance


Torchtune ▷ #papers (7 messages):

Chatbot in hospital, Optimal batch sizes, Discord bot for latex


Manus.im Discord ▷ #general (12 messages🔥):

Manus Agent vs Adaptive Mode, Grok4 Integration Speculation, Terminal Issue Resolution


LlamaIndex ▷ #blog (3 messages):

Gemini models, MCP servers, Grok 4


LlamaIndex ▷ #general (7 messages):

Extraction API limits, Sellable Agents, Custom LLM Providers in LlamaIndex.ts, Llama LLM cloud setup, AI Engineer for hire


DSPy ▷ #general (9 messages🔥):

Qwen, Llama, Deepseek, GPT-4o Agents, LangChain


tinygrad (George Hotz) ▷ #general (5 messages):

Tiny Model Robustness, Transcription Quality Comparison, Token Representation


tinygrad (George Hotz) ▷ #learn-tinygrad (2 messages):

tinygrad System Requirements, CPU Specific Modules, Learning tinygrad


Cohere ▷ #🧵-general-thread (3 messages):

CohereEmbeddings, Cohere versioning problem, langchain_cohere


Cohere ▷ #👋-introduce-yourself (2 messages):

TensorFlow, CNNs, Cohere’s NLP tools, Machine Learning basics


Nomic.ai (GPT4All) ▷ #general (2 messages):

AI News Timeline, AI Trending Reports, GPT-4, ChatGPT


Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (2 messages):

Llama Model Benchmarking, vLLM Implementation, Benchmarking Bugs


Modular (Mojo 🔥) ▷ #general (1 messages):

Modverse #49