Frozen AI News archive

OAI and GDM announce IMO Gold-level results with natural language reasoning, no specialized training or tools, under human time limits

**OpenAI** and **Google DeepMind** achieved a major milestone by solving 5 out of 6 problems at the **International Mathematical Olympiad (IMO) 2025** within the human time limit of 4.5 hours, earning the IMO Gold medal. This breakthrough was accomplished using general-purpose reinforcement learning and pure in-weights reasoning without specialized tools or internet access, surpassing previous systems like AlphaProof and AlphaGeometry2. The success resolved a 3-year-old AI bet on AI's capability to solve IMO problems and sparked discussions among mathematicians including **Terence Tao**. Despite this, 26 human competitors remain better than AI on the hardest combinatorics problem (P6). The achievement highlights advances in **reinforcement-learning**, **reasoning**, and **model-scaling** in AI research.

Canonical issue URL

General-purpose RL is all you need.

AI News for 7/18/2025-7/21/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (227 channels, and 21117 messages) for you. Estimated reading time saved (at 200wpm): 1729 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

This time last year, GDM announced that AlphaProof and AlphaGeometry2 (the latest in a long series of Alpha* work) had perfectly solved 4 out of the 6 IMO 2024 problems, falling 1 point short of the Gold medal cutoff. However that system needed over 60 hours for some problems, much longer than the 4.5 hours allowed for humans.

This year, both OpenAI ("an experimental research model, not released in GPT5" - their solutions here) and GDM ("Advanced version of Gemini Deep Think" - their solutions here) announced* full solves of 5 out of the 6 problems (P6 is typically the hardest) all within 4.5 hours, achieving IMO Gold and resolving a 3 year old AI bet between Paul Christiano and Eliezer Yudkowsky, where Paul had put the probability at <4% in Feb 2022. Interestingly, the market estimated probability of this success trended DOWN even through the release of o1 and new reasoner models, and only shot up to 50-80% after the GDM announcement last year:

The even more surprising element of this Gold prize not documented by that bet is that it was done WITHOUT use of specialized tools like Lean or even access to the Internet; just pure in-weights reasoning (aka "purely via search in token space"):

Mathematicians seem mostly unthreatened and are welcoming the result, although Terence Tao had some strong doubts about methdology and medal claim (which were answered).

Thanks to the combinatorics problem P6 that requires creativity, In 2025, 26 humans remain better than AI at the IMO. Try it if you wish.

In case you were wondering, here is how SOTA released models did on the same IMO: "not even bronze".


AI Twitter Recap

AI Achieves IMO Gold: The Race, Results, and Reaction

New Models, Architectures, and Performance

Agentic Systems, Tooling, and Developer Experience

AI Research, Infrastructure, and Technical Concepts

AI Industry, Companies, and Geopolitics

Humor/Memes


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Qwen3-235B-A22B-2507 Launch and Anticipation

2. Custom LLM Projects and System Prompt Extraction

3. LLM Hardware Innovations and Local Model Preferences

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. Gemini Deep Think and AI Performance at IMO Controversy

2. AI Industry Talent Wars and Big AI Hiring Moves

3. Large-Scale Diffusion Model Training and Finetuning Experiments


AI Discord Recap

A summary of Summaries of Summaries by X.ai Grok-4

Theme 1: AI Agents Storm the Scene with Multimodal Might

Theme 2: Quantization Tricks Squeeze Models into Tiny Bits

Theme 3: Sky-High Valuations Fuel AI Bubble Fears

Theme 4: Hardware Hurdles Haunt GPU Warriors

Theme 5: Tools and APIs Tackle Tricky Tasks


Discord: High level Discord summaries

Perplexity AI Discord


OpenAI Discord


Unsloth AI (Daniel Han) Discord


Cursor Community Discord


LMArena Discord


Latent Space Discord


OpenRouter (Alex Atallah) Discord


Eleuther Discord


LM Studio Discord


HuggingFace Discord


GPU MODE Discord


Modular (Mojo 🔥) Discord


Yannick Kilcher Discord


Manus.im Discord Discord


MCP (Glama) Discord


tinygrad (George Hotz) Discord


Nous Research AI Discord


Notebook LM Discord


LLM Agents (Berkeley MOOC) Discord


Cohere Discord


LlamaIndex Discord


DSPy Discord


Codeium (Windsurf) Discord


MLOps @Chipro Discord


Nomic.ai (GPT4All) Discord


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (1283 messages🔥🔥🔥):

Airtel Free Perplexity Pro, Perplexity Pro India, Comet Browser invite, New perplexity page, Ai waifus


Perplexity AI ▷ #sharing (2 messages):

CachyOS, Iron Rails and Ideals: Mao Zedong


Perplexity AI ▷ #pplx-api (5 messages):

Perplexity Pro, API access, Sonar models, Prompting, JSON output


OpenAI ▷ #annnouncements (3 messages):

ChatGPT Agent, Deep Research, Operator


OpenAI ▷ #ai-discussions (1172 messages🔥🔥🔥):

Grok app, Chat GPT for desktop, AI overlords, OpenAI's Agent/Operator, Mensa IQ Test


OpenAI ▷ #gpt-4-discussions (4 messages):

GPT Agents, ChatGPT website, LLM models


OpenAI ▷ #prompt-engineering (3 messages):

Reproducibility Elements, Prompt Templates, Model Interfaces and Calls, Tasks and Inputs, Evaluation Metrics


OpenAI ▷ #api-discussions (3 messages):

Reproducibility, Missing Reproducibility Elements, Prompt Templates, Model Interfaces and Calls, Tasks and Inputs


Unsloth AI (Daniel Han) ▷ #general (549 messages🔥🔥🔥):

Model performance within same family vs different families, Kimi model 1.8 bit usability, Swapping model architectures, Fine-tuning LLMs for educational purposes, ERNIE 4.5 MoE models support in llama.cpp


Unsloth AI (Daniel Han) ▷ #off-topic (2 messages):

Small Language Models, Low Compute Power Systems, Data Collection and Processing Jobs, Low Power Distributed Computing


Unsloth AI (Daniel Han) ▷ #help (228 messages🔥🔥):

Blackwell RTX 50 series and xformers, Qwen3-4B-Base training, Smartest model for 15GB VRAM, Unsloth optimizations on big VRAM GPUs, GGUF conversion logic rework


Unsloth AI (Daniel Han) ▷ #showcase (2 messages):

Unsloth fine-tuning, Osmosis-AI models, Model Accuracy on Benchmarks


Unsloth AI (Daniel Han) ▷ #research (6 messages):

LLM Hallucinations, Apple Intelligence, Sycophancy Impact


Unsloth AI (Daniel Han) ▷ #unsloth-bot (20 messages🔥):

Logprobs for tokens, Dataset preparation for Qwen3, Automatic early stopping in Unsloth


Cursor Community ▷ #general (568 messages🔥🔥🔥):

Cursor Pricing, MCP & Claude integration, Agent stuck, KIRO, Auto Model details


Cursor Community ▷ #background-agents (8 messages🔥):

Dockerfile NVM_DIR Issue, Agent stuck in Opening Remote state, Environment not rebuilding


LMArena ▷ #general (559 messages🔥🔥🔥):

DeepSeek Margin, OpenAI Browser Speculation, Kimi K2 coding, OpenAI Image editor API, GPT-5 Hype


Latent Space ▷ #ai-general-chat (195 messages🔥🔥):

ChatGPT Agent, Perplexity's Valuation, Mistral Le Chat, FAL Series C, Real-Time Diffusion Video


Latent Space ▷ #ai-announcements (1 messages):

YouTube Video Announcement


Latent Space ▷ #ai-in-action-club (96 messages🔥🔥):

ChatGPT Agent Launch, Benchmarks, Safety Concerns - Biohazards, Bespoke Operator-Mode Training, BBQ Evaluation


OpenRouter (Alex Atallah) ▷ #app-showcase (7 messages):

Kimi K2, GROQ, OpenRouter, Email Builder, FlowDown


OpenRouter (Alex Atallah) ▷ #general (258 messages🔥🔥):

Claude 4 Opus pricing and usage, GPTs Agents Learning, Free Models, Janitor AI and 401 errors, Chutes Free Tier Limits


OpenRouter (Alex Atallah) ▷ #discussion (11 messages🔥):

OpenRouter models in Cursor, Kluster.ai shuts down, AI inference services shutting down


Eleuther ▷ #general (47 messages🔥):

Research Management, ML Paper Writing Advice, Finding Research Mentors, Smallest Benchmark Datasets for LLMs, SOAR Program


Eleuther ▷ #research (79 messages🔥🔥):

latent space initialization for experts, ETHOS model updates, PEER paper discussion, Weight decay perturbation, MLA but for MOE


Eleuther ▷ #interpretability-general (3 messages):

SAE model data discrepancies, nnterp package beta release, Transformer models unified interface, Robust testing system for models, Model validation tests for hooks


Eleuther ▷ #lm-thunderdome (4 messages):

Harness Reproducibility, Dynamic IFEval Suite, bfloat16


Eleuther ▷ #gpt-neox-dev (20 messages🔥):

Transformer Engine setup issues, RoPE_Pct in gpt-neox, Slurm runner in DeeperSpeed, Containerized setup for gpt-neox


LM Studio ▷ #general (78 messages🔥🔥):

Speculative Decoding speed boost, Local Gemma threatening users, LM Studio Open Network Server setup, EOS token definition, MoE Model analysis


LM Studio ▷ #hardware-discussion (68 messages🔥🔥):

LM Studio multi CPU support, AMD Ryzen 9 8945H, 3090 vs 3080Ti Price, NPU use case


HuggingFace ▷ #general (66 messages🔥🔥):

HF repo PR watching, SmolVLM2 blogpost scam, Dataset-viewer API modality, Gender swapping AI, CAD-Editor model released


HuggingFace ▷ #today-im-learning (1 messages):

Model Training, 1.5 bit research


HuggingFace ▷ #cool-finds (2 messages):

GPUHammer exploit, LLM Hallucination


HuggingFace ▷ #i-made-this (4 messages):

LunarisCodex LLM, GitChameleon eval benchmark for LLMs, SuccubusBot Text Coherence Model, Flame Audio AI toolkit


HuggingFace ▷ #computer-vision (2 messages):

SmolDocLing finetuning issues, Symmetry-agnostic image similarity models


HuggingFace ▷ #agents-course (2 messages):

HuggingFace Inference API, LLMs Deployed via HF Inference


GPU MODE ▷ #general (12 messages🔥):

shfl_down_sync, reduction intrinsics, warp reduce functions, kernel optimization


GPU MODE ▷ #triton (9 messages🔥):

Triton Autodiff, sm120 GPUs for fp4 ops, tl.constexpr_function decorator, einops package for triton


GPU MODE ▷ #torch (2 messages):

Inductor problems, Blackwell GPU issues


GPU MODE ▷ #algorithms (1 messages):

kszysiu2137: Quad tree maybe


GPU MODE ▷ #cool-links (3 messages):

NVIDIA CUDA Kernel Fusion in Python, AMD's response to CUDA, Triton as an alternative to CUDA


GPU MODE ▷ #jobs (1 messages):

Storage Engineer, Remote Job


GPU MODE ▷ #beginner (3 messages):

vast.ai, GPU programming opportunities, CUDA speedup, Bioinformatics


GPU MODE ▷ #rocm (1 messages):

Compiler behavior, Builtins, asm volatile, llvm.amdgcn.raw.buffer.store.i128


GPU MODE ▷ #submissions (1 messages):

A100 Speed


GPU MODE ▷ #hardware (6 messages):

Coreweave GB300 NVL72 Availability, Nvidia Hardware Prioritization, DGX vs HGX, B200 Availability & Liquid Cooling, Voltage Park Solutions Engineer


GPU MODE ▷ #factorio-learning-env (3 messages):

MCTS gym_env integration, Factory rollouts, Visual encoder


GPU MODE ▷ #cutlass (7 messages):

Jetson Orin, Jetson Thor, CuteDSL, tv_layout swaps


GPU MODE ▷ #singularity-systems (2 messages):

Scheduling


Modular (Mojo 🔥) ▷ #general (2 messages):

Greetings


Modular (Mojo 🔥) ▷ #mojo (21 messages🔥):

parameter functions and closures, Q3 Roadmap: Unified @parameter and runtime closures, copyinit__ for escaping values, DynStringable, merge various known origins


Modular (Mojo 🔥) ▷ #max (18 messages🔥):

PyTorch Custom Ops with MAX Graph, Benchmarking Issues with Max-24.6, CUDA OOM Errors, LTS Release Support


Yannick Kilcher ▷ #general (29 messages🔥):

Zuckerberg AI Talent Acquisition, Chicken Tender Inflation, OpenAI benchmark comparisons, Grok 4 HLE score


Yannick Kilcher ▷ #paper-discussion (2 messages):

``


Yannick Kilcher ▷ #ml-news (5 messages):

Gaussian Splatting, General Analysis iMessage Stripe Exploit


Manus.im Discord ▷ #general (22 messages🔥):

Manus Alternatives, Manus chat down?, File Zipping Advice, Custom Data Sources in Manus


MCP (Glama) ▷ #general (18 messages🔥):

Anthropic Payment Issues, Domain Name Checking MCP Server, Needle MCP Server Introduction, OAuth vs API Keys for MCPs, Brave's Official MCP Server


MCP (Glama) ▷ #showcase (3 messages):

Vibe Coding Survey, Adaptive RAG MCP Server, Generator Checkpoint, Microsoft NextCoder


tinygrad (George Hotz) ▷ #general (2 messages):

ShapeTracker parameter to ASSIGN UOp


tinygrad (George Hotz) ▷ #learn-tinygrad (18 messages🔥):

tinygrad documentation for beginners, NVIDIA GPU driver issues with tinygrad and WSL2, Muon optimizer in tinygrad, Switching from WSL2 to native Ubuntu


Nous Research AI ▷ #announcements (1 messages):

Atropos, RL Environments Framework


Nous Research AI ▷ #general (18 messages🔥):

Proto-agentic XML tag adherence, Hermes Documentation, Open Source Models vs US Models, Ethical Considerations in AI, Learning ML


Nous Research AI ▷ #ask-about-llms (1 messages):

Model Context Size, Letta Personas, Model Evaluation


Notebook LM ▷ #use-cases (4 messages):

uBlock browser extension, notepad.exe, NotebookLM folders/subfolders


Notebook LM ▷ #general (14 messages🔥):

Service Unavailable Error, NotebookLM Use Cases, Textbook Integration with NotebookLM, NotebookLM Enterprise & GCP Integration


LLM Agents (Berkeley MOOC) ▷ #hackathon-announcements (1 messages):

Agentic AI Summit 2025, LLM Agents MOOC, UC Berkeley, Khosla Ventures, Nvidia


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (8 messages🔥):

Fall Semester Updates, Certificate Declaration Form, Berkeley RDI Newsletter


Cohere ▷ #🧵-general-thread (1 messages):

sma.bari.shafin: btw, how will we get the certificates of the Community Summer School?


Cohere ▷ #👋-introduce-yourself (4 messages):

DNNs for Time Series, ML in Data Science Education, ML for Real-World Problems, Interests in ML Domains


LlamaIndex ▷ #blog (2 messages):

Human-in-the-loop agents, LlamaParse one-click table extraction


LlamaIndex ▷ #general (1 messages):

beastx2: <@334536717648265216> heyy


DSPy ▷ #general (3 messages):

DSPy creative applications, Lean 4 verification, Story generation, Roleplay prompt optimization


Codeium (Windsurf) ▷ #announcements (2 messages):

Claude Sonnet 4, Discounted Credit Rate, Windsurf Wave 11, Acquisition by Cognition, Voice Mode


MLOps @Chipro ▷ #events (1 messages):

AI-Native Data Infrastructure, Task-Specific Data Discovery, Secure Autonomous Access, Production-Scale Performance


Nomic.ai (GPT4All) ▷ #general (1 messages):

Web3 and AI, AI agents and multi-agent systems, Automation workflows, NLP apps and chatbots, Voice & speech integration