Frozen AI News archive

o3 solves AIME, GPQA, Codeforces, makes 11 years of progress in ARC-AGI and 25% in FrontierMath

**OpenAI** announced the **o3** and **o3-mini** models with groundbreaking benchmark results, including a jump from **2% to 25%** on the **FrontierMath** benchmark and **87.5%** on the **ARC-AGI** reasoning benchmark, representing about **11 years of progress** on the GPT3 to GPT4o scaling curve. The **o1-mini** model shows superior inference efficiency compared to o3-full, promising significant cost reductions on coding tasks. The announcement was accompanied by community discussions, safety testing applications, and detailed analyses. *Sama* highlighted the unusual cost-performance tradeoff, and **Eric Wallace** shared insights on the o-series deliberative alignment strategy.

Canonical issue URL

AI News for 12/19/2024-12/20/2024. We checked 7 subreddits, 433 Twitters and 32 Discords (215 channels, and 6058 messages) for you. Estimated reading time saved (at 200wpm): 607 minutes. You can now tag @smol_ai for AINews discussions!

With the departure of key researchers, Veo 2 beating Sora Turbo in heads up comparisons, and Noam Shazeer debuting a new Gemini 2.0 Flash Reasoning model, the mood around OpenAI has been tense to say the least.

But patience has been rewarded.

As teased by sama and with clues uncovered by internet sleuths and journalists, the last day of OpenAI's Shipmas brought the biggest announcement: o3 and o3-mini were announced, with breathtaking early benchmark results:

o1-mini is not to be overlooked, as the distillation team proudly showed off how it has an overwhelmingly superior inference-intelligence curve than o3-full: image.png

as sama says: "on many coding tasks, o3-mini will outperform o1 at a massive cost reduction! i expect this trend to continue, but also that the ability to get marginally more performance for exponentially more money will be really strange."

Eric Wallace also published a post on their o-series deliberative alignment strategy and applications are open for safety researchers to test it out.

Community recap videos, writeups, liveblogs, and architecture speculations are also worth checking out.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

OpenAI Model Releases (o3 and o3-mini)

Other AI Model Releases (Qwen2.5, Google Gemini, Anthropic Claude)

Benchmarking and Performance Metrics

AI Safety, Alignment, and Ethics

AI Tools, Applications, and Research

Memes and Humor

AI Research and Technical Insights


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. OpenAI's O3 Mini Outperforms Predecessors

Theme 2. Qwen QVQ-72B: New Frontiers in AI Modeling

Theme 3. RWKV-7's Advances in Multilingual and Long Context Processing

Theme 4. Open-Source AI: The Necessary Evolution

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. OpenAI's O3: High ARC-AGI Performance But High Cost

Theme 2. Google's Gemini 2.5 Eclipses Competitors amid O3 Buzz

Theme 3. TinyBox GPU Manipulations and Networking Deception

Theme 4. ChatGPT Pro Pricing and Market Impact Discussion


AI Discord Recap

A summary of Summaries of Summaries by o1-2024-12-17

Theme 1. The O3 Frenzy and New Benchmarks

Theme 2. AI Editor Madness: Codeium, Cursor, Aider, and More

Theme 3. Fine-Tuning Feuds: LoRA, QLoRA, and Pruning

Theme 4. Agents, RL Methods, and Rival Model Showdowns

Theme 5. Creative & Multimedia AI: Notebook LM, SDXL, And Friends


PART 1: High level Discord summaries

Codeium (Windsurf) Discord


Cursor IDE Discord


aider (Paul Gauthier) Discord


Interconnects (Nathan Lambert) Discord


OpenAI Discord


Unsloth AI (Daniel Han) Discord


Nous Research AI Discord


Stackblitz (Bolt.new) Discord


LM Studio Discord


OpenRouter (Alex Atallah) Discord


Eleuther Discord


Modular (Mojo 🔥) Discord


Latent Space Discord


Notebook LM Discord Discord


Perplexity AI Discord


Nomic.ai (GPT4All) Discord


Stability.ai (Stable Diffusion) Discord


Cohere Discord


LAION Discord


GPU MODE Discord


LlamaIndex Discord


LLM Agents (Berkeley MOOC) Discord


Torchtune Discord


DSPy Discord


OpenInterpreter Discord


Axolotl AI Discord


tinygrad (George Hotz) Discord


Gorilla LLM (Berkeley Function Calling) Discord


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Codeium (Windsurf) ▷ #announcements (1 messages):

Windsurf 1.1.1 Release, Usage Transparency and Pricing, Cascade Image Uploads, Language Support Improvements

Link mentioned: Windsurf Editor Changelogs | Windsurf Editor and Codeium extensions: Latest updates and changes for the Windsurf Editor.


Codeium (Windsurf) ▷ #content (1 messages):

Send to Cascade Button

Link mentioned: Tweet from Windsurf (@windsurf_ai): Send your problems straight to Cascade!


Codeium (Windsurf) ▷ #discussion (64 messages🔥🔥):

Cascade Performance, Windsurf Subscription Plans, Codeium Extension Features, Usage of AI in Code Reviews, AI Prompting Guidelines

Link mentioned: Plan Settings: Tomorrow's editor, today. Windsurf Editor is the first AI agent-powered IDE that keeps developers in the flow. Available today on Mac, Windows, and Linux.


Codeium (Windsurf) ▷ #windsurf (603 messages🔥🔥🔥):

Windsurf Performance Issues, Codeium Features and Updates, Using Cascade Effectively, User Experiences with AI Models, Integration of New Tools

Links mentioned:


Cursor IDE ▷ #general (819 messages🔥🔥🔥):

Cursor IDE updates, AI-driven development tools, Comparison of Sonnet models, Freelancing with AI assistance, Limitations of AI in styling

Links mentioned:


aider (Paul Gauthier) ▷ #general (628 messages🔥🔥🔥):

OpenAI O3 Release, Use of Aider and Cline, Impact of AI on Software Development, Job Security in Coding, Comparison of Tools for Developers

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (33 messages🔥):

Aider hardware recommendations, OpenRouter API key setup, Using /read command with PDF files, Gemini model updates, Aider tutorial resources

Links mentioned:


aider (Paul Gauthier) ▷ #links (5 messages):

AniDoc animation tool, Depth AI evaluation, Integrating external libraries

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (454 messages🔥🔥🔥):

OpenAI O3 release, AI and Software Engineering, Market impacts of AI advancements, Challenges in AI reasoning, AI's influence on job diversity

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-questions (2 messages):

LoRA Finetuning, Finetuning Closed-source Models, Open-source vs Closed-source Models


Interconnects (Nathan Lambert) ▷ #ml-drama (34 messages🔥):

François Chollet's statements, O1 model characteristics, Subbarao/Miles Brundage incident, AI community reactions, Recent incidents involving GDM director

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (6 messages):

Discord stalking, o3 discussion, Timing comparison


Interconnects (Nathan Lambert) ▷ #memes (32 messages🔥):

OpenAI O3 model naming, Meme vs Reality in AI, OpenAI's latest model developments, Riemann Question in AI

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rl (6 messages):

Reinforcement Learning Challenges, Reward Models in RL, Verification in RL, Specialized Reward Criteria, Future of RL Research


Interconnects (Nathan Lambert) ▷ #rlhf (1 messages):

natolambert: https://x.com/natolambert/status/1870150741593129045


Interconnects (Nathan Lambert) ▷ #reads (4 messages):

Building Anthropic, YouTube Video Discussion

Link mentioned: - YouTube: no description found


Interconnects (Nathan Lambert) ▷ #lectures-and-projects (3 messages):

RLHF Ignorance, GitHub Availability, Interest in Free Resources


Interconnects (Nathan Lambert) ▷ #posts (7 messages):

OpenAI's o3 model preview, Anthropic's potential release, User vacation plans

Link mentioned: o3: The grand finale of AI in 2024: A step change as influential as the release of GPT-4. Reasoning language models are the current big thing.


OpenAI ▷ #annnouncements (1 messages):

12 Days of OpenAI, Final Day Event, Sam Altman, Mark Chen, Hongyu Ren

Link mentioned: - YouTube: no description found


OpenAI ▷ #ai-discussions (401 messages🔥🔥):

OpenAI o3 release expectations, Comparison of AI models, AI capabilities in development, Market impact of AI pricing, Future of AI technology updates

Link mentioned: Tweet from Deedy (@deedydas): OpenAI o3 is 2727 on Codeforces which is equivalent to the #175 best human competitive coder on the planet.This is an absolutely superhuman result for AI and technology at large.


OpenAI ▷ #gpt-4-discussions (6 messages):

Custom GPT usage, Obsolescence of discussion channels, O3 release timeline, Chatbot development advice


Unsloth AI (Daniel Han) ▷ #general (168 messages🔥🔥):

O3 Release Discussion, Fine-tuning LLMs, Consciousness Benchmarks, TGI and Deployment Options, FrontierMath Performance

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (28 messages🔥):

League addiction, SDXL model strength, LoRA models for anime, Flux model challenges, Unsloth support plans

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (131 messages🔥🔥):

RAG Implementation, Training and Fine-Tuning Models, Using Google Colab and Kaggle, JSON Formatting Issues, Installation Problems on Windows

Links mentioned:


Nous Research AI ▷ #general (298 messages🔥🔥):

O1 and O3 Models, Agentic Systems, Economic Impact of AI, ARC-AGI Benchmark, Open Source AI Development

Links mentioned:


Nous Research AI ▷ #ask-about-llms (15 messages🔥):

Subconscious programming in prompts, Tokenization methods, Random activation functions, Function calling behavior in LLMs, Instruction tuning LLMs on raw data


Nous Research AI ▷ #interesting-links (1 messages):

jellyberg: https://theaidigest.org/self-awareness


Nous Research AI ▷ #reasoning-tasks (1 messages):

Reasoning dataset, Collaborative project, Using <think> tag, Modeling strategies


Stackblitz (Bolt.new) ▷ #announcements (1 messages):

Mistletokens, Holiday Gifts, Free Tokens Distribution

Link mentioned: Tweet from StackBlitz (@stackblitz): Happy Holidays! Yet again our team put together a special gift for y'all:🎄 We call them, Mistletokens! 🎄Till EOY:🔔 All Pro users get 2M free tokens!🔔 All Free users get 200K daily & 2M monthly...


Stackblitz (Bolt.new) ▷ #prompting (3 messages):

Bolt application review, Redundancy cleanup, Targeted review requests


Stackblitz (Bolt.new) ▷ #discussions (295 messages🔥🔥):

Bolt integration issues, WebRTC implementation, Subscription and token management, Ecommerce platform development using Bolt, Community support and collaboration

Links mentioned:


LM Studio ▷ #general (103 messages🔥🔥):

Adrenaline Driver Issues, LM Studio Installation, TPM and Windows 11 Compatibility, Defamation Lawsuit Against OpenAI, LM Studio Chat Naming Mechanism

Links mentioned:


LM Studio ▷ #hardware-discussion (103 messages🔥🔥):

3090 Performance for AI and Coding, External GPU Setups, LLM Parameter Compression, Mac vs. PC for AI Development, Local Market vs. eBay for Hardware Purchase

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

Gemini 2.0 Flash Thinking Experimental, Timeout Logic Change and Reversion, BYOK (Bring Your Own API Keys), o1 Model Changes, Crypto Payments API

Links mentioned:


OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

AI To-Do List, Open Router integration, 5-Minute Rule

Link mentioned: Todo Lists: no description found


OpenRouter (Alex Atallah) ▷ #general (170 messages🔥🔥):

OpenRouter Payment Policies, AGI Discussions, Model Releases and Features, Cloud Service Utilization, User Experience with APIs

Links mentioned:


Eleuther ▷ #general (55 messages🔥🔥):

Natural Attention and Scaling Laws, Causal Masking in Attention Models, Optimizer Improvements in Training, Quality vs. Quantity in Pretraining, Patterns of Attention Mechanisms

Links mentioned:


Eleuther ▷ #research (68 messages🔥🔥):

MSR Research Ethics, Plagiarism Issues at MSR, Optimizer Research Challenges, Sparks of AGI Paper Problems, OpenAI's Research Environment

Links mentioned:


Eleuther ▷ #interpretability-general (14 messages🔥):

Mahalanobis distance, Model activation norms, BOS token issues, SAE training strategies, Normalization techniques


Eleuther ▷ #lm-thunderdome (18 messages🔥):

Benchmark Directory Issues, Model Checkpoint Naming, Harness Setup for Multiple Models

Links mentioned:


Eleuther ▷ #gpt-neox-dev (3 messages):

Pull Request #1331, WandB Testing

Links mentioned:


Modular (Mojo 🔥) ▷ #general (4 messages):

Machine setup, Level progression


Modular (Mojo 🔥) ▷ #announcements (1 messages):

Modular community appreciation, Holiday shutdown notice, Feedback and bug reporting for 24.6 release, Looking forward to 2025


Modular (Mojo 🔥) ▷ #mojo (142 messages🔥🔥):

FFI Compatibility Issues, Libc Bindings Development, Performance of Float Parsing, Mojo As an Extension to Python, Properties in Mojo

Links mentioned:


Modular (Mojo 🔥) ▷ #max (3 messages):

Tensor implementation, Feature Request, MAX APIs

Link mentioned: [Feature Request] Make tensor.Tensor implement tensor_utils.TensorLike · Issue #274 · modularml/max: What is your request? Please make tensor.Tensor implement the tensor_utils.TensorLike trait. As far as I can tell it already implements the required functions, but it does not implement this trait ...


Latent Space ▷ #ai-general-chat (127 messages🔥🔥):

OpenAI o3 model, Alec Radford departure, AI benchmark improvements, Economic implications of AI models, Safety testing for AI models

Links mentioned:


Latent Space ▷ #ai-in-action-club (20 messages🔥):

API Keys Usage, Character AI Audience Insights, User Experience Signals, Interest in Role-play, Swyx's Reporting


Notebook LM Discord ▷ #use-cases (38 messages🔥):

AI in Podcasting, Notebook LM for Education, Job Application Assistance, AI-Generated Video Projects, Improving Audio Production

Links mentioned:


Notebook LM Discord ▷ #general (106 messages🔥🔥):

NotebookLM Interactive Mode, Citation Feature Issues, Audio Overview Retrieval, Language Processing in NLM, Timeline Feature Usage

Links mentioned:


Perplexity AI ▷ #general (102 messages🔥🔥):

Superman movie teaser, Perplexity Pro with .edu emails, OpenAI's new GPT models, Lepton AI project similarities, Perplexity API support issues

Links mentioned:


Perplexity AI ▷ #sharing (5 messages):

Rio Da Yung OG released, Samsung's Project Moohan, Apple's Congo Conflict Minerals, Oregon’s Psilocybin Program, AI use at work

Link mentioned: YouTube: no description found


Nomic.ai (GPT4All) ▷ #announcements (3 messages):

GPT4All v3.6.0 Release, GPT4All v3.6.1 Release, Reasoner v1, Chat Template Fixes


Nomic.ai (GPT4All) ▷ #general (90 messages🔥🔥):

Llama 3.3 and Qwen2 models, GPT4ALL custom templates and reasoning, Local API server integration, Phi-4 model comparison, Stop generating button issue in v3.6.0

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (81 messages🔥🔥):

Best local AI image generators, Creating style models in AI, Tech support and scams in Discord, Asset generation tools for game devs, Training models from existing images

Link mentioned: stabilityai/stable-fast-3d · Hugging Face: no description found


Cohere ▷ #discussions (58 messages🔥🔥):

Cohere's c4ai model, MLX integration, VLLM support, Latest model performance review, Upcoming releases

Links mentioned:


Cohere ▷ #questions (4 messages):

Credit Card Rejections, 3D Secure Issues, VPN Usage, Support Contact


Cohere ▷ #api-discussions (16 messages🔥):

Payment Method Issues in India, Upgrading API Keys for Higher Limits, Context Errors with Trial Keys


Cohere ▷ #projects (1 messages):

Cohere tech in Findr, Findr launch excitement


LAION ▷ #general (10 messages🔥):

DCT Encoding Exploration, VAEs and Human Perception, Color Spaces and Detail Perception


LAION ▷ #research (57 messages🔥🔥):

OpenAI o3 announcement, AGI discussion, Elo ratings and performance comparison, Test time compute implications, Future AI predictions

Links mentioned:


GPU MODE ▷ #general (11 messages🔥):

GPU recommendations, Chip design resources, Hardware description languages

Link mentioned: Reddit - Dive into anything: no description found


GPU MODE ▷ #triton (2 messages):

Triton Documentation Issues, Debugging Kernel Shared Memory, Proton Memory Instrumentation, Triton Language Types

Link mentioned: Welcome to Triton’s documentation! — Triton documentation: no description found


GPU MODE ▷ #cuda (9 messages🔥):

TensorRT Namespace Issue, Race Condition in Memory Copy, Memory Fencing after Kernel Execution, Understanding cute::composite


GPU MODE ▷ #torch (3 messages):

Flex Attention, Context Parallel Implementation, Attn-Gym Examples


GPU MODE ▷ #algorithms (1 messages):

Diffusion Models Conditioning, NeurIPS 2024 Papers

Link mentioned: Tweet from The Variational Book (@TheVariational): Curious about how diffusion models are influenced? @jaakkolehtinen @unixpickle @prafdhar @TimSalimans @hojonathanho Check out the review of the Autoguidance #NeurIPS2024 runner-up best paper in the ...


GPU MODE ▷ #off-topic (3 messages):

Multi Node Inference, Distributed Topics, Channel Management


GPU MODE ▷ #sparsity-pruning (1 messages):

Sparse API Usage, PyTorch Quantization, Sparsity Design Overview

Link mentioned: ao/torchao/sparsity at main · pytorch/ao: PyTorch native quantization and sparsity for training and inference - pytorch/ao


GPU MODE ▷ #arc-agi-2 (6 messages):

ARC CoT dataset, LLaMA 8B fine-tuning, OpenAI evaluation results, o3-high evaluation costs


LlamaIndex ▷ #blog (4 messages):

LlamaParse Audio Capabilities, Year-End Review of LlamaIndex, Stock Analysis Bot Creation, Document Processing Automation

Link mentioned: The Year in LlamaIndex: 2024 — LlamaIndex - Build Knowledge Assistants over your Enterprise Data: LlamaIndex is a simple, flexible framework for building knowledge assistants using LLMs connected to your enterprise data.


LlamaIndex ▷ #general (17 messages🔥):

Azure OpenAI embedding models, GraphDBs for larger projects, Fine-tuning LLM with sentiment analysis, Creating synthetic datasets, Issue with TextNode attributes

Links mentioned:


LLM Agents (Berkeley MOOC) ▷ #hackathon-announcements (1 messages):

Hackathon Submission Reopened, Technical Difficulties, Submission Deadline, Manual Submission Check

Link mentioned: no title found: no description found


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (11 messages🔥):

Hackathon Extension Requests, Hackathon Participation Forms, Submission Registration Confirmation, YouTube Video Format Issues, Agent Framework Recommendations

Link mentioned: Building effective agents: A post for developers with advice and workflows for building effective AI agents


Torchtune ▷ #announcements (1 messages):

Torchtune v0.5.0, Kaggle Integration, QAT + LoRA Training Recipe, Early Exit Training Recipe, NPU Support

Links mentioned:


Torchtune ▷ #general (7 messages):

QwQ-preview-32B finetuning, State dict loading for fsdp2, Parallelism support improvements, Gradient accumulation and clipping, Vocab pruning in finetuning

Links mentioned:


DSPy ▷ #general (7 messages):

Litellm Proxy Server, Synthetic Data Impact on LLMs, Optimization Parameters, MIPRO Light Mode

Link mentioned: On Synthetic Data: How It’s Improving & Shaping LLMs: Synthetic data is helping LLMs scale the data wall, but it’s doing so while creating a growing perception gap between those who use LLMs for quantitative tasks and those who use it for anything else, ...


OpenInterpreter ▷ #general (7 messages):

OpenInterpreter Server Mode, Google Gemini 2.0 Multimodal, Local LLM Integration, SSH Usage with OpenInterpreter


Axolotl AI ▷ #general (4 messages):

Liger and KTO integration, Liger DPO, Loss parity issues


tinygrad (George Hotz) ▷ #general (1 messages):

chenyuy: i will close (or find a bot to close) prs that are inactive > 30 days next week


Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (1 messages):

Watt-tool models, GitHub Pull Requests, Christmas timeframe

Link mentioned: [BFCL] Add New Model watt-tool-8B and watt-tool-70B by zhanghanduo · Pull Request #847 · ShishirPatil/gorilla: This PR adds the model watt-ai/watt-tool-8B and watt-ai/watt-tool-70B to the leaderboard.





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}