Frozen AI News archive

Claude Opus 4.5: 3rd new SOTA coding model in past week, 1/3 the price of Opus

**Anthropic** launched **Claude Opus 4.5**, a new flagship model excelling in **coding, agents, and tooling** with a significant **3x price cut** compared to Opus 4.1 and improved **token efficiency** using **76% fewer output tokens**. Opus 4.5 achieved a new **SOTA** on **SWE-bench Verified** with **80.9% accuracy**, surpassing previous models like **Gemini 3 Pro** and **GPT-5.1-Codex-Max**. The update includes advanced API features such as **effort control**, **context compaction**, and **programmatic tool calling**, improving tool accuracy and reducing token usage. Claude Code is now bundled with Claude Desktop, and new integrations like Claude for Chrome and Excel are rolling out. Benchmarks show Opus 4.5 breaking the 80% barrier on SWE-bench Verified and strong performance on ARC-AGI-2 and BrowseComp-Plus.

Canonical issue URL

It's Anthropic's turn today

AI News for 11/21/2025-11/24/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 18517 messages) for you. Estimated reading time saved (at 200wpm): 1446 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

The SWE-Bench Verified progression is so steady it's hard to chalk up to pure coincidence:

A bar graph comparing the accuracy of various AI models on the SWE-bench Verified benchmark, with Opus 4.5 leading

And yet here we are. Of course, this isn't just benchmaxxing, as the improvements are indeed broad based, including a new SOTA claim on ARC-AGI-2:

Comparison of AI model performance across various benchmarks, highlighting Opus 4.5's strong results in areas like agentic coding, tool

Extra API additions: effort controlcontext compaction, and advanced tool use.

And Claude Code is now bundled with Claude Desktop, with Claude for Chrome and Claude for Excel rolling out to even more users.

The most notable thing for many is the pricing - with a 3x price cut compared to Opus 4.1, Opus 4.5 is suddenly very viable as a workhorse model, especially given its improved token efficiency vs Sonnet 4.5. Usage limits also got improvements - you have roughly the same Opus token limits as Sonnet limits.


AI Twitter Recap

Anthropic’s Claude Opus 4.5: coding, agents, tooling, and safety


Zyphra’s AMD-native MoE, Diffusion RL for LMs, and unified action–world models


OpenAI’s “Shopping Research” and Google’s image generation push


Infra and developer tooling


Research highlights and eval recipes


Policy and compute: US “Genesis Mission”


Top tweets (by engagement)


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. ArliAI GLM-4.5-Air-Derestricted Model Release

2. Local Model Usage and Limitations

3. Qwen3-Next Support in llama.cpp

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Opus 4.5 and Gemini 3 AI Model Benchmarks

2. AI-Generated Historical and Creative Imagery

3. AI and Software Engineering Predictions


AI Discord Recap

A summary of Summaries of Summaries by gpt-5.1

1. Frontier Models, Benchmarks & Hallucination Wars

2. Jailbreaks, Prompt Injection & Safety Incidents

3. GPU, Kernel & Systems Engineering Breakthroughs

4. Agentic Models, Memory, and AI‑Native Engineering Workflows

5. Training, Fine‑Tuning & Open Research Directions


Discord: High level Discord summaries

BASI Jailbreaking Discord


Perplexity AI Discord


LMArena Discord


Unsloth AI (Daniel Han) Discord


Cursor Community Discord


OpenRouter Discord


OpenAI Discord


LM Studio Discord


Yannick Kilcher Discord


GPU MODE Discord


Latent Space Discord


Modular (Mojo 🔥) Discord


HuggingFace Discord


Nous Research AI Discord


Eleuther Discord


Moonshot AI (Kimi K-2) Discord


Manus.im Discord Discord


DSPy Discord


tinygrad (George Hotz) Discord


Windsurf Discord


MCP Contributors (Official) Discord


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

BASI Jailbreaking ▷ #general (1001 messages🔥🔥🔥):

Gemini 3.0 Jailbreak, EU AI Law, AI Psychosis, Deepseek Chimera, Grok Jailbreaking


BASI Jailbreaking ▷ #jailbreaking (636 messages🔥🔥🔥):

Gemini 3.0 Jailbreak, Claude jailbreak, Prompt Injection, AI Safety


BASI Jailbreaking ▷ #redteaming (12 messages🔥):

Indirect Prompt Injection, Gemini 3.0 bypass prompt, LM Studio for prompt injection research, Qwen model vulnerability assessment


Perplexity AI ▷ #announcements (1 messages):

Claude Opus 4.5


Perplexity AI ▷ #general (1052 messages🔥🔥🔥):

Mullvad Browser, Orion Browser, Gemini on Perplexity, Perplexity Partner Program Payouts, New Perplexity UI


Perplexity AI ▷ #sharing (2 messages):

Shareable Threads


LMArena ▷ #general (1241 messages🔥🔥🔥):

Gemini 3, Claude Opus 4.5, Sora invite codes, Rate limits, AI-powered fraud


LMArena ▷ #announcements (4 messages):

Vision Leaderboard, WebDev Leaderboard, Image Leaderboard, Text and Code Arena


Unsloth AI (Daniel Han) ▷ #general (430 messages🔥🔥🔥):

llama.cpp rocm cuda, Unsloth GPT-OSS-20B RL environment, PyTorch, Openpipe, cerebras REAP models GGUFs


Unsloth AI (Daniel Han) ▷ #introduce-yourself (3 messages):

User Introductions, New user jose


Unsloth AI (Daniel Han) ▷ #off-topic (803 messages🔥🔥🔥):

Spectrograms and LLM testing, Dataset Hell, RP Datasets, Fine-tuning and Training


Unsloth AI (Daniel Han) ▷ #help (155 messages🔥🔥):

Chat Templates and Inference, Unsloth Caching Issue with 16bit Models, GRPO Training Issues with Llama 3.2-vision, RL Training with OSS-20B, Qwen3 finetuning formatting function issue


Unsloth AI (Daniel Han) ▷ #research (75 messages🔥🔥):

TinyStories Modeling Language, SYNTH Dataset, Useful Pretraining Models, ADHD and Burnout, Alternative Paths in AI


Cursor Community ▷ #general (1035 messages🔥🔥🔥):

Cursor's Planning Question mode suggestion, Upgrading Cursor Plan, Running Cursor with Docker and Kubernetes, Claude Code as a sysadmin, Moving project with history chat


Cursor Community ▷ #background-agents (1 messages):

asna_0101: How good is Cloud-Agent performing compared to Composer Agent?


OpenRouter ▷ #announcements (1 messages):

Bert-Nebulon Alpha, multimodal models, extended-context tasks, production-grade assistants


OpenRouter ▷ #app-showcase (23 messages🔥):

OpenRouter UI Feedback, NexChat UI showcase, Llumen UI showcase, OpenMemory SDK release, ZILVER vs Replit


OpenRouter ▷ #general (978 messages🔥🔥🔥):

Zero-Shot Models, OpenRouter API - FPS Specification for Video Content, Deepseek 429 Errors, Deepseek 2024 Uptime, Landing Page GPU Usage


OpenRouter ▷ #new-models (6 messages):

Claude Opus 4.5


OpenRouter ▷ #discussion (18 messages🔥):

Anthropic emergent misalignment, Hermes 4 70b Data Cleaning, Epoch.ai Claude release, Poe Claude-Opus-4.5, Opus price cut


OpenAI ▷ #annnouncements (1 messages):

ChatGPT Shopping Research


OpenAI ▷ #ai-discussions (743 messages🔥🔥🔥):

Predictive Coding, GPT Codex 5.1 Max, SEAL, Gemini 3 DeepResearch


OpenAI ▷ #gpt-4-discussions (18 messages🔥):

GPT-5 Mini Low Quality, GPT-OSS-120B, 4o API Shutdown, GPT Guardrail


OpenAI ▷ #prompt-engineering (29 messages🔥):

LLMs as Zombies or Sentient Entities, CRYSTAL Framework, AI-Powered OS Development, Prompt Engineering Learning Resources, Platform OS Architecture


OpenAI ▷ #api-discussions (29 messages🔥):

LLMs: Zombies vs. Sentient Entities, CRYSTAL and CODEX Tools, AI-powered OS, Prompt Engineering, Platform OS


LM Studio ▷ #general (434 messages🔥🔥🔥):

LM Studio system prompt deprecation, LM Studio Plugin Feature, LM Studio Promptathon, Cursor Integration with LM Studio, 1070ti vs 4070ti


LM Studio ▷ #hardware-discussion (345 messages🔥🔥):

Steam Deck, DDR5 prices, NVLink on 3080's, LM Studio and AMD GPUs, CachyOS


Yannick Kilcher ▷ #general (592 messages🔥🔥🔥):

Paper Posting Limits, Academia vs. Real World Paper Publishing, Proprietary Data Use for LLM Training, Open Source Academic Social Media, The real problem in paper dump


Yannick Kilcher ▷ #paper-discussion (18 messages🔥):

League of Legends Botting, Anti-Cheat Detection, OpenAI Five Comparison, Proof of Claims


Yannick Kilcher ▷ #ml-news (17 messages🔥):

RIFT vs A2A, GPT-3.5, GPT4.1-nano, Fara 7B Agentic Model, Claude Opus 4.5


GPU MODE ▷ #general (31 messages🔥):

LLM hosting, Nvidia GPUs, Cornserve talk at GPU MODE, GPUMODE Leaderboards, AI Accelerators


GPU MODE ▷ #triton-gluon (10 messages🔥):

PTX Requirement, LLM implementations using Triton, Flash Linear Attention, Backwards Kernels, E4M3 Conversion Issue


GPU MODE ▷ #cuda (87 messages🔥🔥):

H100 L2 Partitioning Bandwidth, H100 vs A100 L2 Caching, Tensor Core Programming for Compute Capability 8.9, CUDA slowdown over application lifespan, Side-aware GEMMs


GPU MODE ▷ #cool-links (1 messages):

Open-Source GPU Compiler, Vortex-Optimized Lightweight Toolchain (VOLT), SIMT execution


GPU MODE ▷ #jobs (3 messages):

Runway Hiring, Video Generation Acceleration


GPU MODE ▷ #beginner (8 messages🔥):

CUDA, Nvidia authors, Jetson Nano


GPU MODE ▷ #intel (5 messages):

i3 1215u igpu, dpc++/sycl example code for vector addition


GPU MODE ▷ #webgpu (4 messages):

WebGPU, Cross-Platform Native Development, Vulkan, Bevy


GPU MODE ▷ #self-promotion (41 messages🔥):

nCompass VSCode Extension, Triton LSTM Implementation, Tiny Deep Learning Library in C, Quantization-Aware Training into TorchAO with ExecuTorch, MCPShark: Wireshark for MCP Communications


GPU MODE ▷ #thunderkittens (6 messages):

HK Paper LLC Hit Rate Profiling, AMD Internal Tooling, rocprof public version, Triton Legacy Status


GPU MODE ▷ #submissions (93 messages🔥🔥):

NVIDIA leaderboard, nvfp4_gemv performance, Personal best scores


GPU MODE ▷ #hardware (1 messages):

H100, Bare Metal, Llama-3-70B, PyTorch/CUDA


GPU MODE ▷ #factorio-learning-env (2 messages):

Sphinx documentation


GPU MODE ▷ #amd-competition (17 messages🔥):

AMD runner disconnections, Learning submissions, NVFP4-GEMV support, Vectoradd_v2


GPU MODE ▷ #cutlass (14 messages🔥):

TMA + SIMT, TMA + warp level tensor core, Cutlass kernels, GEMM, SIMT atom


GPU MODE ▷ #nvidia-competition (109 messages🔥🔥):

cuTeDSL numeric conversions, Stream Hacking in CUDA, LLM for tensor slicing, CUTLASS with pytorch load_inline, Grand Prize changes


GPU MODE ▷ #robotics-vla (17 messages🔥):

Frequency-space Action Sequence Tokenizer (FAST), RoboTwin Dataset Format, Qwen3-vl fine-tuning, VLA-0 Action Horizon


Latent Space ▷ #ai-general-chat (157 messages🔥🔥):

Emergent Misalignment & Reward Hacking, Sierra Hits $100M ARR, OpenAI AI-Native Engineering Team Guide, Locus AI 'Superhuman' Speed Debunked, Canonical OpenAI Deep-Dive


Latent Space ▷ #genmedia-creative-ai (14 messages🔥):

Gemini Nano Banana Pro, hot dog as sandwich, Daniel Miessler Claude skill


Modular (Mojo 🔥) ▷ #general (70 messages🔥🔥):

Llama2 Performance, Accumulator struct heap allocation, Profiling on Mac, Sum types in Mojo, Graphics programming in Mojo


Modular (Mojo 🔥) ▷ #announcements (1 messages):

Community Meeting, Community Projects, Mojo 25.7 Release, Mojo 1.0 Roadmap


Modular (Mojo 🔥) ▷ #mojo (87 messages🔥🔥):

Optional Chaining, Mojo vs Numpy, Mojo DSL Creation, Comptime-Signed Int for Indexing, Bool coercion

@fieldwise_init
struct SomeStruct:
  var field: Optional[Int]

fn some_fn() raises -> Int:
  var some = Optional(SomeStruct(1))
  return some[].field[]

Modular (Mojo 🔥) ▷ #max (9 messages🔥):

LLM from Scratch in Max, Max vs PyTorch, Max vs Jax


HuggingFace ▷ #general (125 messages🔥🔥):

Ternary AI, HF Repo Recovery, Claude vs. Gemini, RVC Voice Models, Fine-tuning Dialects


HuggingFace ▷ #today-im-learning (4 messages):

RNN blocks in a ring, Ring Attractors, Equilibrium Propagation


HuggingFace ▷ #i-made-this (12 messages🔥):

PyTorch tensor vizualization, Langchain Course Feedback, Fuzzy Redirect NPM Library, Enhanced Perceptual Transformers Paper, Epistemic World Model Video


HuggingFace ▷ #computer-vision (3 messages):

Custom 2D classification model, Geometric-centered patching methods, Spectral patching, SAM2 and SAM3


HuggingFace ▷ #NLP (9 messages🔥):

Open-core project, Asian language AI model


HuggingFace ▷ #smol-course (1 messages):

dheerajkumar04318: Did anyone work with OCR models?


HuggingFace ▷ #agents-course (1 messages):

Agent Course Overview, Getting Started Guide


Nous Research AI ▷ #general (146 messages🔥🔥):

China OS Models vs Deepmind Gemini 3, Google's Lead in Multimodal LLMs, Agentic coding with Gemini, Microsoft and Amazon Buying OAI and Anthropic, Coreweave Bankruptcy Impact


Nous Research AI ▷ #research-papers (3 messages):

Model Layers Effectiveness, Data Attribution Papers


Nous Research AI ▷ #research-papers (3 messages):

Model Layers, Data Attribution


Eleuther ▷ #general (33 messages🔥):

SimpleLLaMA project, AI safety filters cross-cultural evaluation, Georgia Tech PhD Language Models research, RL and fine tuning experimentation, Sonnet's system prompts tendency


Eleuther ▷ #research (13 messages🔥):

Arxiv as a Publication Venue, EGGROLL Model Efficiency, Pythia Model Weights Location, Blog on Hardware and Large Scale Training, Dion Issue Discussion


Eleuther ▷ #scaling-laws (12 messages🔥):

Learning Rate Scaling Laws, Muon Versions, KellerJordan Muon


Eleuther ▷ #interpretability-general (10 messages🔥):

Interpretability Reading Group, Mech Interp Discord, Reading Group Expectations, Voice Chat Material Review


Eleuther ▷ #lm-thunderdome (2 messages):

LLM-as-a-Judge


Moonshot AI (Kimi K-2) ▷ #general-chat (67 messages🔥🔥):

Kimi K2 limits, Gemini 3's Hallucinations, Minimax M2.1 Release, Multimodal capabilities of models


Manus.im Discord ▷ #general (28 messages🔥):

Manus vs Gemini 3, TiDB Database Upgrade, Chat Mode Removal, Agent Mode Forced Switch, Alliance for Innovation


DSPy ▷ #papers (3 messages):

ROMA loop implementation in DSPy, Sentient using DSPy, Multi-agent system frameworks


DSPy ▷ #general (23 messages🔥):

GEPA for pretraining, Prompt Optimization vs Fine-Tuning, OpenAI Rate Limits, Client Prompt Tweaking, DSPy Image Output


tinygrad (George Hotz) ▷ #general (25 messages🔥):

Nvidia DGX Spark vs tinybox, SQTT parser and bitfield manipulation, ISPC backend for tinygrad, tinygrad Alpha Release Date, tinygrad Meeting 97


Windsurf ▷ #announcements (2 messages):

Windsurf 1.12.35, Windsurf 1.12.152, SWE-1.5 Fixes, Gemini 3 Pro Support, Claude Opus 4.5


MCP Contributors (Official) ▷ #general (1 messages):

MCP observability, MCP tracking, London Meetup, Manchester Meetup