Frozen AI News archive

xAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writing

**xAI** launched **Grok 4.1**, achieving a #1 rank on the LM Arena Text Leaderboard with an Elo score of **1483**, showing improvements in creative writing and anti-hallucination. **OpenAI's GPT-5.1 "Thinking"** demonstrates efficiency gains with ~60% less "thinking" on easy queries and strong ARC-AGI performance. **Google DeepMind** released **WeatherNext 2**, an ensemble generative model that is **8× faster** and more accurate for global weather forecasts, integrated into multiple Google products. **Sakana AI** raised **¥20B ($135M)** in Series B funding at a **$2.63B** valuation to focus on efficient AI for resource-constrained enterprise applications in Japan. New evaluations highlight tradeoffs between hallucination and knowledge accuracy across models including **Claude 4.1 Opus** and **Anthropic** models.

Canonical issue URL

a nice incremental improvement.

AI News for 11/14/2025-11/17/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 17770 messages) for you. Estimated reading time saved (at 200wpm): 1367 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Ahead of a very heavily rumored Gemini 3 launch this week, Xai launched their (presumably weaker, but still significantly stronger than Gemini 2.5) update to Grok 4 in a blogpost with some decent evals - a 65% win rate in A/B tests vs Grok 4, and a new SOTA on the Text LMArena with Style Control, top EQBench scores and improvements in anti-hallucination.

Grok 4.1 tops the LM Arena Text Leaderboard with an Elo score of 1483, showc

Just as people are wondering why AI writing is still so mid, it seems both GPT 5.1 and Grok 4.1 are both showing real improvements in creative writing:

A screenshot of Grok 4.1's creative writing demonstration, showing two different responses to a prompt about an AI discovering its consciousness.


AI Twitter Recap

xAI’s Grok 4.1 hits #1 on LM Arena; GPT‑5.1 “Thinking” tightens the race

Google/DeepMind WeatherNext 2: 8× faster global forecasts, production rollout

Sakana AI raises ¥20B ($135M) Series B at ~$2.63B valuation; doubles down on efficient AI for Japan

Systems, inference, and RL/post‑training: kernels, fleets, and new workflows

Open‑source multimodal and diffusion updates

Agents in practice: reliability, scope, and longer‑running sessions

Top tweets (by engagement)


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. AI Model Comparisons and Accessibility

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Google DeepMind WeatherNext 2 Launch

2. Public Reactions to AI Censorship and Freedom


AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. OpenRouter's Sherlock Models & Ecosystem Moves

2. GPU Kernels, Blackwell Metrics & AMD Ecosystem

3. Quantization on Blackwell & Unsloth/vLLM Pragmatics

4. Agents in the Wild: Production KPIs & New Eval Stacks

5. Developer Tooling & Protocols Ship


Discord: High level Discord summaries

LMArena Discord


BASI Jailbreaking Discord


Perplexity AI Discord


Unsloth AI (Daniel Han) Discord


OpenRouter Discord


GPU MODE Discord


LM Studio Discord


Cursor Community Discord


OpenAI Discord


Yannick Kilcher Discord


Moonshot AI (Kimi K-2) Discord


HuggingFace Discord


Modular (Mojo 🔥) Discord


Latent Space Discord


Eleuther Discord


Nous Research AI Discord


DSPy Discord


tinygrad (George Hotz) Discord


Manus.im Discord Discord


aider (Paul Gauthier) Discord


MCP Contributors (Official) Discord


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (1204 messages🔥🔥🔥):

Gemini 3, Riftrunner performance, upscaling tools, ps1/ps2 error/startup recreation, Grok 4.1


LMArena ▷ #announcements (2 messages):

LMArena Ranking Method, New Models, Rank Spread, Raw Rank


BASI Jailbreaking ▷ #announcements (1 messages):

Vibe Coding Contest, Web App Challenge, Crypto Theme, Google AIStudio, Discord Role Transition


BASI Jailbreaking ▷ #general (1255 messages🔥🔥🔥):

ChatGPT Payment issue, Gemini 3 beta, SMM Panels, 48k .gov machines, Thinkpad with a 4090


BASI Jailbreaking ▷ #jailbreaking (421 messages🔥🔥🔥):

AI company monitoring Discord, gandalf.lakera jailbreak, GPTs agent cannot learn, Sora unjailbreakable?, Cracking Grok


BASI Jailbreaking ▷ #redteaming (31 messages🔥):

Claude Code AI Hacking, AI model choices, Purple Teaming concerns, GPT-Realtime API testing, GenAI PT recommendations


Perplexity AI ▷ #announcements (1 messages):

Comet Assistant Upgrade, Privacy Snapshot Feature, Open Links in Comet, New OpenAI Models, Faster Library Search


Perplexity AI ▷ #general (1166 messages🔥🔥🔥):

Comet Mobil iOS port, 5.1 thinking, Perplexity Discord bot integration, Comet's memory leak, OpenAI and Anthropic's profitability


Perplexity AI ▷ #sharing (3 messages):

Sora 2, Brain Waves, Suno


Perplexity AI ▷ #pplx-api (6 messages):

Deep research high for API, Delete API Group


Unsloth AI (Daniel Han) ▷ #general (515 messages🔥🔥🔥):

Quantization accuracy loss, Character tokenization with multi-token prediction, Fine-tuning DeepSeek-OCR on a T4 GPU, Grok Code Fast, Unsloth GGUFs locally via Docker


Unsloth AI (Daniel Han) ▷ #introduce-yourself (7 messages):

AI Engineers, Intelligent voice agents, GPT-powered assistants, LLMs in Robotics, AI Projects


Unsloth AI (Daniel Han) ▷ #off-topic (534 messages🔥🔥🔥):

Any DAW, Model Training, AI, GPU


Unsloth AI (Daniel Han) ▷ #help (174 messages🔥🔥):

Dynamic Quantization Support, Training Vision Language Models on limited VRAM, Unsloth installation problems, GPU Utilization and Memory Management, Unsloth with function calling


Unsloth AI (Daniel Han) ▷ #research (3 messages):

Sparse Autoencoders (SAEs), AMD Hardware, RLVR pretraining


OpenRouter ▷ #announcements (1 messages):

Sherlock Think Alpha, Sherlock Dash Alpha, 1.8M context window, Multimodal Support, Tool Calling


OpenRouter ▷ #app-showcase (91 messages🔥🔥):

vero-eval OSS Tool, Agent-based LLM voting, Errno 5 Backend Error, Agent searches


OpenRouter ▷ #general (574 messages🔥🔥🔥):

Gemini Pro 2.5 vs Regulatory Documents, Document Uploading Issues, Deepseek Replies at the Top, Sherlock Stealth Model, Gemini 2.0 Flash and Video Inputs


OpenRouter ▷ #new-models (2 messages):

``


OpenRouter ▷ #discussion (41 messages🔥):

Claude's Structured Outputs, Qwen 3 VL Video Support, Replicate Joins Cloudflare, OpenRouter-Cloudflare Relationship, Grok 4.1 Announcement


GPU MODE ▷ #general (30 messages🔥):

Nvidia Driver, CUDA kernels, LoRA training in vLLM, GPU clusters maintenance, L1/L2 footprint ratio in CUDA


GPU MODE ▷ #triton-gluon (6 messages):

Triton build RAM usage, Ninja build system issues, Python and C/C++ build systems


GPU MODE ▷ #cuda (53 messages🔥):

B200 Memory Latency, Cutlass v4.3.0, NV-HBI on B200, B200 Bandwidth, SM120


GPU MODE ▷ #cool-links (14 messages🔥):

AI Performance Engineering, Compiler Optimization, CUDA Class, Josh Holloway


GPU MODE ▷ #jobs (3 messages):

Mercor Hiring, PPoPP 2026 AEC Volunteers


GPU MODE ▷ #beginner (15 messages🔥):

CUDA and VS 2022 on Windows, Dual Booting Windows with Ubuntu, CUDA under WSL, Nsight Compute resources, Magnus and Arun talks


GPU MODE ▷ #youtube-recordings (3 messages):

Lecture Slides Request, Paulius Assistance


GPU MODE ▷ #jax-pallas-mosaic (1 messages):

Mosaic-TPU, Pallas access levels


GPU MODE ▷ #torchao (1 messages):

version 0.14.1, version 0.13.0, nsys


GPU MODE ▷ #off-topic (7 messages):

Rassolnik recipe, RTX 5090 Black Screen Issues, Signed Magnitude Negative Zeros, Succinct Y Combinator Application


GPU MODE ▷ #irl-meetup (1 messages):

SC25 meetup, AI projects, HPC projects


GPU MODE ▷ #rocm (5 messages):

AMD GPU MMA, AMD Architect WMMA doc


GPU MODE ▷ #intel (3 messages):

Intel Sycl-TLA, Bank Width


GPU MODE ▷ #self-promotion (12 messages🔥):

NVFP4 GEMV Kernel Implementation, CuTeDSL Improvements, GEMV/Split-K Optimization, Data Infrastructure Treatise


GPU MODE ▷ #thunderkittens (2 messages):

HIP reimplementation, tinygrad advantages, UOps coding


GPU MODE ▷ #gpu模式 (11 messages🔥):

Triton, CUDA Python, vllm, sglang, InfiniTensor


GPU MODE ▷ #submissions (153 messages🔥🔥):

nvfp4_gemv leaderboard, Torch 2.1.0 Leak, NVIDIA performance improvements


GPU MODE ▷ #factorio-learning-env (3 messages):

December 5th Meeting, Google Meet


GPU MODE ▷ #amd-competition (6 messages):

MI300, ROCm Kernels, HuggingFace kernel-builder and kernels libraries, HFxAMD partnership


GPU MODE ▷ #cutlass (2 messages):

Arithmetic Tuple Tensors, TMA Tensors, Scaled Basis Visualization


GPU MODE ▷ #multi-gpu (12 messages🔥):

UCC vs NCCL, UCX Collectives, NCCL Debugging, Multi-GPU on Single Node


GPU MODE ▷ #opencl-vulkan (1 messages):

erichallahan: https://www.phoronix.com/news/NVK-Cooperative-Matrix-Perf


GPU MODE ▷ #helion (6 messages):

My_kernel function implementation using helion, Differences between Torch and Triton semantics


GPU MODE ▷ #nvidia-competition (259 messages🔥🔥):

tcgen05.mma, GEMV, Cutlass Issues, Submission Deadline, Job Opportunities


GPU MODE ▷ #hf-kernels (2 messages):

ROCm kernels, Hugging Face blog post


GPU MODE ▷ #robotics-vla (35 messages🔥):

VLA adapter experiments with Qwen3-VL, Fine-tuning Pi0, Feetech servos, Action representations for RL, VLA-0 paper reproduction


LM Studio ▷ #general (257 messages🔥🔥):

LM Studio DVD inference, MCP gateway SDK, Langchain criticisms, Qwen3Vls perceptiveness, LM Studio image resolution settings


LM Studio ▷ #hardware-discussion (370 messages🔥🔥):

NV-Link bridges, Turing vs Ampere VRAM performance, RTX 2000 Value, Qwen 4B q8 benchmark, Extension Cord Safety


Cursor Community ▷ #general (495 messages🔥🔥🔥):

Cursor Tab Key Gift, GPT-5 High issues, GPT-5 Codex disappointment, Cursor Pro Plan Limits, Figma Designs with Cursor


OpenAI ▷ #ai-discussions (255 messages🔥🔥):

Sora 2 inconsistencies, GPT 5.1 Woes, Nano Banana 2 Release, AI for Windows Gaming, FiveTrainAI and Sentience


OpenAI ▷ #gpt-4-discussions (20 messages🔥):

GPT-5.1 Memory Issues, GPT Model for Exam Preparation, Harmony Response Format, Story Generation Limitations, GPT-5.1 Speed Comparison


OpenAI ▷ #prompt-engineering (7 messages):

Sora Prompts, Epistemic Laziness, LLM Benevolence


OpenAI ▷ #api-discussions (7 messages):

Sora 1, Mass ping detection, Epistemic Laziness Toxicity


Yannick Kilcher ▷ #general (172 messages🔥🔥):

Anthropic's PR Stunts, GPUs in geopolitical conflict, Claude-code comparison


Yannick Kilcher ▷ #paper-discussion (7 messages):

Circuit Sparsity, Exploration vs Exploitation


Yannick Kilcher ▷ #ml-news (73 messages🔥🔥):

AI Sidebar Dissapointment, Firefox vs Brave, Lithium Niobate Challenges, Photonics and Computing, Peter Thiel dumps AI stock


Moonshot AI (Kimi K-2) ▷ #general-chat (231 messages🔥🔥):

Kimi K2's roleplay, Kimi API jailbreaks, Claude's message limit, GLM 4.6, Ernie 5 parameters


HuggingFace ▷ #general (197 messages🔥🔥):

HuggingChat Pricing, AI Generated Videos, TRL GOLD Trainer, AI and Screenshot Manipulation, Human Centric AI


HuggingFace ▷ #i-made-this (8 messages🔥):

Open Source Rust Coding TUI, Memory Bank MCP Server, RAG/Agents Evaluation Tool, RAG Boilerplate Repo, Architecting Agentic AI


HuggingFace ▷ #reading-group (1 messages):

Semantic Chunking, Proposition Methods, Clever Chunking


HuggingFace ▷ #computer-vision (1 messages):

its_nmt05: Can anybody suggest some SOTA segmentation masking models in the present....


HuggingFace ▷ #gradio-announcements (1 messages):

Gradio 6, Gradio 6 launch, Gradio 6 release


HuggingFace ▷ #agents-course (6 messages):

Hugging Face Agentic AI Course, HF Token and 401 Error, GAIA Benchmark Task Files


Modular (Mojo 🔥) ▷ #general (5 messages):

default struct values, separate trait impl, static fields, owned value to immut reference, Mojo Roadmap


Modular (Mojo 🔥) ▷ #mojo (152 messages🔥🔥):

Immut vs Read, GPU programming: hardware tracking and scheduling overhead, Mojo's MAX graph compiler vs. torch.compile, @always_inline("builtin") hack, Int <-> UInt conversion in Mojo nightly


Modular (Mojo 🔥) ▷ #max (1 messages):

hasanabukaram: Can I infer DeepSeek-OCR with Max, using CPU only?


Latent Space ▷ #ai-general-chat (149 messages🔥🔥):

Vercel's Internal AI Agents, Neolab Seed Rounds, Factory Ultra Plan Pricing, Azure AI Foundry Quality Issues, xAI Grok CLI Agent


Eleuther ▷ #general (61 messages🔥🔥):

hardware recommendations for local llm machine, attention-free transformer variants, NeurIPS 2025


Eleuther ▷ #research (42 messages🔥):

Reasoning Data Placement (Pre/Mid/Post Training), Recurrent Model Conversion, Transformers with Tied Weights


Eleuther ▷ #scaling-laws (15 messages🔥):

Transformers vs RNNs, Transformers vs State Space Models, Attention Mechanism, Linear Attention, Domain-Specific Compression


Eleuther ▷ #interpretability-general (6 messages):

Sparse Autoencoders on Attention Heads, Interpretability as Biology, New Papers, Emerging Methods in Interpretability


Eleuther ▷ #multimodal-general (1 messages):

yolito92: 7 000 000 file json ready https://huggingface.co/datasets/YoloMG/ZeronexWikiEnglishfull


Nous Research AI ▷ #announcements (1 messages):

Cline, Hermes 4


Nous Research AI ▷ #general (60 messages🔥🔥):

Model Purchase, Academic vs Industrial Neurips, Vector Graphics by LLMs, Amazon's Nova Premier, AWS Bedrock Experience


Nous Research AI ▷ #ask-about-llms (6 messages):

Similarity Score Tests, Uncensored MoEs, Josiefied Models, Crime Prevention, Alignment Post-Training


Nous Research AI ▷ #research-papers (3 messages):

Agentic AI Frameworks, Clustering Hidden Layers


Nous Research AI ▷ #interesting-links (1 messages):

teknium: https://fxtwitter.com/cline/status/1989432694867193988?s=46


Nous Research AI ▷ #research-papers (3 messages):

Agentic AI Frameworks, Clustering Hidden Layers Ablations


DSPy ▷ #show-and-tell (1 messages):

Viksit article, DSPy updates


DSPy ▷ #papers (5 messages):

GEPA, DSPy, LLM training techniques, Practical applications of LLMs


DSPy ▷ #general (28 messages🔥):

Self promotion instabans, Prompt engineering competition, DSPy GEPA Optimization, Model training


tinygrad (George Hotz) ▷ #general (29 messages🔥):

NeurIPS attendance, uop mapping confirmation, CPU multithreading with OpenMP, tinybox performance


Manus.im Discord ▷ #general (16 messages🔥):

Chat Mode Disappearance, Pro Subscriber Privileges, Credit Inconsistencies, AI Lending Bubble, Private Chats for Pro Users


aider (Paul Gauthier) ▷ #general (10 messages🔥):

MCP Server Setup, Custom Shell for /test and /run, Greedy Decoding for Model Testing


aider (Paul Gauthier) ▷ #questions-and-tips (5 messages):

Terminal rendering of URLs, OpenRouter API key issues, Image Enhancement model struggles, Model Architectures, Loss Functions


MCP Contributors (Official) ▷ #general (14 messages🔥):

2025-11-25 RC Frozen, SEP Merging, Official HTTP Server Implementation for MCP, MCP SDK Discussion