Frozen AI News archive

Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500

**DeepSeek r1** leads the race for "open o1" models but has yet to release weights, while **Justin Lin** released **QwQ**, a **32B open weight model** that outperforms **GPT-4o** and **Claude 3.5 Sonnet** on benchmarks. QwQ appears to be a fine-tuned version of **Qwen 2.5**, emphasizing sequential search and reflection for complex problem-solving. **SambaNova** promotes its RDUs as superior to GPUs for inference tasks, highlighting the shift from training to inference in AI systems. On Twitter, **Hugging Face** announced CPU deployment for llama.cpp instances, **Marker v1** was released as a faster and more accurate deployment tool, and **Agentic RAG** developments focus on integrating external tools and advanced LLM chains for improved response accuracy. The open-source AI community sees growing momentum with models like **Flux** gaining popularity, reflecting a shift towards multi-modal AI models including image, video, audio, and biology.

Canonical issue URL

AI News for 11/27/2024-11/28/2024. We checked 7 subreddits, 433 Twitters and 29 Discords (198 channels, and 2864 messages) for you. Estimated reading time saved (at 200wpm): 341 minutes. You can now tag @smol_ai for AINews discussions!

In the race for "open o1", DeepSeek r1 (our coverage here) still has the best results, but has not yet released weights. An exhausted-sounding Justin Lin made a sudden late release today of QwQ, weights, demo and all:

image.png

Quite notably, this 32B open weight model fully trounces GPT4o and Claude 3.5 Sonnet on every benchmark.

Categorizing QwQ is an awkward task: it makes enough vague handwaves at sampling time scaling to get /r/localLlama excited:

image.png

But the model weights itself show that it looks like a Qwen 32B model (probably Qwen 2.5, our coverage here), so perhaps it has just been finetuned to take "time to ponder, to question, and to reflect", to "carefully examin[e] their work and learn from mistakes". "This process of careful reflection and self-questioning leads to remarkable breakthroughs in solving complex problems". All of which are vaguely chatgptesque descriptions and do not constitute a technical report, but the model is real and live and downloadable which says a lot. The open "reasoning traces" demonstrate how it has been tuned to do sequential search:

image.png

image.png

image.png

A fuller technical report is coming but this is impressive if it holds up... perhaps the real irony is Reflection 70B (our coverage here) wasn't wrong, just early...


[Sponsored by SambaNova] Inference is quickly becoming the main function of AI systems, replacing model training. It’s time to start using processors that were built for the task. SambaNova’s RDUs have some unique advantages over GPUs in terms of speed and flexibility.

swyx's comment: RDU's are back! if a simple 32B autoregressive LLM like QwQ can beat 4o and 3.5 Sonnet, that is very good news for the alternative compute providers, who can optimize the heck out of this standard model architecture for shockingly fast/cheap inference.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

Theme 1. Hugging Face and Model Deployments

Theme 2. Open Source AI Momentum

Theme 3. NVIDIA and CUDA Advancements

Theme 4. Impact of VC Practices and AI Industry Insights

Theme 5. Multimodal Model Development

Theme 6. Memes and Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. QwQ-32B: Qwen's New Reasoning Model Matches O1-Preview

Theme 2. Qwen2.5-Coder-32B AWQ Quantization Outperforms Other Methods

Theme 3. Cost-Effective Hardware Setups for 32B Model Inference

Theme 4. NVIDIA Star-Attention: 11x Faster Long Sequence Processing

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

Theme 1. Claude Gets Model Context Protocol for Direct System Access

Theme 2. ChatGPT Voice Makes 15% Cold Call Conversion Rate

Theme 3. OpenAI's $1.5B Softbank Investment & Military Contracts Push

Theme 4. Local LLaMa-Mesh Integration Released for Blender


AI Discord Recap

A summary of Summaries of Summaries by O1-preview

Theme 1. AI Models Break New Ground in Efficiency and Performance

Theme 2. AI Tools and Infrastructure Level Up

Theme 3. AI Community Grapples with Ethical Quandaries

Theme 4. Users Wrestle with AI Tool Growing Pains

Theme 5. Big Bucks and Moves in the AI Industry


PART 1: High level Discord summaries

Modular (Mojo 🔥) Discord


Cursor IDE Discord


OpenAI Discord


Nous Research AI Discord


Eleuther Discord


OpenRouter (Alex Atallah) Discord


Interconnects (Nathan Lambert) Discord


Perplexity AI Discord


aider (Paul Gauthier) Discord


Unsloth AI (Daniel Han) Discord


Stability.ai (Stable Diffusion) Discord


Notebook LM Discord Discord


Cohere Discord


Latent Space Discord


GPU MODE Discord


LlamaIndex Discord


tinygrad (George Hotz) Discord


Torchtune Discord


OpenInterpreter Discord


Axolotl AI Discord


MLOps @Chipro Discord


LAION Discord


Gorilla LLM (Berkeley Function Calling) Discord


LLM Agents (Berkeley MOOC) Discord


Mozilla AI Discord


AI21 Labs (Jamba) Discord


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Modular (Mojo 🔥) ▷ #mojo (435 messages🔥🔥🔥):

Mojo performance issues, Memory management in Mojo, Mutable aliasing and origins, Function signatures and origins

Links mentioned:


Modular (Mojo 🔥) ▷ #max (4 messages):

MAX Engine vs ONNX, MAX Graph API, AI Model Inference

Link mentioned: Get started with MAX Graph | Modular Docs: Learn how to build a model graph with our Mojo API for inference with MAX Engine.


Cursor IDE ▷ #general (332 messages🔥🔥):

Cursor Agent Performance, Cursor Version Updates, User Experiences with Cursor, Model Context Protocol (MCP), Markdown Issues and Bug Fixes

Links mentioned:


OpenAI ▷ #ai-discussions (95 messages🔥🔥):

Sora Video Generator Leak, ChatGPT Image Analysis Capabilities, Discord Community Engagement, Translation Challenges, Usage of AI in Content Creation

Link mentioned: Public Access to Open AI's Sora Video Generator just Leaked...: In this video, I discuss the unexpected leak of OpenAI's Sora, an advanced AI video generation tool. The leak was reportedly initiated by artists protesting ...


OpenAI ▷ #gpt-4-discussions (85 messages🔥🔥):

Accessing ChatGPT for Free, User Experience with ChatGPT, ChatGPT's Reliability and Validity, Using Files with GPT, Community Interaction and Humor


OpenAI ▷ #prompt-engineering (62 messages🔥🔥):

Empirical Prompting Research, AI Phone Calls, Model Testing Frameworks, Error Identification in AI, General Problem-Solving Strategies


OpenAI ▷ #api-discussions (62 messages🔥🔥):

Empirical Prompting Research, AI Phone Call Agents, Testing and Consistency in AI, IVR vs AI Interaction, RAG for File Referencing


Nous Research AI ▷ #general (85 messages🔥🔥):

OLMo Model Updates, GPU Rental Experiences, Nous Hermes Model Comparisons, Qwen Reasoning Model Release, Issues with Crypto Scams

Links mentioned:


Nous Research AI ▷ #ask-about-llms (9 messages🔥):

Test Time Training, ARC Prize, Table Question Answering


Nous Research AI ▷ #research-papers (93 messages🔥🔥):

MH-MoE Model Efficiency, Star Attention Mechanism, DALL-E Variational Bound Issues, Financial Economics and Bayesian Stats, Algorithmic Trading Experiences

Links mentioned:


Nous Research AI ▷ #interesting-links (4 messages):

Karpathy bump, TL;DR newsletter, Priompt to Python porting, Prompt design tools

Link mentioned: GitHub - zenbase-ai/py-priompt: Prompt design in Python: Prompt design in Python. Contribute to zenbase-ai/py-priompt development by creating an account on GitHub.


Nous Research AI ▷ #research-papers (93 messages🔥🔥):

MH-MoE paper, Star Attention paper, DALL-E variational bounds, Conditional independence in ML, Bayesian stats in finance

Links mentioned:


Eleuther ▷ #general (6 messages):

FSDP Spaghetti Internals, Dynamic Structs, Knowledge Conflict


Eleuther ▷ #research (276 messages🔥🔥):

RWKV-7 developments, SSM and graphs curvature, Mamba 2 architecture, Gradient descent optimization, Curvature and vertex degree analogy

Links mentioned:


Eleuther ▷ #interpretability-general (1 messages):

Research updates


Eleuther ▷ #lm-thunderdome (1 messages):

RunPod server configurations, OpenAI completions endpoint, Llama 3.2 model usage

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Gemini Flash 1.5, Provider Routing, Load Balancing, Grok Vision Beta

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (136 messages🔥🔥):

Jamba 1.5 model, AI21 Labs support, EVA Qwen2.5 pricing, Claude API issues, OpenRouter functionality

Links mentioned:


OpenRouter (Alex Atallah) ▷ #beta-feedback (4 messages):

Custom API Keys Access Requests


Interconnects (Nathan Lambert) ▷ #news (61 messages🔥🔥):

QwQ-32B-Preview, Olmo Model Differences, PRM Utility in Scaling, Tülu vs Olmo Performance, Demo Availability

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (49 messages🔥):

Bsky dataset controversy, Impact on social media research, User blocking trends, Dataset releases on Bluesky, Public accessibility of online posts

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (14 messages🔥):

Model insights, O1 LLM impact, Upcoming content, DesuAnon's new archive formats, 2024 AI illustrations

Links mentioned:


Interconnects (Nathan Lambert) ▷ #reads (2 messages):

Low-bit quantization in LLMs, Deepseek's AI advancements, CEO Liang Wenfeng's background, High-Flyer's role in Deepseek, Compute resources in AI development

Links mentioned:


Interconnects (Nathan Lambert) ▷ #posts (2 messages):

``


Perplexity AI ▷ #general (102 messages🔥🔥):

Perplexity support issues, Discount offerings, Image generation capabilities, Model selection benefits, Competitors in the search/chat hybrid area

Links mentioned:


Perplexity AI ▷ #sharing (5 messages):

AI Soundscapes, Cognitive Behavioral Therapy, Oldest Alphabetic Writing, Bluesky vs Zuckerberg, Black Friday Debate


Perplexity AI ▷ #pplx-api (4 messages):

Perplexity API financial data sources, Reddit citation issues, GitHub projects, Perplexity engine depreciation

Link mentioned: GitHub - dawid-szewc/perplexity-cli: 🧠 A simple command-line client for the Perplexity API. Ask questions and receive answers directly from the terminal! 🚀🚀🚀: 🧠 A simple command-line client for the Perplexity API. Ask questions and receive answers directly from the terminal! 🚀🚀🚀 - dawid-szewc/perplexity-cli


aider (Paul Gauthier) ▷ #general (84 messages🔥🔥):

Scripting Aider with Python, Sonnet in Aider Development, Benchmark Results for Aider, Cedarscript Discussion, QwQ Model Performance

Link mentioned: QwQ: Reflect Deeply on the Boundaries of the Unknown: GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDNote: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”.What does it mean to think, to question, to understand? These are t...


aider (Paul Gauthier) ▷ #questions-and-tips (26 messages🔥):

PDF Support in Sonnet, Refactoring with Aider, Aider Commands and Features, Next.js Folder Structure Issues, Whisper API for Transcription

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (46 messages🔥):

Axolotl vs Unsloth, Fine-tuning pre-trained models, Multiple models inference, Formatting for fine-tuning datasets, Embeddings model fine-tuning

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (17 messages🔥):

RTX 3090 Pricing, GPU Hosting Solutions, Docker Containers in Hosting, Demand for Higher VRAM GPUs


Unsloth AI (Daniel Han) ▷ #help (40 messages🔥):

Finetuning Local Models, Multi-GPU Support, Evaluating Models for EM and F1-Score, Model Quantization, Using Prompt Styles for Mistral-Nemo

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (2 messages):

Equation Correction, Order of Operations

Link mentioned: QwQ: Reflect Deeply on the Boundaries of the Unknown: GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDNote: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”.What does it mean to think, to question, to understand? These are t...


Stability.ai (Stable Diffusion) ▷ #general-chat (99 messages🔥🔥):

Wildcard Definitions, Image Generation Workflows, ControlNet Functionality, High-Quality Image Creation, SD Plugin Updates

Link mentioned: Definition of WILD CARD: an unknown or unpredictable factor; one picked to fill a leftover playoff or tournament berth after regularly qualifying competitors have all been determined… See the full definition


Notebook LM Discord ▷ #use-cases (9 messages🔥):

NotebookLM Experiments, AI Video Showcase, Model Comparisons, Satirical Writing, AI Interaction Insights

Links mentioned:


Notebook LM Discord ▷ #general (72 messages🔥🔥):

Notebook LM support issues, Notebook sharing limitations, AI content generation, Podcast length customization, Networking concerns

Links mentioned:


Cohere ▷ #discussions (7 messages):

Cohere Community, Welcoming New Members


Cohere ▷ #questions (68 messages🔥🔥):

Cohere API use with LiteLLM, Cohere model integration challenges

Links mentioned:


Cohere ▷ #projects (1 messages):

Full Stack AI Engineering, Web Application Development, AI-driven Solutions, Containerization and Deployment, Model Training and Deployment


Latent Space ▷ #ai-general-chat (55 messages🔥🔥):

PlayAI funding, OLMo 2 release, SmolVLM introduction, Deepseek AI developments, Generative AI in enterprise

Links mentioned:


GPU MODE ▷ #general (4 messages):

Poll Improvement Suggestions, General Setup for Courses, GPU Access for Personal Use


GPU MODE ▷ #cuda (3 messages):

Kernel Fusion, Model Traces, Reducing CUDA Kernel Launch Overheads

Link mentioned: Tweet from mike64_t (@mike64_t): @memorypaladin @ID_AA_Carmack @ezyang If this theory holds, changing the launch bounds should do something similar.


GPU MODE ▷ #torch (9 messages🔥):

FSDP multiple ranks, FLOPS counting challenges, GPU performance discrepancies, Math library workspace issues, Efficient use of FLOP counters

Link mentioned: torchtune/torchtune/training/_distributed.py at b5d2e6372017c163914b13b2514f29914e5dbb84 · pytorch/torchtune: PyTorch native finetuning library. Contribute to pytorch/torchtune development by creating an account on GitHub.


GPU MODE ▷ #algorithms (4 messages):

LoLCATs paper, ThunderKittens kernel, Model Throughput Issues, Linearized Attention Performance

Link mentioned: Linearizing LLMs with LoLCATs: no description found


GPU MODE ▷ #beginner (11 messages🔥):

#pragma unroll usage, Machine specifications for GPU work, NVIDIA vs AMD GPUs, Importance of recent GPU architecture


GPU MODE ▷ #llmdotc (2 messages):

CUDA course on freeCodeCamp, cublaslt performance


GPU MODE ▷ #intel (1 messages):

binarysoloist: First Google result for meteor lake NPU says it’s shared memory


GPU MODE ▷ #liger-kernel (5 messages):

Motherboard Replacement, GPU Performance, CMOS Battery Check, Pull Request Comments


GPU MODE ▷ #🍿 (2 messages):

CUDA bot usage, VS Code extension for AI coding, GitHub Copilot customization

Links mentioned:


GPU MODE ▷ #thunderkittens (3 messages):

FP8 support, ThunderKittens kernels

Links mentioned:


LlamaIndex ▷ #blog (4 messages):

Azure OpenAI endpoints, CXL memory for RAG, Quality-aware documentation chatbot, MSIgnite announcements, LlamaParse functionalities


LlamaIndex ▷ #general (18 messages🔥):

BM25 retriever with Postgres, Loading index in Milvus, Ollama API independence, Pydantic model extraction with o1 models, Document hashing comparison


tinygrad (George Hotz) ▷ #general (10 messages🔥):

TinyCloud Infrastructure, FPGA Backend Development, Cloud Access for Tinygrad Contributors, Intel/ARC Support Concerns, Tinybox Performance Focus

Link mentioned: Tweet from the tiny corp (@tinygrad): We will have 9 tinybox reds (54x 7900XTX) up in a test cloud by the end of the year. (stable with our custom driver) Will be free for tinygrad contributors to use.Using it will be as simple as running...


tinygrad (George Hotz) ▷ #learn-tinygrad (11 messages🔥):

GPU Radix Sort Optimization, Handling Edge Cases, Sorting Algorithm Selection, Vectorization Techniques, Support for Bitonic Sort

Link mentioned: tinygrad/examples/stunning_mnist.py at 84f96e48a1bb8826d868ad19ea34ce2deb019ce1 · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ - tinygrad/tinygrad


Torchtune ▷ #dev (15 messages🔥):

Educational Chatbot Development, Torchtune Compatibility, LoRA Single-Device Recipes, Torchtune Commit Milestone, Activation Offloading and Memory Efficiency


OpenInterpreter ▷ #general (8 messages🔥):

Normal Mode vs OS Mode, OS Mode requirements, CLI vs GUI functionality, Open Interpreter Point API


OpenInterpreter ▷ #ai-content (6 messages):

MCP tool feedback, Installed servers & tools, Cheatsheets for MCP

Links mentioned:


Axolotl AI ▷ #general (5 messages):

SmolLM2-1.7B, Transformers.js v3 Release, Frontend LLM Tasks, Axolotl Full Fine Tuning, Qwen 2.5 Model Configuration

Links mentioned:


MLOps @Chipro ▷ #events (2 messages):

Feature Store Webinar, Multi-Agent Framework Bootcamp

Links mentioned:


MLOps @Chipro ▷ #general-ml (2 messages):

LLMOps resource, Large Language Models impact

Link mentioned: LLMOps Part 1: Introduction: The world is experiencing a transformative wave driven by large language models (LLMs). These advanced AI models, capable of understanding and generating human-quality text, are changing interactions ...


LAION ▷ #general (4 messages):

Audio Dataset Captioning, Faster Whisper Batching, Script for Captioning, Audio Joining Strategy

Link mentioned: GitHub - SYSTRAN/faster-whisper: Faster Whisper transcription with CTranslate2: Faster Whisper transcription with CTranslate2. Contribute to SYSTRAN/faster-whisper development by creating an account on GitHub.


Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (3 messages):

Llama 3.2 System Prompt, Multi Turn Categories Evaluation, Leaderboard Score Changes, Error Logs Observations


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (2 messages):

Quiz Score Notifications, Confirmation Emails


Mozilla AI ▷ #announcements (1 messages):

Hidden States Unconference, Local RAG Application Workshop, ESM-1 Protein Language Model Discussion, San Francisco Demo Night, Data Bias Seminar


AI21 Labs (Jamba) ▷ #general-chat (1 messages):

Jamba 1.5 Mini Model, Function Calling Issues, OpenRouter Performance, Password Change Request


{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}