Frozen AI News archive

DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens

**DeepSeek-V3** has launched with **671B MoE parameters** and trained on **14.8T tokens**, outperforming **GPT-4o** and **Claude-3.5-sonnet** in benchmarks. It was trained with only **2.788M H800 GPU hours**, significantly less than **Llama-3**'s **30.8M GPU-hours**, showcasing major compute efficiency and cost reduction. The model is open-source and deployed via **Hugging Face** with API support. Innovations include native FP8 mixed precision training, Multi-Head Latent Attention scaling, distillation from synthetic reasoning data, pruning and healing for MoEs with up to **256 experts**, and a new multi-token prediction objective enabling lookahead token planning. Research highlights also cover the **OREO method** and **Natural Language Reinforcement Learning (NLRL)** for multi-step reasoning and agent control.

Canonical issue URL

AI News for 12/25/2024-12/26/2024. We checked 7 subreddits, 433 Twitters and 32 Discords (215 channels, and 5486 messages) for you. Estimated reading time saved (at 200wpm): 548 minutes. You can now tag @smol_ai for AINews discussions!

image.png

As teased over the Christmas break, DeepSeek v3 is here (our previous coverage of DeepSeek v2 here). The benchmarks are as good as you've come to expect from China's frontier open model lab:

image.png

(more details on aider and bigcodebench)

But the training details are even better:


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Developments and Releases

AI Research Techniques and Benchmarks

Open Source AI vs Proprietary AI

AI Infrastructure and Compute Resources

Immigration and AI Talent Policies

Memes and Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. DeepSeek V3 Release: Technical Innovations and Benchmarks

Theme 2. Cost Efficiency of DeepSeek V3 vs Competition

Theme 3. FP8 Training Breakthrough in DeepSeek V3

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. OpenAI O1 Model Impacting Financial Markets

Theme 2. Debates Surrounding O1 Pro Mode's Usefulness

Theme 3. OpenAI's Latest Developments and Tools Overview

Theme 4. ChatGPT Downtime and User Impact


AI Discord Recap

A summary of Summaries of Summaries by o1-2024-12-17

Theme 1. DeepSeek V3 Takes Center Stage

Theme 2. Code Editors & IDE Woes

Theme 3. AI Powers Creative & Collaborative Work

Theme 4. Retrieval, Fine-Tuning, and HPC Upscaling

Theme 5. Key Tech & Performance Fixes


PART 1: High level Discord summaries

Codeium (Windsurf) Discord


Cursor IDE Discord


aider (Paul Gauthier) Discord


Nous Research AI Discord


OpenRouter (Alex Atallah) Discord


LM Studio Discord


Stability.ai (Stable Diffusion) Discord


OpenAI Discord


Stackblitz (Bolt.new) Discord


Unsloth AI (Daniel Han) Discord


Perplexity AI Discord


Latent Space Discord


Interconnects (Nathan Lambert) Discord


GPU MODE Discord


Notebook LM Discord Discord


LlamaIndex Discord


Eleuther Discord


tinygrad (George Hotz) Discord


Cohere Discord


Modular (Mojo 🔥) Discord


DSPy Discord


LLM Agents (Berkeley MOOC) Discord


OpenInterpreter Discord


Nomic.ai (GPT4All) Discord


LAION Discord


MLOps @Chipro Discord


The Axolotl AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Torchtune Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Codeium (Windsurf) ▷ #content (1 messages):

Windsurf AI

Link mentioned: Tweet from Windsurf (@windsurf_ai): What exactly is Windsurf? Watch how we dared to innovate by breaking every industry convention 🌊


Codeium (Windsurf) ▷ #discussion (433 messages🔥🔥🔥):

Windsurf Performance Issues, Cascade Base Model Concerns, Integration with Remote Hosts, User Experience Feedback, Login and API Errors

Links mentioned:


Codeium (Windsurf) ▷ #windsurf (869 messages🔥🔥🔥):

Windsurf Issues, Cascade Performance, User Experiences, AI Model Performance, Project Development Challenges

Links mentioned:


Cursor IDE ▷ #general (744 messages🔥🔥🔥):

DeepSeek V3 performance, Cursor IDE and DeepSeek integration, Agent mode and token consumption, Challenges with UI design in Next.js, Comparison of different AI models

Links mentioned:


aider (Paul Gauthier) ▷ #announcements (1 messages):

Aider v0.70.0 release, Analytics opt-in feature, Error handling improvements, Model support enhancements

Link mentioned: Release history: Release notes and stats on aider writing its own code.


aider (Paul Gauthier) ▷ #general (557 messages🔥🔥🔥):

DeepSeek V3 vs O1 Pro, Model Comparisons: Claude vs DeepSeek, Using Aider with DeepSeek, Challenges in Code Implementation, Context Limitations in LLMs

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (49 messages🔥):

Aider configuration with aliases, DeepSeek Chat V3 performance, Repo-map functionality, Model combinations in Aider, Managing API keys in config files

Links mentioned:


aider (Paul Gauthier) ▷ #links (2 messages):

BigCodeBench Leaderboard, GitDiagram for Visualization, GitIngest for Codebases

Links mentioned:


Nous Research AI ▷ #general (324 messages🔥🔥):

DeepSeek V3 Release, Linux Mint Experience, Text-to-Video Model Comparisons, URL Moderation API Challenges, Inference Costing and Deployment

Links mentioned:


Nous Research AI ▷ #ask-about-llms (2 messages):

NotebookLM Inline Sourcing


Nous Research AI ▷ #research-papers (2 messages):

Differentiable Cache Augmentation, DeepSeek-V3

Links mentioned:


Nous Research AI ▷ #research-papers (2 messages):

Differentiable Cache Augmentation, DeepSeek V3

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Web Search for LLMs, Price Cuts on Models, New Endpoints API, Deepseek v3 Launch

Links mentioned:


OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

3D Game Generation Tool, AI Chat Terminal (ACT)

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (301 messages🔥🔥):

DeepSeek V3 Feedback, OpenRouter Chat Performance, DeepSeek Pricing, API Limitations, Model Comparisons

Links mentioned:


LM Studio ▷ #general (153 messages🔥🔥):

LM Studio Model Performance, AI Roleplaying Game Management, Memory Management Issues, Implementation of RAG for PAM, Model Context Length Limitations

Links mentioned:


LM Studio ▷ #hardware-discussion (101 messages🔥🔥):

X99 motherboard performance, GPU utilization for LLMs, Hardware recommendations for AI training, Multi-GPU setups, Low VRAM models for video generation

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (226 messages🔥🔥):

AI Image Generation Techniques, ComfyUI Usage Tips, Stable Diffusion Model Comparisons, Video Generation Capabilities, NSFW Protections in LoRA

Links mentioned:


OpenAI ▷ #ai-discussions (183 messages🔥🔥):

OpenAI Outage and Alternatives, DeepSeek V3 Performance, Model Comparisons, ChatGPT Competitors, Acronym Understanding in LLMs

Link mentioned: High error rates for ChatGPT, APIs, and Sora: no description found


OpenAI ▷ #gpt-4-discussions (33 messages🔥):

GPT-O3 Release, ChatGPT Down Status, Using GPTs in RPGs, Canvas Window Issues, Eslint Configuration


OpenAI ▷ #prompt-engineering (1 messages):

madame_architect: Why would a minute not be ok?


OpenAI ▷ #api-discussions (1 messages):

madame_architect: Why would a minute not be ok?


Stackblitz (Bolt.new) ▷ #prompting (7 messages):

ProductPAPI, Anthropic Concerns, Direct Code Modification, Claude Load Issues


Stackblitz (Bolt.new) ▷ #discussions (198 messages🔥🔥):

Issues with Bolt token usage, Building applications using Bolt, Feature requests and feedback for Bolt, Community support and collaboration, Tool limitations and user experiences

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (132 messages🔥🔥):

QVQ Model Launch, Fine-Tuning Llama Models, DeepSeek V3 Discussion, Nvidia Driver Issues, Dataset Formatting for AI Training

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (4 messages):

Sprint mode query, Coding datasets for instruction-tuning, Personal training experience for thesis


Unsloth AI (Daniel Han) ▷ #help (37 messages🔥):

SFT DPO Evaluation, Fine-tuning Llama 3.2 Vision, Using Unsloth with CPU, GGUF Conversion Issues, Model Performance Discrepancies

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (5 messages):

Stella recommendations, Mixed Bread models, Benchmarking and finetuning


Perplexity AI ▷ #general (134 messages🔥🔥):

Perplexity AI usage concerns, Feedback on AI models, Job inquiries in AI, AI model comparisons, Subscription and access issues

Links mentioned:


Perplexity AI ▷ #sharing (14 messages🔥):

NASA touches the Sun, Murder Hornets eradicated, Solar paint for EV charging, AI model from India, Body-heat powered wearables

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (4 messages):

Payment Processing, Virtual Cards, OpenRouter Credits, Perplexity Models


Latent Space ▷ #ai-general-chat (103 messages🔥🔥):

DeepSeek V3, OpenAI outages, ChatGPT memory improvements, RL training for LLM reasoning, Anduril partnership with OpenAI

Links mentioned:


Latent Space ▷ #ai-announcements (3 messages):

2024 in Synthetic Data, 2024 in Agents, AI Engineer Summit NYC, Event Calendar Updates

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (45 messages🔥):

DeepSeek V3 Launch, Multi-Token Prediction, Reward Model Techniques, Performance Comparisons, Model Training Techniques

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-questions (5 messages):

Deepseek's Multi-head Latent Attention Mechanism, Implementations and Inference Libraries, Deepseek V2 Paper Insights, Deepseek V3 Inference Code

Link mentioned: DeepSeek-V3/inference/model.py at main · deepseek-ai/DeepSeek-V3: Contribute to deepseek-ai/DeepSeek-V3 development by creating an account on GitHub.


Interconnects (Nathan Lambert) ▷ #ml-drama (15 messages🔥):

QvQ License Update, Bluesky Safety Concerns, AI Backlash from Data Scientists

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (6 messages):

Meta paper on self-awareness in language models, Copyright lawsuits against AI companies, Anthropic and copyright issues, Public benefit ethos of AI companies

Link mentioned: Every AI Copyright Lawsuit in the US, Visualized: WIRED is following every copyright battle involving the AI industry—and we’ve created some handy visualizations that will be updated as the cases progress.


Interconnects (Nathan Lambert) ▷ #memes (4 messages):

Bluesky performance, New LLM release from DeepSeek

Links mentioned:


Interconnects (Nathan Lambert) ▷ #nlp (3 messages):

Monte Carlo Tree Search, Iterative Preference Learning, Reasoning in LLMs

Link mentioned: Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning: We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by ...


Interconnects (Nathan Lambert) ▷ #reads (8 messages🔥):

Effective RL Training for LLMs, DPO vs PPO, Reasoning in RL, Viewing Parties for Lectures

Link mentioned: - YouTube: no description found


GPU MODE ▷ #general (12 messages🔥):

Learning CUDA Programming, DETRs and PyTorch, Shared Memory in CUDA, DeepSeek-V3 Training, Earning via Telegram Strategies

Links mentioned:


GPU MODE ▷ #triton (8 messages🔥):

Casting Issues in Triton, Device Printing in Colab, Infinity Feature in Triton, Triton Recompilation, Scam Alert

Links mentioned:


GPU MODE ▷ #cuda (14 messages🔥):

TMA vs cp.async for GEMM, DETRs and PyTorch inquiries, Performance requirements for WGMMA, CUTLASS discussion on Hopper structured sparse GEMM, Earning methods shared on social media

Link mentioned: CUTLASS 3.6.0 · NVIDIA/cutlass · Discussion #2013: Hopper structured sparse GEMM. FP16 FP8 INT8 TF32 A refactor to the CUTLASS 3.x convolution kernel::ConvUniversal API to bring it in line with gemm::GemmUniversal. Now the 3.x convolution API is no...


GPU MODE ▷ #torch (3 messages):

Guard Functions Impact, Earning Strategies

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #algorithms (1 messages):

Earning Strategies, Telegram Outreach

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #cool-links (2 messages):

Character.AI Inference Optimization, Earning Money Online, Telegram Offers

Links mentioned:


GPU MODE ▷ #beginner (5 messages):

Learning CUDA and Triton, vLLM Token Throughput Analysis, Sequence Stacking in Attention Mechanisms, Optimized Attention Implementations, Earning Opportunities

Links mentioned:


GPU MODE ▷ #pmpp-book (3 messages):

PMPP lectures, Earning strategies

Links mentioned:


GPU MODE ▷ #jax (1 messages):

Earning $100k quickly, Investment schemes, Telegram outreach

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #torchao (1 messages):

Earning opportunities, Telegram outreach, Profit sharing

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #sequence-parallel (1 messages):

Earning $100k, Profit reimbursement, Telegram contact

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #off-topic (1 messages):

Earning $100k in 72 hours, Charles William's proposition, Reimbursement of profits, Telegram contact

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #irl-meetup (1 messages):

Earning $100k, Telegram Contact, Profit Sharing Strategy

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #hqq-mobius (1 messages):

Earning Strategies, Telegram Networking

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #triton-viz (1 messages):

Earning $100k strategy, Telegram outreach

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #llmdotc (1 messages):

Earning $100k in 72 hours, Profit-sharing model, Telegram outreach, Investment opportunities, Financial advice

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #rocm (1 messages):

Earning opportunities, Profit-sharing scheme, Telegram outreach

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #intel (1 messages):

Earning $100k, Charles William's Offer, Telegram Outreach

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #lecture-qa (1 messages):

Earning $100k in 72 hours, Reimbursement profit model, Telegram contact for details

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #bitnet (7 messages):

No backpropagation training method, Energy-efficient model training, Random walk sampling technique, Discussion on gradient methods

Links mentioned:


GPU MODE ▷ #arm (1 messages):

Earning $100k, Profits reimbursement, Networking on Telegram

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #sparsity-pruning (2 messages):

Sparsification in PyTorch, Earning Strategies

Links mentioned:


GPU MODE ▷ #webgpu (1 messages):

Earning $100k, Profit Sharing Model, Telegram Outreach

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #liger-kernel (1 messages):

Earning Strategies, Telegram Networking

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #metal (2 messages):

Running .air files on iPad, Fundraising strategies

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #avx (1 messages):

Earning $100k swiftly, Profit sharing model, Telegram outreach

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #🍿 (1 messages):

Earning $100k in 72 hours, Profit-sharing business model, Telegram outreach

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #edge (1 messages):

$100k in 72 hours, Profit Reimbursement, Telegram Outreach

Link mentioned: Charles William: Spreading the wealth around the world.


GPU MODE ▷ #arc-agi-2 (9 messages🔥):

Oreo Code Release, Hugging Face TRL Library, ARC-AGI-2 Repository, Chollet's Views on VLMs, 1D Task Generators

Links mentioned:


Notebook LM Discord ▷ #use-cases (7 messages):

Podcast integration with Google News, AI-generated podcasts, Summarizing Pathfinder adventures, Audio Overviews of news articles, Chatbots in everyday scenarios

Links mentioned:


Notebook LM Discord ▷ #general (54 messages🔥):

PDF Upload Issues, Podcast Customization, Language Settings, Feature Requests, AI Podcast Sharing

Links mentioned:


LlamaIndex ▷ #blog (1 messages):

Report Generation Agent, LlamaParse, LlamaCloud


LlamaIndex ▷ #general (36 messages🔥):

LlamaIndex and OpenAI integration, DocumentContextExtractor proposals, Tokenization and truncation issues, Generating LlamaIndex documentation, Payroll PDF parsing solutions


LlamaIndex ▷ #ai-discussion (3 messages):

Unstructured RAG, LangChain, Unstructured IO, Athina AI, LlamaIndex

Link mentioned: End-to-End Guide: Implementing Unstructured RAG Systems: Learn the complete process for implementing Unstructured RAG systems. Boost AI performance with this comprehensive Athina AI Hub Original guide!


Eleuther ▷ #general (16 messages🔥):

Hugging Face Checkpoints, Fine-Tuning VLMs, Loss Calculation in Trainer


Eleuther ▷ #research (17 messages🔥):

Latency reduction techniques for LLMs, Performance of LLMs in engineering applications, Comparative analysis of LLMs and optimized models, Self-improvement in LLMs, Insights on token prediction objectives

Links mentioned:


Eleuther ▷ #interpretability-general (6 messages):

GPT-2 Token Activations, BOS Token Discussion


Eleuther ▷ #multimodal-general (1 messages):

Encoder-Free VLMs, Video VLMs, Encoder Efficiency, Fuyu Model Series, EVE Model

Link mentioned: GitHub - baaivision/EVE: [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models: [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models - baaivision/EVE


tinygrad (George Hotz) ▷ #general (8 messages🔥):

Proof in Lean Bounty, BITCAST Const Folding, Matching Engine Performance Bounties, Tinygrad Updates, Performance Optimization

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (31 messages🔥):

Performance Comparisons, Tinygrad Model Integration, Beam Search Efficiency, GPU Compatibility, Kernel Caching

Link mentioned: fish-speech/fish_speech/models/text2semantic/llama.py at main · fishaudio/fish-speech: SOTA Open Source TTS. Contribute to fishaudio/fish-speech development by creating an account on GitHub.


Cohere ▷ #discussions (13 messages🔥):

Christmas Greetings, Re-ranker Pricing Inquiry, AI and ML Learning Journey

Links mentioned:


Cohere ▷ #questions (6 messages):

CMD-R updates, R7B beginnings, HuggingFace finetunes, User feedback


Cohere ▷ #cmd-r-bot (15 messages🔥):

LLM University, Command R model, Command R+ performance


Cohere ▷ #projects (1 messages):

Voice to Voice chat app, Music Generation using Generative AI, DNN-VAD, NLP projects, ASR project


Modular (Mojo 🔥) ▷ #general (6 messages):

io_uring networking, Mojo swag, Modular merchandise


Modular (Mojo 🔥) ▷ #mojo (19 messages🔥):

Importing String Module Issues, StringRef and Crash Causes, Testing for EOF in Read Until Delimiter, Concerns About Copyable Traits

Link mentioned: Commits · mahiro21h/mojo: The Mojo Programming Language. Contribute to mahiro21h/mojo development by creating an account on GitHub.


Modular (Mojo 🔥) ▷ #max (2 messages):

Modular Stack Kernel, MAX vs XLA Compile Times


DSPy ▷ #show-and-tell (1 messages):

PyN8N, DSLModel, AI Workflow Creation

Link mentioned: no title found: no description found


DSPy ▷ #general (12 messages🔥):

NotebookLM inline sourcing, Jekyll glossary script, Typing.TypedDict usage, Pydantic for output fields design

Link mentioned: A script to generate a glossary of key terms from your Jekyll posts. We're using DSPy to handle LLM interactions; it helps with boilerplate prompt context and parsing responses into Pydantic objects. To run this, put this script in a folder named 'scripts' (or whatever) in your Jekyll site directory. Then plug in your Anthropic API key (or point DSPy to the LLM endpoint of your choice). It will output a YAML file named 'glossary.yaml' to your '_data' directory.: A script to generate a glossary of key terms from your Jekyll posts. We're using DSPy to handle LLM interactions; it helps with boilerplate prompt context and parsing responses into Pydantic o...


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (8 messages🔥):

Certificate Distribution, Certificate Declaration Form, Next Course Start Date


OpenInterpreter ▷ #general (7 messages):

Open Interpreter API, OCR functionality, Desktop version release, Voice to Voice chat app, QvQ with Open-Interpreter

Link mentioned: no title found: no description found


Nomic.ai (GPT4All) ▷ #general (3 messages):

UI Features, Copying Code, Keyboard Shortcuts


LAION ▷ #general (2 messages):

TTS dataset creation, Audio segmentation, Using Whisper for transcription


MLOps @Chipro ▷ #general-ml (1 messages):

ML ops frameworks, HPC environments, Guild AI stability, DIY ops framework






{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}