Frozen AI News archive

not much happened today

**DeepSeek-R1 and DeepSeek-V3** models have made significant advancements, trained on an **instruction-tuning dataset of 1.5M samples** with **600,000 reasoning** and **200,000 non-reasoning SFT data**. The models demonstrate strong **performance benchmarks** and are deployed on-premise via collaborations with **Dell** and **Hugging Face**. Training costs are estimated around **$5.5M to $6M**, with efficient hardware utilization on **8xH100 servers**. The **International AI Safety Report** highlights risks such as **malicious use**, **malfunctions**, and **systemic risks** including **AI-driven cyberattacks**. Industry leaders like **Yann LeCun** and **Yoshua Bengio** provide insights on market reactions, AI safety, and ethical considerations, with emphasis on AI's role in creativity and economic incentives.

Canonical issue URL

AI News for 1/28/2025-1/29/2025. We checked 7 subreddits, 433 Twitters and 34 Discords (225 channels, and 4890 messages) for you. Estimated reading time saved (at 200wpm): 549 minutes. You can now tag @smol_ai for AINews discussions!

Rumors of Grok 3 and o3-mini continue to swirl.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

DeepSeek Developments and Performance

AI Model Training, Costs, and Hardware

Open Source AI and Deployment

AI Safety, Risks, and Ethics

AI Industry Insights and Comparisons

Memes/Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Confusion over DeepSeek R1 Models and Distillations

Theme 2. Speculation on US Ban of DeepSeek and Market Impact

Theme 3. DeepSeek API Challenges Amidst DDoS Attacks

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. OpenAI's Allegation: DeepSeek Leveraged Their Model

Theme 2. Qwen 2.5 Max vs GPT-4o: Price and Performance Clash

Theme 3. Gemini 2's Flash Thinking: Evolution in AI Speed


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Exp (gemini-2.0-flash-exp)

Theme 1: DeepSeek R1 Model Mania: Performance, Problems, and Promise

Theme 2: Model Deployment and Hardware Headaches

Theme 3: AI Tools, Frameworks, and Their Quirks

Theme 4: Training Techniques and Emerging Models

Theme 5: AI Ethics, Data, and the Future


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


OpenAI Discord


LM Studio Discord


aider (Paul Gauthier) Discord


Perplexity AI Discord


Nous Research AI Discord


Codeium (Windsurf) Discord


OpenRouter (Alex Atallah) Discord


Interconnects (Nathan Lambert) Discord


Cursor IDE Discord


Yannick Kilcher Discord


Eleuther Discord


GPU MODE Discord


Stability.ai (Stable Diffusion) Discord


Stackblitz (Bolt.new) Discord


MCP (Glama) Discord


Nomic.ai (GPT4All) Discord


Notebook LM Discord Discord


Latent Space Discord


Cohere Discord


LLM Agents (Berkeley MOOC) Discord


Modular (Mojo 🔥) Discord


Torchtune Discord


Axolotl AI Discord


LlamaIndex Discord


MLOps @Chipro Discord


DSPy Discord


OpenInterpreter Discord


tinygrad (George Hotz) Discord


LAION Discord


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Unsloth AI (Daniel Han) ▷ #general (584 messages🔥🔥🔥):

Unsloth AI performance and functionalities, Training with deep learning models, Reinforcement Learning advancements, Fine-tuning models with synthetic datasets, Dynamic quantization for efficient modeling

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (24 messages🔥):

Federated Learning, LaMDA sentience claims, Consciousness and Sentience, AI Roleplaying, Deepseek use in workplace

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (131 messages🔥🔥):

DeepSeek R1 model handling, Model training issues and optimizations, Qwen2.5-VL support updates, Ollama and llama.cpp functionalities, Running models on various hardware

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (3 messages):

AGI breakthroughs, Cybergod paper, Auto-download links controversy


OpenAI ▷ #ai-discussions (404 messages🔥🔥🔥):

DeepSeek vs OpenAI, Censorship in AI, Using Multiple AI Models, AI in Creative Writing, Real-Time Functionality with AI

Links mentioned:


OpenAI ▷ #gpt-4-discussions (30 messages🔥):

Invisible zero width space characters, Custom GPT link output issues, GPT memory and context limitations, Contradictions in GPT responses, User Memory feature challenges


OpenAI ▷ #prompt-engineering (1 messages):

o3-mini, owl-palm tree riddle


OpenAI ▷ #api-discussions (1 messages):

o3-mini, owl-palm tree riddle


LM Studio ▷ #general (247 messages🔥🔥):

DeepSeek R1 Models, LM Studio Functionality, RAG Implementation and Performance, User Experience with LLMs, Learning Resources for LLM Optimization

Links mentioned:


LM Studio ▷ #hardware-discussion (152 messages🔥🔥):

LLM Inference Speed, Hardware Requirements for DeepSeek, Using Models on Apple Silicon, Performance of ML Models, Handling CSV Data with LLMs


aider (Paul Gauthier) ▷ #general (329 messages🔥🔥):

DeepSeek API Issues, Qwen 2.5 Max, Sonnet as Editor, Test Driven Development (TDD), Pricing and Spending on AI Models

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (56 messages🔥🔥):

Aider context and file management, Model performance and speed, Using conventions for code style, Architect Mode workflow, Troubleshooting token limits

Links mentioned:


aider (Paul Gauthier) ▷ #links (1 messages):

apcameron: Have a look at this project. https://github.com/huggingface/open-r1


Perplexity AI ▷ #announcements (2 messages):

Sonar Reasoning API, DeepSeek R1 on Mac App


Perplexity AI ▷ #general (316 messages🔥🔥):

DeepSeek R1 queries, Perplexity Pro subscription, Model availability and usage, API key and usage, Web and iOS features

Links mentioned:


Perplexity AI ▷ #sharing (13 messages🔥):

Java 23 SDK Update, DeepSeek vs OpenAI O1, F-35 Fighter Jet Incident, Leafcutter Ants Cultivation, Alibaba's New Model

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (10 messages🔥):

Sonar Reasoning performance, Feedback on reasoning search, Sonar model specifications, Issues with reasoning outputs, Sources and citations


Nous Research AI ▷ #general (298 messages🔥🔥):

MoE models performance, Nous Research funding, DeepSeek R1 availability, AI reasoning and output quality, Speculation on stock predictions

Links mentioned:


Nous Research AI ▷ #ask-about-llms (6 messages):

Olama, Local AI Model Options, CLI vs GUI


Nous Research AI ▷ #research-papers (2 messages):

Mixture-of-Experts Models, Autonomy-of-Experts Paradigm

Link mentioned: Autonomy-of-Experts Models: Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation b...


Nous Research AI ▷ #interesting-links (1 messages):

tudorboto: "Intel i5/AMD Ryzen 5 or mightier", does an M3 go in the "mightier" category?


Nous Research AI ▷ #research-papers (2 messages):

Mixture-of-Experts models, Autonomy-of-Experts paradigm, Router decision-making in MoE

Link mentioned: Autonomy-of-Experts Models: Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation b...


Codeium (Windsurf) ▷ #discussion (87 messages🔥🔥):

Windsurf account issues, DeepSeek integration, Codeium extension setup, User experience concerns, Flex credits and pricing

Links mentioned:


Codeium (Windsurf) ▷ #windsurf (193 messages🔥🔥):

Issues with Windsurf performance, Sonnet LLM criticism, Cascade functionality problems, User frustrations with AI assistance, Feedback on pricing and value

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

DeepSeek R1, Chutes, Perplexity's Sonar, Sonar-Reasoning

Link mentioned: DeepSeek R1 - API, Providers, Stats: DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...


OpenRouter (Alex Atallah) ▷ #general (277 messages🔥🔥):

OpenRouter User Experiences, DeepSeek Model Performance, Model Communication and Pricing, Image Generation Discussions, Translation Model Recommendations

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (53 messages🔥):

DeepSeek Database Exposure, Dario Amodei's Thoughts on AI Models, Community Reactions to Model Performance, Analysis of R1 and R1-Zero Models, Concerns about AI Model Transparency

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (24 messages🔥):

OpenAI's lockdown mode, Concerns over O3 launch timing, Meta's interest in DeepSeek, Grok3 development, Model pricing issues

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (41 messages🔥):

DeepSeek R1, Llama 4 Development, Grok 3 and O3-mini Release, ChatGPT Revenue Insights, Vulnerability Reports

Links mentioned:


Interconnects (Nathan Lambert) ▷ #memes (10 messages🔥):

Flawed Benchmark Debate, Zizek Voice Interpretation, Colorful Language in AGI Discourse

Link mentioned: Tweet from Teortaxes▶️ (DeepSeek🐳 Cheerleader since 2023) (@teortaxesTex): tbh I hate SV AGI bros. Their whorish mystifications and clinging to minor technical secrets. Their creepy need to lull people into false sense of safety, then terrify with visions of AGI doom. Their ...


Interconnects (Nathan Lambert) ▷ #reads (35 messages🔥):

DeepSeek's Papers, Liang Wenfeng Interview, Mixture-of-Experts (MoE), Multi-Token Prediction (MTP), Deepseek v2 and v3 Papers

Links mentioned:


Interconnects (Nathan Lambert) ▷ #posts (42 messages🔥):

DeepSeek's impact, OpenAI's formal math direction, LLMs as verifiers, Community engagement around reasoning models

Link mentioned: Tweet from Nathan Lambert (@natolambert): Why reasoning models will generalizeDeepSeek R1 is just the tip of the ice berg of rapid progress. People underestimate the long-term potential of “reasoning.”https://buff.ly/4haoAtt


Interconnects (Nathan Lambert) ▷ #policy (24 messages🔥):

DeepSeek IP Concerns, ChatGPT Token Usage, Inference Cost of AI Models, Export Restrictions on AI Chips

Links mentioned:


Cursor IDE ▷ #general (219 messages🔥🔥):

DeepSeek Updates, Cursor IDE Bugs, Model Comparisons, Usage Limitations, User Experiences

Links mentioned:


Yannick Kilcher ▷ #general (180 messages🔥🔥):

Softmax Variations, Deep Reinforcement Learning Challenges, RTX 5090 Release Discussions, Performance Metrics in AI, Community Engagement Issues

Links mentioned:


Yannick Kilcher ▷ #paper-discussion (15 messages🔥):

DeepSeek claims, OpenAI vs DeepSeek, Data usage controversy, Model distillation debates, Cyber attack implications

Link mentioned: OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole From Us: OpenAI shocked that an AI company would train on someone else's data without permission or compensation.


Yannick Kilcher ▷ #agents (3 messages):

PydanticAI, Qwen2 VL performance, Multimodal model advantages


Yannick Kilcher ▷ #ml-news (14 messages🔥):

DeepSeek AI technologies, O3-mini launch, AI computing trends, Claude 3.5 training cost, Italy's regulation on AI

Links mentioned:


Eleuther ▷ #general (60 messages🔥🔥):

Mordechai Rorvig's Book Project, Protein-Ligand Binding Research, Test Time Compute Models, Generative Models for Molecules, DeepSeek Architecture and Inference Framework

Links mentioned:


Eleuther ▷ #research (82 messages🔥🔥):

High Update Ratio Tricks in RL, Min-P Sampling Method, Exploration vs. Exploitation in RL, Fastfood Transform in Kernel Methods, Generalization in SFT vs. RL

Links mentioned:


Eleuther ▷ #interpretability-general (55 messages🔥🔥):

Generalization Benchmarking, Sparse Autoencoders, Seed Dependency in ML Models, Robustness of Initialization, Mechanistic Permutability

Links mentioned:


Eleuther ▷ #gpt-neox-dev (5 messages):

Vocabulary Size Configuration, Intermediate Size Logic, Model Export Size Mismatch, Optimizer Configuration Issues

Link mentioned: EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries - EleutherAI/gpt-neox


GPU MODE ▷ #general (12 messages🔥):

GPU Direct Storage, Tensor Weight Compression, Memory Snapshotting

Links mentioned:


GPU MODE ▷ #cuda (19 messages🔥):

CUDA type punning, RTX Blackwell architecture, Memory alignment in CUDA, Memcopy performance optimization

Link mentioned: NVIDIA GeForce RTX 5090 Graphics Cards: Powered by the NVIDIA Blackwell architecture.


GPU MODE ▷ #torch (8 messages🔥):

PyTorch on GB200s, Container availability for PyTorch, Merging PRs permissions, Scaled MM API

Link mentioned: Scaled MM API: Scaled MM API. GitHub Gist: instantly share code, notes, and snippets.


GPU MODE ▷ #announcements (1 messages):

GTC 2023, CUDA Developer Meetup, Low Level Technical Track for CUDA Programming, GPU MODE Event at GTC

Links mentioned:


GPU MODE ▷ #cool-links (40 messages🔥):

Tom Yeh's Multi-Head Attention Lecture, FP4 Training Framework for LLMs, Microscaling in DeepSeek, Llama Training Codebase

Links mentioned:


GPU MODE ▷ #beginner (10 messages🔥):

Working group suggestions, Training models for chess LLM, Collaborators on HF server, DiT training run completion


GPU MODE ▷ #bitnet (1 messages):

leiwang1999_53585: we'll add some examples of bwd kernels 🙂


GPU MODE ▷ #self-promotion (1 messages):

Llama training, Minimal codebase

Link mentioned: GitHub - ahxt/speed_llama3: Contribute to ahxt/speed_llama3 development by creating an account on GitHub.


GPU MODE ▷ #thunderkittens (1 messages):

Thunderkitten community enthusiasm, Hardware feature support requests, Distributed Shared Memory (DSM), Threadblock to SM scheduling, FlexAttention blog


GPU MODE ▷ #arc-agi-2 (89 messages🔥🔥):

Dynamic Evaluation in Reasoning Tasks, Chess Puzzles and Strategic Reasoning, Wikipedia Game Proposal, Explainability in AI, Utilizing Inference Engines for Training

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (120 messages🔥🔥):

ComfyUI vs Forge, Model Performance and Workflows, Image Generation Tools, User Interface Preferences, Character Generation in Stable Diffusion

Links mentioned:


Stackblitz (Bolt.new) ▷ #announcements (1 messages):

Bolt updates, Export and Import Handling

Link mentioned: Tweet from bolt.new (@boltdotnew): Bolt 🧠 update: Smart Imports'export default' might not be the most thrilling part of your codebase. But it is important!The latest update in Bolt's engine ensures that all imports and exp...


Stackblitz (Bolt.new) ▷ #prompting (2 messages):

Backend suggestions, Firebase learning experience


Stackblitz (Bolt.new) ▷ #discussions (110 messages🔥🔥):

GitHub OAuth Disconnection, Bolt App Development Support, Token Usage in Bolt, Error Handling in Bolt, Custom Domains with Supabase and Netlify

Links mentioned:


MCP (Glama) ▷ #general (74 messages🔥🔥):

Goose Client Impressions, MCP Server for Google Sheets, DeepSeek Integration, Ideal LLM Client Features, Collaborative Development of LLM Tools

Links mentioned:


MCP (Glama) ▷ #showcase (20 messages🔥):

Codename Goose, lüm AI for mental health, mcp-agent framework, Show HN trending, Google integration agents

Links mentioned:


Nomic.ai (GPT4All) ▷ #general (82 messages🔥🔥):

DeepSeek R1 Distill models, CUDA and CPU performance, Template optimization for DeepSeek, LM Studio usage, Acknowledge new R1 releases

Links mentioned:


Notebook LM Discord ▷ #use-cases (11 messages🔥):

Using NotebookLM for Environmental Engineering, Risks of Using NotebookLM as a Repository, Converting Notes to Source for Data Comparison, Maximum File Size Limits in NotebookLM

Link mentioned: Frequently Asked Questions - NotebookLM Help: no description found


Notebook LM Discord ▷ #general (70 messages🔥🔥):

NotebookLM Button Issues, Documentation Limit Queries, Audio Podcast Capabilities, Using LinkedIn Profiles as Sources, Translating Notes and Audio

Links mentioned:


Latent Space ▷ #ai-general-chat (60 messages🔥🔥):

DeepSeek's R1-Zero, Huawei chips usage, OpenAI revenue dynamics, Sourcegraph enterprise agents, Microsoft Copilot rollout

Links mentioned:


Cohere ▷ #discussions (17 messages🔥):

Welcome to New Regulars, Color Change Excitement, Event Awareness, Appreciation for Cohere Designers, Community Engagement


Cohere ▷ #api-discussions (4 messages):

command-r-plus model issues, Model version specifications, User experience with model changes


Cohere ▷ #cmd-r-bot (6 messages):

Safety Modes Overview, Contextual Safety Mode, Strict Safety Mode, No Safety Mode, Cohere Documentation Links


Cohere ▷ #projects (12 messages🔥):

Rerveting Efforts Reasoning Prompt, Aya 8b, Markdown Formatting, Clipboard Management, Image Analysis


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (20 messages🔥):

Certificate eligibility for non-students, Hackathon availability, Group participation in application track, Project policy details, MOOC curriculum clarifications

Link mentioned: no title found: no description found


LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (5 messages):

Lecture Transcripts, Lecture Slides, Stake Airdrop

Links mentioned:


LLM Agents (Berkeley MOOC) ▷ #mooc-readings-discussion (1 messages):

Stake Airdrop, Rewards for Stakers, Limited-time Event

Link mentioned: no title found: no description found


Modular (Mojo 🔥) ▷ #general (6 messages):

Modular as a tools company, PyTorch community engagement, Channel sharing etiquette


Modular (Mojo 🔥) ▷ #announcements (2 messages):

Discord Changes, Branch Changes


Modular (Mojo 🔥) ▷ #mojo (10 messages🔥):

Mojo LSP Server Parameters, Mojo Mention in TIOBE, VS Code Extension Features, Mojo Roadmap Update


Torchtune ▷ #general (2 messages):

Office Hours Announcement, Upcoming features discussion, Library Improvements, Incentive with Banana Bread


Torchtune ▷ #dev (13 messages🔥):

DPO metrics aggregation, TRL vs. Torchtune debugging, Loss normalization in DPO, Open PR for DPO metrics, Community debugging efforts

Link mentioned: pytorch/torchtune: PyTorch native post-training library. Contribute to pytorch/torchtune development by creating an account on GitHub.


Torchtune ▷ #papers (2 messages):

Imagen, Image2Txt, Chatbot


Axolotl AI ▷ #general (8 messages🔥):

Multi-Turn KTO, RLHF new member assignment, NeurIPS manuscript, Axolotl usage challenges


LlamaIndex ▷ #blog (2 messages):

Agentic web scraping, Multimodal financial report generation


LlamaIndex ▷ #general (5 messages):

GUI Differences, LlamaCloud Waitlist, Confluence DataSource Grayed Out


MLOps @Chipro ▷ #events (1 messages):

MLOps Workshop, Feature Store on Databricks, Databricks and Unity Catalog, Featureform, Best Practices in Feature Engineering

Link mentioned: MLOps Workshop: Building a Feature Store on Databricks: Join our 1-hr webinar with Featureform's founder to learn how to empower your data by using Featureform and Databricks!


MLOps @Chipro ▷ #general-ml (3 messages):

AI replacing developers, Perception of AI advancements, AI wrappers improvement


DSPy ▷ #papers (1 messages):

Auto-Differentiation in LLMs, Manual Prompting, LLM Workflows


DSPy ▷ #general (1 messages):

scruffalubadubdub: Ayeee merged an hr ago. Thank you


OpenInterpreter ▷ #general (2 messages):

Goose Overview, Goose Features, User Feedback on Goose

Link mentioned: codename goose | codename goose: Your open source AI agent, automating engineering tasks seamlessly.


tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):

Learn Git Branching Style for Tinygrad, Tinygrad Basics, Coding Puzzles, Code Architecture

Link mentioned: Learn Git Branching: An interactive Git visualization tool to educate and challenge!


LAION ▷ #general (1 messages):

spirit_from_germany: How is it going? 🙂





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}