Frozen AI News archive

not much happened today

**Huawei chips** are highlighted in a diverse AI news roundup covering **NVIDIA's** stock rebound, new open music foundation models like **Local Suno**, and competitive AI models such as **Qwen 2.5 Max** and **Deepseek V3**. The release of **DeepSeek Janus Pro**, a multimodal LLM with image generation capabilities, and advancements in **reinforcement learning** and **chain-of-thought reasoning** are noted. Discussions include GPU rebranding with **NVIDIA's H6400 GPUs**, data center innovations, and enterprise AI applications like crypto APIs in hedge funds. *"Deepseek R1's capabilities"* and *"Qwen 2.5 models added to applications"* are key highlights.

Canonical issue URL

AI News for 1/27/2025-1/28/2025. We checked 7 subreddits, 433 Twitters and 34 Discords (225 channels, and 6553 messages) for you. Estimated reading time saved (at 200wpm): 656 minutes. You can now tag @smol_ai for AINews discussions!

no title story but a bunch of small ones


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Developments and Comparisons

Reinforcement Learning and Reasoning

AI Infrastructure and Compute

AI in Enterprises and Applications

Open-source AI and API Integrations

AI Infrastructure and Compute


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. DeepSeek-R1 Runs Inference on Huawei's 910C Chips

Theme 2. DeepSeek-R1: Efficient Training Costs Explored

Theme 3. DeepSeek Censorship: A Comparative Analysis

Theme 4. Janus Pro 1B: In-browser Multimodal AI Innovation

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. DeepSeek R1 Challenges OpenAI’s Reinforcement Learning Dominance

Theme 2. DeepSeek R1 Censorship Sparking Debates on Bias

Theme 3. Government Integration: OpenAI's ChatGPT Gov Announcement

Theme 4. DeepSeek Training Cost Controversy: $6 Million Claim Dissected


AI Discord Recap

A summary of Summaries of Summaries by o1-preview-2024-09-12

Theme 1: DeepSeek R1 Shakes the AI World

Theme 2: Qwen's New Models Take Center Stage

Theme 3: AI Reasoning Models and Open-Source Innovations

Theme 4: AI Hardware and Infrastructure Under Spotlight

Theme 5: User Challenges and Experiences with AI Tools


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


Perplexity AI Discord


aider (Paul Gauthier) Discord


Cursor IDE Discord


OpenAI Discord


Nous Research AI Discord


LM Studio Discord


Yannick Kilcher Discord


Codeium (Windsurf) Discord


OpenRouter (Alex Atallah) Discord


Eleuther Discord


Interconnects (Nathan Lambert) Discord


Stackblitz (Bolt.new) Discord


Stability.ai (Stable Diffusion) Discord


MCP (Glama) Discord


Latent Space Discord


Notebook LM Discord Discord


GPU MODE Discord


LLM Agents (Berkeley MOOC) Discord


Nomic.ai (GPT4All) Discord


Torchtune Discord


LlamaIndex Discord


Modular (Mojo 🔥) Discord


tinygrad (George Hotz) Discord


Cohere Discord


LAION Discord


Axolotl AI Discord


OpenInterpreter Discord


Gorilla LLM (Berkeley Function Calling) Discord


DSPy Discord


MLOps @Chipro Discord


Mozilla AI Discord


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Unsloth AI (Daniel Han) ▷ #general (1010 messages🔥🔥🔥):

Dynamic Quantization of DeepSeek R1, DeepSeek Model Parameters, Training and Fine-Tuning Models, Ollama Compatibility, Quantization Effects on Performance

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (29 messages🔥):

Unsloth vs Unclothe, NVIDIA and market reactions, Federated Learning and asynchronous training, AI voices ethically shared, Development of ryfai app

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (201 messages🔥🔥):

Issues running Unsloth on various models, Model fine-tuning processes, Quantization and deployment techniques, Using datasets with Unsloth models, Troubleshooting errors in Unsloth setup

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (1 messages):

DeepSeek, Operator gist

Link mentioned: DeepSeek-R1 Mastery: Build Your Ultimate AI Assistant: DeepSeek-R1 Mastery: Build Your Ultimate AI Assistant - DeepSeekR1AssistantAICareerPathDev.md


Unsloth AI (Daniel Han) ▷ #research (13 messages🔥):

Embeddings and Vector Precision, Azure OpenAI Assistants Code Interpreter, Azure Databricks AI Agent Framework, ReAct Agents with Code Sandbox, Sandbox Execution Environments

Links mentioned:


Perplexity AI ▷ #general (624 messages🔥🔥🔥):

Deepseek R1 limitations, Comparison with OpenAI models, Deepseek performance and usage, Perplexity app updates, User experiences with AI models

Links mentioned:


Perplexity AI ▷ #sharing (17 messages🔥):

AI-Developed Drugs, JFK and MLK Files Declassified, Epic Games Disabling, Custom Enclosures, Plantin Typeface

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (1 messages):

Sonar response issues, Cost difference between Sonar and Sonar-Pro


aider (Paul Gauthier) ▷ #general (401 messages🔥🔥):

DeepSeek API issues, Using Aider with models, Qwen 2.5-Max, Groq performance, Token usage and cost

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (128 messages🔥🔥):

Deepseek API issues, Ollama model usage, Aider configuration, Benchmarking LLMs, ChatGPT integration

Links mentioned:


Cursor IDE ▷ #general (517 messages🔥🔥🔥):

DeepSeek R1 vs V3, Cursor updates, Experiences with coding models, Quantization effects, Using different AI models

Links mentioned:


OpenAI ▷ #ai-discussions (466 messages🔥🔥🔥):

DeepSeek vs. OpenAI, AI Consciousness Debate, Censorship in AI, Competitive Pricing in AI Models, User Experiences with AI Models

Links mentioned:


OpenAI ▷ #gpt-4-discussions (2 messages):

Custom GPTs URL Output, Using Zero Width Space for Links


OpenAI ▷ #prompt-engineering (21 messages🔥):

Feeding content to models, Impersonating authors, Using AI for advanced search, Cost and time for training AI, Believability of model's answers


OpenAI ▷ #api-discussions (21 messages🔥):

Feeding content to AI, Impersonating authors, Using AI as an advanced search tool, Challenges in training models, Costs and time for training AI


Nous Research AI ▷ #general (496 messages🔥🔥🔥):

Nous Psyche, DeepSeek Models, Reasoning in AI, Stock Predictions, Business Applications of AI

Links mentioned:


Nous Research AI ▷ #ask-about-llms (4 messages):

Developing a Local AI Assistant, Learning Resources for AI Development, Optimizing Learning Velocity


Nous Research AI ▷ #interesting-links (7 messages):

Qwen2.5-VL model, YuE music generation model, AI assistants explained, Deepseek and Operator usage

Links mentioned:


LM Studio ▷ #general (308 messages🔥🔥):

DeepSeek R1 Distilled Models, Model Performance and Comparison, Quantization Techniques, Tooling and Web Browsing Capabilities, Model Compatibility with Hardware

Links mentioned:


LM Studio ▷ #hardware-discussion (96 messages🔥🔥):

DeepSeek-R1 Model Performance, GPU Detection Issues, SSD NVMe Speed Impacts, Best Specs for 70B Model, Unified RAM on Apple Devices

Links mentioned:


Yannick Kilcher ▷ #general (282 messages🔥🔥):

DeepSeek and model performance, Data privacy concerns with AI, VRAM requirements for large models, Benchmark manipulation in AI research, Trends in AI model development

Links mentioned:


Yannick Kilcher ▷ #paper-discussion (47 messages🔥):

Janus-Pro Release, Qwen2.5-VL Launch, DeepSeek Advancements, Emu Learning Algorithms, Quantized Model Development

Links mentioned:


Yannick Kilcher ▷ #ml-news (24 messages🔥):

DeepSeek's Janus-Pro Model, Trump's Tariffs on Chips, Qwen 2.5 Model, Mistral Acquisition Rumors, AI Data Protection in Italy

Links mentioned:


Codeium (Windsurf) ▷ #discussion (103 messages🔥🔥):

Cascade Errors, Windsurf Authentication Issues, Type-Checking Integration, DeepSeek Model Integration, Credit Management Concerns

Links mentioned:


Codeium (Windsurf) ▷ #windsurf (189 messages🔥🔥):

Windsurf Cascade Issues, DeepSeek Model Addition, User Prompt Credits Explanation, Errors and Internal Problems in Cascade, Cascade Base Model Functionality

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Amazon Nova models, Amazon Bedrock operational issues, Model availability


OpenRouter (Alex Atallah) ▷ #general (255 messages🔥🔥):

Deepseek Provider Issues, Gemini Video Support, Model Speed Comparisons, OpenRouter Usage, Provider Pricing Context

Links mentioned:


Eleuther ▷ #general (79 messages🔥🔥):

GRPO implementation discussions, DeepSeek training cost analysis, LLM reasoning abilities, Job opportunities in LLM research, Neuroscience and AI interpretations

Links mentioned:


Eleuther ▷ #research (112 messages🔥🔥):

GRPO and Momentum Matrices, Model-Based Reinforcement Learning, Muesli Method Comparisons, YuE Music Generation Model, Privileged Bases in Transformers

Links mentioned:


Eleuther ▷ #scaling-laws (2 messages):

Compute and Curvature, Scaling Impacts, Inductive Bias in Parametric Models


Eleuther ▷ #lm-thunderdome (5 messages):

scbench, zeroSCROLLS, longbench, LM Evaluation Harness, MLX methods


Eleuther ▷ #multimodal-general (1 messages):

Janus flow paper, Rectified flow objective, Image generation tasks


Interconnects (Nathan Lambert) ▷ #news (59 messages🔥🔥):

DeepSeek V3 and competition, OpenAI's new offerings, Qwen licensing and development, ChatGPT Gov announcement, Open Thoughts project and partnerships

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-questions (5 messages):

DeepSeek optimizations, CUDA limitations, Reinforcement learning, DeepSeek implications, Technical report findings

Link mentioned: DeepSeek FAQ: DeepSeek has completely upended people’s expectations for AI and competition with China. What is it, and why does it matter?


Interconnects (Nathan Lambert) ▷ #ml-drama (18 messages🔥):

Liang Wenfeng Meme Game, DeepSeek vs Qwen Team Dynamics, Meme Coin Launch Speculation, Qwen2.5-Max Model Release, AI Community Discussions

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (36 messages🔥):

DeepSeek R1 Launch, Qwen 2.5-Max Performance, AI Customer Applications, Laptop Purchase Decision, Influencer Impact on Academic Figures

Links mentioned:


Interconnects (Nathan Lambert) ▷ #memes (10 messages🔥):

Deepseek, Google release delay, AI job market, ChatGPT, AI misconceptions

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rl (5 messages):

Open-Instruct integration, vLLM maintenance, OpenRLHF framework


Interconnects (Nathan Lambert) ▷ #cv (5 messages):

Qwen Licensing Issues, Qwen Model Variants


Interconnects (Nathan Lambert) ▷ #reads (13 messages🔥):

DeepSeek-R1 release, AI model comparisons, Hawaii's military population, Jay Alammar's illustrations, Challenges in understanding complex models

Link mentioned: The Illustrated DeepSeek-R1: A recipe for reasoning LLMs


Interconnects (Nathan Lambert) ▷ #posts (40 messages🔥):

DeepSeek's Mainstream Attention, OpenAI's Formal Math Direction, LLMs as Verifiers in Math, Upcoming Post by Nat Lambert

Link mentioned: Tweet from Nathan Lambert (@natolambert): Why reasoning models will generalizeDeepSeek R1 is just the tip of the ice berg of rapid progress. People underestimate the long-term potential of “reasoning.”https://buff.ly/4haoAtt


Stackblitz (Bolt.new) ▷ #announcements (1 messages):

Terminal Errors Detection, Bolt Integration

Link mentioned: Tweet from bolt.new (@boltdotnew): Bolt 🧠 update: Terminal Errors DetectionSome errors are hard to catch: they happen in the terminal where we don't often look.Bolt is now tightly integrated with your app's development environ...


Stackblitz (Bolt.new) ▷ #prompting (4 messages):

Improver Prompt Limitations, Frontend Prototype Constraints, User Experience with Prompt Improver


Stackblitz (Bolt.new) ▷ #discussions (135 messages🔥🔥):

Stripe Integration Challenges, Workflow and Project Management with Bolt, AI for Title Generation from Images, Node Version Updates in Bolt, Community Support and Collaboration

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (138 messages🔥🔥):

Janus Model Opinions, AMD Support for Stable Diffusion, Hardware Recommendations for AI Work, Upscalers in Stable Diffusion, Deepseek Model Comparisons

Links mentioned:


MCP (Glama) ▷ #general (112 messages🔥🔥):

Goose Client, MCP Server Issues, DeepSeek Pricing, Integration with Home Assistant, Token Usage Monitoring

Links mentioned:


Latent Space ▷ #ai-general-chat (96 messages🔥🔥):

Qwen 2.5-Max Launch, DeepSeek R1 vs. Qwen 2.5, Open Source AI Developments, TSMC Tariffs Impact on AI, Huawei Chips in AI Applications

Links mentioned:


Notebook LM Discord ▷ #announcements (1 messages):

NotebookLM Collaboration Features, User Feedback, Product Interviews, Survey Participation

Link mentioned: Sharing and collaboration in NotebookLM: Hello! Thanks for filling out this form. We are looking to learn a bit more about how to make the sharing and collaboration functionality in NotebookLM better and we would love to hear from you! If yo...


Notebook LM Discord ▷ #use-cases (17 messages🔥):

NotebookLM customization, DeepSeek AI advancements, Voice synthesis inconsistencies, Using large documents with LLM, Character traits in AI prompts

Links mentioned:


Notebook LM Discord ▷ #general (77 messages🔥🔥):

User Role Clarification, Podcast Features and Limitations, NotebookLM Language and Export Issues, Gemini and Audio Generation, Citation and Reference Management

Links mentioned:


GPU MODE ▷ #general (22 messages🔥):

Minimizing Startup Times for LLMs, Optimizations for Model Loading, Utilizing Modal's RAM Snapshots, GPU Direct Storage (GDS) Considerations, Torch Distributed Package

Links mentioned:


GPU MODE ▷ #cuda (16 messages🔥):

Grace Hopper architecture, Jupyter Lab setup issues, CUDA pointer alignment, H100 PCIe/SXM card, GH200 rental rates

Link mentioned:

  PyTorch

: no description found


GPU MODE ▷ #torch (9 messages🔥):

FP8 Conversion, FP8 Stochastic Rounding, PyTorch GB200 Support, CUDA 12.8 Compatibility

Links mentioned:


GPU MODE ▷ #cool-links (2 messages):

DeepSeek-R1 Release, Matrix Multiplication Challenge

Links mentioned:


GPU MODE ▷ #bitnet (1 messages):

Tile Lang, BitBLAS repo


GPU MODE ▷ #thunderkittens (1 messages):

ThunderKittens Improvements, Testing Kernel Performance, Generalization of Tests

Link mentioned: ThunderKittens/kernels/torch_scaled/gentests.py at main · HazyResearch/ThunderKittens: Tile primitives for speedy kernels. Contribute to HazyResearch/ThunderKittens development by creating an account on GitHub.


GPU MODE ▷ #arc-agi-2 (38 messages🔥):

Reasoning Gym Pull Request, Apple Dataset License Concerns, Generating Templates for GSM8K, OpenRLHF Trial Runs, Multi-Licensing and Copyright

Links mentioned:


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (60 messages🔥🔥):

Fall Semester Class SP24, Lecture Slides Availability, Research Track Eligibility, Hackathon Information, Application Track Collaborations

Link mentioned: CS294/194-280 Advanced Large Language Model Agents: Spring 2025


LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (12 messages🔥):

Slide Deck Availability, YouTube Lecture Feedback, Using NotebookLM for Research Tools

Links mentioned:


Nomic.ai (GPT4All) ▷ #general (42 messages🔥):

Chat Template Errors, DeepSeek Implementation, GPT4All Roadmap, Model Options in GPT4ALL, LocalDocs File Uploads

Links mentioned:


Torchtune ▷ #dev (31 messages🔥):

Running Distributed Recipes, Issues with 'torchrun', Distributed Init Protocols, Multinode Setup on Mac, Debugging Torch Distributed

Links mentioned:


Torchtune ▷ #papers (1 messages):

Model Comparisons, Image Analysis


LlamaIndex ▷ #blog (2 messages):

DeepSeek-R1 API Integration, SOFTIQ SaaS App, Tender Analysis Efficiency

Link mentioned: DeepSeek - LlamaIndex: no description found


LlamaIndex ▷ #general (18 messages🔥):

LlamaReport Documentation, Pull Request Review, RAG Retrieval in Reasoning Models, FastAPI Event Streaming

Links mentioned:


Modular (Mojo 🔥) ▷ #general (6 messages):

Documentation Status, Deepseek vs Modular Debate


Modular (Mojo 🔥) ▷ #announcements (1 messages):

MAX repo changes, Mojo repo updates


Modular (Mojo 🔥) ▷ #mojo (10 messages🔥):

Documentation Outage, Mojo Code Issues, Garbage References in Code, Callback Capturing Behavior, String Captures Clobbered

Link mentioned: tree.mojo: GitHub Gist: instantly share code, notes, and snippets.


tinygrad (George Hotz) ▷ #general (6 messages):

tinygrad PR 8781, Python CUDA Emulator for FP8, MathTrait and SimpleMathTrait Unification, Tests for View.stride() and View.flip(), Bounty questions

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (10 messages🔥):

Tensor.isclose and Tensor.allclose methods, Negative stride interpretation, Git Branching tutorials for tinygrad

Link mentioned: Learn Git Branching: An interactive Git visualization tool to educate and challenge!


Cohere ▷ #discussions (2 messages):

Greetings in the Discord, User Interaction


Cohere ▷ #api-discussions (8 messages🔥):

Model Response Quality, Error 500 from Classify Endpoint, Finetuned Model Details, Model Versioning, Command R+ Model Updates


LAION ▷ #general (6 messages):

Updating speech parameter settings, AI agent consultancy

Link mentioned: Google Colab: no description found


LAION ▷ #research (1 messages):

Compute budget claims, MoE training efficiency, Llama3 GPU hours comparison


Axolotl AI ▷ #general (4 messages):

H200 Sale Strategy, Multi Turn Kto Discussion


OpenInterpreter ▷ #general (1 messages):

OpenInterpreter Skills, Import Skills Configuration


OpenInterpreter ▷ #O1 (2 messages):

API base functionality, Source code modifications


Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (2 messages):

System prompts injection, Weights and biases for tracing

Link mentioned: gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/constant.py at main · ShishirPatil/gorilla: Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls) - ShishirPatil/gorilla


DSPy ▷ #general (1 messages):

GitHub PR for DSPy, Poetry lock issue

Link mentioned: Fix poetry lock by chenmoneygithub · Pull Request #6755 · stanfordnlp/dspy: resolve #6644


MLOps @Chipro ▷ #events (1 messages):

DeepSeek Performance, Cost Comparison with ChatGPT, Live Workshop, Real App Building

Link mentioned: What's the hype about DeepSeek?🐬 · Zoom · Luma: The AI world is in a frenzy! A new open-source model from China is outperforming ChatGPT and Claude in benchmarks, and it's 20-30 times cheaper. Is this the…


Mozilla AI ▷ #announcements (1 messages):

FOSDEM 2025, Open-source collaboration



{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}