Frozen AI News archive

Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking

**Google DeepMind** officially launched **Gemini 2.0** models including **Flash**, **Flash-Lite**, and **Pro Experimental**, with **Gemini 2.0 Flash** outperforming **Gemini 1.5 Pro** while being **12x cheaper** and supporting **multimodal input** and a **1 million token context window**. **Andrej Karpathy** released a **3h31m** video deep dive into **large language models**, covering **pretraining**, **fine-tuning**, and **reinforcement learning** with examples like **GPT-2** and **Llama 3.1**. A free course on **Transformer architecture** was introduced by **Jay Alammar**, **Maarten Gr**, and **Andrew Ng**, focusing on **tokenizers**, **embeddings**, and **mixture-of-expert models**. **DeepSeek-R1** reached **1.2 million downloads** on **Hugging Face** with a detailed **36-page technical report**. **Anthropic** increased rewards to **$10K** and **$20K** for their jailbreak challenge, while **BlueRaven** extension was updated to hide Twitter metrics for unbiased engagement.

Canonical issue URL

AI News for 2/4/2025-2/5/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (210 channels, and 5481 messages) for you. Estimated reading time saved (at 200wpm): 571 minutes. You can now tag @smol_ai for AINews discussions!

Gemini 2.0 has been "here" since December (our coverage here), but now we can officially count Gemini 2.0 Flash's prices as "real", and put them up on our Pareto frontier chart:

image.png

We will grant that raw intelligence charts like those mean increasingly less and will probably die this year because they cannot accurately describe the multimodal input AND output capabilities of these releases, nor coding ability, nor the 1-2m long context, as Sundar Pichai demonstrates:

image.png

Of particular note is the cost effectiveness of the new "Flash Lite", as well as the very slight price hike that Gemini 2.0 Flash has vs 1.5 Flash.

Curiously enough, the competitive dynamics of OpenAI "mogging" Google releases seem to have stayed in 2024.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. DeepSeek VL2 Small Launch and R1's Benchmark Success

Theme 2. Google's AI Policy Shift on Weapons and Surveillance Use

Theme 3. Gemma 3 Announcement and Community Reactions

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Nvidia's CUDA Strategy: Catalyst to AI's Evolution

Theme 2. ByteDance and Google Advance AI Frontiers

Theme 3. Debating Open Source in AI: A Look at DeepSeek and More


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. Gemini 2.0 Model Family: Performance and Integration

Theme 2. Coding IDEs and AI Assistants: Feature Comparisons and User Feedback

Theme 3. Advanced Model Training and Optimization Techniques

Theme 4. Open Source and Community in AI Development

Theme 5. Reasoning Model Benchmarks and Performance Analysis


PART 1: High level Discord summaries

aider (Paul Gauthier) Discord


Unsloth AI (Daniel Han) Discord


Codeium (Windsurf) Discord


Stability.ai (Stable Diffusion) Discord


Cursor IDE Discord


Perplexity AI Discord


Eleuther Discord


OpenRouter (Alex Atallah) Discord


Interconnects (Nathan Lambert) Discord


LM Studio Discord


HuggingFace Discord


Yannick Kilcher Discord


OpenAI Discord


Nous Research AI Discord


Modular (Mojo 🔥) Discord


Notebook LM Discord


Torchtune Discord


Latent Space Discord


Nomic.ai (GPT4All) Discord


MCP (Glama) Discord


GPU MODE Discord


LlamaIndex Discord


LLM Agents (Berkeley MOOC) Discord


Cohere Discord


tinygrad (George Hotz) Discord


Gorilla LLM (Berkeley Function Calling) Discord


DSPy Discord


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

aider (Paul Gauthier) ▷ #general (827 messages🔥🔥🔥):

Aider and deep learning models, Model performance comparisons, Integration of AI tools, Project management with AI, Limitations of LLMs in complex tasks

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (53 messages🔥):

Aider Configuration Issues, OpenRouter Compatibility, Using Multiple Models, Git Commit Issues with Aider, Running Commands with Aider

Links mentioned:


aider (Paul Gauthier) ▷ #links (1 messages):

epicureus: gemini 2.0 on lmsys https://lmarena.ai


Unsloth AI (Daniel Han) ▷ #general (537 messages🔥🔥🔥):

Dynamic Quantization, Using DeepSeek Locally, GRPO Training, Model Comparison, Layer Quantization Strategy

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (17 messages🔥):

Regex for Dates, Uploading Jupyter Notebooks, LLM Model Breakdown, Blogging Platforms for LLM, GRPO Support

Link mentioned: Google Colab: no description found


Unsloth AI (Daniel Han) ▷ #help (94 messages🔥🔥):

Instructions for Using Unsloth Models, DeepSeek in Oobagooba, Training Configuration Suggestions, Dynamic Quantization in LLMs, Using vLLM and SGLang for Model Inference

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (7 messages):

CPT with Unsloth, DeepSeek model age, Math versions for Qwen, Making AI accessible

Link mentioned: unsloth (Unsloth AI): no description found


Codeium (Windsurf) ▷ #announcements (2 messages):

Gemini 2.0 Flash, Windsurf Next Beta

Links mentioned:


Codeium (Windsurf) ▷ #discussion (34 messages🔥):

Credits and Tool Use, Qodo Concerns, Runic Open-Source Framework, Codeium Plugin Issues, Windsurf vs Codeium

Link mentioned: GitHub - livingstonlarus/runic: An open-source framework that enhances Large Language Models (LLMs) with Long-Term Memory (LTM) and Retrieval-Augmented Generation (RAG). Ideal for AI coding assistants and other applications, it enables LLMs to retain context, adapt over time, and access up to date information, ensuring more intelligent and context-aware interactions.: An open-source framework that enhances Large Language Models (LLMs) with Long-Term Memory (LTM) and Retrieval-Augmented Generation (RAG). Ideal for AI coding assistants and other applications, it e...


Codeium (Windsurf) ▷ #windsurf (476 messages🔥🔥🔥):

Issues with Windsurf Credits, Gemini Model Discussions, Windsurf vs Cursor Feature Comparison, User Experience Feedback, Suggestions for Windsurf Improvements

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #announcements (1 messages):

Introductory Message, Community Engagement Initiatives, Feature Request Board, Progress Sharing from Researchers


Stability.ai (Stable Diffusion) ▷ #general-chat (399 messages🔥🔥):

Latency and Model Compatibility, Training and Fine-Tuning Challenges, Upcoming AI Models and Architectures, Community Tools for AI, Spamming in Discord Channels

Links mentioned:


Cursor IDE ▷ #general (365 messages🔥🔥):

Cursor IDE Features, MCP Server Integration, Gemini 2.0 Pro Model, Voice Dictation in Coding, Mobile IDE Usability

Links mentioned:


Perplexity AI ▷ #general (312 messages🔥🔥):

Changes in Perplexity AI, Gemini 2.0 Flash Release, Model Access Issues, Pro Subscription Confusions, Feedback and User Experience

Links mentioned:


Perplexity AI ▷ #sharing (11 messages🔥):

US Iron Dome Proposal, California Secession Bid, Asteroids and Life, Quantum Mechanics and Consciousness, Electricity Types


Perplexity AI ▷ #pplx-api (5 messages):

Sonar Reasoning Pro, Perplexity API Cost Management, Image Uploading in Perplexity API


Eleuther ▷ #general (114 messages🔥🔥):

Discussion on ML Theory and Convex Optimization, Harmonic Loss vs Cross-Entropy Loss, Machine Learning Background and Collaborations, Insights on Diffusion Models, Challenges in Statistical Background for ML

Links mentioned:


Eleuther ▷ #research (210 messages🔥🔥):

Harmonic loss as alternative to CE loss, VideoJAM framework for motion generation, Activation functions in neural networks, Evaluation of various optimizer techniques, Modified ReLU approaches

Links mentioned:


Eleuther ▷ #gpt-neox-dev (2 messages):

Effective Batch Size Strategies, Weight Decay Application in Training


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

DeepSeek R1 Nitro, Downtime Incident

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Significantly better uptime and speed on our DeepSeek R1 Nitro endpoint.Seeing 97% of requests now fully complete, with a finish reason. Try it! 👇


OpenRouter (Alex Atallah) ▷ #general (298 messages🔥🔥):

OpenRouter downtime, API errors and rate limits, Gemini 2.0 updates, Provider routing and pricing, Community support and troubleshooting

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (156 messages🔥🔥):

Gemini 2.0 updates, Mistral's new offerings, AI benchmarking performance, GitHub Copilot updates

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (19 messages🔥):

Sama hiring robotics engineers, Softbank AGI deadline, Krutrim licensing controversy, Anthropic jailbreak challenge, Community perspectives on AI development

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (68 messages🔥🔥):

DeepSeek R1 Launch, OpenAI's Sora Tool, Nvidia Digits Interest, GitHub Pages Certificate Issues, AI Model Performance Discussion

Links mentioned:


Interconnects (Nathan Lambert) ▷ #memes (5 messages):

Vibe Coding, Blackout Poetry in AI, Clarifying Questions in Deep Research

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rl (3 messages):

Skepticism towards RL datasets, Democratization of Reinforcement Learning


Interconnects (Nathan Lambert) ▷ #reads (14 messages🔥):

JAX usage, The Nvidia Way, Amazon operations, Grok shipping, Nvidia company culture


Interconnects (Nathan Lambert) ▷ #posts (2 messages):

SnailBot News, Return of RL World


LM Studio ▷ #general (208 messages🔥🔥):

LM Studio Usage, Model Compatibility, Hardware Requirements, Vulkan Support, GPT-Researcher Integration

Links mentioned:


LM Studio ▷ #hardware-discussion (39 messages🔥):

Performance with 3070 and 8700K, M4 Max capability, GPU pricing and availability, PCIe configurations for inference, RAM and VRAM requirements for models

Link mentioned: Can Your Computer Run This LLM?: no description found


HuggingFace ▷ #general (176 messages🔥🔥):

AI for Math Contributions, DeepSeek Model Performance, AI Art Style Transfer, Hugging Face Spaces Updates, LLM Benchmarking

Links mentioned:

Smart…": no description foundapp.py · m-ric/open_Deep-Research at main: no description foundSneakers (1992): My Voice Is My Passport: Sneakers. Dir. Phil Alden Robinson. Universal Studios, 1992.This short clip is intended to serve as an illustration for an entry on WNYC Radio's "On The Medi...


HuggingFace ▷ #i-made-this (6 messages):

Modified ESN Simulation, New Paper on Arxiv, Securade.ai HUB, TinyRAG System

Links mentioned:


HuggingFace ▷ #reading-group (4 messages):

Event Timing Confirmation, Description Approval, Upcoming Event Excitement


HuggingFace ▷ #computer-vision (1 messages):

Image Classification Models, ResNet50 Fine-tuning, Publishing Sector Insights


HuggingFace ▷ #gradio-announcements (3 messages):

Office Hours Announcement, Gradio Contribution Video

Link mentioned: How to make your very FIRST open-source contribution (feat. Gradio): One of the questions we get asked most often is: "how do I even start contributing to open-source software?"We recorded a video walkthrough fixing a real bug...


HuggingFace ▷ #smol-course (2 messages):

Updated NLP Course, Current NLP Course Limitations


HuggingFace ▷ #agents-course (15 messages🔥):

Agents Course Registration, Python Coding Skills for Course, Python Learning Resources, Tools for 2D Plane to Python Code, Finetuning Models for AI Agents

Links mentioned:


HuggingFace ▷ #open-r1 (1 messages):

HuggingFace Repo Testing, Hardware for Inference


Yannick Kilcher ▷ #general (144 messages🔥🔥):

NURBS vs Meshes, AI Reasoning Models, Perspective and Transformation, Topology in 3D Modeling, Dynamic vs Static Use Cases

Links mentioned:


Yannick Kilcher ▷ #paper-discussion (32 messages🔥):

Harmonic Loss Paper, AI Peer Review Improvements, Error Bars in AI Research, VideoJAM Analysis, Jailbreaking AI Systems

Links mentioned:


Yannick Kilcher ▷ #ml-news (17 messages🔥):

OpenAI's sentry gun incident, Gemini model updates, Flash thinking vs Pro thinking, Gemini 2.0 Flash performance, Leaderboard for AI models

Links mentioned:


OpenAI ▷ #annnouncements (3 messages):

WhatsApp ChatGPT features, Deep research update, YouTube video 'Refreshed'

Link mentioned: Refreshed.: Check out https://openai.com/ to see more.


OpenAI ▷ #ai-discussions (92 messages🔥🔥):

DeepSeek Privacy Concerns, Midjourney vs. Flux, ChatGPT and Reasoning, Gemini 2.0 Features, Deep Research Availability

Links mentioned:


OpenAI ▷ #gpt-4-discussions (7 messages):

Testing New Features, o3mini Inspiration


OpenAI ▷ #prompt-engineering (3 messages):

Statistics Analysis Prompt Design, Rhetorical Argument Structure in Writing, Sprite Sheet Generation for Animation, Character Design in Sprite Sheets


OpenAI ▷ #api-discussions (3 messages):

Statistics Analysis Techniques, Rhetoric Argument Construction, Sprite Sheet Generation


Nous Research AI ▷ #general (96 messages🔥🔥):

Data Availability in Distributed Training, Hermes Reasoner Insights, OpenAI vs DeepSeek Model Performance, AI Backlash and Crypto Relation, DeepResearch from OpenAI

Links mentioned:


Nous Research AI ▷ #ask-about-llms (1 messages):

O3-mini prompt crafting


Nous Research AI ▷ #research-papers (3 messages):

Pretraining papers, Acknowledgment of authors, Hardware infrastructure team


Nous Research AI ▷ #interesting-links (2 messages):

Liger-Kernel PR #553, Deep Dive into LLMs

Links mentioned:


Nous Research AI ▷ #research-papers (3 messages):

Pretraining Papers Authorship, Importance of Hardware Infra Team


Modular (Mojo 🔥) ▷ #general (7 messages):

Closed Source Compiler, Open Sourcing Timeline, MLIR Dialects and Passes, Function Level Lowering

Link mentioned: Modular milestones: GPUs, 2024 reflections, and the road ahead 🚀: In this extra special community meeting, we reflected on 2024's progress and shared updates on:🧑‍🚀 MAX 24.6, featuring MAX GPU!🔥 Our overall approach to M...


Modular (Mojo 🔥) ▷ #mojo (87 messages🔥🔥):

Mojo Standard Library, Function Overloading in Mojo, Async Function Handling, Script Struct Implementation, Buffer Handling in APIs

Links mentioned:


Notebook LM ▷ #use-cases (10 messages🔥):

AI in Legal Practice, Case Study Assistance, Deposition Summaries, Contract Review Experiments, Document Drafting Automation

Link mentioned: Demonstration of how Avatars can add value as digital labor to expand the paralegal team: We add avatars to this contract review app to make the redlining analysis more engaging and to differentiate the product.Avatars by www.simli.com


Notebook LM ▷ #general (84 messages🔥🔥):

NotebookLM access issues, NotebookLM Plus activation, Uploading files and sources, Audio overview features, Spreadsheet integration

Links mentioned:


Torchtune ▷ #general (54 messages🔥):

Torchtune vs Unsloth Performance, Kolo Docker Tool, FastAPI and Next.js Interface for Torchtune, GRPO Implementation, Custom Script Integration in Torchtune

Links mentioned:


Torchtune ▷ #dev (37 messages🔥):

Ladder-residual architecture, Distributed generation issues, FSDP synchronization challenges, Full DPO Distributed PR checks, Performance optimization of generation

Links mentioned:


Latent Space ▷ #ai-general-chat (75 messages🔥🔥):

OpenAI SWE Agent, OmniHuman video generation, Figure's Independence from OpenAI, Gemini 2.0 Flash release, Mistral AI Rebranding

Links mentioned:


Nomic.ai (GPT4All) ▷ #announcements (1 messages):

GPT4All v3.9.0 Release, LocalDocs Fix, DeepSeek-R1 Update, Windows ARM Improvements, New Model Support


Nomic.ai (GPT4All) ▷ #general (62 messages🔥🔥):

ReAG - Reasoning Augmented Generation, Self-hosting GPT4All, Local models for NSFW content, User Interface Bugs, Datalake Concerns

Links mentioned:


MCP (Glama) ▷ #general (50 messages🔥):

ChatGPT Pro Subscription, MCP Excel File Manipulation, Playwright/Puppeteer Automation, GitHub MCP Usage, Home Assistant MCP Client/Server Support

Link mentioned: servers/src/puppeteer/index.ts at evaboot · isaacphi/servers: Model Context Protocol Servers. Contribute to isaacphi/servers development by creating an account on GitHub.


MCP (Glama) ▷ #showcase (6 messages):

Sage Smithery Integration, MCP Tools Support for Claude, PulseMCP Use Cases Launch

Links mentioned:


GPU MODE ▷ #general (8 messages🔥):

Huggingface L40S performance comparison, Janus-Pro-7B results, EvaByte architecture, Autoregressive image generation, Byte transformers in image modeling

Links mentioned:


GPU MODE ▷ #triton (2 messages):

tl.gather function, Triton installation, Installing from source

Link mentioned: GitHub - triton-lang/triton: Development repository for the Triton language and compiler: Development repository for the Triton language and compiler - triton-lang/triton


GPU MODE ▷ #cuda (5 messages):

GPU Invalidations, Microbenchmarking Techniques, WGMMA Layouts, AI Compute Efficiency

Link mentioned: GPUs Go Brrr: how make gpu fast?


GPU MODE ▷ #torch (5 messages):

BlockMask support for .state_dict(), Flex Attention, Torch Save and Load


GPU MODE ▷ #cool-links (3 messages):

OmniHuman framework, FlowLLM for material discovery, Video generation from images, Generative models in research

Links mentioned:


GPU MODE ▷ #jobs (2 messages):

Part-Time AI Software & Hardware Optimization Engineer, Modal Serverless Computing, GPU Performance Engineering

Links mentioned:


GPU MODE ▷ #torchao (4 messages):

Torchao and torch.compile compatibility, PyTorch issue discussion, Community engagement on GitHub

Link mentioned: Compiled nn.Module with tensor subclass can't be moved to another device · Issue #141548 · pytorch/pytorch: 🐛 Describe the bug import torch aten = torch.ops.aten class Subclass(torch.Tensor): def new(cls, data): return torch.Tensor._make_wrapper_subclass(cls, data.shape, dtype=data.dtype, device=data.....


GPU MODE ▷ #off-topic (3 messages):

AI in Gaming, General-Purpose Robotic Models, AI-powered Fax Services

Links mentioned:


GPU MODE ▷ #liger-kernel (1 messages):

Granite 3 models, Llama 3 models, PR #558

Link mentioned: Support Granite 3.0 and 3.1 models by JamesKunstle · Pull Request #558 · linkedin/Liger-Kernel: Granite 3.(0,1) models are Llama-architecture models with some different scaling terms in various places. This commit adds granite model patching for decoder-only granite 3 models (not multimodal) ...


GPU MODE ▷ #reasoning-gym (10 messages🔥):

CompositeDataset PR, gsm-symbolic cross-checks, laptop repair, requirements-dev updates, generator issues in gsm-symbolic


LlamaIndex ▷ #blog (3 messages):

Deepseek exploration, Building RAG applications, Gemini 2.0 launch, Gemini integration

Link mentioned: Gemini 2.0 is now available to everyone: We’re announcing new updates to Gemini 2.0 Flash, plus introducing Gemini 2.0 Flash-Lite and Gemini 2.0 Pro Experimental.


LlamaIndex ▷ #general (20 messages🔥):

Timeout implementation in LlamaIndex, Function calling with Qwen-2.5, Streaming text in AgentWorkflow, Using OpenAILike with vLLM, Tool call streaming limitations

Links mentioned:


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (14 messages🔥):

MOOC Certificate Delays, Quiz 1 and Quiz 2 Availability, Technical Issues with Quizzes, Certificate Request Process

Link mentioned: Quiz 1 - Inference-Time Techniques w/ Xinyun Chen (1/27): INSTRUCTIONS:Each of these quizzes is completion based, however we encourage you to try your best for your own education! These quizzes are a great way to check that you are understanding the course m...


LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (2 messages):

Lecture 1 Recording, Professional Captioning

Link mentioned: CS 194/294-280 (Advanced LLM Agents) - Lecture 1, Xinyun Chen: no description found


Cohere ▷ #discussions (10 messages🔥):

Migrating to Embed v3 Light, Cohere Moderation Model, Chat Feature Fees, Cohere Free API


Cohere ▷ #api-discussions (2 messages):

Conversational Memory, Java API usage, Support Ticket


Cohere ▷ #projects (2 messages):

Rule Enforcement, Apology and Acknowledgment


tinygrad (George Hotz) ▷ #general (10 messages🔥):

tinygrad 0.10.1 issues, NixOS specificities, Compiler flags and warnings, Debugging improvements

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):

tinygrad base operations, kernel implementations


Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (3 messages):

API model requirements, Leaderboards for models, Authentication mechanisms

Link mentioned: gorilla/berkeley-function-call-leaderboard/CONTRIBUTING.md at main · ShishirPatil/gorilla: Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls) - ShishirPatil/gorilla


Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

Raft Method, Llama 3.1 7B, Synthetic Data for Training, Fine-tuning, RAG Implementation


DSPy ▷ #examples (2 messages):

Chain of Agents, DSPy Way

Link mentioned: Tweet from Sergii Guslystyi (@JuiceSharp): http://x.com/i/article/1887191253370216450



{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}