Frozen AI News archive

ModernBert: small new Retriever/Classifier workhorse, 8k context, 2T tokens,

**Answer.ai/LightOn** released **ModernBERT**, an updated encoder-only model with **8k token context**, trained on **2 trillion tokens** including code, with **139M/395M parameters** and state-of-the-art performance on retrieval, NLU, and code tasks. It features **Alternating Attention** layers mixing global and local attention. **Gemini 2.0 Flash Thinking** debuted as #1 in Chatbot Arena, and the **O1 model** scored top in reasoning benchmarks. **Llama** downloads surpassed **650 million**, doubling in 3 months. **OpenAI** launched desktop app integrations with voice capabilities. **Figure** delivered its first humanoid robots commercially. Advances in robotics simulation and a new physics engine **Genesis** claiming **430,000x faster than real-time** were highlighted.

Canonical issue URL

AI News for 12/18/2024-12/19/2024. We checked 7 subreddits, 433 Twitters and 32 Discords (215 channels, and 4745 messages) for you. Estimated reading time saved (at 200wpm): 440 minutes. You can now tag @smol_ai for AINews discussions!

As he has been teasing for a few months, Jeremy Howard and the Answer.ai/LightOn team released ModernBert today, updating the classic BERT from 2018:

image.png

The HuggingFace blogpost goes into more detail on why this is useful:

import torch
from transformers import pipeline
from pprint import pprint

pipe = pipeline(
    "fill-mask",
    model="answerdotai/ModernBERT-base",
    torch_dtype=torch.bfloat16,
)

input_text = "One thing I really like about the [MASK] newsletter is its ability to summarize the entire AI universe in one email, consistently, over time. Don't love the occasional multiple sends tho but I hear they are fixing it."
results = pipe(input_text)
pprint(results)

One of the MANY interesting details disclosed in the paper is the Alternating Attention layers - mixing global and local attention in the same way Noam Shazeer did at Character (our coverage here):

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Releases and Performance

Major Company News

Technical Developments

Memes and Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Bamba: Inference Efficient Hybrid Mamba2 Model

Theme 2. Genesis: Generative Physics Engine Breakthrough

Theme 3. Slim-Llama ASIC Processor's Efficiency Leap

Theme 4. Gemini 2.0 Flash Thinking Experimental Release

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Gemini 2.0 Flash Thinking released, outperforming older models

Theme 2. NotebookLM incorporates interactive podcast feature


AI Discord Recap

A summary of Summaries of Summaries by o1-2024-12-17

Theme 1. Fierce Model Wars and Bold Price Cuts

Theme 2. Multi-GPU and Fine-Tuning Frenzy

Theme 3. Agents, RAG, and RLHF Breakthroughs

Theme 4. AI Tools for Coding Take Center Stage

Theme 5. Fresh Libraries and Open-Source Adventures


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


Cursor IDE Discord


Codeium (Windsurf) Discord


Interconnects (Nathan Lambert) Discord


OpenAI Discord


Eleuther Discord


Perplexity AI Discord


aider (Paul Gauthier) Discord


Stackblitz (Bolt.new) Discord


Notebook LM Discord Discord


Stability.ai (Stable Diffusion) Discord


GPU MODE Discord


Latent Space Discord


OpenInterpreter Discord


LM Studio Discord


OpenRouter (Alex Atallah) Discord


Nous Research AI Discord


Modular (Mojo 🔥) Discord


DSPy Discord


LlamaIndex Discord


Nomic.ai (GPT4All) Discord


tinygrad (George Hotz) Discord


Cohere Discord


Torchtune Discord


LLM Agents (Berkeley MOOC) Discord


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Axolotl AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LAION Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Unsloth AI (Daniel Han) ▷ #general (352 messages🔥🔥):

Unsloth's Multi-GPU Support, Llama 3.3 Fine-Tuning, SGLang vs. vLLM, Sales Strategy, FFT Support

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (117 messages🔥🔥):

Adapters vs Models, Fine-tuning Challenges, Learning Resources for Fine-tuning, Instruction Tuning Limitations, Model Merging Techniques

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (468 messages🔥🔥🔥):

Fine-tuning LLMs, RAG (Retrieval-Augmented Generation), Quantization, Using Google Colab and Kaggle for model training, JSON data formatting for models

Links mentioned:


Cursor IDE ▷ #general (706 messages🔥🔥🔥):

Cursor 0.44.4 Release, O1 vs Sonnet 3.5 Performance, Website Builders vs Custom Code, Gemini-1206 Capabilities, The Role of College for Startups

Links mentioned:


Codeium (Windsurf) ▷ #discussion (65 messages🔥🔥):

Flex credits rollover, Using repoprompt in Windows, Integrating features from GitHub, Codeium extension issues, Windsurf user experience


Codeium (Windsurf) ▷ #windsurf (509 messages🔥🔥🔥):

Windsurf performance issues, Cline + Gemini usage, Codeium support and features, Model comparisons, Credit management in AI tools

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (193 messages🔥🔥):

Gemini 2.0 Flash Thinking, OpenAI updates, Researcher departures, Search engine competition, Reasoning models

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (16 messages🔥):

o1 model discussion, Chollet's analogies, Subbarao/Miles Brundage incident, Francois Chollet's grumpiness, Interconnects engagement

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (167 messages🔥🔥):

Stripe Tax Implementation, Substack Revenue Model and Tax Concerns, CPA Recommendations for Tax Filing, Digital Services and VAT Compliance, Challenges for International Taxation

Links mentioned:


Interconnects (Nathan Lambert) ▷ #memes (2 messages):

Interactive AI in Game Shows, Social Media Reactions

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rl (1 messages):

kevin_nejad: it's interesting (but not obvious) such behaviour emerges purely from RL training


Interconnects (Nathan Lambert) ▷ #lectures-and-projects (7 messages):

RLHF Book, Typos Correction, Fundamentals Review


Interconnects (Nathan Lambert) ▷ #posts (1 messages):

natolambert: Yeah. Students coming up and wanting to take photos is so cute ❤️


OpenAI ▷ #annnouncements (1 messages):

12 Days of OpenAI, ChatGPT work enhancements

Link mentioned: - YouTube: no description found


OpenAI ▷ #ai-discussions (310 messages🔥🔥):

ChatGPT and integration, Google's AI developments, YouTube clone project, Software engineering automation, AI benchmarks and capabilities

Links mentioned:


OpenAI ▷ #gpt-4-discussions (8 messages🔥):

Editing GPTs, Project Folder Limitations, Support Channels, Pro Package Tool Issues


Eleuther ▷ #general (146 messages🔥🔥):

FSDP and Tensor Parallelism, EleutherAI Token Controversy, Natural Attention Optimizer, Debugging Training Models, Causal Masking in Attention

Links mentioned:


Eleuther ▷ #research (123 messages🔥🔥):

Microsoft Research Ethics, Koopman Theory and Neural Networks, Diffusion vs Autoregressive Models, Plagiarism Concerns in ML Research, Research Submissions and Oversight

Links mentioned:


Eleuther ▷ #interpretability-general (5 messages):

Independence of Neural Network Activations, Pre-image Reconstruction Methods, Steered vs Unsteered Sparse Autoencoders, Out-of-Distribution (OOD) Evaluation


Perplexity AI ▷ #general (254 messages🔥🔥):

Perplexity AI updates, You.com features, Gemini models, Student discounts, Referral systems

Links mentioned:


Perplexity AI ▷ #sharing (7 messages):

EU Funds Starlink Rival, Plants Cry, Law of the Few, Magic Spell Hypothesis, Tornado Alley

Link mentioned: YouTube: no description found


aider (Paul Gauthier) ▷ #general (222 messages🔥🔥):

Gemini models, Aider integration, MCP functionality, OpenAI access issues, Jira task automation

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (11 messages🔥):

Using multiple OpenAPI services, Gemini Flash 2.0 issues, Architect mode features, Adding files in a fuzzy way, Project planning models


aider (Paul Gauthier) ▷ #links (9 messages🔥):

GitHub Copilot Chat, Aider Composer VSCode Extension, Diff Edits Preference

Link mentioned: Announcing GitHub Copilot Free · GitHub Changelog: Announcing GitHub Copilot Free


Stackblitz (Bolt.new) ▷ #announcements (1 messages):

Bolt Supabase Integration

Link mentioned: Tweet from StackBlitz (@stackblitz): 📢 Announcing: Supabase<>Bolt integration!No manual setup: just click, connect, and it's done!


Stackblitz (Bolt.new) ▷ #prompting (14 messages🔥):

Bolt project setup, Issues with .env file, Direct uploads from Figma, Application review process


Stackblitz (Bolt.new) ▷ #discussions (182 messages🔥🔥):

Bolt Issues and Feedback, Community Support and Resources, Supabase Integration, Functionality and Token Use, User Experience with Bolt

Links mentioned:


Notebook LM Discord ▷ #announcements (1 messages):

Interactive Mode for Audio Overviews


Notebook LM Discord ▷ #use-cases (17 messages🔥):

NotebookML video generation, Interactive podcast feature, Podcast editing workflows, Connection of MySQL database to NotebookLM, YouTube content creation

Links mentioned:


Notebook LM Discord ▷ #general (144 messages🔥🔥):

Notebook LM Interactive Mode, Audio Overview Pronunciation, Notebook Features Across Notebooks, User Feedback on New UI, Experimental Use of AI in Storytelling

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (102 messages🔥🔥):

Running SDXL on Ubuntu, ComfyUI Issues, AI Image and Video Quality, Quantum Computing Conversations, Civitai Website Issues

Links mentioned:


GPU MODE ▷ #general (58 messages🔥🔥):

Coil Whine, GPU Performance & Choices, Bottlenecking Debate, VRChat VRAM Needs, Next-Gen GPU Pricing


GPU MODE ▷ #triton (4 messages):

tl.dot input shape requirements, AMD GPU performance vs PyTorch, Nvidia Hopper warp-specialization deletion, Triton performance optimization


GPU MODE ▷ #cuda (5 messages):

cudaMemcpy performance, CUTLASS tma_store_wait function behavior, Documentation on TMA operations


GPU MODE ▷ #torch (1 messages):

0x000ff4: any one contributin to keras/pytorch?


GPU MODE ▷ #algorithms (21 messages🔥):

Genesis AI, Sim2Real Technology, CARLA Simulator Update, Synthetic Data Generation for Autonomous Driving, Dexterous Task Applications

Links mentioned:


GPU MODE ▷ #cool-links (1 messages):

Image Analysis, User Concerns


GPU MODE ▷ #jobs (1 messages):

MatX hiring, LLM accelerator ASIC development, Low level compute kernel author roles, ML performance engineer roles, In-person work culture

Link mentioned: MatX: <header><h2>MatX: faster chips for LLMs</h2></header><div id="maincontent"><h3>Come work with us!</h3><ul><li>Whether we're working...


GPU MODE ▷ #sparsity-pruning (1 messages):

Sparsity Design, Sparsifier Functionality, Sparsify Kernel Optimization, Demo for Sparsify Usage


GPU MODE ▷ #self-promotion (1 messages):

alma Python Package, Model Benchmarking, PyTorch Conversion Options

Link mentioned: GitHub - saifhaq/alma: Contribute to saifhaq/alma development by creating an account on GitHub.


GPU MODE ▷ #thunderkittens (1 messages):

kimishpatel: what i cam here for 🙂


GPU MODE ▷ #arc-agi-2 (5 messages):

Cost of GPUs on Vast AI, Generative Flow Networks, ARC Prize Daily Puzzle, Training Smaller Models, Synthesizing Riddles

Link mentioned: ARC Prize - Play the Game: Easy for humans, hard for AI. Try ARC-AGI.


Latent Space ▷ #ai-general-chat (87 messages🔥🔥):

AI Agentic Systems, Gemini 2.0 Flash Thinking, Databricks Funding, ModernBERT Release, Alec Radford Departure from OpenAI

Links mentioned:


OpenInterpreter ▷ #general (67 messages🔥🔥):

OpenInterpreter 1.0 updates, Running commands in server mode, Google Gemini 2.0 multimodal, Local vs server command execution, OS mode functionality


OpenInterpreter ▷ #O1 (1 messages):

O1 Channel Exploration, Understanding documentation


LM Studio ▷ #general (62 messages🔥🔥):

LM Studio Model Loading Issues, Mobile Access to LM Studio, GPU Driver Problems, Image Input Models for LM Studio, Known Issues with AMD Drivers

Links mentioned:


LM Studio ▷ #hardware-discussion (3 messages):

Silicon Chips Performance, Benchmark Comparisons


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Price reductions, Market competition

Links mentioned:


OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

AI Ecosystem Maps, Crowdsourced AI Enablement Stack

Link mentioned: GitHub - daytonaio/ai-enablement-stack: A Community-Driven Mapping of AI Development Tools: A Community-Driven Mapping of AI Development Tools - daytonaio/ai-enablement-stack


OpenRouter (Alex Atallah) ▷ #general (62 messages🔥🔥):

DeepSeek Models, OpenRouter Issues, Model and API Discussion, Data Management, User Experience Feedback

Links mentioned:


OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Programmatic feature requests, Provider API integration


Nous Research AI ▷ #general (47 messages🔥):

GitHub Copilot Free Tier, Granite 3.1-8B-Instruct Model, LM Studio for Local LLMs, Model Context Protocol Testing, Gemini Flash Thinking Experimental

Links mentioned:


Nous Research AI ▷ #ask-about-llms (2 messages):

Agent Message Formatting, Fine-Tuning Dataset Consistency


Nous Research AI ▷ #interesting-links (2 messages):

Genesis Project, Generative Physics Engine, Open Source Robotics Simulation

Links mentioned:


Modular (Mojo 🔥) ▷ #mojo (37 messages🔥):

Mojo Indexing and Casting, SIMD Keying in Dict, Running Mojo on Android, Python Integration Ideas, Negative Indexing Debate

Link mentioned: [BUG] Segfault if using a struct based on SIMD as key in Dict · Issue #3781 · modularml/mojo: Bug description When using a struct containing a sufficiently large SIMD as key in a Dict, a segmentation fault is encountered. Steps to reproduce Execute the following code: from collections impor...


DSPy ▷ #general (28 messages🔥):

Synthetic Data Primer, Rate Limiting in DataBricks, DSPy Signature Outputs, Provisioned Throughput Costs, LiteLLM Proxy Layer

Links mentioned:


LlamaIndex ▷ #blog (3 messages):

Multi-agent systems, Vectara RAG capabilities, AI journey survey

Link mentioned: State of AI Developer Survey: Share your experiences, challenges, and insights, and help shape the future of AI-driven innovation.


LlamaIndex ▷ #general (23 messages🔥):

HuggingFaceEmbedding model loading, Azure OpenAI embedding rate limits, TextNode insert errors

Links mentioned:


LlamaIndex ▷ #ai-discussion (1 messages):

Vision Parse, PDF to Markdown


Nomic.ai (GPT4All) ▷ #announcements (1 messages):

Data Mapping Series, Scalable Graphics, Embeddings, Dimensionality Reduction, Unstructured Data

Link mentioned: Data Maps, Part 4: Why Are Web Browsers The Best Data Browsers?: Why Are Web Browsers The Best Data Browsers?


Nomic.ai (GPT4All) ▷ #general (17 messages🔥):

Nomic BERT issue, Code Interpreter Pull Request, Loading System Messages, GGUF File Issues, Device Requirements

Links mentioned:


tinygrad (George Hotz) ▷ #general (16 messages🔥):

TinyChat Installation Issues, Tiktoken Replacement Discussions, Scroll Direction Bug Report, Bounty Project Engagement, Layout Notation Insights

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):

khaner2162: Hi why does scheduler # realize before expand or unsafe pad ops?


Cohere ▷ #discussions (6 messages):

Introduction of Ikuo618, Reminder about channel etiquette


Cohere ▷ #questions (2 messages):

Platform Availability


Cohere ▷ #api-discussions (3 messages):

Cohere API pricing, API keys types, Rate limits for endpoints

Link mentioned: API Keys and Rate Limits — Cohere: This page describes Cohere API rate limits for production and evaluation keys.


Cohere ▷ #cmd-r-bot (1 messages):

ikuo618: hi..................!


Cohere ▷ #projects (1 messages):

benny0917: Good looking product <@799853279017173033> congrats!


Torchtune ▷ #general (6 messages):

Torchtune Phi 4 Support, New Contributor Role, Implementation Differences Between Phi 3 and Phi 4

Link mentioned: torchtune.models — torchtune 0.4 documentation: no description found


Torchtune ▷ #papers (2 messages):

Asynchronous RLHF, Post-Training Techniques, Model Safety and Robustness

Links mentioned:


LLM Agents (Berkeley MOOC) ▷ #hackathon-announcements (1 messages):

Hackathon submission deadline, LLM Agents Hackathon, Final reminders, Project submissions, Last-minute questions








{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}