Frozen AI News archive

Meta BLT: Tokenizer-free, Byte-level LLM

**Meta AI** introduces the **Byte Latent Transformer (BLT)**, a tokenizer-free architecture that dynamically forms byte patches for efficient compute allocation, outperforming **Llama 3** on benchmarks including the CUTE benchmark. The model was trained on approximately **1 trillion tokens** and features a three-block transformer design with local and global components. This approach challenges traditional tokenization and may enable new multimodal capabilities such as direct file interaction without retrieval-augmented generation. Additionally, **Microsoft** announced the **Phi-4 14B** parameter model achieving state-of-the-art results on STEM and reasoning benchmarks, surpassing **GPT-4o**. **DeepSeek AI** launched new vision-language models based on their MoE architecture with sizes ranging from **1.0B to 27B** parameters. **OpenAI** released a new Projects feature for ChatGPT, and **Cohere** introduced their smallest and fastest **Command R7B** model. **Anthropic** published research on "Best-of-N Jailbreaking" vulnerabilities across text, vision, and audio models. Industry discussion highlights a trend of decreasing frontier LLM sizes, with **GPT-4** at approximately **1.8 trillion parameters** compared to newer models.

Canonical issue URL

AI News for 12/12/2024-12/13/2024. We checked 7 subreddits, 433 Twitters and 31 Discords (209 channels, and 6703 messages) for you. Estimated reading time saved (at 200wpm): 741 minutes. You can now tag @smol_ai for AINews discussions!

In a day with monster $250m fundraising and Ilya declaring the end of pretraining, we are glad for Meta to deliver a paper with some technical meat: Byte Latent Transformer: Patches Scale Better Than Tokens.

image.png

The abstract is very legible. In contrast to previous byte level work like MambaByte, BLT uses dynamically formed patches that are encoded to latent representations. As the authors say: "Tokenization-based LLMs allocate the same amount of compute to every token. This trades efficiency for performance, since tokens are induced with compression heuristics that are not always correlated with the complexity of predictions. Central to our architecture is the idea that models should dynamically allocate compute where it is needed. For example, a large transformer is not needed to predict the ending of most words, since these are comparably easy, low-entropy decisions compared to choosing the first word of a new sentence. This is reflected in BLT’s architecture (§3) where there are three transformer blocks: two small byte-level local models and a large global latent transformer."

image.png

The authors trained this on ~1T tokens worth of data and compared it with their house model, Llama 3, and it does surprisingly well on standard benchmarks:

image.png

but also does much better on tasks that usually trip up tokenizer based models (the CUTE benchmark):

image.png

What's next - scale this up? Is this worth throwing everything we know about tokenization out the window? What about Long context/retrieval/IFEval type capabilities?

Possibly byte-level transformers unlock NEW kinds of multimodality, as /r/localllama explains:

example of such new possibility is "talking to your PDF", when you really do exactly that, without RAG, and chunking by feeding data directly to the model. You can think of all other kinds of crazy use-cases with the model that natively accepts common file types.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

Here are the key topics from the Twitter discussions, organized by category:

New Model & Research Announcements

Product Launches & Updates

Industry Discussion & Analysis

Memes & Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Phi-4 Release: Benchmarks Shine but Practicality Questioned

Theme 2. Andy Konwinski's $1M Prize for Open-Source AI on SWE-bench

Theme 3. GPU Capabilities Unearthed: How Rich Are We?

Theme 4. Meta's Byte Latent Transformer Redefines Tokenization

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

Theme 1. Gemini 2.0: Google's Multimodal Breakthrough

Theme 2. Limitations on Advanced Voice Mode Usage


AI Discord Recap

A summary of Summaries of Summaries by o1-mini

Theme 1. AI Model Performance and Innovations

Theme 2. Integration and Tooling Enhancements for Developers

Theme 3. AI Model Development Techniques and Optimizations

Theme 4. Product Updates and Announcements from AI Providers

Theme 5. Community Engagement and Support Issues


PART 1: High level Discord summaries

Codeium / Windsurf Discord


Notebook LM Discord Discord


aider (Paul Gauthier) Discord


Cursor IDE Discord


Eleuther Discord


OpenAI Discord


LM Studio Discord


Latent Space Discord


Bolt.new / Stackblitz Discord


GPU MODE Discord


Nous Research AI Discord


Unsloth AI (Daniel Han) Discord


Cohere Discord


Perplexity AI Discord


OpenRouter (Alex Atallah) Discord


Modular (Mojo 🔥) Discord


Interconnects (Nathan Lambert) Discord


LLM Agents (Berkeley MOOC) Discord


Stability.ai (Stable Diffusion) Discord


OpenInterpreter Discord


LlamaIndex Discord


DSPy Discord


tinygrad (George Hotz) Discord


Torchtune Discord


MLOps @Chipro Discord


Mozilla AI Discord


The Axolotl AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LAION Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Codeium / Windsurf ▷ #discussion (136 messages🔥🔥):

Codeium subscription concerns, Internal errors with Claude, Cascade usage issues, Windsurf integration with Git, Workspace AI rules for C# projects

Links mentioned:


Codeium / Windsurf ▷ #windsurf (734 messages🔥🔥🔥):

Windsurf, Sonnet 3.5 vs 4o, Windsurf global rules, AI and copyright, Prompt crafting

Links mentioned:


Notebook LM Discord ▷ #announcements (2 messages):

NotebookLM Update, Audio Overview Interaction, NotebookLM Plus Features, 3-Panel Interface, New Sharing Features

Link mentioned: Rocket Engine Test Future In Space GIF - Rocket Engine Test Test Future In Space - Discover & Share GIFs: Click to view the GIF


Notebook LM Discord ▷ #use-cases (58 messages🔥🔥):

NotebookLM Customization, AI in Creative Processes, Language Processing in NotebookLM, Multilingual AI Performance, Educational Use of NotebookLM

Links mentioned:


Notebook LM Discord ▷ #general (506 messages🔥🔥🔥):

NotebookLM updates, Interactive Audio Overviews, NotebookLM Plus, New UI features, Language support

Links mentioned:


aider (Paul Gauthier) ▷ #announcements (1 messages):

Aider v0.69.0 Release, Gemini Flash 2.0 Support, New Slash Commands, Multiline Chat Feature, Analytics Opt-in

Links mentioned:


aider (Paul Gauthier) ▷ #general (506 messages🔥🔥🔥):

Aider workflows, Gemini model performance, Using ChatGPT with Aider, Fine-tuning models for coding, LLM leaderboard comparisons

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (58 messages🔥🔥):

Aider file management, Obsidian integration for project planning, Fast Apply model discussion, Claude AI comparison, Rust-analyzer integration with Aider

Links mentioned:


Cursor IDE ▷ #general (426 messages🔥🔥🔥):

Cursor AI vs Windsurf, User Payment Issues, Model Options and Usage, Development Experiences, AI Performance Observations

Links mentioned:


Eleuther ▷ #general (13 messages🔥):

GPU Cluster Disruption, Creative Grading Methods, Fantasy Character Dataset Project

Link mentioned: Instructions: no description found


Eleuther ▷ #research (334 messages🔥🔥):

Uncertainty in Modeling, Continuous vs Discrete Representations, Philosophy of Mathematics, Complexity in Physics, Interpretation of Probability

Links mentioned:


Eleuther ▷ #interpretability-general (2 messages):

Inverse Mechanistic Interpretability, RASP


Eleuther ▷ #lm-thunderdome (1 messages):

Logging samples in models


OpenAI ▷ #annnouncements (1 messages):

Projects in ChatGPT, 12 Days of OpenAI

Link mentioned: Projects—12 Days of OpenAI: Day 7: Kevin Weil, Drew Schuster, and Thomas Dimson introduce and demo Projects.


OpenAI ▷ #ai-discussions (280 messages🔥🔥):

Sora access issues, ChatGPT subscription frustrations, Comparisons between AI models, Quality of AI-generated content, Local AI implementations

Link mentioned: GitHub - AlignAGI/Alignment: Promoting global awareness and action for ethical AI alignment and safeguarding humanity against AI self-replication risks. Includes research, frameworks, and open-source resources.: Promoting global awareness and action for ethical AI alignment and safeguarding humanity against AI self-replication risks. Includes research, frameworks, and open-source resources. - AlignAGI/Alig...


OpenAI ▷ #gpt-4-discussions (3 messages):

Rollout Speed


OpenAI ▷ #prompt-engineering (4 messages):

Prompt Complexity, Response Time with Non-Logical Questions


OpenAI ▷ #api-discussions (4 messages):

Prompt Complexity, Response Delays in o1


LM Studio ▷ #general (74 messages🔥🔥):

MacBook Pro M4 Pro capabilities, Model training and performance, Model loading issues, Multi-modality models, LLMs usage and configuration

Link mentioned: GitHub - rasbt/LLMs-from-scratch: Implement a ChatGPT-like LLM in PyTorch from scratch, step by step: Implement a ChatGPT-like LLM in PyTorch from scratch, step by step - rasbt/LLMs-from-scratch


LM Studio ▷ #hardware-discussion (171 messages🔥🔥):

GPU Purchase Considerations, Performance of AMD vs Intel, Power Supply Units (PSUs), Memory Overclocking, Model Training and Resource Requirements

Links mentioned:


Latent Space ▷ #ai-general-chat (60 messages🔥🔥):

OpenAI Projects 12 Days, Pika 2.0 Release, NotebookLM Updates, Qwen 2.5 Turbo, Sonnet Performance in WebDev Arena

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):

Windsurf, Codeium, AI IDEs, Scaling in AI Development

Links mentioned:


Latent Space ▷ #ai-in-action-club (182 messages🔥🔥):

NeurIPS webcrawl, Prompt engineering discussion, SillyTavern utilization, AI functions in Python, Local model applications

Links mentioned:


Bolt.new / Stackblitz ▷ #prompting (5 messages):

Prompting Bolt to erase memory, Using API references in prompts, Best practices for code reviews


Bolt.new / Stackblitz ▷ #discussions (214 messages🔥🔥):

Bolt integration issues, Supabase and Stripe integration, Help requests for Bolt, User onboarding in Bolt, Feedback on Bolt features

Links mentioned:


GPU MODE ▷ #general (119 messages🔥🔥):

SSD Recommendations, GPU Computation, Sequence Packing, Tensor Operations, Batched Matrix Multiplication

Link mentioned: GitHub - gouthamk16/AttogradDB: AttogradDB is a simple and efficient vector store designed for document embedding and retrieval tasks.: AttogradDB is a simple and efficient vector store designed for document embedding and retrieval tasks. - gouthamk16/AttogradDB


GPU MODE ▷ #triton (4 messages):

Fused Attention by Triton, Full Flash Attention Kernel Debugging, TRITON_INTERPET and Data Types


GPU MODE ▷ #cuda (3 messages):

GPU Glossary Release

Link mentioned: GPU Glossary: A glossary of terms related to GPUs.


GPU MODE ▷ #torch (9 messages🔥):

Lora Training Fast Kernels, Gradient Calculation Issues, Quantization-Aware Training Implementation, Quantization Operations and STE, Quantizing Weights and Activations

Links mentioned:


GPU MODE ▷ #cool-links (8 messages🔥):

Trillium TPU launch, Gemini 2.0, Meta's AI advancements, Differentiable Tokenizers, YouTube on GPU optimization

Links mentioned:


GPU MODE ▷ #off-topic (1 messages):

High-quality video game datasets, Labeled actions in gaming, Keyboard/mouse inputs in datasets


GPU MODE ▷ #lecture-qa (1 messages):

CUDA Performance Checklist, Data Coalescing, Block Size Impact


GPU MODE ▷ #liger-kernel (2 messages):

Liger Talk Proposal, Future Plans for Liger


GPU MODE ▷ #self-promotion (19 messages🔥):

GPU Glossary Collaboration, CPU Offload for Single-GPU Training, Tensor Cores vs CUDA Cores, H100 GPU Specifications, Synchronization Issues in PyTorch

Links mentioned:


GPU MODE ▷ #🍿 (3 messages):

Markdown blog version, Evaluation perf improvements, Adding search tool, Sharing content formats


GPU MODE ▷ #arc-agi-2 (34 messages🔥):

ARC riddle approaches, Transduction vs Induction, ARC augmentation strategies, In-context RL exploration, Research resources sharing

Links mentioned:


Nous Research AI ▷ #general (147 messages🔥🔥):

Livestream and Recording for Talks, Speculative Decoding Discussion, Adapters and Model Training, Difficult Token Correction, Data Datasets for AI

Links mentioned:


Nous Research AI ▷ #ask-about-llms (41 messages🔥):

Nous Research Llama Instruct, Mac M3 Air for LLMs, Machine Learning Study Paths, Quantization of Hermes Models, Open-source Coding LLMs

Link mentioned: Tweet from ray🖤🇰🇷 (@yoobinray): If you want to self study ML without wasting time here is the definitive guide since I've gotten so many DMs on this topic:Just answer this one question:- do you want to be a cracked researcher or...


Nous Research AI ▷ #interesting-links (9 messages🔥):

Phi-4 Language Model, DeepSeek-VL2 Launch, Meta's Tokenization Breakthrough, GPU Glossary Introduction, Byte Latent Transformer

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (108 messages🔥🔥):

Quantization in LLMs, Phi-4 Release, Command R7B Performance, Multi-GPU Support in Unsloth, Vision Models and Quantization

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (27 messages🔥):

Fine Tuning Llama 3.3 70B, Multi-GPU Training Suggestions, Unsloth Model vs Llama Model, Nemo Context Length Update, Kaggle Training Environment

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (1 messages):

Llama 3.2 1B specifications, Embedding weights, Model experimentation


Cohere ▷ #discussions (72 messages🔥🔥):

Command R updates, Cohere API usage, Server status issues, User experiences with Command models, Cohere documentation resources

Links mentioned:


Cohere ▷ #announcements (1 messages):

Command R7B, Model Performance, Hugging Face Release, Cohere collaboration

Links mentioned:


Cohere ▷ #questions (18 messages🔥):

Structured JSON example, 403 API errors, 7B model performance, Rerank vs Embed, PR for documentation

Links mentioned:


Cohere ▷ #api-discussions (12 messages🔥):

Community Accessibility Issues, ClientV2 Installation, Cohere Python Library, Model Card Discrepancy

Links mentioned:


Cohere ▷ #cmd-r-bot (25 messages🔥):

Cohere Bot Resurgence, Differences Between Models, Understanding Embed vs Rerank, Emotion-Concealing Robot Traits


Perplexity AI ▷ #announcements (1 messages):

Campus Strategist program, Spring 2025 cohort, International expansion


Perplexity AI ▷ #general (119 messages🔥🔥):

O1 Mini Status, Perplexity Pro User Experience, Image Generation in Perplexity, Pro Subscription Issues, Custom Web Sources in Spaces

Links mentioned:


Perplexity AI ▷ #sharing (3 messages):

Iambic Pentameter, Psych Major Interest, Master and The Emissary, Samsung's Project Moohan


Perplexity AI ▷ #pplx-api (4 messages):

Perplexity API and Website Differences, Closed Beta Access Inquiry, Domain Filter Request


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Model Provider Filtering, API Uptime Issues

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): OpenRouter recovered over 1.8 million requests for closed-source LLMs in the last 2 daysQuoting Zsolt Ero (@hyperknot) Interesting side effect of this "AI Launch Week" is that all providers&#3...


OpenRouter (Alex Atallah) ▷ #general (77 messages🔥🔥):

Gemini Flash 2.0 Feedback, Euryale Model Issues, Using API Keys, Creative Writing Model Comparison, Synthetic Datasets in Pretraining

Links mentioned:


OpenRouter (Alex Atallah) ▷ #beta-feedback (9 messages🔥):

Access to custom provider keys, Integration beta feature, API Keys provision


Modular (Mojo 🔥) ▷ #general (2 messages):

Mojo memes, Hints from OpenAI


Modular (Mojo 🔥) ▷ #announcements (1 messages):

Friday swag challenge, Modular milestones event


Modular (Mojo 🔥) ▷ #mojo (81 messages🔥🔥):

Mojo Language Features, Community Perceptions of Mojo, Networking Capabilities in Mojo, Performance Comparisons between CPUs and GPUs, Use of Mojo in Electrical Engineering


Interconnects (Nathan Lambert) ▷ #events (3 messages):

Meeting Location, Event Coordination


Interconnects (Nathan Lambert) ▷ #news (24 messages🔥):

Microsoft Phi-4 announcement, Skepticism around Phi models, LiquidAI funding, DeepSeek VL2 release, AMD's role in LiquidAI's development

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (26 messages🔥):

Bitter Lesson Discontent, AI Companies' Approach to Personal Topics, Empathy in AI Marketing, Misunderstandings of AI in Academia

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (4 messages):

Qwen models, WebDev Arena Leaderboard, Hugging Face Account Compromise

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rlhf (6 messages):

Stream of Search video, Advantage-Induced Policy Alignment (APA), Reward hacking discussions

Link mentioned: Stream of Search (SoS): Learning to Search in Language (COLM Oral 2024): Authors: Kanishk Gandhi, Denise H J Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah GoodmanLanguage models are rarely shown fruitful mistake...


Interconnects (Nathan Lambert) ▷ #cv (8 messages🔥):

Twitter feeds on VLMs, MVLM posts and university courses, Merve from Huggingface

Links mentioned:


Interconnects (Nathan Lambert) ▷ #reads (9 messages🔥):

Claude AI's SEO Usage, Tulu 3 Post-Training Techniques, Trends in Language Model Sizes, Flash Models and MOEs

Links mentioned:


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (56 messages🔥🔥):

Certificate Declaration Form, Lab Submission Deadlines, Quizzes and Articles, Public Notion Links, Certificate Distribution Timeline


Stability.ai (Stable Diffusion) ▷ #general-chat (46 messages🔥):

WD 1.4 Model Performance, Local Video AI Models Communities, Tag Generation with Taggui, Stable Diffusion XL Inpainting, Image Generation with ComfyUI

Links mentioned:


OpenInterpreter ▷ #general (33 messages🔥):

Nvidia NIM API Setup, Custom API in Open Interpreter, Token Limit Confusions, Development Branch Improvements, Max-budget Implementation


OpenInterpreter ▷ #ai-content (1 messages):

Meta's Byte Latent Transformer, Language Modeling in Sentence Representation

Link mentioned: no title found: no description found


LlamaIndex ▷ #blog (3 messages):

LlamaCloud multimodal pipeline, LlamaParse parsing instructions, RAG application tutorial


LlamaIndex ▷ #general (19 messages🔥):

Function calling defaults, Prompt engineering vs frameworks, AWS Valkey support, Creating a query engine on vector store

Link mentioned: What is Valkey? – Valkey Datastore Explained - Amazon Web Services: no description found


LlamaIndex ▷ #ai-discussion (1 messages):

Langchain, MegaParse, Document Parsing, AI Artistry

Link mentioned: Integrating Langchain with MegaParse: Unlocking Seamless Document Parsing: Ankush k Singal


DSPy ▷ #show-and-tell (3 messages):

DSPy framework, LLM applications, Categorization task example, User feedback on DSPy, GitHub resource link

Link mentioned: Pipelines & Prompt Optimization with DSPy: Writing about technology, culture, media, data, and all the ways they interact.


DSPy ▷ #general (9 messages🔥):

DSPy optimizers, AI as a platypus, Exploring new technologies, Learning resources, Prompt optimization in NLP

Links mentioned:


DSPy ▷ #examples (5 messages):

Claude Sonnet prompt optimization, Outdated dspy examples, Documentation for VLM examples


DSPy ▷ #colbert (2 messages):

Cohere v3, Colbert v2, Building scalable AI systems

Link mentioned: Building Scalable Systems with DAGs and Serverless for RAG | APAC Office Hours: Jason and Dan lead an APAC office hours session exploring complex challenges in building AI systems, from router implementations to managing conversation his...


tinygrad (George Hotz) ▷ #general (6 messages):

Tinygrad performance benchmark, Kernel search experience, BEAM configuration


Torchtune ▷ #general (3 messages):

Torchtune update, Type hinting in Python, Ruff functionality


MLOps @Chipro ▷ #events (1 messages):

Next-Gen Retrieval Strategies, Advanced Agent Runtimes, Model Management at Scale, Dynamic Prompt Engineering, AI Safety & Compliance

Link mentioned: Emerging Architectures Webinar | TensorOps: no description found


Mozilla AI ▷ #announcements (1 messages):

Mozilla Builders Demo Day, Event Acknowledgment, Social Media Recap

Link mentioned: Tweet from Mozilla Builders 🔧 (@mozillabuilders): We have chiseled ourselves out of our Demo Day cocoons just in time to write the world's most interesting recap. Seriously, it was spectacular — a confluence of amazing people and incredible techn...






{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}