Frozen AI News archive

BitNet was a lie?

**Scaling laws for quantization** have been modified by a group led by Chris Re, analyzing over **465 pretraining runs** and finding benefits plateau at FP6 precision. Lead author **Tanishq Kumar** highlights that longer training and more data increase sensitivity to quantization, explaining challenges with models like **Llama-3**. **Tim Dettmers**, author of QLoRA, warns that the era of efficiency gains from low-precision quantization is ending, signaling a shift from scaling to optimizing existing resources. Additionally, **Alibaba** announced **Qwen 2.5-Coder-32B-Instruct**, which matches or surpasses **GPT-4o** on coding benchmarks, and open-source initiatives like **DeepEval** for LLM testing are gaining traction.

Canonical issue URL

AI News for 11/11/2024-11/12/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (217 channels, and 2286 messages) for you. Estimated reading time saved (at 200wpm): 281 minutes. You can now tag @smol_ai for AINews discussions!

In a growing literature of post-Chinchilla papers, the enthusiasm for quantization reached its zenith this summer with the BitNet paper (our coverage here) proposing as severe a quantization schema as ternary (-1, 0, 1) aka 1.58 bits. A group of grad students under Chris Re has now modified Chinchilla scaling laws for quantization over 465+ pretraining runs and found that the benefits level off at FP6.

image.png

Lead author Tanishq Kumar notes:

Below is a fixed language model overtrained significantly to various data budgets up to 30B tokens, then post-train quantized afterwards. This demonstrates how more pretraining FLOPs do not always lead to better models served in production.

image.png

QLoRA author Tim Dettmers notes the end of the quantized scaling "free lunch" even more starkly: "Arguably, most progress in AI came from improvements in computational capabilities, which mainly relied on low-precision for acceleration (32-> 16 -> 8 bit). This is now coming to an end. Together with physical limitations, this creates the perfect storm for the end of scale. From my own experience (a lot of failed research), you cannot cheat efficiency. If quantization fails, then also sparsification fails, and other efficiency mechanisms too. If this is true, we are close to optimal now. With this, there are only three ways forward that I see... All of this means that the paradigm will soon shift from scaling to "what can we do with what we have". I think the paradigm of "how do we help people be more productive with AI" is the best mindset forward.


[Sponsored by SambaNova] Take a few hours this week to build an AI agent in SambaNova’s Lightning Fast AI Hackathon! They’re giving out $10,000 in total prizes to the fastest, slickest and most creative agents. The competition ends November 22 - get building now!

Swyx commentary: $10k for an ONLINE hackathon is great money for building that fast AI Agent that you've been wanting!


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Models and Tools

AI Governance and Ethics

AI Applications

Developer Infrastructure and Tools

AI Research and Insights

Memes and Humor

Community and Events


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Qwen2.5-Coder 32B Release: Community Reception and Technical Breakdown

Theme 2. ExllamaV2 Introduces Vision Model Support with Pixtral

Theme 3. Exploring Binary Vector Embeddings: Speed vs. Compression

Theme 4. Qwen 2.5 Technical Benchmarks: Hardware and Platform Strategy

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Claude 3.5 Opus Coming Soon: Anthropic CEO Confirms

Theme 2. Qwen2.5-Coder-32B Matches Claude: Open Source Milestone

Theme 3. ComfyUI Video Generation: New Tools & Capabilities

Theme 4. AI Content Generation on Reddit: Growing Trend & Concerns


AI Discord Recap

A summary of Summaries of Summaries by O1-mini

AI Language Models Battle for Supremacy

Optimization Techniques Revolutionize Model Training

Deployment and Inference Get a Boost with New Strategies

APIs and Tools Streamline AI Development

Scaling Laws and Datasets Challenge AI Research


PART 1: High level Discord summaries

Eleuther Discord


Perplexity AI Discord


Unsloth AI (Daniel Han) Discord


aider (Paul Gauthier) Discord


Interconnects (Nathan Lambert) Discord


OpenRouter (Alex Atallah) Discord


OpenAI Discord


Modular (Mojo 🔥) Discord


tinygrad (George Hotz) Discord


Notebook LM Discord Discord


Latent Space Discord


GPU MODE Discord


Cohere Discord


HuggingFace Discord


LlamaIndex Discord


LAION Discord


Gorilla LLM (Berkeley Function Calling) Discord


LLM Agents (Berkeley MOOC) Discord


OpenAccess AI Collective (axolotl) Discord


DSPy Discord


OpenInterpreter Discord


Torchtune Discord


AI21 Labs (Jamba) Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Eleuther ▷ #general (35 messages🔥):

  • Reduced Clicks in Workflow Design
  • Evaluation of AI Models
  • Emotional Intelligence in AI
  • Text-MIDI Multimodal Datasets
  • User Feedback in AI Development

Eleuther ▷ #research (271 messages🔥🔥):

  • Gradient Descent and Optimization
  • Muon and Feature Learning
  • Second Order Methods
  • Newton's Method
  • Saddle Points in Optimization

Links mentioned:


Eleuther ▷ #scaling-laws (10 messages🔥):

  • Scaling Laws Investigation
  • Learning Rate Adjustment
  • Line Search Techniques
  • Gradient Descent Dynamics

Links mentioned:


Eleuther ▷ #lm-thunderdome (5 messages):

  • Custom Task Issues
  • Limit Samples in Evaluation
  • Metrics in YAML Configuration

Perplexity AI ▷ #general (230 messages🔥🔥):

  • Perplexity Technical Issues
  • User Experience with Perplexity
  • Perplexity Pro Subscription Details
  • Perplexity Model Comparison
  • Feedback on Mac App and Features

Links mentioned:


Perplexity AI ▷ #sharing (12 messages🔥):

  • US NATO Membership
  • Bitcoin Market Predictions
  • TSMC Chip Shipments
  • China vs US Trade War
  • AI Winter Trends

Link mentioned: TSMC Halts Chinese Chip Shipments, Beatles Make AI History with Grammy Noms, and How the Body Sto...: What would you like to see more of? Let us know! (https://www.buzzsprout.com/twilio/text_messages/2302487/open_sms) In today's episode, we explore TSMC's sig...


Perplexity AI ▷ #pplx-api (3 messages):

  • Pplx API DailyBot Custom Command Editor
  • AI Limitations
  • Webhook Implementation
  • CodeSandBox VM Usage

Unsloth AI (Daniel Han) ▷ #general (135 messages🔥🔥):

  • Qwen 2.5 Coder finetuning
  • Use of datasets for improvement
  • Unsloth model fixes
  • Function calling in models
  • Chat history and memory retention

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (27 messages🔥):

  • Diet Choices
  • Meal Frequency
  • Keto Diet

Unsloth AI (Daniel Han) ▷ #help (49 messages🔥):

  • Model Saving and Checkpoints
  • RAM Usage During Training
  • Fine-tuning Practices
  • Data Formatting for Models
  • Training Dataset Size and Performance

Link mentioned: Errors | Unsloth Documentation: To fix any errors with your setup, see below:


Unsloth AI (Daniel Han) ▷ #showcase (7 messages):

  • Integration Calls
  • Inference Strategies
  • Fast Apply Model
  • Community Interaction

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (14 messages🔥):

  • Tuning thoughts vs. outputs
  • Analysis of wrong outputs
  • Chain of Thought (COT) errors
  • Generating profound tweets

aider (Paul Gauthier) ▷ #general (155 messages🔥🔥):

  • Qwen 2.5 Coder Performance
  • Aider Installation and Usage
  • Model Comparison
  • Context Handling in Aider
  • Feature Suggestions for Aider

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (49 messages🔥):

  • Aider Configuration Warnings
  • OpenRouter API Usage
  • Benchmarking Models
  • Ping Settings in Aider
  • Architect Mode Functionality

Links mentioned:


aider (Paul Gauthier) ▷ #links (2 messages):

  • Copilot Edits
  • Cursor and SupermavenAI Partnership

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (62 messages🔥🔥):

  • Qwen 2.5 Coder
  • Dario Amodei on AI Scaling
  • Nous Research Forge API
  • Anthropic Team Updates
  • OpenAI o1 Release

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-questions (7 messages):

  • SPARC model
  • VLM techniques
  • Claude's OCR capabilities
  • Recent VLM articles
  • Finbarr blog

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (1 messages):

an1lam: Semi-relevant but a bit funny, I think I walked by Gary last night on the street


Interconnects (Nathan Lambert) ▷ #random (2 messages):

  • ICLR Review Process
  • Reviewer Feedback

Interconnects (Nathan Lambert) ▷ #nlp (5 messages):

  • Neural Notes episode
  • Stanford MIPRO optimizers
  • Eugene Charniak Memorial Symposium
  • Automated prompt optimization

Link mentioned: Neural Notes: The future of language model optimization: In this episode of Neural Notes, Vertex Ventures US investors Sandeep Bhadra and Simon Tiu talk to Krista Opsahl-Ong, PhD Candidate at Stanford's AI Lab (SAI...


Interconnects (Nathan Lambert) ▷ #reads (62 messages🔥🔥):

  • Scaling laws and model quantization
  • Dylan Patel's inference insights
  • AI local running challenges
  • Performance expectations of LLaMA models
  • Impact on datacenter infrastructure

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (8 messages🔥):

  • Qwen2.5 Coder 32B
  • Gemini models updates
  • Scheduled Downtime

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (107 messages🔥🔥):

  • Gemini 1.5 Flash updates
  • Qwen 2.5 Coder performance
  • Anthropic's computer use tool
  • Model knowledge limitations
  • OpenRouter pricing and features

Links mentioned:


OpenRouter (Alex Atallah) ▷ #beta-feedback (6 messages):

  • Custom Provider Keys Access

OpenAI ▷ #ai-discussions (50 messages🔥):

  • TTS alternatives
  • Application development with AI
  • Qwen2.5-Coder model
  • KitchenAI project
  • Language model quirks

Links mentioned:


OpenAI ▷ #gpt-4-discussions (10 messages🔥):

  • Access Issues with ChatGPT
  • Report on Blocking Agencies
  • Frustrations with DALL-E Image Generation
  • Disappearing Document Issues in aiHa GPT

OpenAI ▷ #prompt-engineering (21 messages🔥):

  • Prompt design for GPT-4o mini
  • Structured outputs technique
  • Resources for prompt engineering
  • Engagement and relevance in transcripts

OpenAI ▷ #api-discussions (21 messages🔥):

  • Prompt engineering for GPT models
  • Clip selection techniques
  • Structured output usage
  • Scratchpad technique

Modular (Mojo 🔥) ▷ #general (41 messages🔥):

  • Nvidia CUDA in WSL2
  • Higher Level Interoperability
  • WASI and Edge Computing
  • Application Plugins and Performance
  • CRABI ABI Proposal

Links mentioned:


Modular (Mojo 🔥) ▷ #mojo (58 messages🔥🔥):

  • Mojo Installation Issues
  • Mojo Subreddits
  • Benchmark Module Functionality
  • Dynamic Module Importing
  • Standard Library Contributions

Links mentioned:


tinygrad (George Hotz) ▷ #general (77 messages🔥🔥):

  • Hailo Model Quantization
  • ASM2464PD Chip Specifications
  • USB4 PCIe Converter Development
  • Audio Recording Formats
  • Tinygrad Distributed Systems

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (3 messages):

  • Parallelization without sharding
  • Model serialization on GPU
  • Pattern matcher assistance

Link mentioned: tinygrad-notes/20241112_pm.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.


Notebook LM Discord ▷ #use-cases (34 messages🔥):

  • Using NotebookLM for summarization
  • Experimenting with podcasts and avatars
  • Issues with textbook uploads
  • KATT for fact-checking
  • Potential of NotebookLM in AI discussions

Links mentioned:


Notebook LM Discord ▷ #general (40 messages🔥):

  • Exporting Notebooks as PDF
  • Unofficial API for NotebookLM
  • Document Upload Limitations
  • Notebook Centralization Workflows
  • Audio File Upload Issues

Links mentioned:


Latent Space ▷ #ai-general-chat (51 messages🔥):

  • Dario Amodei Interview
  • Magentic-One Framework
  • Context Autopilot
  • Writer Series C Funding
  • Supermaven Joins Cursor

Links mentioned:


Latent Space ▷ #ai-announcements (3 messages):

  • Dust XP1
  • OpenAI Journey
  • Voice Questions for Recap Pod
  • AI Agent Infrastructure
  • SaaS and AI Software Impact

Links mentioned:


GPU MODE ▷ #general (14 messages🔥):

  • GPU Memory vs Speed
  • Cloud GPU Providers
  • Building CUTLASS on Lambda Cloud
  • XOR Tensor Cores in Beamforming
  • Multiple GPUs for Memory Concerns

Link mentioned: Pricing | Vast.ai: View the pricing for popular GPUs on Vast.ai


GPU MODE ▷ #triton (3 messages):

  • Slack additions
  • Triton Puzzle discussion
  • Working group for puzzles

GPU MODE ▷ #cool-links (2 messages):

  • Efficient Deep Learning Systems
  • AOT Compilation Features

Links mentioned:


GPU MODE ▷ #beginner (1 messages):

pondering_wanderer: helllo all


GPU MODE ▷ #off-topic (9 messages🔥):

  • Image Generation
  • Prompt Engineering
  • Food Models
  • AI Interactions
  • Bot Verification

GPU MODE ▷ #triton-puzzles (3 messages):

  • Triton Puzzles
  • Triton Kernel Coding
  • Block Mapping Implementation
  • Tensor Copying

GPU MODE ▷ #liger-kernel (3 messages):

  • Batch Normalization Challenges
  • Multi-GPU Synchronization

GPU MODE ▷ #self-promotion (1 messages):

  • WebGPU
  • Surfgrad
  • Autograd Engine Optimization

Links mentioned:


GPU MODE ▷ #🍿 (5 messages):

  • Bot testing methods
  • Job queue implementation
  • Channel dynamics

Cohere ▷ #discussions (22 messages🔥):

  • Command R discontinuation concerns
  • aya_collection dataset inconsistencies
  • Forest fire prediction AI project
  • Dataset translation quality
  • AI application discussions

Cohere ▷ #announcements (1 messages):

  • Research Prototype Beta Testing
  • Feedback on Writing Tools

Link mentioned: Research Prototype - Early Beta Sign Up Form: Thank you for your interest in participating in the beta testing phase of our research prototype — a tool designed to help users tackle research and writing tasks such as: creating complex reports, do...


Cohere ▷ #questions (3 messages):

  • AI Assistant with RAG
  • Cohere Dashboard Login
  • Organizational ID Usage

Links mentioned:


Cohere ▷ #api-discussions (9 messages🔥):

  • Cohere API /rerank Issues
  • Return_documents Argument Removal
  • Troubleshooting API Changes
  • Python Async Client Usage
  • Unexpected API Behavior

Cohere ▷ #projects (2 messages):

  • Sharing tools
  • Community Engagement

Cohere ▷ #cohere-toolkit (2 messages):

  • ICS Calendar Support
  • File Content Viewing

HuggingFace ▷ #cool-finds (4 messages):

  • Home AI Server for LLMs
  • NeurIPS 2024 Graph Neural Networks
  • Phase Transitions in Image Denoising
  • Ultra Realistic AI Models
  • E-commerce and AI Fashion

Links mentioned:


HuggingFace ▷ #i-made-this (12 messages🔥):

  • Qwen2.5 Coder Performance
  • Mochi -1-preview Video Generator
  • Ecommerce Embedding Models
  • AutoML Application
  • OSS Prompt Management

Links mentioned:


HuggingFace ▷ #reading-group (2 messages):

  • Reading Group Announcement
  • Paper on Arxiv
  • Authors of the Paper

Link mentioned: Consent in Crisis: The Rapid Decline of the AI Data Commons: General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, ...


HuggingFace ▷ #NLP (5 messages):

  • Evaluation metrics for Langchain SQL Agent
  • Agent trajectory evaluation
  • Fast-langdetect usage

HuggingFace ▷ #diffusion-discussions (1 messages):

  • Diffusers Library Schedulers
  • Inheritance from nn.Module

LlamaIndex ▷ #blog (3 messages):

  • PursuitGov transformation
  • Using ColPali as a re-ranker
  • Cohere multimodal embeddings

LlamaIndex ▷ #general (12 messages🔥):

  • Next release date
  • Automating workflow processes
  • FastAPI and streaming responses
  • SSE with FastAPI
  • Testing LlamaIndex workflows

Link mentioned: v0.11.23 by logan-markewich · Pull Request #16919 · run-llama/llama_index: no description found


LAION ▷ #general (4 messages):

  • Gorilla Marketing in AI
  • Air Conditioner Object Detection Project

Link mentioned: Harambe America GIF - Harambe America Murica - Discover & Share GIFs: Click to view the GIF


LAION ▷ #research (7 messages):

  • GitChameleon
  • SCAR
  • NVIDIA paper on frequency noise
  • Sparse Autoencoders
  • Code generation models

Links mentioned:


Gorilla LLM (Berkeley Function Calling) ▷ #discussion (11 messages🔥):

  • Test Cases Overwriting
  • Qwen-2.5 Invalid AST Issues
  • Raw Output Format Confusion
  • Quantized Fine-tuned Models Evaluation

LLM Agents (Berkeley MOOC) ▷ #hackathon-announcements (1 messages):

  • LLM Agents MOOC Hackathon
  • LambdaAPI Demos
  • Hackathon Sign-ups
  • Innovative LLM Agents Tracks

Link mentioned: LLM Agents MOOC Hackathon - Lambda Labs Workshop: no description found


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (5 messages):

  • Google Forms Confirmation
  • OpenAI Org Credits

LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (4 messages):

  • NVIDIA's Embodied AI Presentation
  • Ethics of AI Rights
  • Normative Alignment Discussion
  • Veteran's Day Cancellation

OpenAccess AI Collective (axolotl) ▷ #general (1 messages):

  • FOSDEM AI DevRoom
  • Low-level AI Engineering
  • AI Project Collaboration
  • Fine-tuning Presentations
  • Sponsorship and Travel Stipends

Link mentioned: FOSDEM 2025 - Low-Level AI Engineering & Hacking Dev Room: Explore the new "Low-Level AI Hacking & Engineering" Dev Room at FOSDEM, featuring open-source projects powering the AI industry. Submit a session or become a sponsor for this innovative...


OpenAccess AI Collective (axolotl) ▷ #general-help (8 messages🔥):

  • Fine-tuning with Axolotl
  • Tokenization Configuration
  • Default System Prompts

DSPy ▷ #general (7 messages):

  • Annotations in dspy signatures
  • Usage of custom types in outputs

OpenInterpreter ▷ #general (3 messages):

  • Linux Mint installation
  • Microsoft Copilot interaction
  • Interpreter CLI issues

Torchtune ▷ #general (1 messages):

whynot9753: update: we will probably have a DCP PR from pytorch folks tomorrow 🙂


AI21 Labs (Jamba) ▷ #general-chat (1 messages):

ag8701347: Please allow us to continue using our fine-tuned models.





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}