Frozen AI News archive

Nvidia Minitron: LLM Pruning and Distillation updated for Llama 3.1

**Nvidia** and **Meta** researchers updated their **Llama 3** results with a paper demonstrating the effectiveness of combining **weight pruning** and **knowledge distillation** to reduce training costs by training only the largest model from scratch and deriving smaller models via pruning and distillation. The process involves teacher correction, activation-based pruning (favoring width pruning), and retraining with distillation using KL Divergence loss, resulting in better-performing models at comparable sizes. However, distillation incurs some accuracy tradeoffs. Additionally, **AI21 Labs** launched **Jamba 1.5**, a hybrid SSM-Transformer MoE model with large context windows and multilingual support. **Anthropic** updated **Claude 3** with LaTeX rendering and prompt caching. An open-source coding-focused LLM, **Dracarys**, was released in 70B and 72B sizes, showing improved coding performance. The **Mistral Nemo Minitron 8B** model outperforms **Llama 3.1 8B** and **Mistral 7B** on the Hugging Face leaderboard, highlighting pruning and distillation benefits. Research on prompt optimization reveals the complexity of prompt search spaces and the surprising effectiveness of simple algorithms like AutoPrompt/GCG.

Canonical issue URL

AI News for 8/22/2024-8/23/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (214 channels, and 2531 messages) for you. Estimated reading time saved (at 200wpm): 284 minutes. You can now tag @smol_ai for AINews discussions!

We've obliquely mentioned the 4B and 8B Minitron (Nvidia's distillations of Llama 3.1 8B) a couple times in recent weeks, but there's now a nice short 7 pager from Sreenivas & Muralidharan et al (authors of the Minitron paper last month) updating their Llama 2 results for Llama 3:

image.png

The reason this is important provides some insight on Llama 3 given Nvidia's close relatinoship with Meta:

"training multiple multi-billion parameter models from scratch is extremely time-, data- and resource-intensive. Recent work [1] has demonstrated the effectiveness of combining weight pruning with knowledge distillation to significantly reduce the cost of training LLM model families. Here, only the biggest model in the family is trained from scratch; other models are obtained by successively pruning the bigger model(s) and then performing knowledge distillation to recover the accuracy of pruned models.

image.png

The main steps:

  1. teacher correction - lightly finetuning the teacher model on the target dataset to be used for distillation, using ∼127B tokens.
  2. depth or width pruning: using "a purely activation-based importance estimation strategy that simultaneously computes sensitivity information for all the axes we consider (depth, neuron, head, and embedding channel) using a small calibration dataset and only forward propagation passes". Width pruning consistently outperformed in ablations. image.png
  3. Retraining with distillation: "real" KD, aka using KL Divergence loss on teacher and student logits. image.png

This produces a generally across-the-board-better model for comparable sizes:

image.png

image.png

The distillation is far from lossless, however; the paper does not make it easy to read off the deltas but there are footnotes at the end on the tradeoffs.

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Releases and Developments

AI Research and Techniques

AI Applications and Tools

AI Development and Industry Trends


AI Reddit Recap

/r/LocalLlama Recap

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI and Machine Learning Advancements

AI-Generated Content and Tools

Robotics and AR Technology

AI-Related Discussions and Humor

Feature Requests for AI Tools


AI Discord Recap

A summary of Summaries of Summaries by Claude 3.5 Sonnet

1. AI Model Releases and Benchmarks

2. AI Development Tools and Frameworks

3. AI Research and Technical Advancements

4. AI Industry News and Events

5. AI Safety and Ethics Discussions


PART 1: High level Discord summaries

LM Studio Discord


Nous Research AI Discord


HuggingFace Discord


Stability.ai (Stable Diffusion) Discord


aider (Paul Gauthier) Discord


Latent Space Discord


OpenRouter (Alex Atallah) Discord


Modular (Mojo 🔥) Discord


Perplexity AI Discord


OpenAccess AI Collective (axolotl) Discord


OpenAI Discord


Eleuther Discord


Cohere Discord


Interconnects (Nathan Lambert) Discord


LAION Discord


LangChain AI Discord


DSPy Discord


OpenInterpreter Discord


Gorilla LLM (Berkeley Function Calling) Discord


AI21 Labs (Jamba) Discord


MLOps @Chipro Discord


tinygrad (George Hotz) Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Torchtune Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

LM Studio ▷ #general (545 messages🔥🔥🔥):

  • LM Studio 0.3.0
  • LM Studio 0.3.0 UI
  • LM Studio 0.3.0 Bugs
  • LM Studio Server
  • LM Studio RAG

Links mentioned:


LM Studio ▷ #hardware-discussion (66 messages🔥🔥):

  • GPU offloading
  • Llama 3.1
  • CPU performance
  • Apple Silicon
  • Model size and performance

Link mentioned: GitHub - tlkh/asitop: Perf monitoring CLI tool for Apple Silicon: Perf monitoring CLI tool for Apple Silicon. Contribute to tlkh/asitop development by creating an account on GitHub.


Nous Research AI ▷ #announcements (1 messages):

  • Nous Research Merch Store

Link mentioned: Nous Research: Nous Research


Nous Research AI ▷ #general (288 messages🔥🔥):

  • Hermes 3
  • Mistral
  • Mode Collapse
  • LLM's Insanity
  • Synthetic Data Generation

Links mentioned:


Nous Research AI ▷ #ask-about-llms (12 messages🔥):

  • AI Agent GitHub Repositories
  • Langchain and CrewAI
  • Building Your Own AI Agent
  • Drama Engine Framework
  • LLM Autocomplete Tool

Link mentioned: GitHub - Write-with-LAIKA/drama-engine: A Framework for Narrative Agents: A Framework for Narrative Agents. Contribute to Write-with-LAIKA/drama-engine development by creating an account on GitHub.


HuggingFace ▷ #announcements (1 messages):

  • Offensive Security
  • Deep Learning Courses
  • Unity ML Agents
  • Garfield Dataset
  • Tensor Parallelism

Links mentioned:


HuggingFace ▷ #general (232 messages🔥🔥):

  • RTX 6000
  • HuggingFace Payment Issues
  • OpenAI Platform Changes
  • GPTs Agents
  • Model Merging

Links mentioned:


HuggingFace ▷ #today-im-learning (6 messages):

  • HF Work
  • Neuralink Work
  • Efficient Models

HuggingFace ▷ #cool-finds (1 messages):

this_is_prince: https://github.com/All-Hands-AI/OpenHands


HuggingFace ▷ #i-made-this (11 messages🔥):

  • LogLLM
  • RYFAI
  • Writer Framework
  • Unsloth
  • NeuroSync

Links mentioned:


HuggingFace ▷ #reading-group (3 messages):

  • Alignment Techniques Reading Group

HuggingFace ▷ #NLP (3 messages):

  • Data Splitting
  • HF Dataset Homogeneity
  • SQL Summarization

HuggingFace ▷ #diffusion-discussions (25 messages🔥):

  • Flux Pipeline
  • torch.compile
  • fp8 checkpoints
  • Model Loading Speed
  • Hugging Face Snapshots

Stability.ai (Stable Diffusion) ▷ #general-chat (199 messages🔥🔥):

  • SDXL vs SD1.5
  • prompting techniques
  • consistency issue
  • comfyUI and Flux
  • GPU Ram Issues

Links mentioned:


aider (Paul Gauthier) ▷ #announcements (1 messages):

  • Aider 0.52.0

aider (Paul Gauthier) ▷ #general (129 messages🔥🔥):

  • fzf support
  • Aider training set
  • DSPY, TEXTGRAD, TRACE
  • aider co-op thread
  • diff vs diff-fenced

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (53 messages🔥):

  • Repo Map
  • Groq API Key
  • Token Optimization
  • Aider's Chat Modes
  • Aider as a Chatbot

Links mentioned:


aider (Paul Gauthier) ▷ #links (5 messages):

  • Cursor
  • Aider
  • OpenAI's Composer
  • AI Code Generation
  • AutoToS

Links mentioned:


Latent Space ▷ #ai-general-chat (58 messages🔥🔥):

  • Autogen Lead
  • ThePrimeagen
  • Cursor
  • AI Regulations
  • Inflection

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):

  • AI Engineer Meetup in London
  • Speakers at the Meetup
  • AI Engineer World's Fair

Link mentioned: Tweet from Damien C. Tanner (@dctanner): We're brining a slice of @swyx's AI Engineer World's Fair to London! Evening of 12 September is the first AI Engineer London Meetup. Hear from 4 amazing speakers: @maximelabonne, @rovio...


Latent Space ▷ #ai-in-action-club (53 messages🔥):

  • Duplicate Topics
  • Similar Topics
  • Taxonomy Synthesis
  • GPT Researcher
  • Embedland

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

  • Model Deprecation
  • Yi Model
  • Hermes Model
  • Mistral Model
  • Llama 2

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

  • OpenRouter Team's work

OpenRouter (Alex Atallah) ▷ #general (104 messages🔥🔥):

  • OpenRouter Pricing
  • OpenRouter Token Counting
  • OpenRouter Model Deprecations
  • Llama 2
  • Grok 2

Links mentioned:


Modular (Mojo 🔥) ▷ #general (57 messages🔥🔥):

  • Mojo Licensing
  • Mojo and Max
  • Modular's Business Model
  • Heterogenous Compute

Modular (Mojo 🔥) ▷ #mojo (28 messages🔥):

  • Mojo Community Welcome
  • Async in Mojo
  • Mojo's HTTP Implementation
  • Mojo's Versioning and Stability
  • Mojo's Memory Management

Link mentioned: Network protocols, sans I/O — Sans I/O 1.0.0 documentation: no description found


Modular (Mojo 🔥) ▷ #max (3 messages):

  • Modular Max Installation Issues
  • M1 Max Compatibility
  • Modular Clean Command

Perplexity AI ▷ #general (43 messages🔥):

  • Perplexity's internals
  • Twitter sources
  • Perplexity Pro source count
  • Generating images from Perplexity
  • Perplexity's future

Link mentioned: Black Topographic Map Images - Free Download on Freepik: Find & Download Free Graphic Resources for Black Topographic Map. 20,000+ Vectors, Stock Photos & PSD files. ✓ Free for commercial use ✓ High Quality Images. #freepik


Perplexity AI ▷ #sharing (14 messages🔥):

  • Perplexity AI Bot
  • Shareable Threads
  • MrBeast

OpenAccess AI Collective (axolotl) ▷ #general (11 messages🔥):

  • Phi 3.5
  • QLORA + FSDP
  • Pretraining
  • Data Structure

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (33 messages🔥):

  • Gradients Issue
  • Chat Template and Special Tokens
  • Phi_3 Chat Template
  • Resize_token_embeddings_to_32x
  • ChatML

Link mentioned: Mode-aware chat templates for distinct training and inference behaviors · Issue #33096 · huggingface/transformers: Feature request Implement mode-aware chat templates for distinct training and inference behaviors Proposed Solution To resolve this, I propose adding a new variable called template_mode to indicate...


OpenAccess AI Collective (axolotl) ▷ #general-help (5 messages):

  • SmolLM
  • Mamba
  • Mamba Training
  • Cosmo2 Tokenizer
  • BOS/EOS Token

Links mentioned:


OpenAI ▷ #ai-discussions (20 messages🔥):

  • GPT-3.5 vs GPT-4
  • GPT-2 vs GPT-3.5
  • Email Automation
  • SwarmUI
  • OpenAI Finetuning API

OpenAI ▷ #gpt-4-discussions (10 messages🔥):

  • GPTs Knowledge Files
  • GPTs formatting
  • GPTs formatting and style
  • ChatGPT GPTs

OpenAI ▷ #prompt-engineering (7 messages):

  • ChatGPT Playground Limitations
  • GPT Output Token Limits

OpenAI ▷ #api-discussions (7 messages):

  • ChatGPT drawing complex equations
  • Playground limitations
  • ChatGPT's roles in automation
  • ChatGPT output token limits

Eleuther ▷ #general (7 messages):

  • Prompt Engineering for Multi-Turn Messages
  • SmolLM Model
  • Mamba Model
  • Training from Scratch
  • BOS Token Usage

Links mentioned:


Eleuther ▷ #research (22 messages🔥):

  • Model Distillation
  • Model Compression
  • Positional Embeddings in Graphs
  • Research Projects
  • Tree and Digraph Embeddings

Link mentioned: The Vizier Gaussian Process Bandit Algorithm: Google Vizier has performed millions of optimizations and accelerated numerous research and production systems at Google, demonstrating the success of Bayesian optimization as a large-scale service. O...


Eleuther ▷ #lm-thunderdome (2 messages):

  • Llama 406B on Slurm
  • Multiple Choice Evals
  • ChatGPT4o
  • Anthropic APIs
  • Claude

Link mentioned: lm-evaluation-harness/jobs/scripts/submit/models_XXL/eval_llama31_instruct_405B_smartt.sh at main · DCGM/lm-evaluation-harness: A framework for few-shot evaluation of language models. - DCGM/lm-evaluation-harness


Cohere ▷ #discussions (18 messages🔥):

  • Cohere API Error
  • Multimodal LLM
  • Cohere Schema Object
  • Prompt Tuner

Cohere ▷ #questions (4 messages):

  • Cohere Pricing
  • Tokenization
  • Cohere API
  • Oracle APEX
  • Command R Models

Links mentioned:


Cohere ▷ #api-discussions (5 messages):

  • Command R+ via HTTP
  • Structured Outputs

Link mentioned: Chat Non-streaming — Cohere: Generates a text response to a user message. To learn how to use the Chat API with Streaming and RAG follow our Text Generation guides .


Cohere ▷ #projects (2 messages):

  • Cohere Models on Hugging Face Hub

Interconnects (Nathan Lambert) ▷ #news (17 messages🔥):

  • AI burnout
  • AI powerusers
  • Model Generations
  • Twitter fatigue

LAION ▷ #general (9 messages🔥):

  • Infinite Generative Youtube
  • TTS for Low Resource Languages
  • WhisperSpeech Semantic Tokens for ASR

LAION ▷ #research (1 messages):

  • SmolLM
  • Mamba
  • Cosmo2-tokenizer
  • BOS Tokens

Links mentioned:


LangChain AI ▷ #general (2 messages):

  • Graph memory
  • Memory saving

LangChain AI ▷ #langchain-templates (5 messages):

  • LangChain Prompting
  • SQL Query Generation
  • LangChain Documentation
  • Chain Inspection

Links mentioned:


LangChain AI ▷ #share-your-work (1 messages):

  • Writer Framework
  • Hugging Face Spaces
  • Docker Deployment

Link mentioned: Using Writer Framework with Hugging Face Spaces: no description found


DSPy ▷ #announcements (1 messages):

okhattab: https://lu.ma/03f7pesv


DSPy ▷ #papers (1 messages):

mrauter: https://arxiv.org/abs/2408.11326


DSPy ▷ #general (5 messages):

  • Adalflow
  • DSpy vs Textgrad vs Adalflow

Link mentioned: Get Started — AdalFlow: The Library to Build and Auto-Optimize LLM Task Pipelines: no description found


OpenInterpreter ▷ #general (7 messages):

  • Open Interpreter brand guidelines
  • Phi-3.5-mini
  • Qwen2
  • Python screen clicking script
  • Data Analytics masterclass

Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (4 messages):

  • Gorilla Leaderboard
  • Huggingface Leaderboard
  • Llama-3.1-Storm-8B

Link mentioned: [BFCL] Adding Llama-3.1-Storm-8B model handler by akshita-sukhlecha · Pull Request #598 · ShishirPatil/gorilla: Llama-3.1-Storm-8B model was recently released. This PR adds model handler for Llama-3.1-Storm-8B.


Gorilla LLM (Berkeley Function Calling) ▷ #discussion (3 messages):

  • REST API testing
  • Test pairs
  • Gorilla leaderboard

Links mentioned:


AI21 Labs (Jamba) ▷ #announcements (1 messages):

  • Jamba 1.5
  • SSM-Transformer architecture
  • Long context handling
  • Speed
  • Quality

Links mentioned:


AI21 Labs (Jamba) ▷ #jamba (4 messages):

  • Jamba Fine-Tuning
  • Jamba Model Filtering

AI21 Labs (Jamba) ▷ #general-chat (2 messages):

  • API Rate Limits

MLOps @Chipro ▷ #events (3 messages):

  • NVIDIA AI Summit India
  • AI Safety
  • Demo-Jam Hackathon

Link mentioned: Join NVIDIA AI Summit 2024: October 23–25, Mumbai, India


tinygrad (George Hotz) ▷ #general (2 messages):

  • tinygrad mypyc compilation





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}