Frozen AI News archive

Gemma 2 2B + Scope + Shield

**Gemma 2B**, a 2 billion parameter model trained on **2 trillion tokens** and distilled from a larger unnamed LLM, has been released by **Google DeepMind** and shows strong leaderboard performance despite weaknesses in math. The Gemma series, including 9B and 27B models, has gained popularity since its June release. The team also released 400 SAEs for interpretability, inspired by **Anthropic**'s research. A finetuned classifier called ShieldGemma outperforms Meta's LlamaGuard in harm detection. Meanwhile, **Meta AI** announced **Llama-3.1-405B** reaching #3 on the Overall Arena leaderboard, and released **SAM 2**, a video and image segmentation model with significant speed improvements. **OpenAI** is rolling out an advanced Voice Mode to Plus users. **Perplexity AI** launched a Publishers Program with major media partners and a status page. **NVIDIA** introduced Project GR00T for scaling robot data using Apple Vision Pro and generative simulation. Interest in quantization for compressing LLMs is growing, and LLM-as-a-Judge implementations from Vicuna, AlpacaEval, and G-Eval highlight the effectiveness of simple prompts and domain-specific evaluation.

Canonical issue URL

AI News for 7/30/2024-7/31/2024. We checked 7 subreddits, 384 Twitters and 28 Discords (249 channels, and 2824 messages) for you. Estimated reading time saved (at 200wpm): 314 minutes. You can now tag @smol_ai for AINews discussions!

The knowledge distillation metagame is getting out of hand. Gemma 2 9B and 27B were already winning hearts (our coverage) since release in June (our coverage) post Google I/O in May (our coverage).

Gemma 2B is finally out (why was it delayed again?) but with 2 trillion tokens training a 2B model distilled from a larger, unnamed LLM, Gemma 2 2B is looking very strong on both the HF v2 Leaderboard (terrible at MATH but very strong on IFEval) and LMsys. image.png

In the spirit of Anthropic's interpretability research (our coverage here), the Gemma team also released 400 SAEs covering the 2B and 9B models. You can learn more on Neuronpedia, where we had fun rolling our own "Golden Gate Gemma":

image.png

There's also ShieldGemma, which seems to be a finetuned Gemma 2 classifier for key areas of harm, beating Meta's LlamaGuard:

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Updates and Releases

AI Research and Development

AI Tools and Platforms

Industry and Career News


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Open Source AI and Democratization of Large Language Models

Theme 2. Advanced Prompting Techniques for Enhanced LLM Performance

Theme 3. Optimizing Ternary Models for Faster AI Inference

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI-Generated Media and Visual Technologies

AI and Privacy Concerns

AI Regulation and Policy


AI Discord Recap

A summary of Summaries of Summaries

1. LLM Advancements and Benchmarking

2. Model Performance Optimization and Benchmarking

3. Fine-tuning Challenges and Prompt Engineering Strategies

4. Open-Source AI Developments and Collaborations

5. Multimodal AI and Generative Modeling Innovations


PART 1: High level Discord summaries

HuggingFace Discord


Unsloth AI (Daniel Han) Discord


Nous Research AI Discord


CUDA MODE Discord


LM Studio Discord


Stability.ai (Stable Diffusion) Discord


OpenAI Discord


Eleuther Discord


Perplexity AI Discord


Modular (Mojo 🔥) Discord


OpenRouter (Alex Atallah) Discord


Interconnects (Nathan Lambert) Discord


Cohere Discord


Latent Space Discord


OpenAccess AI Collective (axolotl) Discord


LlamaIndex Discord


Torchtune Discord


LangChain AI Discord


OpenInterpreter Discord


DSPy Discord


tinygrad (George Hotz) Discord


MLOps @Chipro Discord


LLM Finetuning (Hamel + Dan) Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LAION Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

HuggingFace ▷ #announcements (1 messages):

  • Llama 3.1 Launch
  • Argilla 2.0 Features
  • Peft v0.12.0 Release
  • Inference-as-a-Service with Nvidia
  • New AutoTrain Task for VLM Finetuning

Links mentioned:


HuggingFace ▷ #general (395 messages🔥🔥):

  • Knowledge Distillation
  • Community Interactions
  • AI Training Techniques
  • Fine-tuning Models
  • Dialectal Language Processing

Links mentioned:


HuggingFace ▷ #today-im-learning (1 messages):

vandutech: You're welcome! Glad you found it useful. and thank you for the feedback.


HuggingFace ▷ #cool-finds (1 messages):

  • Quantizing Diffusion Models
  • Transformer-based Diffusion Backbones
  • High-resolution Text-to-Image Generation
  • Memory Requirements in Large Models

Link mentioned: Memory-efficient Diffusion Transformers with Quanto and Diffusers: no description found


HuggingFace ▷ #i-made-this (12 messages🔥):

  • SAM v2 Model Updates
  • Trivia Question Generation with LLM
  • Palmyra Domain-Specific Models
  • Article Summary on Instruction Hierarchy
  • Llama.cpp Utilization

Links mentioned:


HuggingFace ▷ #computer-vision (2 messages):

  • Hugging Face ML Tasks
  • Face Recognition Task

Link mentioned: Tasks - Hugging Face: no description found


HuggingFace ▷ #NLP (4 messages):

  • Seq2Seq tasks limitations
  • Referenceless metrics
  • Finetuning models

HuggingFace ▷ #diffusion-discussions (5 messages):

  • Knowledge Distillation of 7B Model
  • State-of-the-Art Image Generation
  • Integrating Ollama RAG with WhatsApp
  • Using ONNX Models in Android Apps

Unsloth AI (Daniel Han) ▷ #general (213 messages🔥🔥):

  • Gemma 2 model updates
  • Multigpu support progress
  • Using Lora with fine-tuning
  • Issues with 4bit merging
  • Installation challenges with Unsloth

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (2 messages):

  • MegaBeam-Mistral
  • Long-context benchmarks

Link mentioned: aws-prototyping/MegaBeam-Mistral-7B-512k · Hugging Face: no description found


Unsloth AI (Daniel Han) ▷ #help (135 messages🔥🔥):

  • Quantization Methods
  • Hugging Face API Errors
  • Model Fine-Tuning
  • Installation Issues with Unsloth
  • Inference Consistency

Links mentioned:


Unsloth AI (Daniel Han) ▷ #community-collaboration (6 messages):

  • Unsloth Inference Integration
  • Model Evaluation Strategy
  • Translation Model Readiness

Link mentioned: GitHub - kmahorker/unsloth-hf-inference: Custom Handler for Unsloth Inference with HuggingFace Inference Endpoints: Custom Handler for Unsloth Inference with HuggingFace Inference Endpoints - kmahorker/unsloth-hf-inference


Unsloth AI (Daniel Han) ▷ #research (2 messages):

  • HDMI Eavesdropping
  • Continual Pre-training Insights
  • Sailor Language Models
  • Learning Rate Trade-offs
  • Replay Ratio Dynamics

Links mentioned:


Nous Research AI ▷ #research-papers (1 messages):

_paradroid: https://arxiv.org/abs/2407.04620


Nous Research AI ▷ #off-topic (1 messages):

not_lain: they finally updated the desktop app I can now add a status on my profile


Nous Research AI ▷ #general (330 messages🔥🔥):

  • SOTA image generation
  • LLM reasoning capabilities
  • Gemma 2B performance
  • Dynamic memory systems
  • Q* implementation

Links mentioned:


Nous Research AI ▷ #ask-about-llms (10 messages🔥):

  • Hugging Face Code Generation Leaderboard
  • Mistral Prompting Issues
  • BigCodeBench Leaderboard
  • Character Card Specifications

Links mentioned:


Nous Research AI ▷ #reasoning-tasks-master-list (11 messages🔥):

  • Website Rendering
  • Netlify Automation
  • Subdomain Discussion
  • Domain Unification

Link mentioned: Configure external DNS for a custom domain: Configure an external DNS provider to point your domain to our platform. You can use external DNS for a subdomain or apex domain you registered externally.


CUDA MODE ▷ #general (35 messages🔥):

  • What is Accel?
  • Learning materials for distributed training
  • Tinyboxes shipment update
  • IRL keynotes recording confirmation
  • Challenges with Llama 3.1 inference

Links mentioned:


CUDA MODE ▷ #triton (1 messages):

  • Code Reflection
  • Triton Programming Model
  • OpenJDK Project Babylon
  • GPU Programming

Link mentioned: Exploring Triton GPU programming for neural networks in Java: no description found


CUDA MODE ▷ #torch (13 messages🔥):

  • CUDA Memory Alignment
  • torch.compile on Google Colab
  • Non-blocking Data Transfer Issues
  • Pinned Memory Usage in LLM Inference

Links mentioned:


CUDA MODE ▷ #cool-links (1 messages):

mobicham: https://arxiv.org/abs/2407.09717


CUDA MODE ▷ #jobs (1 messages):

  • ML Performance Optimization
  • Zoox team expansion

CUDA MODE ▷ #pmpp-book (1 messages):

  • Ampere A100 Architecture
  • Warp Processing Efficiency

CUDA MODE ▷ #torchao (13 messages🔥):

  • Quantized Training Recipes
  • Post-Training Quantization
  • Low Bit Optimizers
  • FP8 Support
  • Tutorial Format Discussion

Links mentioned:


CUDA MODE ▷ #hqq (2 messages):

  • Apple's LoRA Adapter Discoveries
  • Llama 3.1-8B Instruct Model Performance

Links mentioned:


CUDA MODE ▷ #llmdotc (199 messages🔥🔥):

  • SwiGLU performance
  • FP8 challenges
  • RoPE integration
  • Llama 3 implementation
  • Hyperparameter tuning

Links mentioned:


CUDA MODE ▷ #bitnet (2 messages):

  • Ternary models speed boosts
  • Ternary-int8 dot product performance
  • CPU vs CUDA performance

Link mentioned: Reddit - Dive into anything: no description found


CUDA MODE ▷ #webgpu (8 messages🔥):

  • WebGPU Overview
  • gpu.cpp Usage
  • Real-time Multimodal Integration
  • Hybrid Model Computation
  • Local Device Computation

CUDA MODE ▷ #cudamode-irl (11 messages🔥):

  • Event Registration
  • Compute Access
  • Funding for GPUs
  • Participant Engagement
  • Venue Details

LM Studio ▷ #announcements (1 messages):

  • Vulkan support update
  • OpenCL deprecation
  • ROCm support

Link mentioned: configs/Extension-Pack-Instructions.md at main · lmstudio-ai/configs: LM Studio JSON configuration file format and a collection of example config files. - lmstudio-ai/configs


LM Studio ▷ #general (143 messages🔥🔥):

  • LM Studio Updates
  • Training and Model Usage
  • AI Conversation Management
  • Installation Issues
  • Model Support and Configurations

Links mentioned:


LM Studio ▷ #hardware-discussion (78 messages🔥🔥):

  • Intel graphics support issues
  • Vulkan support rollout
  • GPU offloading and model performance
  • Challenges with upgrading hardware
  • RAM usage discrepancies between GPUs

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (212 messages🔥🔥):

  • Training Loras for TV characters
  • Model performance and GPU recommendations
  • Using ComfyUI and Auto1111
  • Image generation issues
  • Creative upscaling in Automatic1111

Links mentioned:


OpenAI ▷ #ai-discussions (110 messages🔥🔥):

  • OpenAI's advanced voice mode
  • DALL-E vs Imagen comparisons
  • STT and TTS latency
  • Emotional intelligence in AI
  • AI tools for school work

Links mentioned:


OpenAI ▷ #gpt-4-discussions (14 messages🔥):

  • Custom GPT concerns
  • ChatGPT memory updates
  • Alpha tester selection

OpenAI ▷ #prompt-engineering (4 messages):

  • GPT-4o function performance
  • Language preferences in the community
  • Prompt engineering platforms

OpenAI ▷ #api-discussions (4 messages):

  • GPT-4o functions performance
  • Language diversity in the community
  • Best platforms for prompt engineering

Eleuther ▷ #announcements (1 messages):

  • Sparse Autoencoders
  • Evaluating Text Explanations
  • Open Source Library for Auto-Interpreted Features
  • Cost Efficiency in Feature Interpretation

Links mentioned:


Eleuther ▷ #general (73 messages🔥🔥):

  • Open Source AI Policies
  • GoldFinch Architecture
  • Deepfake Concerns
  • Genomic Data Processing
  • LLM Performance Comparisons

Links mentioned:


Eleuther ▷ #research (36 messages🔥):

  • SAE publication feedback
  • Diffusion models discussions
  • Random number generation on tensor cores
  • Manifold vs graph similarity metrics
  • Training of system prompt style models

Links mentioned:


Eleuther ▷ #scaling-laws (1 messages):

  • Knowledge Distillation
  • 7B Model Hyperparameters
  • Compute Resources for Distillation

Eleuther ▷ #interpretability-general (6 messages):

  • Gemma Scope
  • ICML Workshop Recording

Links mentioned:


Eleuther ▷ #lm-thunderdome (13 messages🔥):

  • lm-eval Zeroshot
  • GPQA processing discrepancies
  • lm-eval launch script
  • super_glue task
  • sts-b subtask omission

Eleuther ▷ #gpt-neox-dev (1 messages):

  • GPT-NeoX library papers
  • Azure power needs study
  • MIT CogSci lab research
  • Hierarchical transformers
  • Low-latency multimodal models

Perplexity AI ▷ #general (70 messages🔥🔥):

  • Paid User Concerns
  • WordPress Partnership
  • Perplexity Labs Issues
  • Advertising on Perplexity
  • Chart Creation in Perplexity

Links mentioned:


Perplexity AI ▷ #sharing (2 messages):

  • Simulation Hypothesis
  • Perplexity AI Skills

Links mentioned:


Perplexity AI ▷ #pplx-api (49 messages🔥):

  • API model discrepancies
  • Citation request delays
  • Model deprecation
  • Search index issues
  • Response quality concerns

Links mentioned:


Modular (Mojo 🔥) ▷ #general (14 messages🔥):

  • Mojo community feedback
  • Mojo presentation guidelines
  • Mojo as a C replacement
  • Type comparison in Mojo

Modular (Mojo 🔥) ▷ #mojo (100 messages🔥🔥):

  • Mojo String Implementation
  • Function Reflection in Mojo
  • Mojo Database Drivers
  • Mojo and MLIR Integration
  • Mojo Max License Concerns

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (70 messages🔥🔥):

  • LLM Tracking Challenges
  • Aider's LLM Leaderboard
  • 4o Mini Performance Discussion
  • NSFW Model Recommendations
  • OpenRouter Cost Comparison

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (26 messages🔥):

  • Gemma 2 2B performance
  • Model releases and competition
  • Distillation in AI models
  • Turbo-sloppofication comment
  • Inside detail on model naming

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (2 messages):

  • Llama 3.1 model performance
  • Inference provider differences
  • Benchmarking challenges
  • Twitter discussions about Llama 3.1

Link mentioned: Llama 3.1: Same model, different results. The impact of a percentage point.: no description found


Interconnects (Nathan Lambert) ▷ #random (4 messages):

  • Anime PFP Feed
  • Llama 3.1 Scores
  • Article Timing

Interconnects (Nathan Lambert) ▷ #memes (3 messages):

  • Open Name Discussion

Link mentioned: Tweet from Delip Rao e/σ (@deliprao): Yes, but only one has “open” in their name.


Interconnects (Nathan Lambert) ▷ #nlp (19 messages🔥):

  • Subbarao Kambhampati's work
  • Intrinsic self-correction in LLMs
  • Benchmarking reasoning trajectories
  • LLM limitations in reasoning and planning
  • Critique of LLM self-correction

Links mentioned:


Interconnects (Nathan Lambert) ▷ #posts (7 messages):

  • Visibility of Text in Screenshots
  • Compression Issues
  • Message Clarity

Cohere ▷ #discussions (41 messages🔥):

  • Google Colab Cohere API
  • Cohere Agentic Build Day
  • Rerank API for document relevance
  • OpenAI and Hugging Face contributions
  • Community support and feedback

Links mentioned:


Cohere ▷ #announcements (1 messages):

  • Agent Build Day
  • Learning from Cohere Experts
  • Agent Demo Competition
  • Integrating Human Oversight
  • Cohere RAG Capabilities

Link mentioned: Agent Build Day by Cohere x AgentOps · Luma: Learn how to leverage Cohere's foundation models, Command, Embed, and Rerank, to build enterprise-grade agentic systems that use tools to connect to external…


Cohere ▷ #questions (14 messages🔥):

  • Rerank API 403 Error
  • Internship Application Status
  • Training Models for Dialect Generation

Cohere ▷ #cohere-toolkit (2 messages):

  • Community Toolkit Activation
  • Docker Compose Configuration
  • Development Environment Setup

Latent Space ▷ #ai-general-chat (56 messages🔥🔥):

  • OpenAI's synthetic data rumors
  • Gemma 2 2B model performance
  • Llama 3.1 evaluation differences
  • alphaXiv for arXiv papers
  • InternLM's MindSearch framework

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):

  • AgentInstruct
  • AutoEvolInstruct
  • Apple Intelligence Paper
  • LLM Paper Club

Link mentioned: LLM Paper Club (MSR special: AgentInstruct/Orca 3, AutoEvolInstruct) · Zoom · Luma: @sam, @vibhu, @alpay will be guiding us through AgentInstruct (https://arxiv.org/abs/2407.03502) and AutoEvolInstruct (https://arxiv.org/abs/2406.00770)! For…


OpenAccess AI Collective (axolotl) ▷ #general (4 messages):

  • Quantization in LLMs
  • Axolotl early stopping features
  • Manual termination of training runs
  • Gema2b discussion

Link mentioned: A Visual Guide to Quantization: Exploring memory-efficient techniques for LLMs


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (7 messages):

  • Gemma-2-27b Tuning Config
  • Roles_to_train in chat_template
  • Default roles to train fix
  • Logging verbosity adjustments

Link mentioned: fix roles to train defaults and make logging less verbose by winglian · Pull Request #1801 · axolotl-ai-cloud/axolotl: This fixes an issue where everything was getting ignored except for the final token with basic defaults.


OpenAccess AI Collective (axolotl) ▷ #general-help (44 messages🔥):

  • Training QLora and Lora models
  • Challenges in training a support AI
  • Fine-tuning Llama models
  • Retrieval Augmented Generation (RAG)
  • Data cleaning for conversation datasets

OpenAccess AI Collective (axolotl) ▷ #replicate-help (1 messages):

  • Serverless GPUs
  • AI infrastructure developments
  • Dynamic market trends
  • Deployment experiences
  • Cold starts and autoscaling

Link mentioned: Serverless GPU Part 2 Benchmarking: A Comprehensive Comparison of Performance & Pricing: Dive into an in-depth review of Serverless GPU platforms. Explore cold-start times, integration challenges, pricing comparison and auto-scaling capabilities. Make informed choices with our detailed an...


LlamaIndex ▷ #blog (3 messages):

  • MLflow in LlamaIndex
  • AI21 Labs' Jamba-Instruct model
  • Open-source contributions
  • Async functionality for BedrockConverse
  • Token improvements

Link mentioned: feat: ✨ Implement async functionality in BedrockConverse by AndreCNF · Pull Request #14326 · run-llama/llama_index: Description Implement async methods for the BedrockConverse LLM. Fixes #10714 Fixes #14004 New Package? Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.m...


LlamaIndex ▷ #general (49 messages🔥):

  • MLflow integration issues
  • vLLM documentation PR
  • RAG observability concerns
  • Llama product naming confusion
  • RagApp alternatives

Link mentioned: <a href="http://127.0.0.1:3000")">no title found: no description found


LlamaIndex ▷ #ai-discussion (2 messages):

  • Full-Document Retrieval
  • Medium Content Concerns

Torchtune ▷ #general (12 messages🔥):

  • LLAMA_3 model outputs
  • Generation parameters
  • Top_p and Frequency_penalty settings
  • Temperature settings impact
  • Quality comparison between deployments

Link mentioned: AI Playground | Compare top AI models side-by-side: Chat and compare OpenAI GPT, Anthropic Claude, Google Gemini, Llama, Mistral, and more.


Torchtune ▷ #dev (25 messages🔥):

  • ChatPreferenceDataset Updates
  • FSDP and QAT Compatibility
  • Parameter Naming Discussion
  • Merging PRs
  • FSDP2 Capabilities

Links mentioned:


LangChain AI ▷ #general (20 messages🔥):

  • Google Gemini context caching
  • Streaming tokens from an agent
  • LangChain errors and issues
  • Using LangChain tools

Links mentioned:


LangChain AI ▷ #share-your-work (2 messages):

  • SWE Agent Guide
  • Palmyra-Fin-70b
  • Palmyra-Med-70b
  • frameworks like CrewAI, AutoGen, LangChain, LLamaIndex

Links mentioned:


LangChain AI ▷ #tutorials (2 messages):

  • SWE Agents Guide
  • AI Long-Term Memory Solutions

Links mentioned:


OpenInterpreter ▷ #general (10 messages🔥):

  • Open Interpreter Workflow
  • OS Mode Requirements
  • 4o Mini Compatibility
  • Eye Tracking Technology
  • Visor Technology Impact

Link mentioned: Tweet from Humanoid History (@HumanoidHistory): The future in space, illustrated by Tatsushi Morimoto, Robert McCall, Günter Radtke, and John Berkey.


OpenInterpreter ▷ #O1 (10 messages🔥):

  • 01 Server Installation on Ubuntu 22.04
  • Custom Instructions for 01
  • Community Engagement with 01
  • Accessing Pre-Order Information
  • Poetry Version Discussion

OpenInterpreter ▷ #ai-content (2 messages):

  • Perplexica
  • Llama-3
  • Open Source AI
  • AI-powered Search Engines

Links mentioned:


DSPy ▷ #papers (1 messages):

pavl_p: Sounds like they integrated dspy with a symbolic learner. Exciting!


DSPy ▷ #general (15 messages🔥):

  • DSPy Module Penalty System
  • Launching on ProductHunt
  • Using DSPy for Product Development
  • Cache Management in DSPy
  • Schema-Aligned Parsing Proposal

Links mentioned:


tinygrad (George Hotz) ▷ #general (7 messages):

  • UCSC Colloquium Talk
  • OpenCL Resource Errors
  • Brazilian AI Investment Plan
  • Discord Rules Reminder

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (6 messages):

  • jit compilation
  • step function optimization

MLOps @Chipro ▷ #general-ml (4 messages):

  • Goldman Sachs report
  • General AI interest
  • Recommendation Systems

LLM Finetuning (Hamel + Dan) ▷ #general (1 messages):

  • Trivia App Development
  • LLM Usage in Gaming
  • User Engagement Statistics

Link mentioned: FastHTML page: no description found






{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}