Frozen AI News archive

Reflection 70B, by Matt from IT Department

**Reflection Tuning** technique has been used by a two-person team from **Hyperwrite** and **Glaive** to finetune **llama-3.1-70b**, showing strong performance improvements with minimal synthetic data. The approach builds on the concept of adding `thinking` and `reflection` steps to outputs, related to the **Chain of Thought** method. Despite some criticisms like contamination concerns, worse coding performance, and reliance on system prompts, the model has received positive reception and comparisons to **claude-3.5-sonnet**. The work highlights efficient instruction tuning and synthetic data generation for large models.

Canonical issue URL

AI News for 9/5/2024-9/6/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (214 channels, and 2813 messages) for you. Estimated reading time saved (at 200wpm): 304 minutes. You can now tag @smol_ai for AINews discussions!

We were going to wait til next week for the paper + 405B, but the reception has been so strong (with VentureBeat cover story) and the criticisms mostly minor so we are going to make this the title story even though it technically happened yesterday, since no other story comes close.

image.png

TL;DR a two person team; Matt Shumer from Hyperwrite (who has no prior history of AI research but is a prolific AI builder and influencer) and Sahil Chaudhary from Glaive finetuned Llama 3.1 70B (though context is limited) using a technique similar to a one year old paper, Reflection-Tuning: Recycling Data for Better Instruction-Tuning: image.png

Matt hasn't yet publicly cited the paper, but it almost doesn't matter because the process is retrospectively obvious to anyone who understands the broad Chain of Thought literature: train LLMs to add thinking and reflection sections to their output before giving a final output.

image.png

This is basically "Let's Think Step By Step" in more formal terms, and is surprising to the extent that the Orca series of models (our coverage here) already showed that Chain of Thought could be added to Llama 1/2/3 and would work:

image.png

It would seem that Matt has found the ideal low hanging fruit because nobody bothered to take a different spin on Orca + generate enough synthetic data (we still don't know how much it was, but it couldn't have been that much given the couple dozen person-days that Matt and Sahil spent on it) to do this until now.

The criticisms have been few and mostly not fatal:

After a day of review, the overall vibes remain very strong - with /r/localLlama reporting that even 4bit quantizations of Reflection 70B are doing well, and Twitter reporting riddles and favorable comparisons with Claude 3.5 Sonnet that it can be said to at least pass the vibe check if not as a generally capable model, but on enough reasoning tasks to be significant.

More information can be found on this 34min livestream conversation and 12min recap with Matthew Berman.

All in all, not a bad day for Matt from IT.

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

LLM Training & Evaluation

Open-Source Models & Research

AI Tools & Applications

AI Alignment & Safety

Memes & Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Advancements in LLM Quantization and Efficiency

Theme 2. Reflection-70B: A Novel Fine-tuning Technique for LLMs

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Model Developments and Releases

AI Industry and Market Dynamics

AI Applications and Innovations


AI Discord Recap

A summary of Summaries of Summaries by Claude 3.5 Sonnet

1. LLM Advancements and Benchmarking

2. Model Optimization Techniques

3. Open-source AI Developments

4. AI Infrastructure and Deployment


PART 1: High level Discord summaries

HuggingFace Discord


Stability.ai (Stable Diffusion) Discord


Unsloth AI (Daniel Han) Discord


LM Studio Discord


Nous Research AI Discord


Latent Space Discord


OpenAI Discord


Eleuther Discord


OpenInterpreter Discord


Modular (Mojo 🔥) Discord


CUDA MODE Discord


Interconnects (Nathan Lambert) Discord


Perplexity AI Discord


tinygrad (George Hotz) Discord


Torchtune Discord


LlamaIndex Discord


OpenAccess AI Collective (axolotl) Discord


Cohere Discord


DSPy Discord


LAION Discord


LangChain AI Discord


LLM Finetuning (Hamel + Dan) Discord


Gorilla LLM (Berkeley Function Calling) Discord


Alignment Lab AI Discord


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

HuggingFace ▷ #announcements (1 messages):

  • Vision Language Models
  • Tau LLM Training Optimization
  • African Language Models
  • No-Code AI Model Tasks
  • Selective Fine-Tuning of Language Models

HuggingFace ▷ #general (258 messages🔥🔥):

  • Code Generation Evaluations
  • Model Training Issues
  • Data Handling for Training
  • Fine-tuning and Pre-training
  • Performance Analysis of Models

Links mentioned:


HuggingFace ▷ #today-im-learning (8 messages🔥):

  • Understanding Attention Mechanism in Transformers
  • Discussions on Cross-posting
  • Using AI for Tutoring Kids
  • Creating a Python Microservice with Ollama

HuggingFace ▷ #cool-finds (2 messages):

  • Elasticsearch
  • Vespa Search Engine

Link mentioned: Tweet from Jo Kristian Bergum (@jobergum): Goodbye Elasticsearch, Hello Vespa Search Engine 👀


HuggingFace ▷ #i-made-this (14 messages🔥):

  • Pro-Pretorian Computer Vision System
  • Interactive Model Comparator
  • Chess Puzzle Visualization
  • Tau LLM Series Update

Links mentioned:


HuggingFace ▷ #reading-group (1 messages):

noaroggendorff: <@&1078351789843292311>


HuggingFace ▷ #core-announcements (1 messages):

  • Optimizing Flux and Cog
  • Diffusion models
  • TorchAO

Link mentioned: GitHub - sayakpaul/diffusers-torchao: End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).: End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training). - sayakpaul/diffusers-torchao


HuggingFace ▷ #computer-vision (2 messages):

  • Medical dataset for disease detection
  • Training Nougat and Donut

HuggingFace ▷ #NLP (4 messages):

  • OOM errors during evaluation
  • DeepSpeed configuration for evaluation
  • Custom Dataset for evaluation
  • GPU distribution techniques

HuggingFace ▷ #diffusion-discussions (10 messages🔥):

  • Flux img2img Pipeline
  • SD3 vs. SDXL models
  • ControlNets for SDXL
  • Auto Class Recommendations
  • Memory Optimizations

Link mentioned: Flux: no description found


Stability.ai (Stable Diffusion) ▷ #general-chat (274 messages🔥🔥):

  • ControlNet and Model Usage
  • Flux vs. SDXL for Image Generation
  • Scams and Online Safety
  • Tagging and Workflow in ComfyUI
  • Integration of Extensions in Forge

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (189 messages🔥🔥):

  • Congratulations on Y Combinator backing
  • Unsloth AI functionality and support
  • Models for synthetic data generation
  • Reflection model performance
  • Hardware requirements for Unsloth

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (10 messages🔥):

  • Evolution of Unsloth
  • Emoji Communication
  • App Promotion

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (43 messages🔥):

  • Unsloth Library Installation
  • Kaggle Competition Constraints
  • Phi 3.5 Fine Tuning
  • Gemma-2-27B Loading Issues
  • Mistral 7B Domain Limitation

Link mentioned: Qwen2 error when loading from checkpoint · Issue #478 · unslothai/unsloth: Works as expected when loading the base model, but when a LoRA checkpoint is loaded in place of the base model, unsloth returns: Unsloth cannot patch Attention layers with our manual autograd engin...


Unsloth AI (Daniel Han) ▷ #showcase (2 messages):

  • Comparison Reports
  • YouTube Explanations

Unsloth AI (Daniel Han) ▷ #research (1 messages):

  • Message Duplication
  • Channel Oversight

LM Studio ▷ #general (142 messages🔥🔥):

  • Image API options
  • Reflection Llama-3.1 70B updates
  • LM Studio issues
  • Scraping data with local LLMs
  • Accessing Llama 3.1 405B model

Links mentioned:

Not using a lookup table anymore makes it match q4_0 speed.


LM Studio ▷ #hardware-discussion (59 messages🔥🔥):

  • Apple Event Announcement
  • Mac Studio Performance Concerns
  • NVIDIA RTX 3090 Performance with NVLink
  • LMStudio Boot Time Issues
  • NAS Usage with Apple Devices

Links mentioned:


Nous Research AI ▷ #general (190 messages🔥🔥):

  • Reflection 70B Model
  • Hermes 3 and Llama 3.1 API Usage
  • Benchmarking reflection and ICL performance
  • MCTS and PRM Techniques
  • Quantization Issues

Links mentioned:


Nous Research AI ▷ #ask-about-llms (1 messages):

  • DeepSeek v2.5
  • Coding improvements

Nous Research AI ▷ #interesting-links (1 messages):

teknium: https://x.com/alexandr_wang/status/1832147956562284987?s=46


Latent Space ▷ #ai-general-chat (52 messages🔥):

  • OpenAI's $2000 Subscription Model
  • Reflection 70B Model Performances
  • Speculative Decoding in Inference
  • New Text-to-Music Models
  • AI Scientist Testing Challenges

Links mentioned:


Latent Space ▷ #ai-in-action-club (76 messages🔥🔥):

  • AI Code Editors
  • Handling Errors in Engineering
  • Tools for Code Automation
  • Collaboration with AI
  • Fine-tuning Models

Links mentioned:


OpenAI ▷ #ai-discussions (80 messages🔥🔥):

  • Perplexity usage
  • RunwayML controversy
  • Reflection model testing
  • Luma Dream Machine preferences
  • OpenAI tokens availability

Links mentioned:


OpenAI ▷ #gpt-4-discussions (10 messages🔥):

  • Rate Limit Issues
  • Custom GPT Sharing Problems
  • Browser Compatibility

OpenAI ▷ #prompt-engineering (10 messages🔥):

  • Incorporating tool calls
  • Prompt library location
  • Creative prompt usage

OpenAI ▷ #api-discussions (10 messages🔥):

  • Incorporating Tool Calls
  • Prompt Library Location
  • Buffer Content Prompt

Eleuther ▷ #general (97 messages🔥🔥):

  • Academic Lab Opportunities
  • Universal Transformers
  • Recurrence in Neural Networks
  • Computational Resource Challenges
  • Independence in Research

Link mentioned: Universal Transformers: Recurrent neural networks (RNNs) sequentially process data by updating their state with each new data point, and have long been the de facto choice for sequence modeling tasks. However, their inherent...


Eleuther ▷ #research (5 messages):

  • Momentum-based Optimizers
  • Reinforcement Learning Automation
  • Gradient Cosine Similarity
  • Consecutive Gradient Analysis

Links mentioned:


Eleuther ▷ #lm-thunderdome (2 messages):

  • Reusing Model Outputs
  • lm-evaluation-harness

Link mentioned: GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models.: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness


Eleuther ▷ #gpt-neox-dev (2 messages):

  • Hugging Face RoPE Implementation Compatibility
  • Training Model for 1 Epoch

OpenInterpreter ▷ #general (74 messages🔥🔥):

  • Open Interpreter birthday celebration
  • Skills functionality in OI
  • Feedback on 01 app performance
  • Fulcra app availability
  • Beta testing for OI

Links mentioned:


OpenInterpreter ▷ #O1 (8 messages🔥):

  • Beta role for desktop
  • Open Interpreter 01 issues
  • Audio device inquiry

Modular (Mojo 🔥) ▷ #general (13 messages🔥):

  • 404 on values page
  • Integration of C and Mojo
  • Company culture link update

Links mentioned:


Modular (Mojo 🔥) ▷ #mojo (68 messages🔥🔥):

  • Mojo async functionality
  • Use of DType as Dict key
  • Improvements in constructor usage
  • Wrapper for pop.array
  • MLIR and IR generation in Mojo

Link mentioned: 2023 LLVM Dev Mtg - Mojo 🔥: A system programming language for heterogenous computing: 2023 LLVM Developers' Meetinghttps://llvm.org/devmtg/2023-10------Mojo 🔥: A system programming language for heterogenous computingSpeaker: Abdul Dakkak, Chr...


CUDA MODE ▷ #general (4 messages):

  • Reflection 70B model
  • Reflection Tuning technique
  • Together's custom kernel performance

Links mentioned:


CUDA MODE ▷ #triton (9 messages🔥):

  • Debugging tips for Triton
  • MLIR_ENABLE_DUMP
  • TRITON_INTERPRET
  • Triton vs Marlin comparison
  • Quantum zero effects

Link mentioned: GitHub - triton-lang/triton at 7480ef5028b724cb434b7841b016c6d6debf3b84: Development repository for the Triton language and compiler - GitHub - triton-lang/triton at 7480ef5028b724cb434b7841b016c6d6debf3b84


CUDA MODE ▷ #torch (1 messages):

  • TorchDynamo Cache Lookup
  • Performance Issues with Large Models
  • torch/nn/modules/container.py

CUDA MODE ▷ #cool-links (3 messages):

  • NVIDIA Generative AI Teaching Kit
  • Efficient Machine Learning Course
  • Model Compression Techniques
  • Llama2-7B Deployment

Links mentioned:


CUDA MODE ▷ #jobs (3 messages):

  • Citadel Securities hiring
  • Liquid AI remote roles
  • CUDA Mode awareness

Link mentioned: Liquid AI jobs: Job openings at Liquid AI


CUDA MODE ▷ #beginner (9 messages🔥):

  • Image Convolution Optimization
  • Control Divergence vs Arithmetic
  • Triton Kernels for LLM Training

Link mentioned: Lecture 28: Liger Kernel - Efficient Triton Kernels for LLM Training: Byron Hsu presents LinkedIn's open-source collection of Triton kernels for efficient LLM training.TIMESTAMPS00:00 Host Opening00:22 Main Focus01:18 Outline03...


CUDA MODE ▷ #jax (1 messages):

0ut0f0rder: Thanks!


CUDA MODE ▷ #torchao (14 messages🔥):

  • Batch Size Limitations in FP16 x INT8 Matmul
  • Torch Compiler Performance Issues
  • Torchao Installation Errors

Link mentioned: Unbreak build after #621 by andrewor14 · Pull Request #826 · pytorch/ao: no description found


CUDA MODE ▷ #off-topic (9 messages🔥):

  • Avoiding Burnout Strategies
  • Personal Projects for Productivity
  • Flow State in Programming
  • Work-Life Balance
  • New System Torture Test Script

Links mentioned:


CUDA MODE ▷ #llmdotc (6 messages):

  • Small Talk on llm.c in Yerevan
  • Innovative Uses of llm.c
  • NCCL Multi-GPU Training
  • Scaling on GPUs

Link mentioned: NCCL only multi-gpu multi-node training without MPI by chinthysl · Pull Request #426 · karpathy/llm.c: Scheduling jobs using Slurm seems much easier in a multi-node training setup compared to setting up MPI for the cluster. This draft contains the changes to use mpirun for single-node training and S...


CUDA MODE ▷ #liger-kernel (5 messages):

  • Multimodal Convergence Tests
  • Liger's Swiglu Kernels performance
  • Together AI's GPU Clusters
  • Performance comparison against cuBLAS
  • Kernel optimization strategies

Link mentioned: Supercharging NVIDIA H200 and H100 GPU Cluster Performance With Together Kernel Collection: no description found


Interconnects (Nathan Lambert) ▷ #news (43 messages🔥):

  • Reflection Llama-3.1 70B
  • Glaive data usage
  • Model performance
  • Hype around LLMs
  • Feedback on self-reflection prompts

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-questions (5 messages):

  • HuggingFace Numina
  • Math benchmarks
  • CHAMP benchmark
  • Research queries

Link mentioned: CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities: Recent large language models (LLMs) have shown indications of mathematical reasoning ability on challenging competition-level problems, especially with self-generated verbalizations of intermediate re...


Interconnects (Nathan Lambert) ▷ #random (16 messages🔥):

  • Reliability of Fireworks and Together
  • GitHub organization takedowns
  • Standardization of AI chat logs
  • Embarrassment in AI interactions
  • Chat templates for AI models

Perplexity AI ▷ #general (42 messages🔥):

  • Getting into Tech with No Experience
  • Bing Copilot Capabilities
  • Perplexity AI Referral Program
  • Web3 Innovation Job Opportunities

Link mentioned: Tweet from Perplexity (@perplexity_ai): New merch for students 🔜 Just one way to get it: refer your friends to Perplexity! Share more, get more: http://perplexity.ai/backtoschool


Perplexity AI ▷ #sharing (11 messages🔥):

  • Sutskever's SSI funding
  • Volkswagen ChatGPT integration
  • AI-powered worldbuilding
  • NFL 2024 season kickoff
  • Vehicle-to-everything tech

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (2 messages):

  • pplx-api memory usage
  • Telegram bot memory storage

tinygrad (George Hotz) ▷ #general (15 messages🔥):

  • Bounty Questions
  • Tinygrad Pricing
  • Server Relevance
  • Code Readability
  • Guidelines Acknowledgment

Link mentioned: How To Ask Questions The Smart Way: no description found


tinygrad (George Hotz) ▷ #learn-tinygrad (18 messages🔥):

  • PHI operation confusion
  • MultiLazyBuffer features
  • Sharded buffer behavior
  • Discussion on SDXL inference
  • Understanding Tensor views

Links mentioned:


Torchtune ▷ #general (2 messages):

  • Gemma 2 model
  • Links to resources
  • Model information

Link mentioned: google/gemma-2-9b · Hugging Face: no description found


Torchtune ▷ #dev (28 messages🔥):

  • Multimodal Generation Handling
  • Flex Attention for Document Masking
  • INT8 Mixed-Precision Training
  • TransformerDecoder Configuration
  • GitHub PRs for Generation Overhaul

Links mentioned:


LlamaIndex ▷ #blog (4 messages):

  • llama-deploy launch
  • agentic system deployment example
  • Running Reflection 70B
  • advanced agentic RAG pipelines

LlamaIndex ▷ #general (21 messages🔥):

  • PandasQueryEngine issues
  • Customer support chatbot integration
  • NeptuneDatabaseGraphStore bug
  • Cohere reranker in Azure

Link mentioned: Node Postprocessor - LlamaIndex: no description found


OpenAccess AI Collective (axolotl) ▷ #general (18 messages🔥):

  • Reflection Llama-3.1 70B
  • Synthetic Dataset Generation
  • Model Thinking Space
  • Fine-tuning Challenges
  • ReAct CoT Technique

Link mentioned: mattshumer/Reflection-Llama-3.1-70B · Hugging Face: no description found


OpenAccess AI Collective (axolotl) ▷ #general-help (2 messages):

  • Fine-tuning Llama 3.1
  • GPU requirements for Lora finetuning

OpenAccess AI Collective (axolotl) ▷ #community-showcase (2 messages):

  • SmileyLlama
  • Chemical Language Model
  • Molecule Design

Link mentioned: Tweet from Axolotl (@axolotl_ai): SmileyLlama, a fine-tuned Chemical Language Model to design molecules from properties specified in the prompt. An SFT+DPO model on par with other pure CLM's, but built with Axolotl.


Cohere ▷ #discussions (15 messages🔥):

  • Cohere resources
  • Anthropic library usage
  • Embed-multilingual-light-v3.0 on Azure

Links mentioned:


Cohere ▷ #questions (2 messages):

  • RAG citations
  • Text files as knowledge base

DSPy ▷ #show-and-tell (3 messages):

  • Chroma DB Setup
  • Weaviate Examples
  • Jupyter Notebooks for Server-Client Communication

DSPy ▷ #papers (3 messages):

  • Importance of Names
  • Collaborative Learning
  • AI in Education
  • MAIC Proposal
  • Online Course Evolution

Link mentioned: Paper page - From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents: no description found


DSPy ▷ #general (2 messages):

  • Reflection 70B
  • Routing LLMs by Query
  • TPU Speed and Pricing

Link mentioned: Tweet from Matt Shumer (@mattshumer_): I'm excited to announce Reflection 70B, the world’s top open-source model. Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes. 405B coming next week ...


LAION ▷ #general (5 messages):

  • SwarmUI
  • User Interface Design
  • Bane Meme

Links mentioned:


LAION ▷ #research (3 messages):

  • Reflection 70B
  • LLM Self-Correction
  • Lucidrains Transfusion Implementation
  • 405B Model Release

Links mentioned:


LangChain AI ▷ #general (6 messages):

  • Deploying ReAct agent on GCP
  • LangChain Callbacks system
  • Cerebras with LangChain
  • Decoding streams from .astream_events

Link mentioned: Callbacks | 🦜️🔗 LangChain: Head to Integrations for documentation on built-in callbacks integrations with 3rd-party tools.


LLM Finetuning (Hamel + Dan) ▷ #general (5 messages):

  • RAG system improvements
  • Embedding model usage
  • Hybrid search
  • Metadata and reranking

Link mentioned: pymilvus/examples/hello_hybrid_sparse_dense.py at master · milvus-io/pymilvus: Python SDK for Milvus. Contribute to milvus-io/pymilvus development by creating an account on GitHub.


Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (1 messages):

  • XLAM system prompt
  • OSS models comparison

Gorilla LLM (Berkeley Function Calling) ▷ #discussion (3 messages):

  • Testing API server
  • Adding models to leaderboard
  • Gorilla leaderboard

Link mentioned: gorilla/berkeley-function-call-leaderboard at main · ShishirPatil/gorilla: Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls) - ShishirPatil/gorilla


Alignment Lab AI ▷ #general (1 messages):

knut09896: hi there





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}