Frozen AI News archive

RouteLLM: RIP Martian? (Plus: AINews Structured Summaries update)

**LMSys** introduces RouteLLM, an open-source router framework trained on **preference data** from Chatbot Arena, achieving **cost reductions over 85% on MT Bench, 45% on MMLU, and 35% on GSM8K** while maintaining **95% of GPT-4's performance**. This approach surpasses previous task-specific routing by using syntax-based Mixture of Experts (MoE) routing and data augmentation, beating commercial solutions by 40%. The update highlights advances in **LLM routing**, **cost-efficiency**, and **model performance optimization** across multiple models rather than single-model or MoE-level improvements. Additionally, the AI Twitter recap notes the **Gemma 2 model family** as a top open model, the **Block Transformer architecture** for improved inference throughput, and a proposal for a fully Software 2.0 computer vision system by **karpathy**.

Canonical issue URL

AI News for 6/28/2024-7/1/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (419 channels, and 6896 messages) for you. Estimated reading time saved (at 200wpm): 746 minutes. You can now tag @smol_ai for AINews discussions!

Remember the Mistral Convex Hull of April, and then the DeepSeekV2 win of May?. The cost-vs-performance efficient frontier is being pushed out again, but not at the single-model or MoE level, rather across all models:

image.png

The headline feature to note is this sentence: We trained four different routers using public data from Chatbot Arena and demonstrate that they can significantly reduce costs without compromising quality, with cost reductions of over 85% on MT Bench, 45% on MMLU, and 35% on GSM8K as compared to using only GPT-4, while still achieving 95% of GPT-4’s performance.

The idea of LLM routing isn't new; model-router was a featured project at the "Woodstock of AI" meetup in early 2023, and subsequently raised a sizable $9m seed round off that concept. However these routing solutions were based off of task-specific routing, the concept that different models are better at different tasks, which stands in direct contrast with syntax-based MoE routing.

LMSys' new open source router framework, RouteLLM, innovates by using preference data from The Arena for training their routers, based on predicting the best model a user prefers conditional upon a prompt. They also use data augmentation of the Arena data to further improve their routing benefits:

image.png

Perhaps most brutally, LMSys claim to beat existing commercial solutions by 40% for the same performance.

image.png

SPECIAL AINEWS UPDATE: Structured Summaries

We have revised our core summary code to use structured output, focusing on achieving 1) better topic selection, 2) separation between fact and opinion/reation, and 3) better linking and highlighting. You can see the results accordingly. We do see that they have become more verbose, with this update, but we hope that the structure makes it more scannable, our upcoming web version will also be easier to navigate.

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

AI Models and Architectures

AI Agents and Reasoning

AI Applications

Memes and Humor


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

Humor/Memes

AI Art

AI Scaling and Capabilities

AI Models and Benchmarks


AI Discord Recap

A summary of Summaries of Summaries

  1. Model Training, Quantization, and Optimization:

    • Adam-mini Optimizer: Saves VRAM by 45-50%. Achieves performance akin to AdamW without the excessive memory overhead, useful for models like llama 70b and GPT-4.
    • Hugging Face's new low-precision inference boosts transformer pipeline performance. Aimed at models like SD3 and PixArt-Sigma, it improves computational efficiency.
    • CAME Optimizer: Memory-efficient optimization. Shows better or comparable performance with reduced memory need, beneficial for stable diffusion training.
  2. New AI Models and Benchmarking:

    • Gemma 2 demonstrates mixed performance but shows potential against models like Phi3 and Mistral, pending further optimization.
    • Claude 3.5 faces contextual retention issues despite high initial expectations; alternative models like Claude Opus perform reliably.
    • Persona Hub leverages diverse data applications to skyrocket MATH benchmark scores, proving synthetic data's efficacy in broader AI applications.
  3. Open-Source AI Tools and Community Engagement:

    • Rig Library: Integrates fully with Cohere models, aimed at Rust developers with $100 feedback rewards for insights.
    • LlamaIndex introduces its best Jina reranker yet and provides a comprehensive tutorial for hybrid retrieval setups, promising advancements in retrieval pipelines.
    • Jina Reranker: A new hybrid retriever tutorial details combining methods for better performance, allowing integration with tools like Langchain and Postgres.
  4. Technical Challenges and Troubleshooting:

    • BPE Tokenizer Visualizer helps understand tokenizer mechanics in LLMs, inviting community feedback to refine the tool.
    • Database Queue Issues plague Eleuther and Hugging Face models' benchmarking efforts, urging users to look at alternatives like vllm for better efficiency.
    • Training GPT Models across multiple systems: Discussions emphasized handling GPU constraints and optimizing scales for effective resource usage.
  5. AI in Real-World Applications:

    • Featherless.ai launches to provide serverless access to LLMs at a flat rate, facilitating easy AI persona application development without GPU setups.
    • Deepseek Code V2 highly praised for its performance in solving complex calculus and coding tasks efficiently.
    • Computer Vision in Healthcare: Exploring agentic hospitals using CV, emphasizing compute resources integration to enhance patient care and reduce administrative workloads.

PART 1: High level Discord summaries

HuggingFace Discord


Unsloth AI (Daniel Han) Discord


LM Studio Discord


Stability.ai (Stable Diffusion) Discord


CUDA MODE Discord


Perplexity AI Discord


Nous Research AI Discord


Modular (Mojo 🔥) Discord


OpenRouter (Alex Atallah) Discord


Latent Space Discord


LangChain AI Discord


Interconnects (Nathan Lambert) Discord


OpenInterpreter Discord


LlamaIndex Discord


OpenAccess AI Collective (axolotl) Discord


Eleuther Discord


tinygrad (George Hotz) Discord


LAION Discord


LLM Finetuning (Hamel + Dan) Discord


Cohere Discord


Torchtune Discord


AI Stack Devs (Yoko Li) Discord


Mozilla AI Discord


MLOps @Chipro Discord


Datasette - LLM (@SimonW) Discord


DiscoResearch Discord


The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

HuggingFace ▷ #announcements (1 messages):

Link mentioned: Gemma2:27B First Test ! How Can it be THAT Bad ?!: Let's test the biggest version (27B) of the gemma2 release an hour ago by Google with ollama


HuggingFace ▷ #general (952 messages🔥🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (7 messages):

Links mentioned:


HuggingFace ▷ #cool-finds (11 messages🔥):

Links mentioned:


HuggingFace ▷ #i-made-this (17 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (30 messages🔥):

Links mentioned:


HuggingFace ▷ #core-announcements (1 messages):


HuggingFace ▷ #computer-vision (12 messages🔥):

Links mentioned:


HuggingFace ▷ #NLP (9 messages🔥):

Link mentioned: Top 10 Deep Learning Algorithms intro in 1 min: Welcome to our deep dive into the Top 10 Deep Learning Algorithms! In this video, we break down each algorithm with a concise 10-word explanation. Perfect fo...


HuggingFace ▷ #diffusion-discussions (10 messages🔥):


Unsloth AI (Daniel Han) ▷ #general (1031 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (14 messages🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (170 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (2 messages):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (10 messages🔥):

Links mentioned:


LM Studio ▷ #💬-general (332 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (221 messages🔥🔥):

Links mentioned:


LM Studio ▷ #announcements (1 messages):

Links mentioned:


LM Studio ▷ #🧠-feedback (13 messages🔥):


LM Studio ▷ #🎛-hardware-discussion (149 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (14 messages🔥):

Link mentioned: Futurama Angry GIF - Futurama Angry - Discover & Share GIFs: Click to view the GIF


LM Studio ▷ #amd-rocm-tech-preview (29 messages🔥):

Links mentioned:


LM Studio ▷ #🛠-dev-chat (11 messages🔥):

Link mentioned: Issues · lmstudio-ai/lmstudio.js: LM Studio TypeScript SDK (pre-release public alpha) - Issues · lmstudio-ai/lmstudio.js


Stability.ai (Stable Diffusion) ▷ #general-chat (716 messages🔥🔥🔥):

Links mentioned:


CUDA MODE ▷ #general (1 messages):


CUDA MODE ▷ #triton (4 messages):

Link mentioned: The Log-Sum-Exp Trick: no description found


CUDA MODE ▷ #torch (24 messages🔥):

Links mentioned:


CUDA MODE ▷ #cool-links (3 messages):

Link mentioned: AI Engineer World’s Fair 2024 — GPUs & Inference Track: https://twitter.com/aidotengineer


CUDA MODE ▷ #pmpp-book (2 messages):


CUDA MODE ▷ #torchao (42 messages🔥):

Link mentioned: FlexAttention API by drisspg · Pull Request #121845 · pytorch/pytorch: Summary This PR adds a new higher-order_op: templated_attention. This op is designed to extend the functionality of torch.nn.fucntional.scaled_dot_product_attention. PyTorch has efficient pre-wri...


CUDA MODE ▷ #off-topic (2 messages):

Link mentioned: Next-door 10x Software Engineer [FULL]: no description found


CUDA MODE ▷ #llmdotc (473 messages🔥🔥🔥):

Links mentioned:


CUDA MODE ▷ #rocm (1 messages):

Link mentioned: Nscale Benchmarks: AMD MI300x GPUs with GEMM tuning improves throughput and latency by up to 7.2x: Optimising AI model performance: vLLM throughput and latency benchmarks and GEMM Tuning with rocBLAS and hipBLASlt


CUDA MODE ▷ #sparsity (1 messages):


Perplexity AI ▷ #announcements (2 messages):


Perplexity AI ▷ #general (367 messages🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (16 messages🔥):

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (19 messages🔥):

Links mentioned:


Nous Research AI ▷ #research-papers (3 messages):

Links mentioned:


Nous Research AI ▷ #datasets (4 messages):

Links mentioned:


Nous Research AI ▷ #off-topic (10 messages🔥):

Links mentioned:


Nous Research AI ▷ #general (60 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (6 messages):


Nous Research AI ▷ #rag-dataset (163 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #world-sim (31 messages🔥):


Modular (Mojo 🔥) ▷ #general (7 messages):


Modular (Mojo 🔥) ▷ #announcements (1 messages):

Link mentioned: GitHub - Benny-Nottonson/Mojo-Marathons: Contribute to Benny-Nottonson/Mojo-Marathons development by creating an account on GitHub.


Modular (Mojo 🔥) ▷ #ai (32 messages🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #🔥mojo (149 messages🔥🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #performance-and-benchmarks (7 messages):


Modular (Mojo 🔥) ▷ #🏎engine (6 messages):


Modular (Mojo 🔥) ▷ #nightly (28 messages🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #mojo-marathons (9 messages🔥):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Link mentioned: LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps


OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

wyverndryke: These are amazing you guys! 😄


OpenRouter (Alex Atallah) ▷ #general (155 messages🔥🔥):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #일반 (1 messages):

salnegyeron: 나쁘진 않는데 Cohere의 Aya23이 나은 것 같았어요


Latent Space ▷ #ai-general-chat (74 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):


Latent Space ▷ #ai-in-action-club (34 messages🔥):

Links mentioned:


LangChain AI ▷ #general (97 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #share-your-work (7 messages):

Links mentioned:


LangChain AI ▷ #tutorials (3 messages):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (40 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-questions (6 messages):

Link mentioned: How Cohere will improve AI Reasoning this year: Aidan Gomez, CEO of Cohere, reveals how they're tackling AI hallucinations and improving reasoning abilities. He also explains why Cohere doesn't use any out...


Interconnects (Nathan Lambert) ▷ #ml-drama (23 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (5 messages):


Interconnects (Nathan Lambert) ▷ #memes (7 messages):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rl (4 messages):


OpenInterpreter ▷ #general (33 messages🔥):

Links mentioned:


OpenInterpreter ▷ #O1 (42 messages🔥):

Links mentioned:


LlamaIndex ▷ #blog (7 messages):


LlamaIndex ▷ #general (65 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #ai-discussion (1 messages):


OpenAccess AI Collective (axolotl) ▷ #general (54 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (11 messages🔥):


Eleuther ▷ #general (9 messages🔥):

Link mentioned: Home: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth


Eleuther ▷ #research (46 messages🔥):

Links mentioned:


Eleuther ▷ #lm-thunderdome (7 messages):

Link mentioned: Tweet from Lysandre (@LysandreJik): Last week, Gemma 2 was released. Since then, implems have been tuned to reflect the model performance: pip install -U transformers==4.42.3 We saw reports of tools (transformers, llama.cpp) not being...


Eleuther ▷ #gpt-neox-dev (1 messages):

arctodus_: Thanks! This is what I was looking for. Will take a look.


tinygrad (George Hotz) ▷ #general (27 messages🔥):

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (34 messages🔥):

Links mentioned:


LAION ▷ #general (25 messages🔥):


LAION ▷ #research (31 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #general (16 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (4 messages):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #hugging-face (7 messages):


LLM Finetuning (Hamel + Dan) ▷ #replicate (1 messages):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #ankurgoyal_textsql_llmevals (9 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #paige_when_finetune (2 messages):


LLM Finetuning (Hamel + Dan) ▷ #axolotl (1 messages):


LLM Finetuning (Hamel + Dan) ▷ #zach-accelerate (1 messages):

Link mentioned: Hugging Face Accelerate: Making Device-Agnostic ML Training and Inference Easy... - Zachary Mueller: Hugging Face Accelerate: Making Device-Agnostic ML Training and Inference Easy at Scale - Zachary Mueller, Hugging FaceHugging Face Accelerate is an open-sou...


LLM Finetuning (Hamel + Dan) ▷ #fireworks (1 messages):

1dingyao: Hi <@466291653154439169>,

adingyao-41fa41 is for your assistance please.

Many thanks!


LLM Finetuning (Hamel + Dan) ▷ #east-coast-usa (1 messages):

saharn_34789: Anyone in NY? if meet up in Boston, I will manage.


LLM Finetuning (Hamel + Dan) ▷ #career-questions-and-stories (4 messages):


LLM Finetuning (Hamel + Dan) ▷ #openpipe (1 messages):


LLM Finetuning (Hamel + Dan) ▷ #bergum_rag (5 messages):


Cohere ▷ #general (32 messages🔥):

Links mentioned:


Cohere ▷ #project-sharing (3 messages):

Links mentioned:


Torchtune ▷ #general (22 messages🔥):

Links mentioned:


AI Stack Devs (Yoko Li) ▷ #app-showcase (1 messages):

mikhail_ee: Some fresh locations from https://Hexagen.World


AI Stack Devs (Yoko Li) ▷ #ai-companion (12 messages🔥):

Links mentioned:


AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (2 messages):


Mozilla AI ▷ #llamafile (12 messages🔥):

Links mentioned:


MLOps @Chipro ▷ #events (8 messages🔥):

Links mentioned:


Datasette - LLM (@SimonW) ▷ #ai (1 messages):

dbreunig: Only since 5-19, but you can definitely see the pack catching up at the top


DiscoResearch ▷ #general (1 messages):

Links mentioned:




{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}