Frozen AI News archive

o1 destroys Lmsys Arena, Qwen 2.5, Kyutai Moshi release

**OpenAI's o1-preview** model has achieved a milestone by fully matching top daily AI news stories without human intervention, consistently outperforming other models like **Anthropic**, **Google**, and **Llama 3** in vibe check evaluations. **OpenAI** models dominate the top 4 slots on **LMsys** benchmarks, with rate limits increasing to **500-1000 requests per minute**. In open source, **Alibaba's Qwen 2.5** suite surpasses **Llama 3.1** at the 70B scale and updates its closed **Qwen-Plus** models to outperform **DeepSeek V2.5** but still lag behind leading American models. **Kyutai Moshi** released its open weights realtime voice model featuring a unique streaming neural architecture with an "inner monologue." **Weights & Biases** introduced **Weave**, an LLM observability toolkit that enhances experiment tracking and evaluation, turning prompting into a more scientific process. The news also highlights upcoming events like the **WandB LLM-as-judge hackathon** in San Francisco. *"o1-preview consistently beats out our vibe check evals"* and *"OpenAI models are gradually raising rate limits by the day."*

Canonical issue URL

AI News for 9/17/2024-9/18/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (221 channels, and 1591 messages) for you. Estimated reading time saved (at 200wpm): 176 minutes. You can now tag @smol_ai for AINews discussions!

We humans at Smol AI have been dreading this day.

For the first time ever, an LLM has been able to 100% match and accurately report what we consider to be the top stories of the day without our intervention. (See the AI Discord Recap below.)

Perhaps more interesting for the model trainers, o1-preview consistently beats out our vibe check evals. Every AINews daily run is a bakeoff between OpenAI, Anthropic, and Google models (you can see traces in the archives. we briefly tried Llama 3 too but it consistently lost), and o1-preview has basically won every day since introduction (with no specific tuning beyond needing to rip out instructor's hidden system prompts).

We now have LMsys numbers on o1-preview and -mini to quantify the vibe checks.

image.png

The top 4 slots on LMsys are now taken by OpenAI models. Demand has been high even as OpenAI is gradually raising rate limits by the day, now up to 500-1000 requests per minute.

Over in open source land, Alibaba's Qwen caught up to DeepSeek with its own Qwen 2.5 suite of general, coding, and math models, showing better numbers than Llama 3.1 at the 70B scale.

image.png

as well as updating their closed Qwen-Plus models to beat DeepSeek V2.5 but coming short of the American frontier models.

Finally, Kyutai Moshi, which teased its realtime voice model in July and had some entertaining/concerning mental breakdowns in the public demo, finally released their open weights model as promised, along with details of their unique streaming neural architecture that displays an "inner monologue".

image.png

Live demo remains at https://moshi.chat, or try locally at

$ pip install moshi_mlx
$ python -m moshi_mlx.local_web -q 4

[This week's issues brought to you by Weights and Biases Weave!]: Look, we’ll be honest, many teams know Weights & Biases only as the best ML experiments tracking software in the world and aren’t even aware of our new LLM observability toolkit called Weave. So if you’re reading this, and you’re doing any LLM calls on production, why don’t you give Weave a try? With 3 lines of code you can log and trace all inputs, outputs and metadata between your users and LLMs, and with our evaluation framework, you can turn your prompting from an art into more of a science.

Check out the report on building a GenAI-assisted automatic story illustrator with Weave.

swyx's Commentary: I'll visiting the WandB LLM-as-judge hackathon this weekend in SF with many friends from the Latent Space/AI Engineer crew hacking with Weave!

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Updates and Releases

AI Development and Tools

AI Research and Benchmarks

AI Education and Resources

AI Applications and Demonstrations


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. T-MAC: Energy-efficient CPU backend for llama.cpp

Theme 2. Qwen2.5-72B-Instruct: Performance and content filtering

Theme 3. Latest developments in Vision Language Models (VLMs)

Theme 4. Mistral Small v24.09: New 22B enterprise-grade model

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Model Advancements and Research

AI Applications and Demonstrations

Industry and Infrastructure Developments

Philosophical and Societal Implications


AI Discord Recap

A summary of Summaries of Summaries by O1-preview

Theme 1. New AI Models Take the Stage

Theme 2. Turbocharging Model Fine-Tuning

Theme 3. Navigating AI Model Hiccups

Theme 4. AI Rocks the Creative World

Theme 5. AI Integration Boosts Productivity


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


Stability.ai (Stable Diffusion) Discord


OpenRouter (Alex Atallah) Discord


aider (Paul Gauthier) Discord


Perplexity AI Discord


HuggingFace Discord


Nous Research AI Discord


CUDA MODE Discord


Eleuther Discord


OpenAI Discord


Cohere Discord


LM Studio Discord


Latent Space Discord


Torchtune Discord


Interconnects (Nathan Lambert) Discord


OpenInterpreter Discord


LlamaIndex Discord


LangChain AI Discord


Modular (Mojo 🔥) Discord


LAION Discord


DSPy Discord


OpenAccess AI Collective (axolotl) Discord


tinygrad (George Hotz) Discord


LLM Finetuning (Hamel + Dan) Discord


Gorilla LLM (Berkeley Function Calling) Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Unsloth AI (Daniel Han) ▷ #general (196 messages🔥🔥):

  • Unsloth Model Fine-Tuning
  • Qwen 2.5 Release
  • Gemma 2 Fine-Tuning Issues
  • Pytorch Conference
  • Using WSL for Installation

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (20 messages🔥):

  • Neural Network Code Generation
  • Path Specification for Llama-CPP
  • Fine-tuning Llama Models
  • LoRa Quantization Issues
  • vLLM Serving Performance

Link mentioned: unsloth/unsloth/save.py at main · unslothai/unsloth: Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth


Stability.ai (Stable Diffusion) ▷ #general-chat (161 messages🔥🔥):

  • Training LoRa Models
  • Image Generation Techniques
  • Multidiffusion Usage
  • Audio Generation Tools
  • General AI Discussions

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (126 messages🔥🔥):

  • OpenRouter issues
  • Mistral API price drops
  • Rate limits and model access
  • Backup model usage
  • LLM allocation for users

Links mentioned:


OpenRouter (Alex Atallah) ▷ #beta-feedback (21 messages🔥):

  • Fallback Model Behavior
  • API Key Management
  • Rate Limiting with Gemini Flash
  • User Implementation of Fallbacks

aider (Paul Gauthier) ▷ #general (109 messages🔥🔥):

  • Aider Performance
  • Using OpenAI Models
  • O1 Mini Feedback
  • DeepSeek Model Testing
  • OpenAI API Costs

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (17 messages🔥):

  • Marblism tool
  • Aider functionality enhancements
  • RAG system integration
  • Markdown vs XML discussion
  • User engagement with Aider

Link mentioned: Images & web pages: Add images and web pages to the aider coding chat.


aider (Paul Gauthier) ▷ #links (9 messages🔥):

  • Claude 3.5 Sonnet system prompt
  • RethinkMCTS
  • JavaScript trademark concerns
  • Fine-tuning GPT-4o
  • FlutterFlow 5.0

Links mentioned:


Perplexity AI ▷ #general (120 messages🔥🔥):

  • Perplexity Pro Model Integration
  • O1 and Reasoning Focus
  • Perplexity API vs ChatGPT
  • Challenges with Perplexity Features
  • User Experience with Extensions

Links mentioned:


Perplexity AI ▷ #sharing (7 messages):

  • Slack AI Agents
  • Lucid Electric SUV
  • Bitcoin Puzzle
  • Windows Registry Tips
  • Motorola Smartphones

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (3 messages):

  • Perplexity API consistency
  • New API features timeline

HuggingFace ▷ #announcements (1 messages):

  • Hugging Face API Docs
  • TRL v0.10 Release
  • PySpark for HF Datasets
  • Sentence Transformers v3.1
  • DataCraft Introduction

Links mentioned:


HuggingFace ▷ #general (101 messages🔥🔥):

  • Hugging Face Conference Attendance
  • JSON Output in LLMs
  • Moshi Checkpoint Release
  • ADIFY AI Playlist Generator
  • Qwen2.5 Math Demo Release

Links mentioned:


HuggingFace ▷ #today-im-learning (1 messages):

  • Lunar Flu FAQ
  • Support Requests

HuggingFace ▷ #i-made-this (5 messages):

  • Mini-4B Model Release
  • Biometric Template Protection Implementation
  • Interactive World & Character Generative AI
  • Reasoning and Reflection Theories Dataset

Links mentioned:


HuggingFace ▷ #computer-vision (5 messages):

  • Open-source computer vision projects
  • Research topic exploration in CV and ML
  • AI video encoding models for live streaming
  • Python implementation for token settings

Links mentioned:


HuggingFace ▷ #NLP (4 messages):

  • Llama3 model upload
  • MLflow model registration

HuggingFace ▷ #diffusion-discussions (2 messages):

  • Image to Cartoon Models
  • AI Video Encoding for Live Streaming

Nous Research AI ▷ #general (84 messages🔥🔥):

  • NousCon Attendance
  • Hermes Tool Calling
  • Qwen 2.5 Release
  • Fine-tuning Models
  • AI Community Interaction

Links mentioned:


Nous Research AI ▷ #ask-about-llms (11 messages🔥):

  • Hermes 3 API Access
  • Open Source LLM Prompt Size
  • Gemma 2 Token Training
  • Model Parameter Calculation

Link mentioned: Unveiling Hermes 3: The First Full-Parameter Fine-Tuned Llama 3.1 405B Model is on Lambda’s Cloud: Introducing Hermes 3 in partnership with Nous Research, the first fine-tune of Meta Llama 3.1 405B model. Train, fine-tune or serve Hermes 3 with Lambda


Nous Research AI ▷ #research-papers (5 messages):

  • Research on chunking phases
  • Reverse engineering o1
  • OpenAI Strawberry speculation

Nous Research AI ▷ #research-papers (5 messages):

  • chunking phases research
  • OpenAI's strawberry speculation

CUDA MODE ▷ #triton (6 messages):

  • Triton Conference Keynote
  • Triton CPU / ARM Status
  • CUDA Community Engagement

Link mentioned: GitHub - triton-lang/triton-cpu: An experimental CPU backend for Triton: An experimental CPU backend for Triton. Contribute to triton-lang/triton-cpu development by creating an account on GitHub.


CUDA MODE ▷ #torch (1 messages):

kashimoo: is there a video or smth for navigating across chrome tracing with the pytorch profiler


CUDA MODE ▷ #algorithms (2 messages):

  • Triton
  • Fine-grained control

CUDA MODE ▷ #jax (1 messages):

betonitcso: What do you use if you want a very high throughput dataloader? Is using Grain common?


CUDA MODE ▷ #torchao (9 messages🔥):

  • torch.compile() performance
  • NVIDIA NeMo model timings
  • Llama benchmarks
  • TorchInductor performance dashboard

Link mentioned: no title found: no description found


CUDA MODE ▷ #irl-meetup (1 messages):

nahidai: Is there any GPU programming reading/work group based on SF? Would love to join


CUDA MODE ▷ #llmdotc (41 messages🔥):

  • RMSNorm kernel issues
  • Training Llama3 with FP32
  • Introduction to Torch Titan
  • FP8 stability and multi-GPU setup
  • Meeting up to discuss Llama3.1 hacks

Links mentioned:


CUDA MODE ▷ #bitnet (24 messages🔥):

  • Ternary LUT Implementation
  • Quantization Techniques
  • Performance of Llama-2 Model
  • Kernel Performance in BitNet
  • Training with int4 Tensor Cores

Links mentioned:


CUDA MODE ▷ #cudamode-irl (14 messages🔥):

  • PyTorch Conference Attendance
  • RSVP Email Status
  • Project Proposals for Hackathon
  • Mentorship in CUDA
  • IRL Hackathon Acceptance

Links mentioned:


CUDA MODE ▷ #liger-kernel (3 messages):

  • Nondeterministic Methods
  • Pixtral Support
  • Upcoming Release

Link mentioned: [Model] Pixtral Support by AndreSlavescu · Pull Request #253 · linkedin/Liger-Kernel: Summary This PR aims to support pixtral Testing Done tested model + tested monkey patch Hardware Type: 4090 run make test to ensure correctness run make checkstyle to ensure code style run ...


CUDA MODE ▷ #metal (1 messages):

.mattrix96: Just started with puzzles now!


Eleuther ▷ #general (11 messages🔥):

  • Open-Source TTS Models
  • Model Debugging
  • Image Style Changes

Links mentioned:


Eleuther ▷ #research (41 messages🔥):

  • Compression Techniques for MLRA
  • Diagram of Thought (DoT)
  • Low-Precision Training Experiments
  • Playground v3 Model Release
  • Evaluation Methods for LLM Outputs

Links mentioned:


Eleuther ▷ #scaling-laws (3 messages):

  • Training Compute-Optimal Large Language Models
  • Pythia Scaling Curves
  • Big Bench Tasks

Eleuther ▷ #interpretability-general (9 messages🔥):

  • Fourier Transforms of Hidden States
  • Pythia Checkpoints
  • Power Law Behavior in Models
  • Attention Residual Analysis

Link mentioned: interpreting GPT: the logit lens — LessWrong: This post relates an observation I've made in my work with GPT-2, which I have not seen made elsewhere. …


Eleuther ▷ #lm-thunderdome (8 messages🔥):

  • kv-cache issue workaround
  • Chain of Thought prompting with lm eval harness
  • Pending PRs for new benchmarks
  • Comments on PR improvements

Link mentioned: lm-evaluation-harness/docs/new_task_guide.md at main · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness


Eleuther ▷ #gpt-neox-dev (2 messages):

  • Model Outputs
  • Frontier Setup Progress

OpenAI ▷ #ai-discussions (44 messages🔥):

  • Custom GPT Use Cases
  • Advanced Voice Mode Update
  • Concerns Over AI Saturation
  • PDF Formatting Issues for LLMs
  • AI Content Quality Debate

Link mentioned: Reddit - Dive into anything: no description found


OpenAI ▷ #gpt-4-discussions (12 messages🔥):

  • Sharing customized GPTs
  • Automated task requirements
  • Truth-seeking with GPT
  • Reporting cross posting
  • Advanced voice mode capabilities

OpenAI ▷ #prompt-engineering (7 messages):

  • Solicitation Rules
  • GPT Store Creations

OpenAI ▷ #api-discussions (7 messages):

  • GPT Store products
  • Self-promotion rules
  • Language barriers in discussions

Cohere ▷ #discussions (56 messages🔥🔥):

  • Cohere Job Application
  • CoT-Reflections
  • O1 and Reward Models
  • Cost of Experimenting with LLMs
  • OpenAI's CoT Training

Links mentioned:


Cohere ▷ #questions (6 messages):

  • Billing Information Setup
  • VAT Concerns
  • Support Contact

LM Studio ▷ #general (49 messages🔥):

  • Markov Models
  • Training Time for Model
  • PyTorch Framework
  • LM Studio Updates
  • AI Model Recommendations

Links mentioned:


LM Studio ▷ #hardware-discussion (12 messages🔥):

  • Intel Arc multi-GPU setup
  • IPEX performance in LLM
  • NVIDIA 5000 series rumors
  • GPU pricing

Latent Space ▷ #ai-general-chat (47 messages🔥):

  • Langchain Partner Packages
  • Mistral Free Tier Release
  • Qwen 2.5 Full Release
  • Moshi Kyutai Model Release
  • Investment in Mercor

Links mentioned:


Torchtune ▷ #announcements (1 messages):

  • Torchtune 0.3 Release
  • FSDP2 Integration
  • Training-Time Speedups
  • DoRA/QDoRA Support
  • Memory Optimization Techniques

Links mentioned:


Torchtune ▷ #dev (38 messages🔥):

  • Cache Management
  • KV Caching for Models
  • Evaluating Multi-Modal Tasks
  • Pytorch Conference Updates
  • Three Day Work Weeks

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (32 messages🔥):

  • Qwen2.5 Release
  • OpenAI o1 Models Performance
  • Math Reasoning in AI
  • Knowledge Cutoff Issues

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-questions (4 messages):

  • Transformer architecture
  • BertViz library
  • GDM LLM self-critique

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (1 messages):

xeophon.: https://x.com/agarwl_/status/1836119825216602548?s=46


Interconnects (Nathan Lambert) ▷ #memes (1 messages):

xeophon.: I love Twitter


OpenInterpreter ▷ #general (10 messages🔥):

  • 01 App Functionality
  • Automating Browser Form Tasks
  • CV Agents Experimentation

Link mentioned: GitHub - 0xrushi/cv-agents: Intelligent Resumes for Smarter Job Hunting: Intelligent Resumes for Smarter Job Hunting. Contribute to 0xrushi/cv-agents development by creating an account on GitHub.


OpenInterpreter ▷ #O1 (7 messages):

  • Beta Space Availability
  • Browser Action Issues on Windows
  • Error with Discord App Store Prompt

OpenInterpreter ▷ #ai-content (4 messages):

  • Moshi artifacts release
  • Moshi technical report
  • Moshi GitHub repository
  • Audio sync feedback

Links mentioned:


LlamaIndex ▷ #blog (1 messages):

  • RAG services deployment
  • AWS CDK
  • LlamaIndex

LlamaIndex ▷ #general (10 messages🔥):

  • Weaviate Issue Resolution
  • Open Source Contribution Process
  • Feedback on RAG Approaches

Link mentioned: [Question]: LLamaIndex and Weaviate · Issue #13787 · run-llama/llama_index: Question Validation I have searched both the documentation and discord for an answer. Question I am attempting to use llamaIndex to retrieve documents from my weaviate vector database. I have follo...


LangChain AI ▷ #general (3 messages):

  • LLMs Response Latency
  • Python and LangChain Optimizations

LangChain AI ▷ #langserve (2 messages):

  • Langserve
  • React Frontend
  • State Management
  • Python Backend

LangChain AI ▷ #share-your-work (5 messages):

  • PDF Extraction Toolkit
  • RAG Application with AWS
  • LangChain Framework

Links mentioned:


Modular (Mojo 🔥) ▷ #general (8 messages🔥):

  • BeToast Discord Compromise
  • Windows Native Support

Link mentioned: [Feature Request] Native Windows support · Issue #620 · modularml/mojo: Review Mojo's priorities I have read the roadmap and priorities and I believe this request falls within the priorities. What is your request? native support for windows. when will it be available?...


Modular (Mojo 🔥) ▷ #mojo (2 messages):

  • SIMD Conversion
  • Data Type Handling

LAION ▷ #general (6 messages):

  • State of the Art Text to Speech
  • Open Source TTS Solutions
  • Closed Source TTS Solutions

LAION ▷ #research (4 messages):

  • OmniGen
  • Nvidia open-source LLMs
  • SDXL VAE
  • Phi-3

Links mentioned:


DSPy ▷ #show-and-tell (6 messages):

  • Ruff check error
  • Interview with Sayash Kapoor and Benedikt Stroebl
  • LanceDB integration with DSpy
  • Elixir live coding
  • Typed predictors example

Links mentioned:


DSPy ▷ #general (3 messages):

  • API Key Management
  • Trust Issues with Unofficial Servers
  • Reusable RAG Pipelines
  • Multi-Company Context

OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (8 messages🔥):

  • Curriculum Learning Implementation
  • Dataset Shuffling Control

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.


tinygrad (George Hotz) ▷ #general (3 messages):

  • Tinybox setup instructions
  • Tinygrad and Tinybox integration
  • MLPerf Training with Tinyboxes

Link mentioned: tinybox - tinygrad docs: no description found


LLM Finetuning (Hamel + Dan) ▷ #general (1 messages):

  • rateLLMiter
  • API call management
  • Pip install modules

Link mentioned: GitHub - llmonpy/ratellmiter: Rate limiter for LLM clients: Rate limiter for LLM clients. Contribute to llmonpy/ratellmiter development by creating an account on GitHub.


Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

  • Prompt Errors
  • Template Usage





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}