Frozen AI News archive

Cerebras Inference: Faster, Better, AND Cheaper

**Groq** led early 2024 with superfast LLM inference speeds, achieving ~450 tokens/sec for Mixtral 8x7B and 240 tokens/sec for Llama 2 70B. **Cursor** introduced a specialized code edit model hitting 1000 tokens/sec. Now, **Cerebras** claims the fastest inference with their wafer-scale chips, running **Llama3.1-8b** at 1800 tokens/sec and **Llama3.1-70B** at 450 tokens/sec at full precision, with competitive pricing and a generous free tier. **Google's Gemini 1.5** models showed significant benchmark improvements, especially Gemini-1.5-Flash and Gemini-1.5-Pro. New open-source models like **CogVideoX-5B** and **Mamba-2 (Rene 1.3B)** were released, optimized for consumer hardware. **Anthropic's Claude** now supports prompt caching, improving speed and cost efficiency. *"Cerebras Inference runs Llama3.1 20x faster than GPU solutions at 1/5 the price."*

Canonical issue URL

AI News for 8/27/2024-8/28/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (215 channels, and 2366 messages) for you. Estimated reading time saved (at 200wpm): 239 minutes. You can now tag @smol_ai for AINews discussions!

A brief history of superfast LLM inference in 2024:

It is now finally Cerebras' turn to shine. The new Cerebras Inference service is touting Llama3.1-8b at 1800 tok/s at $0.10/mtok and Llama3.1-70B at 450 tokens/s at $0.60/mtok at full precision. Needless to say, Cerebras pricing at full precision AND their unmatched speed is suddenly a serious player in this market. To take their marketing line: "Cerebras Inference runs Llama3.1 20x faster than GPU solutions at 1/5 the price." - not technically true - most inference providers like Together and Fireworks tend to guide people towards the quantized versions of their services, with FP8 70B priced at $0.88/mtok and INT4 70B priced at $0.54. Indisputably better, but not 5x cheaper, not 20x faster.

image.png

Note: one should also note their very generous free tier of 1 million free tokens daily.

The secret, of course, is Cerebras' wafer-scale chips (what else would you expect them to say?). Similar to Groq's LPU argument, Cerebras says putting the entire model in SRAM is the key:

image.png

Your move, Groq and Sambanova.


Today's sponsor: Solaris

Solaris, an office for early stage AI startups in SF, has new desk and office openings! It’s been HQ to founders backed by Nat Friedman, Daniel Gross, Sam Altman, YC and more.**

Swyx's comment: I’ve been here for the last 9 months and have absolutely loved it. If you’re looking for a quality place to build the next great AI startup, book a time with the founders here, and tell them we sent you.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Updates and Benchmarks

AI Development and Infrastructure

AI Applications and Tools

AI Ethics and Regulation

Miscellaneous AI Insights


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Open-Source Text-to-Video AI: CogVideoX 5B Breakthrough

Theme 2. Advancements in Efficient AI Models: Gemini 1.5 Flash 8B

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Model Advancements and Releases

AI in Image Generation and Manipulation

Robotics and Physical AI

Scientific Breakthroughs

AI Ethics and Societal Impact


AI Discord Recap

A summary of Summaries of Summaries by GPT4O (gpt-4o-2024-05-13)

1. LLM Advancements and Benchmarking

2. Model Performance Optimization and Benchmarking

3. Open-Source AI Developments and Collaborations

4. Multimodal AI and Generative Modeling Innovations

5. Fine-tuning Challenges and Prompt Engineering Strategies


PART 1: High level Discord summaries

HuggingFace Discord


Unsloth AI (Daniel Han) Discord


aider (Paul Gauthier) Discord


LM Studio Discord


Perplexity AI Discord


Nous Research AI Discord


OpenRouter (Alex Atallah) Discord


Eleuther Discord


LlamaIndex Discord


Torchtune Discord


OpenAI Discord


Interconnects (Nathan Lambert) Discord


Cohere Discord


Modular (Mojo 🔥) Discord


Latent Space Discord


OpenInterpreter Discord


OpenAccess AI Collective (axolotl) Discord


DSPy Discord


tinygrad (George Hotz) Discord


LAION Discord


Gorilla LLM (Berkeley Function Calling) Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

HuggingFace ▷ #general (468 messages🔥🔥🔥):

  • Gamification of home training
  • Triton config
  • Loss curve
  • Model finetuning
  • GPU performance

Links mentioned:


HuggingFace ▷ #today-im-learning (6 messages):

  • Training AI on CPU
  • Training AI on Laptop
  • Colab TPU instances

Link mentioned: Google Colab: no description found


HuggingFace ▷ #cool-finds (4 messages):

  • DisTrO
  • GameNGen
  • Llama Implementation
  • WiM

Links mentioned:


HuggingFace ▷ #i-made-this (11 messages🔥):

  • RYFAI
  • Tau LLM Series
  • Streetwear Flux
  • Loadimg
  • Bielik-11B

Links mentioned:


HuggingFace ▷ #computer-vision (3 messages):

  • VAEs for Text-Image Generation
  • Transformers Library Contribution
  • Document Quality Assessment
  • Data Augmentation for Document Quality

Link mentioned: Move weight initialization for DeformableDetr · Issue #29818 · huggingface/transformers: System Info Not relevant Reproduction See Deformable Detr Modeling. Expected behavior All weight initializations should be done in _init_weights of the xxxPretrainedModel class


HuggingFace ▷ #NLP (1 messages):

  • Text-Summary trends 2024
  • Specialized vs General Models
  • Llama Long Context
  • System Prompts

HuggingFace ▷ #diffusion-discussions (1 messages):

  • ``

Unsloth AI (Daniel Han) ▷ #general (228 messages🔥🔥):

  • VLLM on Kaggle
  • Aphrodite on Kaggle
  • VLLM on Colab
  • Mistral struggles
  • Model Merging

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (64 messages🔥🔥):

  • 8bit training
  • Unsloth Cont Pretraining
  • Dataset Size
  • Context Length
  • Model Layer Tuning

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (11 messages🔥):

  • Duet Dataset
  • Bielik-11B Model
  • Herplete-LLM-Llama-3.1-8b
  • Unsloth Community

Links mentioned:


Unsloth AI (Daniel Han) ▷ #community-collaboration (1 messages):

mrdragonfox: ya well - weird people are beeing weird


aider (Paul Gauthier) ▷ #announcements (1 messages):

  • Aider v0.54.0
  • New Gemini Models
  • Shell Command Improvements
  • Aider's Role in Development
  • Performance Enhancements

aider (Paul Gauthier) ▷ #general (119 messages🔥🔥):

  • Aider v0.54.0
  • Gemini 1.5 Pro Benchmark
  • OpenRouter vs Discord
  • Prompt Caching
  • Aider and Sonnet 3.5

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (41 messages🔥):

  • Aider on Replit
  • Commit Message Errors
  • Aider Security
  • Aider Documentation
  • Aider Repo Map

Links mentioned:


aider (Paul Gauthier) ▷ #links (6 messages):

  • GameNGen
  • Diffusion Models
  • Doom
  • Real-time Game Engines

Link mentioned: GameNGen: Diffusion Models Are Real-Time Game Engines


LM Studio ▷ #general (148 messages🔥🔥):

  • LM Studio versions
  • LM Studio on Linux
  • LM Studio on Snapdragon
  • LM Studio on AMD GPU
  • LLMs and security

Links mentioned:


LM Studio ▷ #hardware-discussion (8 messages🔥):

  • VRAM and RAM for LLMs
  • NPU vs GPU for LLMs
  • PCIE 5.0 x4 for GPU

Links mentioned:


Perplexity AI ▷ #general (101 messages🔥🔥):

  • Perplexity Pro Issues
  • Claude 3.5 Message Limit
  • Perplexity Image Upload Issues
  • Perplexity Search Quality
  • Perplexity vs ChatGPT

Link mentioned: This Fine GIF - This Is Fine - Discover & Share GIFs: Click to view the GIF


Perplexity AI ▷ #sharing (9 messages🔥):

  • Shareable Threads
  • WTF critical thinking
  • Claude Prompts
  • Australia's Right to Log Off
  • China's Renewable Energy

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (3 messages):

  • Perplexity AI Hebrew implementation
  • Perplexity API citation feature beta

Nous Research AI ▷ #general (75 messages🔥🔥):

  • DisTrO efficiency
  • Training very large LLMs
  • DPO Training
  • Model Merging
  • Hermes 3 vs Llama 3.1

Links mentioned:


Nous Research AI ▷ #ask-about-llms (23 messages🔥):

  • Finetuning with Synthetic Data
  • Hermes 3
  • Llama 3.1 8B on CPU
  • Model Size and RAM
  • Conversation Topic Tagging

Nous Research AI ▷ #research-papers (1 messages):

sunhao77: https://arxiv.org/abs/2408.11029


Nous Research AI ▷ #interesting-links (4 messages):

  • Flex-attention visualization tool
  • Tiny ASIC Matrix Multiplication Implementation

Links mentioned:


Nous Research AI ▷ #research-papers (1 messages):

sunhao77: https://arxiv.org/abs/2408.11029


Nous Research AI ▷ #rag-dataset (1 messages):

draeician: I'd love to see is if you don't mind.


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

  • OpenRouter API Degradation
  • Llama 3.1 405B Update

Link mentioned: Llama 3.1 405B (base) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 405B (base) with API


OpenRouter (Alex Atallah) ▷ #general (89 messages🔥🔥):

  • Hyperbolic's BF16 Llama 405B
  • LMSys Leaderboard
  • OpenRouter's DeepSeek Caching
  • OpenRouter's Activity Page Bar Chart
  • Gemini Flash-8B performance

Links mentioned:


Eleuther ▷ #general (11 messages🔥):

  • Llama 3.1 405B API
  • TRL.X
  • Model Training Data
  • Monte Carlo Tree Search
  • Computer Vision Research

Links mentioned:


Eleuther ▷ #research (73 messages🔥🔥):

  • LR scaling with batch size
  • Adam vs SGD
  • MiniCPM paper
  • Infinite LRs
  • AdamWScale

Links mentioned:


LlamaIndex ▷ #blog (3 messages):

  • LlamaIndex
  • RAG
  • Workflows
  • Query Engine
  • Developer Competition

LlamaIndex ▷ #general (66 messages🔥🔥):

  • LlamaIndex's support for OpenAI models
  • LlamaIndex's openai library update
  • LlamaIndex's pydantic v2 breakages
  • GraphRAG authentication error
  • Multi-Agent system for NL to SQL

Links mentioned:


Torchtune ▷ #general (8 messages🔥):

  • QLoRA distributed training
  • FSDP1/2
  • torch.compile
  • liger kernels
  • chunkedCE

Torchtune ▷ #dev (49 messages🔥):

  • Torch.compile performance
  • Activation checkpointing (AC)
  • Model compilation granularity
  • Model building and KV cache lengths
  • Torchtune PRs

Links mentioned:


OpenAI ▷ #ai-discussions (31 messages🔥):

  • LLM Hallucination
  • GPT-4 vs. Mini
  • SearchGPT vs. Perplexity
  • AI Sentience and Emotions
  • Orion Model Access

Link mentioned: Tweet from Ignacio de Gregorio (@TheTechOasis1): http://x.com/i/article/1827379585861709824


OpenAI ▷ #gpt-4-discussions (20 messages🔥):

  • ChatGPT 4 vs ChatGPT 3.5
  • OpenAI and Google's Data Scraping
  • Custom GPT's memory feature
  • GPTs following multi-step instructions
  • Llama 3.1 vs ChatGPT

Interconnects (Nathan Lambert) ▷ #news (13 messages🔥):

  • Gemini 1.5 Flash-8B
  • Gemini 1.5 Pro
  • Gemini 1.5 Flash
  • Aistudio
  • RewardBench

Link mentioned: Tweet from Logan Kilpatrick (@OfficialLoganK): Today, we are rolling out three experimental models: - A new smaller variant, Gemini 1.5 Flash-8B - A stronger Gemini 1.5 Pro model (better on coding & complex prompts) - A significantly improved Gem...


Interconnects (Nathan Lambert) ▷ #ml-drama (1 messages):

  • AI Art Accessibility
  • AI Art as the Path Forward

Interconnects (Nathan Lambert) ▷ #random (4 messages):

  • Gemini API
  • Gemini API Rate Limits
  • Gemini API Model Availability

Interconnects (Nathan Lambert) ▷ #posts (15 messages🔥):

  • SnailBot
  • Open Source
  • Data Availability
  • Fair Use

Cohere ▷ #discussions (4 messages):

  • Cat gifs

Link mentioned: Dance Dancing GIF - Dance Dancing Dancing cat - Discover & Share GIFs: Click to view the GIF


Cohere ▷ #questions (25 messages🔥):

  • Langchain + Cohere API Errors
  • Cohere API Response Errors
  • Token Counting for Cohere API
  • Aya-23-8b Inference Time
  • Model Quantization

Link mentioned: Tokenize — Cohere: This endpoint splits input text into smaller units called tokens using byte-pair encoding (BPE). To learn more about tokenization and byte pair encoding, see the tokens page.


Cohere ▷ #projects (3 messages):

  • Persian Tourist Attractions
  • Next.js App
  • Cohere AI
  • Google Places API

Modular (Mojo 🔥) ▷ #general (8 messages🔥):

  • Mojo Circular Imports
  • Python Circular Imports
  • Mojo Compiler Optimization
  • Mojo Top Level Statements

Link mentioned: Appointments: no description found


Modular (Mojo 🔥) ▷ #mojo (21 messages🔥):

  • Mojo performance
  • Mojo named return slots
  • Mojo non-movable types
  • Mojo's Ownership Model
  • Mojo debugging

Links mentioned:


Latent Space ▷ #ai-general-chat (19 messages🔥):

  • Gemini 1.5 Flash Models
  • Anthropic's Claude 3.5 Sonnet
  • Artifacts on iOS and Android
  • Cartesia's Sonic
  • Cerebras Inference

Links mentioned:


OpenInterpreter ▷ #general (11 messages🔥):

  • interpreter custom_instructions
  • emit images in jupyter
  • jupyterbook metadata
  • cython code
  • openinterpreter development

Links mentioned:


OpenInterpreter ▷ #O1 (4 messages):

  • 01 Design & Research
  • Pre-order Status

OpenInterpreter ▷ #ai-content (2 messages):

  • Daily Bots
  • RTVI
  • Bland
  • Voice AI
  • Real-time AI

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (3 messages):

  • Apple Silicon (M3) support for Axlotl
  • Training on Apple Silicon

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (4 messages):

  • Power Scheduler
  • Learning Rate
  • Batch Size
  • Training Tokens
  • QLora FSDP

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (3 messages):

  • token analysis
  • model training

OpenAccess AI Collective (axolotl) ▷ #community-showcase (1 messages):

akjindal53244: Upon eye-balling it seemed fine but we didn't perform any quantitative analysis.


DSPy ▷ #show-and-tell (1 messages):

  • DSPy ImageNet moment
  • NeurIPS HackerCup 2024
  • DSPy for coding
  • Weights & Biases DSPy Talk

Link mentioned: Tweet from Connor Shorten (@CShorten30): Convolutional Neural Networks had their "ImageNet" moment when they surpassed hand-crafted image features on the immensely popular ImageNet dataset. This then sparked massive development inte...


DSPy ▷ #general (7 messages):

  • OpenAI Base URL/Model Change
  • IPython Interpreter for DSPy
  • MIPRO Interview with Krista Opsahl-Ong

Link mentioned: Tweet from Connor Shorten (@CShorten30): I am BEYOND EXCITED to publish our interview with Krista Opsahl-Ong (@kristahopsalong) from @StanfordAILab! 🔥 Krista is the lead author of MIPRO, short for Multi-prompt Instruction Proposal Optimize...


tinygrad (George Hotz) ▷ #general (1 messages):

  • Tinygrad Boxes Europe Shipping

tinygrad (George Hotz) ▷ #learn-tinygrad (5 messages):

  • Tinygrad CPU
  • Tinygrad Device Count

LAION ▷ #general (3 messages):

  • LAION-aesthetic dataset

Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (1 messages):

  • Llama 3.1 Benchmarking
  • Custom API for Llama 3.1
  • Inference Pipeline Benchmarking

Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

  • BFCL Leaderboard
  • Model Handler Optimization
  • Function Calling Feature






{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}