Frozen AI News archive

Too Cheap To Meter: AI prices cut 50-70% in last 30 days

**Gemini 1.5 Flash** has cut prices by approximately **70%**, offering a highly competitive free tier of **1 million tokens per minute** at **$0.075/mtok**, intensifying the AI model price war. Other significant price reductions include **GPT-4o** (~50% cut to **$2.50/mtok**), **GPT-4o mini** (70-98.5% cut to **$0.15/mtok**), **Llama 3.1 405b** (46% cut to **$2.7/mtok**), and **Mistral Large 2** (62% cut to **$3/mtok**). **Deepseek v2** introduced context caching, reducing input token costs by up to **90%** to **$0.014/mtok**. New model releases include **Llama 3.1 405b**, **Sonnet 3.5**, **EXAONE-3.0** (7.8B instruction-tuned by LG AI Research), and **MiniCPM V 2.6** (vision-language model combining SigLIP 400M and Qwen2-7B). Benchmarks show **Mistral Large** performing well on ZebraLogic and **Claude-3.5** leading LiveBench. **FlexAttention**, a new PyTorch API, simplifies and optimizes attention mechanisms. **Andrej Karpathy** analyzed RLHF, highlighting its limitations compared to traditional reinforcement learning. Google DeepMind research on compute-optimal scaling was also summarized.

Canonical issue URL

AI News for 8/7/2024-8/8/2024. We checked 7 subreddits, 384 Twitters and 28 Discords (249 channels, and 2423 messages) for you. Estimated reading time saved (at 200wpm): 247 minutes. You can now tag @smol_ai for AINews discussions!

A simple list of all the price cuts in the last 30 days in AI (measured in "mtok" aka "per million tokens" - the bulk of the cost is usually input), by LMsys Elo/Rank:

Given Gemini 1.5's extremely generous free tier, every model below Lmsys Rank 17 - currently featuring things like Gemma 2, Nemotron 4, GLM 4, Reka Flash, Llama 3 7b, Qwen 72B and others - are effectively dead on arrival for most individual and team usecases.

The Price-Intelligence frontier advances by another order of magnitude in another quarter.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Developments and Releases

AI Research and Insights

AI Applications and Tools

AI Ethics and Policy


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Free Access to Advanced LLMs: Llama 3.1 405B and Sonnet 3.5

Theme 2. Optimized Inference and Quantization for ARM-based Processors

Theme 3. Summarization Techniques and Model Comparison for Large Texts

Theme 4. Repurposing Mining Hardware for AI Workloads

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Model Improvements and Techniques

OpenAI Developments and Speculation

AI Model Behavior and Limitations

Community Reactions and Discussions


AI Discord Recap

A summary of Summaries of Summaries by GPT4O-Aug (gpt-4o-2024-08-06)

1. Model Performance and Optimization

2. Open Source AI Developments

3. AI Infrastructure and Market Dynamics

4. Prompt Engineering and Fine-tuning

5. AI Applications and Tools


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


HuggingFace Discord


CUDA MODE Discord


Perplexity AI Discord


Stability.ai (Stable Diffusion) Discord


LM Studio Discord


OpenAI Discord


Nous Research AI Discord


Eleuther Discord


Interconnects (Nathan Lambert) Discord


LangChain AI Discord


Latent Space Discord


LAION Discord


OpenAccess AI Collective (axolotl) Discord


Cohere Discord


LlamaIndex Discord


Torchtune Discord


Modular (Mojo 🔥) Discord


DSPy Discord


tinygrad (George Hotz) Discord


OpenInterpreter Discord


MLOps @Chipro Discord


OpenRouter (Alex Atallah) Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Unsloth AI (Daniel Han) ▷ #general (80 messages🔥🔥):

  • 4bit and GGUF Models
  • PPO Trainer Challenges
  • Multi-GPU Support in Unsloth
  • Continuous Batching with lmdeploy and vllm
  • Quantization of Mistral Models

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (1 messages):

  • Loss Function in AI Models
  • Understanding Token Labels

Unsloth AI (Daniel Han) ▷ #help (135 messages🔥🔥):

  • Model Loading Issues
  • Dataset Processing
  • Hugging Face Integration
  • Inference Optimization
  • Colab Limitations

Links mentioned:


Unsloth AI (Daniel Han) ▷ #community-collaboration (293 messages🔥🔥):

  • Harambe tool for bug hunting
  • LLMs for URL analysis
  • Open source collaboration
  • Development challenges
  • Productivity and sleep patterns

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (9 messages🔥):

  • Prompt Classification for Evaluation
  • FlexAttention in PyTorch
  • Attention Implementation Challenges
  • Non-Contaminated Packing
  • Hugging Face Integration

Links mentioned:


HuggingFace ▷ #announcements (1 messages):

  • BiRefNet Background Removal
  • ActionGemma Model for Function Calling
  • Unity ML-Agents
  • Segment Anything Model Insights
  • ArabicWeb24 Dataset

Links mentioned:


HuggingFace ▷ #general (224 messages🔥🔥):

  • Web Search Functions in LLM Apps
  • Animated Clone Avatars
  • Performance of AI Models
  • Discord vs. Forums for Communication
  • Minecraft Server Experiences

Links mentioned:


HuggingFace ▷ #today-im-learning (4 messages):

  • Neural Network Optimization
  • AI in Healthcare
  • Embedding Serialization and Deserialization

Link mentioned: The Future of AI in Healthcare, ft. Professor Andrew Janowczyk (Emory University): Dr. Andrew Janowczyk is an Asst. Prof. at Emory Precision AI for Health Institute and a data analyst at Geneva University Hospitals. With nearly 15 years of ...


HuggingFace ▷ #cool-finds (3 messages):

  • Transformers Architecture
  • EU AI Regulations
  • AI Risk-based Regulation

Link mentioned: The EU's AI Act is now in force | TechCrunch: The European Union's risk-based regulation for applications of artificial intelligence has come into force starting from today.


HuggingFace ▷ #i-made-this (16 messages🔥):

  • EurekAI platform
  • Gemma2 9B fine-tuning
  • Flux.1 Dev Controlnet Canny
  • Text-to-Image Diffusion Models
  • TTS optimizations

Links mentioned:


HuggingFace ▷ #reading-group (4 messages):

  • Transformers Architecture
  • New Ideas in AI Experimentation

HuggingFace ▷ #computer-vision (9 messages🔥):

  • Papers with Code for Computer Vision
  • Converting Handwriting to Stroke Format
  • IAM On-Line Handwriting Database

Links mentioned:


HuggingFace ▷ #NLP (2 messages):

  • AutoProcessor Availability
  • InternLM 2.5 Features

Link mentioned: internlm/internlm2_5-7b-chat · Hugging Face: no description found


HuggingFace ▷ #diffusion-discussions (29 messages🔥):

  • Flux Transformer Training
  • Using Multiple GPUs
  • LoRA Training
  • CUDA Resource Management

Links mentioned:


HuggingFace ▷ #gradio-announcements (1 messages):

  • Gradio v4.41 Release
  • New Features in Gradio
  • Security Improvements
  • Bug Fixes
  • Documentation Enhancements

CUDA MODE ▷ #general (14 messages🔥):

  • Profiling CUDA
  • BPF wizardry
  • Nsight tools
  • eBPF for GPU monitoring

CUDA MODE ▷ #torch (31 messages🔥):

  • Attention Gym issues
  • Integration of FlexAttention
  • Torch serialization challenges
  • Flash Attention and Paged Attention connection

Links mentioned:


CUDA MODE ▷ #cool-links (1 messages):

iron_bound: the code released finally https://github.com/Aleph-Alpha/trigrams


CUDA MODE ▷ #beginner (1 messages):

  • 2D Conv Kernels
  • Constant Memory Usage
  • Dynamic Kernel Sizes

CUDA MODE ▷ #torchao (6 messages):

  • Release of torchao v0.4.0
  • Intx Tensor Subclasses Quantization
  • Update on issue #577
  • ModelRunner.run() complexity

Links mentioned:


CUDA MODE ▷ #llmdotc (179 messages🔥🔥):

  • KV Cache Implementation
  • RoPE Optimization
  • Training Efficiency
  • Fine-tuning Strategies
  • Code Cleanup and Refactoring

Links mentioned:


Perplexity AI ▷ #general (179 messages🔥🔥):

  • Perplexity Pro Limits
  • API Usability
  • Changes in Model Availability
  • User Experience with Alternatives
  • Service Stability Issues

Links mentioned:


Perplexity AI ▷ #sharing (11 messages🔥):

  • Quantum Entanglement in the Brain
  • Google Antitrust Lawsuit
  • Perplexity Pro Features
  • Microsoft's Advertising Strategies
  • Node.js Module Exports

Links mentioned:


Perplexity AI ▷ #pplx-api (17 messages🔥):

  • Perplexity API outage
  • Geo-based access issues
  • Claude outage impact
  • Non-English language incoherence
  • Google Maps URL issues

Link mentioned: Discussions: no description found


Stability.ai (Stable Diffusion) ▷ #general-chat (144 messages🔥🔥):

  • Stable Diffusion usage
  • Hardware upgrades for AI
  • Face swapping technology
  • Workflow with SAM
  • Web version recommendations for Mac

Links mentioned:


LM Studio ▷ #general (107 messages🔥🔥):

  • NVIDIA Cards Performance
  • CPU Usage Confusion
  • Model Inference with GPUs
  • LM Studio Functionality
  • Tauri vs Electron

Links mentioned:

Add optional MLP bias for ARCH_LLAMA to support Granite models. Partially addresses ggerganov/llama.cpp/issues/7116 Still needs some more changes to ...


LM Studio ▷ #hardware-discussion (35 messages🔥):

  • 4090 vs. 3080 Performance
  • VRAM and Model Training Requirements
  • Mac vs. Nvidia GPUs for AI Tasks
  • AI Clustering with Mac Mini
  • Gemma Model Performance

Links mentioned:


OpenAI ▷ #annnouncements (2 messages):

  • GPT-4o System Card
  • DALL·E 3 image creation for Free users

OpenAI ▷ #ai-discussions (55 messages🔥🔥):

  • Website Access Issues
  • Quota and Limits
  • Python SDK Issues
  • Patent and Open Source Concerns
  • Model Performance Queries

OpenAI ▷ #gpt-4-discussions (25 messages🔥):

  • API Key Authentication Issues
  • GPT-4o Features for Non-Plus Users
  • GPT-3.5 Turbo Quota Limits
  • Custom GPT Updates Pending
  • Using CSV Files with Langchain

OpenAI ▷ #prompt-engineering (11 messages🔥):

  • Self-Discover prompting strategy
  • Reverse prompting techniques
  • Custom GPTs development
  • Groq API for summary notes

OpenAI ▷ #api-discussions (11 messages🔥):

  • Prompt Engineering Strategies
  • Self-Discover Prompting
  • Custom GPT Development
  • Groq API Integration

Nous Research AI ▷ #research-papers (2 messages):

  • MindSearch AI
  • Information Seeking Systems
  • Audio Research Communities

Link mentioned: Tweet from Carlos E. Perez (@IntuitMachine): 1/n Unlocking the Web's Knowledge: An Agentic AI That Reads Between the Links In our age of information overload, finding the right answers often feels like searching for a needle in a haystack ...


Nous Research AI ▷ #off-topic (14 messages🔥):

  • Machine Learning Discord Channels
  • Nous Artist Compliments
  • Commission Work
  • Reddit Recommendations

Link mentioned: Reddit - Dive into anything: no description found


Nous Research AI ▷ #interesting-links (1 messages):

  • Tavus Phoenix
  • Real-Time Video Cloning
  • Neural Radiance Fields
  • Video Generation API

Link mentioned: Tavus | Developers: Tavus builds advanced AI models in digital replicas, lip syncing, dubbing, text-to-video, accessible to developers via APIs.


Nous Research AI ▷ #ask-about-llms (84 messages🔥🔥):

  • Upside Down Poem Generation
  • Model Comparison
  • API vs Chat Interface
  • Training Data for Tokenization
  • Server Overloads

Eleuther ▷ #general (91 messages🔥🔥):

  • Dataset compatibility with LM Harness
  • CBRN risks and model responses
  • Filtering pretraining data
  • Impact of knowledge removal on model performance
  • Journalistic challenges with AI-generated content

Links mentioned:


Eleuther ▷ #research (6 messages):

  • Perception Queries with RNN
  • Open Source Process Based Reward Models
  • Pythia Checkpoints and WandB Logs
  • Synchronizing Model Curricula

Eleuther ▷ #scaling-laws (1 messages):

brain4brain: Ohhhhhh I see, thanks for the info


Eleuther ▷ #lm-thunderdome (3 messages):

  • Model Parallelism
  • GPU Data Splitting

Interconnects (Nathan Lambert) ▷ #news (49 messages🔥):

  • Hugging Face Acquires XetHub
  • Qwen2-Math Model Release
  • AI Infrastructure Unicorns
  • OpenAI's Price Reductions
  • Text-to-Image Leaderboard

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (23 messages🔥):

  • Tokens vs Epochs
  • GPT-4 Token Count
  • Anthropic CI Debates
  • RLHF Critique
  • Recruitment Challenges

Link mentioned: Tweet from Andrej Karpathy (@karpathy): # RLHF is just barely RL Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF i...


Interconnects (Nathan Lambert) ▷ #memes (6 messages):

  • Gary Marcus Predictions
  • Audience Capture
  • Contrarian Perspectives on AI

Link mentioned: Tweet from Gary Marcus (@GaryMarcus): I just wrote a great piece for WIRED predicting that the AI bubble will in collapse in 2025, and now I wish I hadn’t. Clearly, I got the year wrong. It’s going to be days or weeks from now, not month...


Interconnects (Nathan Lambert) ▷ #rl (1 messages):

chygao: https://youtu.be/6QWuJRvMtxg?si=SYXsRvYbfcdtYLC2


Interconnects (Nathan Lambert) ▷ #posts (1 messages):

SnailBot News: <@&1216534966205284433>


LangChain AI ▷ #general (75 messages🔥🔥):

  • LangChain issues with AWS Lambda
  • Limiting chat history in LLM applications
  • Managing user-specific history in Slack RAG app
  • Comparing LangChain with other frameworks
  • Challenges with different LLMs

Link mentioned: How to trim messages | 🦜️🔗 LangChain: This guide assumes familiarity with the following concepts:


LangChain AI ▷ #share-your-work (1 messages):

_johnny1984: Stupid spambot doesn't stand a change against my rapper AI:


Latent Space ▷ #ai-general-chat (61 messages🔥🔥):

  • GPT-4o capabilities
  • Gemini 1.5 Flash updates
  • DALL·E 3 free access
  • Mistral Agents
  • Academic papers on AI

Links mentioned:


Latent Space ▷ #ai-announcements (5 messages):

  • SAM 2 Pod Launch
  • User Statistics for SAM
  • Future Predictions for SAM 2
  • Video Content in SAM 2
  • Connections to Past Episodes

Links mentioned:


LAION ▷ #general (47 messages🔥):

  • Midjourney CEO's stance on open source
  • ASL model discussion
  • Synthetic voice dataset creation
  • Flux image generation
  • AI applications for accessibility

LAION ▷ #research (2 messages):

  • Frequency Space Analysis
  • Visual Data Comparison

OpenAccess AI Collective (axolotl) ▷ #general (3 messages):

  • Multi-backend Refactor
  • Google Gemini Price Cuts
  • H100 in the Metaverse

Link mentioned: Google Gemini Insane Price Cuts!!!: Google Gemini 1.5 Flash has some insane price cuts!🔗 Links 🔗Details - https://developers.googleblog.com/en/gemini-15-flash-updates-google-ai-studio-gemini-...


OpenAccess AI Collective (axolotl) ▷ #general-help (29 messages🔥):

  • Training Dataset Size
  • Prompt Formatting for Inference
  • LoRA Import Errors
  • Using Alpaca Format for Fine-tuning
  • Llama 3 Model Information

Links mentioned:


Cohere ▷ #discussions (8 messages🔥):

  • Querying PDFs via API
  • Retrieval-Augmented Generation
  • Cohere Documentation
  • Cohere and Fujitsu Partnership
  • Langchain Integration

Links mentioned:


Cohere ▷ #questions (5 messages):

  • Cohere embeddings error
  • RAG model with Llamaparse
  • Azure AI Search integration issues
  • 500 errors from embed endpoint
  • Using preamble ID for prompts

Link mentioned: Integrated vectorization with models from Azure AI Studio - Azure AI Search: Learn how to vectorize content during indexing on Azure AI Search with an AI Studio model.


Cohere ▷ #api-discussions (16 messages🔥):

  • Cohere-toolkit
  • Default tool activation
  • Preamble adjustment
  • Custom deployment

LlamaIndex ▷ #announcements (1 messages):

jerryjliu0: happening in 5 minutes! ^^


LlamaIndex ▷ #blog (3 messages):

  • RAG pipeline observability
  • Workflows abstraction
  • LlamaIndex Sub-Question Query Engine
  • Agent debugging with Workflows

LlamaIndex ▷ #general (20 messages🔥):

  • LongRAG paper
  • Self-routing techniques
  • Evaluation benchmarks for LLMs
  • Token size measurement
  • Using APIs with LlamaIndex

Links mentioned:


Torchtune ▷ #general (1 messages):

  • LLAMA 3 model performance
  • GitHub Issue #1285

Link mentioned: Generation quality · Issue #1285 · pytorch/torchtune: I use the LLAMA 3 8B instruct model and prompt it with "anything" and I get the below result: chat_format: null checkpointer: component: torchtune.utils.FullModelMetaCheckpointer checkpoin...


Torchtune ▷ #dev (17 messages🔥):

  • RTX A4000 and A2000 Performance
  • Memory Optimization Techniques
  • RLHF Cleanup Discussions
  • Torchchat Generation Optimizations
  • Documentation and Tutorial Plans

Link mentioned: salman-mohammadi: Weights & Biases, developer tools for machine learning


Modular (Mojo 🔥) ▷ #general (4 messages):

  • AI Infrastructure Deployment
  • Commercialization Concerns
  • Internal Tools Usage

Modular (Mojo 🔥) ▷ #mojo (13 messages🔥):

  • Running Mojo in Windows DEV Environment
  • VS Code with WSL Support
  • FancyZones Utility
  • Active Directory and Distributed Databases

Link mentioned: PowerToys FancyZones utility for Windows: A window manager utility for arranging and snapping windows into efficient layouts


DSPy ▷ #show-and-tell (1 messages):

seanchatmangpt: https://www.loom.com/share/0ffc1312c47c45fdb61a2ad00102b3da


DSPy ▷ #general (10 messages🔥):

  • Inspect for LLM observability
  • DSPy vs. Langgraph
  • Performance of optimize_signature vs. COPRO

Link mentioned: GitHub - UKGovernmentBEIS/inspect_ai: Inspect: A framework for large language model evaluations: Inspect: A framework for large language model evaluations - UKGovernmentBEIS/inspect_ai


DSPy ▷ #examples (1 messages):

  • DSPy-Multi-Document-Agent
  • requirements.txt file

DSPy ▷ #colbert (3 messages):

  • qdrant_dspy
  • ColBERT and FastEmbed

Links mentioned:


tinygrad (George Hotz) ▷ #general (6 messages):

  • DEBUG environment variable issue
  • Tinygrad Tensor Puzzles
  • getenv function ValueError

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (2 messages):

  • tinygrad notes
  • fine-tuning tutorials
  • computer algebra optimization

Link mentioned: Tutorials on Tinygrad: Tutorials on tinygrad


OpenInterpreter ▷ #general (4 messages):

  • Open Source Vision Models
  • MiniCPM-V 2.6 Performance

Links mentioned:


OpenInterpreter ▷ #O1 (2 messages):

  • Shipping updates

MLOps @Chipro ▷ #events (3 messages):

  • Llama Team Q&A
  • Poe Hackathon

Links mentioned:


MLOps @Chipro ▷ #general-ml (2 messages):

  • General AI vs Non-General AI
  • Types of AI Applications

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

  • Vercel outage
  • Anthropic error rates

Links mentioned:






{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}