Frozen AI News archive

GPT4o August + 100% Structured Outputs for All (GPT4o August edition)

**OpenAI** released the new **gpt-4o-2024-08-06** model with **16k context window** and **33-50% lower pricing** than the previous 4o-May version, featuring a new Structured Output API that improves output quality and reduces retry costs. **Meta AI** launched **Llama 3.1**, a **405-billion parameter** model surpassing **GPT-4** and **Claude 3.5 Sonnet** on benchmarks, alongside expanding the **Llama Impact Grant** program. **Google DeepMind** quietly released **Gemini 1.5 Pro**, outperforming **GPT-4o**, **Claude-3.5**, and **Llama 3.1** on LMSYS benchmarks and leading the Vision Leaderboard. **Yi-Large Turbo** was introduced as a cost-effective upgrade priced at $0.19 per million tokens. In hardware, **NVIDIA H100 GPUs** were highlighted by **John Carmack** for their massive AI workload power, and **Groq** announced plans to deploy **108,000 LPUs** by Q1 2025. New AI tools and techniques include **RAG (Retrieval-Augmented Generation)**, the **JamAI Base** platform for Mixture of Agents systems, and **LangSmith**'s enhanced filtering capabilities. Google DeepMind also introduced **PEER (Parameter Efficient Expert Retrieval)** architecture.

Canonical issue URL

AI News for 8/5/2024-8/6/2024. We checked 7 subreddits, 384 Twitters and 28 Discords (249 channels, and 2423 messages) for you. Estimated reading time saved (at 200wpm): 247 minutes. You can now tag @smol_ai for AINews discussions!

It's new frontier model day again! (Blog, Simonw writeup)

As we did for 4o-mini, there are 2 issues of the newsletter today run with the exact same prompts - you are reading the one with all channel summaries generated by gpt-4o-2024-08-06, the newest 4o model released today with 16k context (4x longer but still less than the alpha Long Output model) and 33-50% lower pricing than 4o-May.

We happen to run AINews with structured output via the Instructor library anyway (doing "chain of thought summaries"), so swapping it out saved us some lines of code and more importantly saved some money in retries (since OpenAI does constrained grammar sampling, you no longer spend any retry money/time on poorly formed json)

image.png

Based on our summary vibe check and prompts, the new model seems strictly better than 4o-May (example picked here, but you can see the two emails you got today for yourself):

image.png

and mostly better than 4o-mini (which we last concluded was about equivalent to but way cheaper than 4o-May):

image.png

New Structured Output API aside, which applies to all models, we think the unexpected 4o model bump is a good thing - 4o August is effectively GPT 4.6 or 4.7 depending how you are counting. We don't have any publicly reported ELO or benchmark metrics on this model yet, but we are willing to bet that this one will be a sleeper hit - perhaps even a sneaky launch of Q*/Strawberry?


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Updates and Benchmarks

AI Hardware and Infrastructure

AI Development and Tools

AI Research and Techniques

AI Ethics and Societal Impact

Practical AI Applications

AI Community and Education


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Architectural Innovations in AI Models

Theme 2. Advancements in Open-Source AI Models

Theme 3. Novel Applications and Capabilities of LLMs

Theme 4. Leadership Shifts in Major AI Companies

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Research and Development

AI Model Releases and Improvements

AI Industry News and Developments

Neurotech and Brain-Computer Interfaces

Memes and Humor


AI Discord Recap

A summary of Summaries of Summaries

Claude 3 Sonnet

1. LLM Advancements and Benchmarking

2. Model Performance Optimization and Inference

3. Open-Source AI Frameworks and Community Efforts

4. Multimodal AI and Generative Modeling Innovations

5. Fine-tuning Challenges and Prompt Engineering Strategies

Claude 3.5 Sonnet

1. LLM Advancements and Benchmarking

2. Inference Optimization and Hardware Advancements

3. Open Source AI and Community Collaborations

4. Multimodal AI and Creative Applications

GPT4O (gpt-4o-2024-05-13)

1. LLM Advancements and Benchmarking

2. Model Performance Optimization and Benchmarking

3. Fine-Tuning Challenges and Integration

4. Open-Source AI Developments and Collaborations

GPT4OMini (gpt-4o-mini-2024-07-18)

1. Installation Challenges in AI Tools

2. Model Performance and Optimization Discussions

3. AI Ethics and Data Practices

4. Emerging AI Projects and Collaborations

5. Advancements in AI Frameworks and Libraries

GPT4O-Aug (gpt-4o-2024-08-06)

1. AI Model Advancements

2. GPU Performance and Compatibility

3. OpenAI and Anthropic Leadership Changes

4. AI Tooling and Frameworks

5. LLM Fine-Tuning Challenges

6. Misc


PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord


Unsloth AI (Daniel Han) Discord


HuggingFace Discord


LM Studio Discord


CUDA MODE Discord


Nous Research AI Discord


Latent Space Discord


OpenAI Discord


Perplexity AI Discord


Eleuther Discord


LangChain AI Discord


Interconnects (Nathan Lambert) Discord


OpenRouter (Alex Atallah) Discord


LlamaIndex Discord


Cohere Discord


Modular (Mojo 🔥) Discord


LAION Discord


tinygrad (George Hotz) Discord


DSPy Discord


OpenAccess AI Collective (axolotl) Discord


Torchtune Discord


OpenInterpreter Discord


Mozilla AI Discord


MLOps @Chipro Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Stability.ai (Stable Diffusion) ▷ #general-chat (459 messages🔥🔥🔥):

  • Model and Tool Discussion
  • Installation Challenges
  • Lora and ControlNet Usage
  • Upscaling and Processing Techniques
  • Community and Platform Issues

Unsloth AI (Daniel Han) ▷ #general (105 messages🔥🔥):

  • MoEification in Mistral-7b
  • Issues with Unsloth fine-tuning save methods
  • Integrating Unsloth models into PPO trainer
  • Performance differences in Fine Tuned Llama3.1 inference
  • Learning resources for LLM inference

Unsloth AI (Daniel Han) ▷ #off-topic (10 messages🔥):

  • BigLlama-3.1-1T-Instruct Model
  • Pokémon AI Game Master
  • LLM leaderboards
  • Minecraft
  • ChatGPT Pokémon Prompt

Unsloth AI (Daniel Han) ▷ #help (162 messages🔥🔥):

  • Llama-3-8b-bnb 4 bit training and merging
  • GPT-4ALL and GGUF files
  • Fine-tuning Llama models on Colab
  • Exporting models to Ollama
  • Multi-GPU support for Unsloth

Unsloth AI (Daniel Han) ▷ #community-collaboration (1 messages):

  • LLaMA3 Configuration on RunPod
  • Efficient AI Resource Management

Unsloth AI (Daniel Han) ▷ #research (1 messages):

vvelo: https://fxtwitter.com/reach_vb/status/1820493688377643178


HuggingFace ▷ #announcements (1 messages):

  • Gemma 2 2B
  • Diffusers integration for FLUX
  • Magpie Ultra
  • Whisper Generations
  • llm-sagemaker Terraform module

HuggingFace ▷ #general (239 messages🔥🔥):

  • MarianMT model translation issues
  • New text to video model release
  • Audio processing with spectrograms
  • Dataset size limit increase process
  • PyTorch warnings and issues

HuggingFace ▷ #today-im-learning (3 messages):

  • Linear Algebra
  • 3D Video Analysis

HuggingFace ▷ #cool-finds (4 messages):

  • High Resolution Image Synthesis
  • Graph Integration with LLMs

HuggingFace ▷ #i-made-this (5 messages):

  • SAC Agent Training in Unity
  • Embodied Agent Platform Development
  • AniTalker Project
  • BiRefNet for Image Segmentation

HuggingFace ▷ #reading-group (5 messages):

  • LLM Reasoning Capabilities
  • OpenAI's Structured Outputs
  • Theories on LLM Reasoning Mechanisms

HuggingFace ▷ #computer-vision (4 messages):

  • Depth Estimation
  • CVPR 2022

HuggingFace ▷ #NLP (2 messages):

  • Named Entity Recognition dataset
  • JSON file search optimization

Link mentioned: NER Annotated CVs: This dataset includes 5029 annotated curriculum vitae (CV), marked with IT skill


LM Studio ▷ #general (157 messages🔥🔥):

  • RAG setup with LMStudio
  • InternLM model performance
  • Audio transcription with AI
  • Model quantization and K-V cache
  • CUDA device selection for inference

LM Studio ▷ #hardware-discussion (59 messages🔥🔥):

  • 8700G/780m IGP testing
  • NVIDIA 4090 and 5090 discussion
  • Graphics card market trends
  • GPU upgrades for LLMs
  • RTX 4090 vs 3080 performance

CUDA MODE ▷ #general (5 messages):

  • PufferLib Environment Setup
  • Reinforcement Learning Streaming
  • GPUDrive Generation Example
  • Request for Mojo Talk

CUDA MODE ▷ #torch (17 messages🔥):

  • PyTorch 2.4 with CUDA 12.4 issues
  • cublas hgemm library for Windows
  • FP16 accumulate versus FP32
  • Speed/accuracy trade-offs in cublas library
  • Inference-only library discussion

Link mentioned: GitHub - aredden/torch-cublas-hgemm: PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu: PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu - aredden/torch-cublas-hgemm


CUDA MODE ▷ #algorithms (3 messages):

  • Quantization Bits as an Optimizable Parameter
  • Accuracy Tuning for CIFAR-10

CUDA MODE ▷ #jobs (7 messages):

  • Hudson River Trading internships
  • GPU job optimization
  • Software Engineer salary at Hudson River Trading

CUDA MODE ▷ #torchao (34 messages🔥):

  • INT8 Quantization Issues
  • AffinQuantizedTensor Plans
  • TorchAO Installation Errors
  • Hardware Compatibility for Tensor Core Operations
  • GPTQ Refactor Progress

CUDA MODE ▷ #off-topic (7 messages):

  • LLaMA 3 Dataset Section
  • Prefix Chunk LLM Paper - Sarathi LLM
  • CTF Challenge using CPU
  • ChunkAttention for LLM Inference
  • SARATHI Framework

CUDA MODE ▷ #llmdotc (99 messages🔥🔥):

  • Ragged attention masks
  • Batch size and sequence length scheduling
  • Special tokens in LLaMA training
  • FlashAttention support
  • Training stability and efficiency

CUDA MODE ▷ #rocm (9 messages🔥):

  • ZLUDA 3 takedown
  • AMD claim on ZLUDA
  • Contractual obligations
  • Development permissions

CUDA MODE ▷ #cudamode-irl (2 messages):

  • Discussion about Decision Timeline
  • Adding Details to Proposals

Nous Research AI ▷ #datasets (1 messages):

  • UltraSteer-V0
  • Multi-Turn Dialogue Dataset
  • Nvidia's Reward Model
  • Fine-Grained Labeling

Link mentioned: Avelina/UltraSteer-v0 · Datasets at Hugging Face: no description found


Nous Research AI ▷ #off-topic (1 messages):

vikings7699: Has anyone here ever worked on fine tuning a model specifically for insurance sector?


Nous Research AI ▷ #general (129 messages🔥🔥):

  • Multi-dataset Model Training Issues
  • OpenAI Leadership Changes
  • Flux AI Model Performance
  • Open Medical Reasoning Tasks Project
  • MiniCPM-Llama3 VLM Capabilities

Nous Research AI ▷ #ask-about-llms (19 messages🔥):

  • Fine-tuning Libraries
  • Insurance Sector Fine-Tuning
  • Hosting Llama 450b
  • Inference Stack and Resources
  • Bottleneck in Inference/Training

Nous Research AI ▷ #reasoning-tasks-master-list (7 messages):

  • Synthetic task generation
  • Open Medical Reasoning Tasks project
  • System 2 Reasoning Link Collection

Latent Space ▷ #ai-general-chat (128 messages🔥🔥):

  • Web Dev to AI Engineer Transition
  • NVIDIA AI Scraping Controversy
  • John Schulman's Departure from OpenAI
  • OpenAI DevDay Events
  • Structured Outputs in OpenAI API

OpenAI ▷ #annnouncements (1 messages):

  • OpenAI DevDay 2023
  • Developer engagement
  • Global developer events

OpenAI ▷ #ai-discussions (86 messages🔥🔥):

  • Desktop ChatGPT App for Windows
  • OpenAI Structured Outputs
  • Llama 3.1 Model and API
  • ChatGPT Vision and 4o Mini
  • Bing AI Image Creator

Link mentioned: Assistant GPT - Can I perform knowledge retrieval from a cloud storage?: I have some files that are on my cloud storage (onedrive) and would like to perform knowledge retrieval on them. Is it possible to integrate an assistant to perform knowledge retrieval directly fro...


OpenAI ▷ #gpt-4-discussions (16 messages🔥):

  • Search GPT release
  • Photo upload limit for members
  • AI in gaming
  • GPT-4o model update
  • Structured outputs announcement

OpenAI ▷ #prompt-engineering (1 messages):

darthgustav.: Use the python tool and import data from uploads.


OpenAI ▷ #api-discussions (1 messages):

darthgustav.: Use the python tool and import data from uploads.


Perplexity AI ▷ #general (82 messages🔥🔥):

  • Issues with LLMs: GPT-4 Turbo vs. 4o
  • Content Sorting and Recommendation Engine
  • PDF Upload Errors with Perplexity AI
  • Application Stability and Feature Changes
  • Felo vs. Perplexity Pro Subscription

Perplexity AI ▷ #sharing (7 messages):

  • NVIDIA Blackwell GPUs delay
  • Digital memory and AI
  • Warhol's $26M digital portrait on YouTube
  • Navigating Perplexity AI's features

Perplexity AI ▷ #pplx-api (8 messages🔥):

  • API Data Corruption
  • API Model Deprecation
  • API Error 502 Issues

Eleuther ▷ #announcements (1 messages):

  • Mechanistic anomaly detection
  • Adversarial examples in image classifiers
  • Eleuther's quirky language models
  • Attribution patching technique

Eleuther ▷ #general (36 messages🔥):

  • SB1047 (AI Safety Act) opposition
  • Concerns with AI regulation and innovation
  • Anthropic's response to SB1047
  • AAAI conference submission relevance
  • Watermarking and AI safety laws

Eleuther ▷ #research (40 messages🔥):

  • Meta's AI network
  • Distributed AI Training at Scale
  • Search efficiency in AI models
  • Differentiability in search techniques
  • Compute-optimal inference methods

Eleuther ▷ #scaling-laws (4 messages):

  • Training Instability
  • Experiment Averaging
  • Learning Rate Adjustments

Eleuther ▷ #interpretability-general (5 messages):

  • State of SAEs
  • Research on Scaling SAEs
  • SAELens Library
  • Recent Developments in Transformer Circuits

Eleuther ▷ #lm-thunderdome (8 messages🔥):

  • lm-eval-harness usage
  • Batch size and loglikelihood_rolling
  • BOS token in evalharness
  • Benchmark names from JSON output

Link mentioned: mamba/evals/lm_harness_eval.py at main · state-spaces/mamba: Mamba SSM architecture. Contribute to state-spaces/mamba development by creating an account on GitHub.


LangChain AI ▷ #general (83 messages🔥🔥):

  • GPU Out of Memory Issues
  • LangChain Integration Questions
  • Automatic Code Review Challenges
  • LangGraph Course Recommendations
  • Mood2Music App Launch

LangChain AI ▷ #share-your-work (2 messages):

  • AgentGenesis Project
  • Open Source Collaboration

Interconnects (Nathan Lambert) ▷ #news (57 messages🔥🔥):

  • John Schulman's move to Anthropic
  • Confidential Gemini program
  • Sabbatical of Greg from OpenAI
  • Claude and Gemini comparison
  • AGI alignment perspectives

Interconnects (Nathan Lambert) ▷ #random (6 messages):

  • DALL-E vs. challengers
  • Flux Pro
  • Replicate's hosting of Flux.1
  • Comparison of image generation models

Interconnects (Nathan Lambert) ▷ #memes (1 messages):

xeophon.: https://x.com/sahir2k/status/1820791954508022019?s=46


Interconnects (Nathan Lambert) ▷ #rlhf (1 messages):

  • Data-dependency in model performance
  • Startups using noisy data
  • ICML discussion on Meta's Chameleon

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

  • GPT4-4o release
  • Structured outputs with strict mode

Link mentioned: GPT-4o (2024-08-06) - API, Providers, Stats: The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](%5Bhttps://openai%5D(https://openai). Run GPT-4o (2024-08...


OpenRouter (Alex Atallah) ▷ #general (62 messages🔥🔥):

  • AI Model Performance
  • GPT-4o-2024-08-06 Update
  • Token Usage and Pricing
  • Google Gemini Update
  • API Cost Calculation

LlamaIndex ▷ #announcements (1 messages):

  • Webinar with CodiumAI
  • RAG-augmented coding assistants
  • LlamaIndex for code generation

Link mentioned: LlamaIndex Webinar: Using RAG with LlamaIndex for Large-Scale Generative Coding · Zoom · Luma: Retrieval-Augmented Generation (RAG) plays a central role in achieving contextual awareness in AI-generated code, which is crucial for enterprises adopting…


LlamaIndex ▷ #blog (4 messages):

  • RabbitMQ and llama-agents
  • Second RAG-a-thon
  • Workflows feature in LlamaIndex
  • Building Multi-agents as a Service

LlamaIndex ▷ #general (49 messages🔥):

  • HuggingFace Inference API for embeddings
  • SimpleDirectoryReader PDF loading
  • Vector DB Comparison
  • Issue with function_calling.py in llama_index
  • Structured Outputs in OpenAI API

Cohere ▷ #discussions (29 messages🔥):

  • Galileo Hallucination Index
  • Open Source vs Open Weights
  • Command R Plus Licensing
  • Mistral Licensing and Access

Cohere ▷ #questions (3 messages):

  • Contacting Dennis Padilla

Cohere ▷ #cohere-toolkit (1 messages):

  • Cohere Toolkit integration
  • Switching models
  • Third-party API usage
  • OpenAI integration
  • Gemini 1.5 compatibility

Modular (Mojo 🔥) ▷ #mojo (30 messages🔥):

  • InlineList development
  • Small buffer optimization in Mojo
  • Using custom accelerators with Mojo
  • RVV support in open-source Mojo

LAION ▷ #general (18 messages🔥):

  • Leadership changes at OpenAI
  • Open-source model training challenges
  • Meta's JASCO status
  • Nullbulge controversy
  • School BUD-E voice assistant

LAION ▷ #research (8 messages🔥):

  • Val Acc Update
  • Scaling Experiments
  • Accuracy Wall discussion
  • Frequency-Phase Inquiry

Link mentioned: The Matrix Laurence Fishburne GIF - The matrix Laurence fishburne Morpheus - Discover & Share GIFs: Click to view the GIF


tinygrad (George Hotz) ▷ #general (8 messages🔥):

  • Tinygrad compatibility with Aurora
  • Intel GPU support
  • Aurora's ExaFLOP capabilities
  • FP8 Nvidia bounty requirements

tinygrad (George Hotz) ▷ #learn-tinygrad (16 messages🔥):

  • Bug in Tensor slicing
  • Buffer to DEFINE_GLOBAL mapping
  • JIT and inconsistent batch sizes
  • Computer algebra study notes
  • Multi-threading in CLANG and LLVM

Link mentioned: computer-algebra-study-notes/README.md at main · mesozoic-egg/computer-algebra-study-notes: Contribute to mesozoic-egg/computer-algebra-study-notes development by creating an account on GitHub.


DSPy ▷ #show-and-tell (6 messages):

  • Wiseflow tool
  • Golden Ret and Wiseflow integration
  • HybridAGI project release

DSPy ▷ #papers (2 messages):

  • LLM-based agents in software engineering
  • Scaling inference compute in language models

DSPy ▷ #general (7 messages):

  • MIPRO performance
  • MIPROv2 capabilities

DSPy ▷ #colbert (1 messages):

gamris: Would you recommend FastEmbed by Qdrant instead? https://github.com/qdrant/fastembed


OpenAccess AI Collective (axolotl) ▷ #general (7 messages):

  • Synthetic Data Strategy
  • SQL Examples in Llama Index
  • MD5 Hash Consistency
  • Bits and Bytes Pull Request

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (5 messages):

  • Gemma 2 27b QLoRA
  • L40S GPUs performance
  • Fast Python package installer

Link mentioned: GitHub - astral-sh/uv: An extremely fast Python package installer and resolver, written in Rust.: An extremely fast Python package installer and resolver, written in Rust. - astral-sh/uv


OpenAccess AI Collective (axolotl) ▷ #general-help (3 messages):

  • Context length adjustment in fine-tuned models
  • RoPE scaling for context length

OpenAccess AI Collective (axolotl) ▷ #announcements (1 messages):

caseus_: Office hours kicks off in an hour in <#1268285745555308649>.


Torchtune ▷ #announcements (1 messages):

  • PPO integration
  • Qwen2 model support
  • RLHF training
  • Feature requests for Torchtune

Torchtune ▷ #general (9 messages🔥):

  • Support for DPO in Llama3-8B
  • Model Prompt Differences
  • LLAMA3 Instruct Model Download

Torchtune ▷ #dev (6 messages):

  • Model Page Refactor
  • PreferenceDataset Refactor

Link mentioned: [4/n] Refactor preference dataset with transforms design by RdoubleA · Pull Request #1276 · pytorch/torchtune: Context Following the RFC in #1186, we will use the unified message_transform -> template -> tokenization data pipeline in all our datasets. This PR updates PreferenceDataset to follow t...


OpenInterpreter ▷ #general (9 messages🔥):

  • Local LLM setup issues
  • Open Interpreter security measures
  • Python version compatibility
  • Vision model recommendations

OpenInterpreter ▷ #O1 (2 messages):

  • Ollama local models setup
  • Deepgram support inquiry

Link mentioned: open-interpreter/docs/language-models/local-models/ollama.mdx at main · OpenInterpreter/open-interpreter: A natural language interface for computers. Contribute to OpenInterpreter/open-interpreter development by creating an account on GitHub.


Mozilla AI ▷ #announcements (2 messages):

  • Llamafile Updates
  • Community Survey for Gift Card
  • sqlite-vec Release Party
  • Machine Learning Paper Talks
  • Local AI AMA

Link mentioned: Discover Typeform, where forms = fun): Create a beautiful, interactive form in minutes with no code. Get started for free.


MLOps @Chipro ▷ #events (1 messages):

  • LinkedIn Engineering's ML platform transformation
  • Flyte pipelines




{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}