Frozen AI News archive

Pixtral 12B: Mistral beats Llama to Multimodality

**Mistral AI** released **Pixtral 12B**, an open-weights **vision-language model** with a **Mistral Nemo 12B** text backbone and a 400M vision adapter, featuring a large vocabulary of **131,072 tokens** and support for **1024x1024 pixel images**. This release notably beat **Meta AI** in launching an open multimodal model. At the Mistral AI Summit, architecture details and benchmark performances were shared, showing strong OCR and screen understanding capabilities. Additionally, **Arcee AI** announced **SuperNova**, a distilled **Llama 3.1 70B & 8B** model outperforming Meta's Llama 3.1 70B instruct on benchmarks. **DeepSeek** released **DeepSeek-V2.5**, scoring **89 on HumanEval**, surpassing **GPT-4-Turbo**, Opus, and Llama 3.1 in coding tasks. **OpenAI** plans to release **Strawberry** as part of ChatGPT soon, though its capabilities are debated. **Anthropic** introduced Workspaces for managing multiple Claude deployments with enhanced access controls.

Canonical issue URL

AI News for 9/10/2024-9/11/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (216 channels, and 3870 messages) for you. Estimated reading time saved (at 200wpm): 411 minutes. You can now tag @smol_ai for AINews discussions!

Late last night Mistral was back to its old self - unlike Mistral Large 2 (our coverage here), Pixtral was released as a magnet link with no accompanying paper or blogpost, ahead of the Mistral AI Summit today celebrating the company's triumphant first year.

VB of Huggingface had the best breakdown: Mistral released Pixtral 12B Vision Language Model. Some notes on the release: 1. Text backbone: Mistral Nemo 12B 2. Vision Adapter: 400M 3. Uses GeLU (for vision adapter) & 2D RoPE (for vision encoder) 4. Larger vocabulary - 131,072 5. Three new special tokens  - img, img_break, img_end 6. Image size: 1024 x 1024 pixels 7. Patch size: 16 x 16 pixels 8. Tokenizer support in mistral_common 9. Model weights in bf16 10. Haven't seen the inference code yet 11. Weights up on Hugging Face Hub 🤗 GG Mistral for successfully frontrunning Meta w/ Multimodal 🐐

VB rightfully points out that Mistral beat Meta to releasing an open-weights multimodal model. You can see the new ImageChunk API in the mistral-common update:

image.png

More hparams are here for those interested in the technical details.

At the Summit, Devendra Chapilot shared more details, on architecture (designed for arbitrary sizes and interleaving)

image.png

together with impressive OCR and screen understanding examples (with mistakes!) favorable benchmark performance vs open model alternatives (though some Qwen and Gemini Flash 8B numbers were off):

image.png

Still an extremely impressive feat and well deserved victory lap for Mistral, who also presented their model priorities and portfolio.

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Updates and Benchmarks

AI Infrastructure and Deployment

AI Development Tools and Frameworks

AI Research and Insights

Industry News and Trends


AI Reddit Recap

/r/LocalLlama Recap

apologies, our pipeline had issues today. Fixing.

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Research and Techniques

AI Model Developments and Releases

AI in Entertainment and Media

AI Industry and Research Trends


AI Discord Recap

A summary of Summaries of Summaries by GPT4O-Aug (gpt-4o-2024-08-06)

1. Model Performance and Benchmarking

2. AI and Multimodal Innovations

3. Software Engineering and AI Collaboration

4. Open-Source AI Tools and Frameworks


PART 1: High level Discord summaries

Modular (Mojo 🔥) Discord


Unsloth AI (Daniel Han) Discord


OpenAI Discord


HuggingFace Discord


aider (Paul Gauthier) Discord


LM Studio Discord


Stability.ai (Stable Diffusion) Discord


OpenRouter (Alex Atallah) Discord


CUDA MODE Discord


Interconnects (Nathan Lambert) Discord


Perplexity AI Discord


Nous Research AI Discord


Latent Space Discord


OpenInterpreter Discord


Cohere Discord


Eleuther Discord


LlamaIndex Discord


LangChain AI Discord


Torchtune Discord


DSPy Discord


tinygrad (George Hotz) Discord


OpenAccess AI Collective (axolotl) Discord


LAION Discord


LLM Finetuning (Hamel + Dan) Discord


Mozilla AI Discord


Gorilla LLM (Berkeley Function Calling) Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Modular (Mojo 🔥) ▷ #general (336 messages🔥🔥):

  • User Feedback for Mojo
  • Swag Discussions
  • Mojo 24.5 Release
  • Trait Conformance and Interfaces
  • Go Interfaces

Links mentioned:


Modular (Mojo 🔥) ▷ #mojo (394 messages🔥🔥):

  • Mojo Copy Behavior
  • Ownership in Mojo
  • ExplicitlyCopyable Trait
  • Mojodojo.dev

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (608 messages🔥🔥🔥):

  • Pixtral Model Launch
  • Gemma 2 vs Llama 3.1 Performance
  • Fine-tuning Techniques
  • Unsloth Features
  • Flash Attention 2 Issues

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (1 messages):

mahiatlinux: https://www.reddit.com/r/ChatGPT/comments/1fdphr6/blowing_out_the_candles/


Unsloth AI (Daniel Han) ▷ #help (48 messages🔥):

  • Unsloth on Intel Gaudi
  • Training Loss and Dataset Size
  • Finetuning LLMs on Non-English Datasets
  • Vision Models Support
  • Using LoRa with phi-3.5

Links mentioned:


OpenAI ▷ #ai-discussions (466 messages🔥🔥🔥):

  • SWE-bench performance
  • GameNGEN capabilities
  • GPT-4o vs GPT-3.5 benchmarks
  • AI capabilities in software engineering
  • GAIA benchmark for AI

Links mentioned:


OpenAI ▷ #gpt-4-discussions (12 messages🔥):

  • Android app copying issues
  • GPT accessibility errors
  • GPT confusion and performance drops
  • Chat memory loading concerns
  • Upcoming GPT-5 release date

OpenAI ▷ #prompt-engineering (17 messages🔥):

  • Prompt Library Access
  • ECHO with ChatGPT
  • Response Variety of ChatGPT
  • Custom Instructions Impact

OpenAI ▷ #api-discussions (17 messages🔥):

  • Prompt Library Location
  • ECHO and Future Models
  • Regenerating Responses
  • Guiding GPT Outputs

HuggingFace ▷ #announcements (1 messages):

  • DeepSeek 2.5
  • Mini Omni
  • Multi-agent systems
  • Transformers.js v3
  • Reflection-Tuning

Links mentioned:


HuggingFace ▷ #general (241 messages🔥🔥):

  • HuggingFace community mapping
  • New datasets features
  • SQL integration with datasets
  • Best AI models for different purposes
  • Using cloud for model training

Links mentioned:


HuggingFace ▷ #today-im-learning (6 messages):

  • Fine-tuning Llama 2
  • PEFT for Fine-tuning
  • Computer Vision Community Course

HuggingFace ▷ #cool-finds (5 messages):

  • AI in Healthcare
  • Retrieval Augmented Generation (RAG)
  • Learning Resources on Hugging Face
  • AI Applications

Links mentioned:


HuggingFace ▷ #i-made-this (23 messages🔥):

  • NLP Dataset Release
  • Gradio Applications in R
  • Agentic Framework in Java
  • Image Similarity Demo
  • DebateThing AI Debate Generator

Links mentioned:


HuggingFace ▷ #computer-vision (2 messages):

  • CSV Image Loading
  • PyTorch DataLoader Best Practices

HuggingFace ▷ #NLP (6 messages):

  • Korean lemmatizer enhancement with AI
  • Building NLP models with PyTorch
  • Fine-tuning models on specific use cases
  • NSFW text detection datasets

Link mentioned: transformers/examples at main · huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers


aider (Paul Gauthier) ▷ #general (141 messages🔥🔥):

  • Aider features and workflows
  • Prompt caching in Aider
  • Model performance and comparisons
  • Using Aider with tools and APIs
  • User experiences and tips with Aider

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (105 messages🔥🔥):

  • Using OpenRouter with Aider
  • OpenAI Compatibility and API Differences
  • YAML Configuration Issues
  • Aider as a Python Tool
  • Handling Git .gitignore Files in Aider

Links mentioned:


aider (Paul Gauthier) ▷ #links (3 messages):

  • Pixel art tools
  • Pixtral model release

Links mentioned:


LM Studio ▷ #general (174 messages🔥🔥):

  • Maintaining Character Consistency in AI Images
  • GPU Performance for Token Processing
  • LM Studio User Meet-up
  • Pixtral Support and Inference Code
  • LM Studio Features and Updates

Links mentioned:


LM Studio ▷ #hardware-discussion (67 messages🔥🔥):

  • AMD vs NVIDIA performance
  • Surface Studio Pro upgrades
  • RTX 4090D characteristics
  • AI model requirements
  • Benchmarking multiple GPUs

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (197 messages🔥🔥):

  • Stable Diffusion Model Comparisons
  • Text to Image Generation Techniques
  • AI Image Generation Technical Discussions
  • Hardware Recommendations for AI Training
  • Reflection LLM Overview

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

  • Novita Endpoint Outage

OpenRouter (Alex Atallah) ▷ #general (171 messages🔥🔥):

  • Tool Suggestions for Programming
  • Discussion on Hermes Model Pricing
  • Pixtral Model Capabilities
  • OpenRouter and Cursor Integration
  • Novita Service Outage

Links mentioned:


CUDA MODE ▷ #general (23 messages🔥):

  • Matmul Algorithms
  • Cudamode-IR Online Discussions
  • Neural Network Quantization
  • Interview Preparation Strategies

Link mentioned: Significant Accuracy Drop After "Custom" Activation Quantization – Seeking Debugging Suggestions: To deepen my understanding of Neural Network quantization, I’m re-implementing Post-Training Quantization (PTQ) from scratch with minimal reliance on PyTorch functions. The code can be found here: Git...


CUDA MODE ▷ #triton (7 messages):

  • Kernel Outputs Garbage with Autotune
  • Utilizing Tensor Cores in Triton
  • Support for uint4 in Triton
  • Using Cutlass for Tensor Operations

Link mentioned: cutlass/include/cute/arch/mma_sm80.hpp at main · NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines. Contribute to NVIDIA/cutlass development by creating an account on GitHub.


CUDA MODE ▷ #torch (1 messages):

  • FlexAttention speedup
  • flash_attn_varlen_func comparison

CUDA MODE ▷ #jobs (39 messages🔥):

  • OpenAI RSU Discussion
  • Secondary Markets for OpenAI Shares
  • Microsoft Investment in OpenAI
  • Collaboration Opportunities on Liquid

Link mentioned: no title found: no description found


CUDA MODE ▷ #torchao (9 messages🔥):

  • FP6 API inclusion
  • BF16 and FP16 confusion
  • Post-Training Quantization challenges
  • Release of torchao v0.5.0
  • Quantized TTS models limitations

Links mentioned:


CUDA MODE ▷ #off-topic (1 messages):

  • Neural Network Quantization
  • Post-Training Quantization (PTQ)
  • Weight Quantization
  • Activation Quantization
  • Debugging Accuracy Drop

Link mentioned: Significant Accuracy Drop After "Custom" Activation Quantization – Seeking Debugging Suggestions: To deepen my understanding of Neural Network quantization, I’m re-implementing Post-Training Quantization (PTQ) from scratch with minimal reliance on PyTorch functions. The code can be found here: Git...


CUDA MODE ▷ #llmdotc (46 messages🔥):

  • Activation Function Save
  • FP8 Custom Implementation
  • Memory Management in Optimizers
  • Tensor Scaling Approaches
  • Debugging Fused Classifier

Links mentioned:


CUDA MODE ▷ #sparsity-pruning (7 messages):

  • cuSparse usage
  • Sparse matrix multiplication
  • Compressed sensing theory

CUDA MODE ▷ #cudamode-irl (9 messages🔥):

  • Hackathon Participation
  • Multi-GPU Enhancements
  • GPU Provider Updates
  • Sponsorship
  • Cloud Credits

Links mentioned:


CUDA MODE ▷ #liger-kernel (3 messages):

  • SGD Implementation
  • Label Smoothing in FLCE

Link mentioned: Add label smoothing to FLCE and unit tests by Tcc0403 · Pull Request #244 · linkedin/Liger-Kernel: Summary Fix #243 Testing Done Hardware Type: RTX-3080 run make test to ensure correctness run make checkstyle to ensure code style run make test-convergence to ensure convergence


Interconnects (Nathan Lambert) ▷ #news (63 messages🔥🔥):

  • OpenAI Departures
  • Meta's AI Supercomputing Cluster
  • Adobe Firefly Video Model
  • Pixtral Model Performance
  • Government Bureaucracy and Automation

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (9 messages🔥):

  • Matt Shumer's Announcement
  • Reflection 70B Model Issues
  • Community Reactions
  • Transparency and Accountability

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (14 messages🔥):

  • Gemini and Cursor Integration
  • Aider vs Cursor
  • API UX frustrations
  • Anthropic's SDK Performance
  • Stripe API Subscription Issues

Interconnects (Nathan Lambert) ▷ #posts (58 messages🔥🔥):

  • Surge AI Contract Issues
  • Data Annotation Workforce
  • Google Contract Workers Unionize
  • Turing vs. Scale AI
  • RLHF for Private Models

Links mentioned:


Perplexity AI ▷ #announcements (1 messages):

  • Perplexity Pro Signup Campaign
  • Final Countdown for Signups
  • Free Month for Students

Link mentioned: Perplexity - Race to Infinity: Welcome back to school! For just two weeks, redeem one free month of Perplexity Pro on us. Refer your friends, because if your school hits 500 signups we'll upgrade that free month to an entire free y...


Perplexity AI ▷ #general (86 messages🔥🔥):

  • Perplexity subscriptions
  • Student offers
  • API features
  • Promotions and discounts
  • General user experience

Perplexity AI ▷ #sharing (17 messages🔥):

  • Neuralink Patient Update
  • SpaceX Starship Mars 2026 Target
  • Commercial Spacewalk
  • Intelligence Chiefs Update
  • Clash of Titans Insights

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (2 messages):

  • Bounce.ai
  • Perplexity API usage
  • Support request

Nous Research AI ▷ #general (74 messages🔥🔥):

  • Llama-3.1-SuperNova-Lite performance
  • Model comparisons: Hermes vs Llama
  • Distillation techniques and implications
  • Need for Hermes 3 API
  • Training small LLMs

Links mentioned:


Nous Research AI ▷ #ask-about-llms (13 messages🔥):

  • Quality Data Scaling
  • Best Small Models for Instruction Following
  • Llama-3.1-SuperNova-Lite Launch
  • Models Under 3B Parameters
  • Open LLM Leaderboard Resources

Links mentioned:


Nous Research AI ▷ #research-papers (1 messages):

  • Spatial Reasoning
  • Neuro-Symbolic AI
  • Program Search
  • Program Synthesis

Nous Research AI ▷ #research-papers (1 messages):

  • Spatial Reasoning
  • Neuro-symbolic AI
  • Program Search
  • Program Synthesis

Latent Space ▷ #ai-general-chat (83 messages🔥🔥):

  • Pixtral 12B model
  • Klarna's SaaS strategy
  • New AI models and tools
  • Trieve's funding round
  • Hume's Empathic Voice Interface

Links mentioned:


OpenInterpreter ▷ #general (36 messages🔥):

  • Open Interpreter Capabilities
  • Documentation Clarity
  • Early Access for Desktop App
  • Discontinuation of 01 Light
  • Exploration of Hardware with Open Interpreter

Links mentioned:


OpenInterpreter ▷ #O1 (45 messages🔥):

  • Open Interpreter installation issues
  • Mobile app requirements
  • Updating 01
  • Upcoming desktop app
  • Differences between Open Interpreter and 01

Links mentioned:


OpenInterpreter ▷ #ai-content (2 messages):

  • RAG Context from JSONL Data
  • Pheme News GitHub Repository
  • NER Process and Neo4j Loading
  • Information Differentiation in News

Link mentioned: GitHub - CodeAKrome/Pheme-News: Differentiate between mis/dis/mal-information in news using NLP to track actors and their interconnectivity with each other and world evens in a holistic fashion.: Differentiate between mis/dis/mal-information in news using NLP to track actors and their interconnectivity with each other and world evens in a holistic fashion. - CodeAKrome/Pheme-News


Cohere ▷ #discussions (52 messages🔥):

  • Cohere Integration Projects
  • Mistral Vision Model
  • Human Oversight in AI
  • Discord FAQ Bot Development

Cohere ▷ #questions (5 messages):

  • Aya-101 End-of-life
  • Fine-tuning LLMs
  • Aya-23 Release

Link mentioned: C4AI Aya 23 - a CohereForAI Collection: no description found


Cohere ▷ #api-discussions (6 messages):

  • Cohere API functionality
  • Feedback process for API
  • Polymorphic objects in JSON

Cohere ▷ #projects (1 messages):

  • AI developer seeking project

Eleuther ▷ #general (15 messages🔥):

  • lm-evaluation-harness
  • pile-t5 performance
  • benchmark paper finalization
  • huggingface implementation

Eleuther ▷ #research (4 messages):

  • Pixtral-12b-240910
  • RWKV-7 improvements
  • Dynamic state evolution in RWKV-7

Links mentioned:


Eleuther ▷ #interpretability-general (6 messages):

  • Chunking datasets
  • Performance rationale in training
  • EOS token usage

Eleuther ▷ #lm-thunderdome (1 messages):

rimanv_51850: I am preparing a pull request for a task, that will include the fix if that's ok


Eleuther ▷ #multimodal-general (3 messages):

  • image-text multimodal LLM positioning
  • pixtral

Eleuther ▷ #gpt-neox-dev (3 messages):

  • Multinode training
  • DDP across nodes
  • Global batch size impact

LlamaIndex ▷ #blog (4 messages):

  • RAG course
  • Retrieval-Augmented Generation tutorial
  • Kotaemon UI for document QA
  • AI Scheduler workshop

LlamaIndex ▷ #general (24 messages🔥):

  • Task queue for building indexes
  • QueryPipeline run_multi_with_intermediates
  • Saving vectors in ChromaDB
  • Memory management in LlamaIndex
  • Using different LLM providers

Links mentioned:


LangChain AI ▷ #general (15 messages🔥):

  • Query Generation with LLM
  • Connecting Young Entrepreneurs
  • Stripping LLM Responses
  • Building RAG Applications
  • Upstash Redis Memory Debugging

Link mentioned: langchain_core.output_parsers.string.StrOutputParser — 🦜🔗 LangChain 0.2.16: no description found


LangChain AI ▷ #share-your-work (2 messages):

  • OppyDev Update
  • Promo Codes
  • Plugin System
  • RAG System
  • Real-time Code Review

Link mentioned: Documentation - OppyDev: Watch our getting started video and learn more about how to use OppyDev's AI agent powered coding assistan


Torchtune ▷ #general (4 messages):

  • Torchtune FP16 Support
  • Qwen2 Interface Discrepancies
  • EOS ID Handling

Links mentioned:


Torchtune ▷ #dev (5 messages):

  • padded_collate utility
  • ppo recipe modifications

Link mentioned: torchtune/torchtune/data/_collate.py at eb92658a360d7a7d4ce1c93bbcf99c99a2e0943b · pytorch/torchtune: A Native-PyTorch Library for LLM Fine-tuning. Contribute to pytorch/torchtune development by creating an account on GitHub.


DSPy ▷ #papers (2 messages):

  • Sci Scope Tool
  • Evaluating AI Outputs

Link mentioned: Sci Scope: An AI generated newspaper on AI research


DSPy ▷ #general (2 messages):

  • DSPy Customizations
  • Dynamic Prompting Techniques

tinygrad (George Hotz) ▷ #learn-tinygrad (3 messages):

  • Running audio models with tinygrad
  • Whisper example
  • Getting help online

Link mentioned: How To Ask Questions The Smart Way: no description found


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (1 messages):

  • Mistral's Pixtral Release
  • Multi-modal Support
  • New Message Structure

Link mentioned: wip add new proposed message structure by winglian · Pull Request #1904 · axolotl-ai-cloud/axolotl: no description found


OpenAccess AI Collective (axolotl) ▷ #community-showcase (1 messages):

  • Speed/Performance of LLM Models
  • September 2024 LLM Testing

Link mentioned: Who ?: In this video, we are going to test the world leading LLM models in this September 2024 both in speed and performance.#tokensperseconds #GPT4o #LLM #SOTA #Cl...


LAION ▷ #general (2 messages):

  • NYX model development
  • Collaboration in AI
  • Data sourcing for large models

LLM Finetuning (Hamel + Dan) ▷ #general (1 messages):

  • Literal AI Usability
  • LLM Observability
  • LLM Evaluation
  • LLM Monitoring
  • LLM Integrations

Link mentioned: Literal AI - RAG LLM observability and evaluation platform: Literal AI is the RAG LLM evaluation and observability platform built for Developers and Product Owners.


Mozilla AI ▷ #announcements (1 messages):

  • Ground Truth Data in AI
  • Mozilla Fellowship Grants

Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (1 messages):

  • Error in Evaluation Script
  • API Credential Issues
  • Connection Problems with Urban Dictionary API




{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}