Frozen AI News archive

LMSys killed Model Versioning (gpt 4o 1120, gemini exp 1121)

**AI News for 11/21/2024-11/22/2024** highlights the intense frontier lab race with **OpenAI's gpt-4o-2024-11-20** and **Google DeepMind's gemini-exp-1121** trading top spots on the Lmsys leaderboard. The trend of using date-based model identifiers instead of traditional versioning is noted across leading labs including **Anthropic**. **DeepSeek R1** is gaining attention as a potent open-source alternative, especially in the context of the AI competition between China and the US. **Gemini-Exp-1121** is praised for improvements in vision, coding, and reasoning, while **MistralAI** expands with a new Palo Alto office, signaling growth and hiring.

Canonical issue URL

AI News for 11/21/2024-11/22/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (217 channels, and 2501 messages) for you. Estimated reading time saved (at 200wpm): 237 minutes. You can now tag @smol_ai for AINews discussions!

Frontier lab race dynamics are getting somewhat ridiculous. We used to have a rule that new SOTA models always get top spot, and reported on Gemini Exp 1114 last week even though there was next to no useful detail on it beyond their lmsys ranking. But yesterday OpenAI overtook them again with gpt-4o-2024-11-20, which we fortunately didn't report on (thanks to DeepSeek R1), because it is now suspected of being a worse (but faster) model (we don't know if this is true but it would be a very serious accusation indeed for OpenAI to effectively brand a "mini" model as a mainline model and hope we don't notice), and meanwhile, today Gemini Exp 1121 is out -again- retaking the top lmsys spot from OpenAI.

It's getting so absurd that this joke playing on OpenAI-vs-Gemini-release-coincidences is somewhat plausible:

image.png

The complete suspension of all model release decorum is always justifiable under innocent "we just wanted to get these out into the hands of devs ASAP" type good intentions, but we are now in a situation where all three frontier labs (reminder that Anthropic, despite their snark, has also been playing the date-update-with-no-versioning game) have SOTA model variants uniquely only identified by their dates rather than their versions, in order to keep up on Lmsys.

image.png

Are we just not doing versioning anymore? Hopefully we are, because we're still talking about o2 and gpt5 and claude4 and gemini2, but this liminal lull as the 100k clusters ramp up is a rather local minima nobody is truly happy with.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

Theme 1. DeepSeek and Global AI Advancements

Theme 2. Model Releases and Tech Developments

Theme 3. AI Frameworks and Dataset Releases

Theme 4. Innovative AI Applications and Tools

Theme 5. Benchmarks and Industry Analysis

Theme 6. Memes/Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. M4 Max 128GB: Running 72B Models at 11 t/s with MLX

Theme 2. DeepSeek R1-Lite Preview Shows Strong Reasoning Capabilities

Theme 3. Gemini-exp-1121 Tops LMSYS with Enhanced Coding & Vision

Theme 4. Allen AI's Tulu 3: Open Source Instruct Models on Llama 3.1

Theme 5. NVIDIA KVPress: Open Source KV Cache Compression Research

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

Theme 1. Flux.1 Tools Suite Expands SD Capabilities

Theme 2. NVIDIA/MIT Release SANA: Efficient Sub-1B Parameter Diffusion Model

Theme 3. ChatGPT 4o Nov Update: Better Writing, Lower Test Scores

Theme 4. Claude Free Users Limited to Haiku as Demand Strains Capacity


AI Discord Recap

A summary of Summaries of Summaries by O1-mini

Theme 1. New AI Models Surge Ahead with Enhanced Capabilities

Theme 2. Advanced Fine-Tuning Techniques Propel Model Efficiency

Theme 3. Hardware Solutions and Performance Optimizations Drive AI Efficiency

Theme 4. APIs and Integrations Enable Custom Deployments and Enhancements

Theme 5. Comprehensive Model Evaluations and Benchmark Comparisons Illuminate AI Progress


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


Interconnects (Nathan Lambert) Discord


HuggingFace Discord


OpenAI Discord


LM Studio Discord


aider (Paul Gauthier) Discord


OpenRouter (Alex Atallah) Discord


Stability.ai (Stable Diffusion) Discord


Eleuther Discord


Perplexity AI Discord


Latent Space Discord


Nous Research AI Discord


GPU MODE Discord


Notebook LM Discord Discord


LlamaIndex Discord


Cohere Discord


Modular (Mojo 🔥) Discord


Torchtune Discord


DSPy Discord


LLM Agents (Berkeley MOOC) Discord


tinygrad (George Hotz) Discord


MLOps @Chipro Discord


OpenInterpreter Discord


OpenAccess AI Collective (axolotl) Discord


LAION Discord


Mozilla AI Discord


Gorilla LLM (Berkeley Function Calling) Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Unsloth AI (Daniel Han) ▷ #general (497 messages🔥🔥🔥):

  • Vision support in Unsloth
  • Fine-tuning Qwen and LLaMA models
  • Dataset preparation for multimodal models
  • Licensing and legal considerations
  • Challenges with model merging and format compatibility

Links mentioned:


Unsloth AI (Daniel Han) ▷ #announcements (1 messages):

  • Llama 3.2 Vision
  • Vision/Multi-modal Models
  • Google Colab Notebooks
  • Hugging Face Model Uploads
  • Fine-tuning Improvements

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (1 messages):

  • Training Checkpoints

Unsloth AI (Daniel Han) ▷ #help (122 messages🔥🔥):

  • Model Training and Preprocessing
  • Fine-tuning Process
  • Vision Support
  • Using Ollama
  • Kubernetes vs SLURM for Training

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (3 messages):

  • BFloat16 impact on RoPE
  • AnchorAttention method
  • Long-context training issues

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (98 messages🔥🔥):

  • Tülu 3 release
  • Nvidia's AI Wall
  • Gemini's performance boost
  • Reinforcement Learning techniques
  • Community discussions on model ranking

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-questions (18 messages🔥):

  • Post-training capabilities
  • SFT and new skills
  • Debate on LLM capabilities
  • Diminished philosophy section
  • Training efficiency

Interconnects (Nathan Lambert) ▷ #random (120 messages🔥🔥):

  • GPT-4o Performance Analysis
  • Perceptron AI Launch
  • Model Context Protocol by Anthropic
  • Rickrolls in AI
  • Issues with Academic AI Resources

Links mentioned:


Interconnects (Nathan Lambert) ▷ #posts (34 messages🔥):

  • RLHF vs DPO
  • Cohere and Alignment
  • SnailBot RSS Issues

HuggingFace ▷ #general (182 messages🔥🔥):

  • Hugging Face Models
  • Voice Conversion Using RVC
  • Community Interactions
  • MagicQuil Downloads
  • Model Training Issues

Links mentioned:


HuggingFace ▷ #cool-finds (10 messages🔥):

  • Custom AI Models with Handler Files
  • AI Security Research Paper
  • Automated AI Research Assistant
  • Collaborative Learning Framework in LLMs
  • Efficient Quantization Method for Attention

Links mentioned:


HuggingFace ▷ #i-made-this (9 messages🔥):

  • Neo's Red Pill Journey
  • Prompting Techniques
  • MOUSE-I Web Service
  • Cinematic Image Generation
  • loadimg Downloads Milestone

Links mentioned:


HuggingFace ▷ #reading-group (1 messages):

crazypistachecat: Sory👍


HuggingFace ▷ #computer-vision (6 messages):

  • YOLO for Video Object Detection
  • Stable Diffusion
  • Autodistill Training Method

Links mentioned:


HuggingFace ▷ #NLP (9 messages🔥):

  • Pandas alternatives
  • Building local Agents
  • Fast inference frameworks
  • Scaling data processing

HuggingFace ▷ #diffusion-discussions (13 messages🔥):

  • Open Source Models like SSD-1B
  • Using SDXL in Google Colab
  • Token Embeddings in Shuttle-3

Links mentioned:


OpenAI ▷ #ai-discussions (209 messages🔥🔥):

  • AI Discussions on Censorship
  • Opinions on Agnosticism
  • Deep Web and AI Access
  • OpenAI's Censorship Policies
  • Perplexity vs ChatGPT

OpenAI ▷ #gpt-4-discussions (1 messages):

grundaypress: Hi, does anyone know why the retry button is gone when using Custom GPTs?


OpenAI ▷ #prompt-engineering (3 messages):

  • Categorizing Products with GPT-4
  • Prompt Optimization
  • Prompt Caching for Efficiency

OpenAI ▷ #api-discussions (3 messages):

  • Product Categorization with GPT-4o
  • Token Optimization Strategies
  • Prompt Caching in API usage

LM Studio ▷ #general (63 messages🔥🔥):

  • Hermes 3 Model Performance
  • Cloud-Based GPU Renting
  • LLM GPU Comparisons
  • MLX Models in LMS
  • Graphics Card Recommendations

Links mentioned:


LM Studio ▷ #hardware-discussion (135 messages🔥🔥):

  • AI chip discussions
  • Performance of GPUs
  • Building a local LLM server
  • USB4 with AMD devices
  • Challenges with GPU configurations

Links mentioned:


aider (Paul Gauthier) ▷ #announcements (2 messages):

  • Qwen 2.5 Model Performance
  • Aider v0.64.0 Features
  • Model Quantization Impact
  • Slash Commands in Aider
  • Context Window and Token Costs

Links mentioned:


aider (Paul Gauthier) ▷ #general (133 messages🔥🔥):

  • Aider Leaderboard Changes
  • OpenRouter Providers and Quantization
  • Gemini Model Performance
  • DeepSeek Model Developments
  • User Experiences with AI Models

Links mentioned:

: no description foundAdvanced model settings: Configuring advanced settings for LLMs.Qwen2.5 Coder 32B Instruct - API, Providers, Stats: Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Run Qwen2.5 Coder 32B Instruct with APITweet from Zhihong Shao (@zhs05232838): Our DeepSeek reasoning model is great on code and math. Try it out! Quoting DeepSeek (@deepseek_ai) 🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! 🔍 o1-preview-...no title found: no description found[Feature]: Support OpenRouter's "provider" argument to control/select providers · Issue #6857 · BerriAI/litellm: The Feature OpenRouter supports a variety of mechanisms to select which providers you want your requests to hit. This involves passing a provider argument. Currently that causes an error: import li...Provider Routing | OpenRouter: Route requests across multiple providersProvider Routing | OpenRouter: Route requests across multiple providersModels | OpenRouter: Browse models on OpenRouterQwen on openrouter#aider #lmsys #qwen #llm #aicoding #huggingface: no description foundDeepSeek V2.5 - API, Providers, Stats: DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. Run DeepSeek V2.5 with API🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! | DeepSeek API Docs: 🔍 o1-preview-level performance on AIME & MATH benchmarks.Meta: Llama 3.1 70B Instruct – Provider Status: See provider status and make a load-balanced request to Meta: Llama 3.1 70B Instruct - Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-t...Tongyi Qianwen (Qwen) - Alibaba Cloud: Top-performance foundation models from Alibaba CloudAlibaba Cloud Model Studio - Alibaba Cloud: A one-stop generative AI platform to build intelligent applications that understand your business, based on Qwen and other popular models模型列表_大模型服务平台百炼(Model Studio)-阿里云帮助中心: no description found


aider (Paul Gauthier) ▷ #questions-and-tips (42 messages🔥):

  • Looping with Aider
  • Aider Caching Efficiency
  • Disabling Autocomplete in Aider
  • API Approval for Aider
  • Recommendations for Model Combinations

Links mentioned:


aider (Paul Gauthier) ▷ #links (2 messages):

  • Gemini API
  • uithub

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

  • New models release
  • High context provider selection

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (162 messages🔥🔥):

  • Mistral Model Issues
  • OpenRouter API Functionality
  • Gemini Experimental Models
  • File Upload Capabilities
  • Community Engagement in OpenRouter

Links mentioned:


OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

  • Claude 3.5
  • Custom provider key requests

Stability.ai (Stable Diffusion) ▷ #general-chat (164 messages🔥🔥):

  • Flux performance
  • SDXL usage
  • Image generation issues
  • ControlNet functionality
  • AI model security concerns

Links mentioned:


Eleuther ▷ #general (5 messages):

  • Mamba SSM Layers
  • Data Transfer Times
  • LLM Autophagy Process
  • Evaluation Tasks for Foundational Models
  • Meetup in Wellington

Eleuther ▷ #research (48 messages🔥):

  • FlexAttentions
  • Position Encoding Techniques
  • Forgetting Transformer
  • Sparse Upcycling vs Continued Pretraining
  • Scale of LLM Training

Links mentioned:


Eleuther ▷ #scaling-laws (7 messages):

  • Scaling Laws
  • Evaluation Predictions
  • Marius Hobbhahn's Contributions
  • Meta and OpenAI's Methods
  • Cost of Scaling Law Training

Link mentioned: Observational Scaling Laws and the Predictability of Language Model Performance: Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of ...


Eleuther ▷ #lm-thunderdome (39 messages🔥):

  • lm-eval and pruned models
  • Using Groq API
  • Logits and loglikelihood in QA
  • Custom metrics in lm-harness

Links mentioned:


Eleuther ▷ #multimodal-general (1 messages):

  • Multimodal benchmarks
  • LLaVA performance
  • Text recognition in images

Perplexity AI ▷ #general (83 messages🔥🔥):

  • Pro Channel Access
  • Image Creation on iOS
  • Perplexity Pro Features
  • Subscription Issues
  • Discord Support

Links mentioned:


Perplexity AI ▷ #sharing (7 messages):

  • Pokémon Data AI Model
  • Baltic Sea Cable Sabotage
  • Chicken or Egg Paradox Resolution
  • NVIDIA's Omniverse Blueprint
  • One-Person Startup Era

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (7 messages):

  • API Rate Limits
  • Using Own API Key in Perplexity
  • Session Management in Frontend Apps

Link mentioned: no title found: no description found


Latent Space ▷ #ai-general-chat (59 messages🔥🔥):

  • Truffles hardware device
  • Vercel acquires Grep
  • Tülu 3 model release
  • Flux Tools from Black Forest Labs
  • Gemini API model updates

Links mentioned:


Nous Research AI ▷ #general (29 messages🔥):

  • AI Expert Needed Urgently
  • DeepSeek R1-Lite Specifications
  • Llama-Mesh Paper Recommendation
  • Daily LLM Drop Discussions
  • User Interaction Memory in AI

Links mentioned:


Nous Research AI ▷ #ask-about-llms (6 messages):

  • LLM Performance Improvement
  • KV Cache Limitations
  • Multi-Agent Frameworks
  • Prefix Caching
  • Prompt Caching

Nous Research AI ▷ #interesting-links (4 messages):

  • Soft Prompts vs Fine Tuning
  • CoPilot Arena Results
  • LoRA Trade-offs

GPU MODE ▷ #triton (20 messages🔥):

  • Debugging Triton Interpreter
  • Block Size Discussion
  • Triton GEMM and Bank Conflicts
  • Boolean Mask Summation Bug
  • Swizzling Techniques for Performance

Links mentioned:


GPU MODE ▷ #beginner (2 messages):

  • cuBLAS operations
  • Matrix Multiplication with cuBLAS
  • Row-major vs Column-major order

Link mentioned: cublasSgemm row-major multiplication: I'm trying to use cublasSgemm to multiplicity two non-square matrices that are stored in row-major order. I know that this function has one parameter where you can specify that if you want to tra...


GPU MODE ▷ #torchao (1 messages):

  • Llama-2-70B model
  • Multi-GPU support

GPU MODE ▷ #rocm (11 messages🔥):

  • HIP Kernel Rules
  • Compilation Time for Examples
  • FP16 GEMM on MI250 GPU
  • Debugging Kernel
  • Triton GEMM on ROCm

GPU MODE ▷ #webgpu (1 messages):

  • AGX machine code
  • Freedesktop tools

GPU MODE ▷ #liger-kernel (1 messages):

0x000ff4: A little update on kto I am working now on the tests


GPU MODE ▷ #self-promotion (1 messages):

pradeep1148: https://www.youtube.com/watch?v=XP33Vgn75lM


GPU MODE ▷ #🍿 (1 messages):

  • Post-Training Techniques
  • Human Preferences in RL
  • Continual Learning
  • Constitutional AI
  • Recursive Summarization

Links mentioned:


Notebook LM Discord ▷ #use-cases (9 messages🔥):

  • NotebookLM and GitHub Repositories
  • Audio Prompt Generation
  • Using Multiple LLMs
  • Table of Contents in Code
  • ElevenLabs and Text-to-Speech AI

Links mentioned:


Notebook LM Discord ▷ #general (25 messages🔥):

  • Jensen Huang's shoutout
  • Podcast generation issues
  • Accent preferences
  • Functionality requests

Link mentioned: Reddit - Dive into anything: no description found


LlamaIndex ▷ #blog (2 messages):

  • AI agents architecture
  • Data-backed systems with Redis
  • Knowledge graph construction
  • Natural language querying
  • Memgraph integration

LlamaIndex ▷ #general (16 messages🔥):

  • LlamaParse for PDF table extraction
  • Create-Llama frontend options
  • Llama-Agents deprecation
  • NDCG calculation query
  • vLLM error and usage

Links mentioned:


Cohere ▷ #discussions (8 messages🔥):

  • 30 Days of Python
  • Capstone Project API
  • Learning Resources

Links mentioned:


Cohere ▷ #questions (3 messages):

  • Cohere Repository
  • Cohere Toolkit
  • Jupyter Notebooks
  • Contribution Guidelines

Links mentioned:


Cohere ▷ #api-discussions (5 messages):

  • Multimodal Embeddings Launch
  • Research Agent Use Case
  • Rate Limit Concerns

Link mentioned: CustomGPT.AI Researcher - Create High-Quality AI Content Based On Deep Research: Create ultra high-quality, brand-safe articles and research reports using CustomGPT.ai Researcher. Perfect for content marketing, SEO and research reports.


Cohere ▷ #projects (1 messages):

rachel_47358: https://github.com/harmonydata/harmony


Modular (Mojo 🔥) ▷ #mojo (6 messages):

  • Mojo Async Progress
  • Mojo Community Channel
  • Async Runtime Overhead

Modular (Mojo 🔥) ▷ #max (9 messages🔥):

  • Moonshine ASR Model Performance
  • Mojo Script Optimization
  • CPU Utilization Observations

Links mentioned:


Torchtune ▷ #dev (9 messages🔥):

  • New guidelines for Torchtune contributors
  • Extender packages for Torchtune
  • Binary search method suggestion
  • Hands-on experience with UV
  • Optional packages feature for TorchAO

DSPy ▷ #general (7 messages):

  • Prompt Signature Modification
  • Adapter Configuration
  • Optimization across Models

LLM Agents (Berkeley MOOC) ▷ #hackathon-announcements (1 messages):

  • Intel AMA Session
  • Hackathon Insights

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (4 messages):

  • Quiz 10 Release
  • Hackathon Discussion

tinygrad (George Hotz) ▷ #learn-tinygrad (4 messages):

  • int64 indexing
  • differences in ops_hip.py
  • maintenance of code
  • HIP setting in tinygrad

Links mentioned:


MLOps @Chipro ▷ #events (3 messages):

  • Event link confusion
  • Rescheduling events

OpenInterpreter ▷ #general (3 messages):

  • AI Expert Request
  • Carter Grant Seeking Opportunities

OpenAccess AI Collective (axolotl) ▷ #general (1 messages):

  • MI300X GPU Issues
  • Ablation Set Runs
  • Intermittent GPU Hangs
  • ROCm GitHub Issue

Link mentioned: [Issue]: Intermittent GPU Hang HW Exception by GPU on MI300X when training with axolotl · Issue #4021 · ROCm/ROCm: Problem Description When running axolotl runs, I get intermittent GPU hangs: {'loss': 0.4589, 'grad_norm': 1.0493940198290594, 'learning_rate': 5.284132841328413e-06, 'epoc...


OpenAccess AI Collective (axolotl) ▷ #community-showcase (1 messages):

volko76: Do we still need to prompt correctly ? https://youtu.be/m3Izr0wNfQc


LAION ▷ #general (2 messages):

  • Autoencoder Training

Mozilla AI ▷ #announcements (2 messages):

  • Refact.AI demo
  • Web Applets project
  • Public AI initiative

Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (1 messages):

  • Llama 3.2 prompt usage

Link mentioned: Llama 3.2 | Model Cards and Prompt formats: .




{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}