Frozen AI News archive

Gemini (Experimental-1114) retakes #1 LLM rank with 1344 Elo

**Anthropic** released the **3.5 Sonnet** benchmark for jailbreak robustness, emphasizing adaptive defenses. **OpenAI** enhanced **GPT-4** with a new RAG technique for contiguous chunk retrieval. **LangChain** launched **Promptim** for prompt optimization. **Meta AI** introduced **NeuralFeels** with neural fields for visuotactile perception. **RichardMCNgo** resigned from **OpenAI**, highlighting concerns on **AI governance** and **theoretical alignment**. Discussions emphasized the importance of **truthful public information** and **ethical alignment** in AI deployment. The latest **Gemini** update marks a new #1 LLM amid alignment challenges. The AI community continues to focus on **benchmarking**, **prompt-engineering**, and **alignment** issues.

Canonical issue URL

Race dynamics is all you need.

AI News for 11/13/2024-11/14/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (217 channels, and 2424 messages) for you. Estimated reading time saved (at 200wpm): 272 minutes. You can now tag @smol_ai for AINews discussions!

Special note from the team: Thanks Andrej! Hi to the >3k of you who joined us! As a brief intro, hi, we are AI News, a side project started over the 2023 holiday break to solve AI Discord overwhelm almost 1 year ago. We currently save ~15 human years of reading per day.


When Anthropic announced 3.5 Sonnet in June, they also published an oddly descriptive chart demonstrating what Dario terms a "race to the top" - the world's top 3 AI labs (ex Meta/X.ai/01.ai) running up benchmarks in tight lockstep. With the latest Nov 14 edition of Gemini, we can now update this chart with the fall editions of all 3 frontier models:

image.png

LMArena (formerly LMsys) explains the rank updates best:

image.png

There is no paper accompanying this update, nor is it yet available in the API, so there's unfortunately not much else to discuss here - normally a disqualifier for feature story, but when we have a new #1 LLM, we have to report on it.

This update comes at a convenient time for Gemini just as it deals with some very bizarre and alarming alignment issues.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Developments and Tools

AI Governance and Ethics

Scaling AI and Evaluation Challenges

Software Tools, Libraries, and Development Platforms

AI Research and Papers


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Nvidia RTX 5090 enters production with 32GB VRAM

Theme 2. MMLU-Pro scores: Qwen and Claude Sonnet models

Theme 3. Qwen2.5 RPMax v1.3: Creative Writing Model

Theme 4. Qwen 32B vs 72B-Ins on Leetcode Comparison

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Gemini 1.5 Pro Released - Claims Top Spot on LMSys Leaderboard

Theme 2. Undetectable ML Model Backdoors Using Digital Signatures - New Research

Theme 3. New CogVideoX-5B Open Source Text-to-Video Model Released

Theme 4. StackOverflow Traffic Plummets as AI Tools Rise


AI Discord Recap

A summary of Summaries of Summaries by O1-preview

Theme 1. AI Models Take the Spotlight: Gemini Soars and New Releases Impress

Theme 2. AI Gets Cozy with Developers: Tools Integrate into Coding Environments

Theme 3. Data Privacy Panic: GPT-4 and LAION Face Scrutiny

Theme 4. Robots Meet AI: Benchmarking Vision Language Action Models

Theme 5. Ads Crash the AI Party: Users Frown at Sponsored Questions


PART 1: High level Discord summaries

HuggingFace Discord


LM Studio Discord


Unsloth AI (Daniel Han) Discord


OpenRouter (Alex Atallah) Discord


Eleuther Discord


aider (Paul Gauthier) Discord


Nous Research AI Discord


Modular (Mojo 🔥) Discord


Perplexity AI Discord


Interconnects (Nathan Lambert) Discord


GPU MODE Discord


Notebook LM Discord Discord


Latent Space Discord


OpenAI Discord


OpenInterpreter Discord


Cohere Discord


LlamaIndex Discord


tinygrad (George Hotz) Discord


OpenAccess AI Collective (axolotl) Discord


LAION Discord


DSPy Discord


LLM Agents (Berkeley MOOC) Discord


Gorilla LLM (Berkeley Function Calling) Discord


AI21 Labs (Jamba) Discord


Mozilla AI Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Torchtune Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Stability.ai (Stable Diffusion) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

HuggingFace ▷ #general (392 messages🔥🔥):

  • GPT-4 Data Leak
  • Hugging Face AI Models
  • LLM Integration Hypotheticals
  • Sample Size and Model Training
  • Coffee Preferences

Links mentioned:


HuggingFace ▷ #today-im-learning (2 messages):

  • AI image generation
  • Game development
  • Bone animation in Unity
  • Project journey resources

HuggingFace ▷ #cool-finds (5 messages):

  • Platform Affiliation
  • User Trust Concerns

HuggingFace ▷ #i-made-this (51 messages🔥):

  • Benchmarking Vision Language Action Models
  • Kokoro TTS Model Updates
  • IDEFICS3_ROCO Medical Imaging Project
  • VividNode v1.7.1 Release
  • Data Mixing Script

Links mentioned:


HuggingFace ▷ #reading-group (51 messages🔥):

  • AI Reading Group Introduction
  • Questions on Mitigation
  • Public Domain Datasets
  • Technical Feasibility of Hardware Setup

HuggingFace ▷ #computer-vision (3 messages):

  • Open3D-ML
  • O3D and Its Historical Context
  • 3D Object Classification
  • LiDAR Applications
  • Point Cloud Library Usage

Links mentioned:


HuggingFace ▷ #diffusion-discussions (1 messages):

  • Stable Diffusion 1.5
  • CPU performance optimization

LM Studio ▷ #general (54 messages🔥):

  • In-line LaTeX rendering in LM Studio
  • Sideloading llama.cpp
  • Running large models on limited RAM
  • Autogen and API issues
  • Nexus team performance

LM Studio ▷ #hardware-discussion (246 messages🔥🔥):

  • GPU performance with large models
  • CPUs vs GPUs for LLM workloads
  • M4 Max benchmark comparison
  • Model offloading to different hardware
  • Integrating AI in SaaS applications

Unsloth AI (Daniel Han) ▷ #general (217 messages🔥🔥):

  • Unsloth AI Training Efficiency
  • Understanding LLMs and Math
  • Editing Code with AI Tools
  • GPU Programming and Triton
  • Educational Chatbot Data Chunking

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (11 messages🔥):

  • Brunch Choices
  • Diet Adjustments
  • Animal-derived Products
  • Nuts and Seeds Discussion

Unsloth AI (Daniel Han) ▷ #help (31 messages🔥):

  • Train on responses only function
  • LoRA parameters in fine-tuning
  • Dataset quality concerns
  • French chatbot model selection
  • Using LoftQ without unquantized models

Links mentioned:


Unsloth AI (Daniel Han) ▷ #community-collaboration (2 messages):

  • Harmony project
  • Open-source questionnaire harmonization
  • LLM matching competition
  • Natural Language Processing enhancements

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

  • UnslopNemo 12B v4
  • SorcererLM
  • Inferor 12B
  • Model Status Updates
  • UI Improvements

Links mentioned:


OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

  • GitHub open source project policies
  • WordPress Chatbot Plugin Launch
  • Companion Discord Bot Features

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (201 messages🔥🔥):

  • Unslopnemo 12b
  • DeepSeek context limitations
  • Gemini API updates
  • OpenRouter API Issues
  • AI Studio generateSpeech API

Links mentioned:


OpenRouter (Alex Atallah) ▷ #beta-feedback (7 messages):

  • Custom Provider Keys
  • Customer Integration Access

Eleuther ▷ #general (43 messages🔥):

  • Job Transitioning Challenges
  • Downloading The Pile Dataset
  • IBM's Granite and Open Source
  • Transformer Architecture Evolution
  • Hardware Developments for AI

Links mentioned:


Eleuther ▷ #research (123 messages🔥🔥):

  • Benchmarking Vision Language Action models
  • Discussion on Scaling Laws
  • Shampoo and Muon Algorithms in Optimization
  • Impact of Int8 Training
  • Usefulness of Synthetic Tasks

Links mentioned:


Eleuther ▷ #interpretability-general (4 messages):

  • Pythia model suite
  • Mixture of Experts (MoE)
  • OLMo and OLMOE comparison
  • Interpolation-focused training
  • Hyperparameter modernization

Link mentioned: Tweet from Nora Belrose (@norabelrose): If there were a mixture-of-expert version of the Pythia model suite, what sorts of questions would you want to answer with it? Should we try to exactly replicate the Pythia training setup, but with M...


Eleuther ▷ #lm-thunderdome (7 messages):

  • Eval prompt modifications
  • Official parser modifications
  • Mmlu standardization
  • MMMU evaluation details

Links mentioned:


aider (Paul Gauthier) ▷ #announcements (1 messages):

  • Aider v0.63.0
  • Qwen 2.5 Coder 32B Support
  • Web Command Improvement
  • Prompting Enhancements
  • Bug Fixes

aider (Paul Gauthier) ▷ #general (123 messages🔥🔥):

  • Aider enhancements
  • Qwen 2.5 Coder performance
  • Gemini experimental models
  • OpenRouter compatibility
  • CLI scripting with Aider

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (29 messages🔥):

  • Installing Aider in Termux
  • Triggering Rust Analyzer in VSCode
  • Using Aider with git diff
  • Aider modes comparison
  • Aider usage tips

Links mentioned:


aider (Paul Gauthier) ▷ #links (2 messages):

  • Organizing Code for AI
  • Aider Discord Guidelines
  • Server Rule Changes

Nous Research AI ▷ #general (142 messages🔥🔥):

  • Joining Forge API Beta
  • 3D Printer Recommendations
  • Hermes Programming Insights
  • Research Project Participation
  • TEE Wallet Collation Concerns

Links mentioned:


Nous Research AI ▷ #interesting-links (6 messages):

  • Rizzler
  • Slang Translator
  • Translation Tool
  • Resume to Website Tool

Links mentioned:


Modular (Mojo 🔥) ▷ #general (1 messages):

aka_afnan: Hi beautiful community i just finished basic tutorials on mojo lang.


Modular (Mojo 🔥) ▷ #mojo (120 messages🔥🔥):

  • Mojo Low-Level Syntax
  • Performance of High-Level Syntax vs C
  • Recursive Vectorization & Tail Call Optimization
  • LLVM and MLIR in Mojo
  • Importance of Language Features

Links mentioned:


Perplexity AI ▷ #general (72 messages🔥🔥):

  • Perplexity's Campus Strategist Program
  • Ads and subscription model concerns
  • Updates on model availability
  • Gemini performance in Chatbot Arena
  • API dashboard issues

Links mentioned:


Perplexity AI ▷ #sharing (8 messages🔥):

  • Perplexity AI features
  • Best mouse for work
  • Google Gemini launch
  • Sharing thread settings

Perplexity AI ▷ #pplx-api (7 messages):

  • Vercel AI SDK usage
  • Reddit citation issues
  • Search domain filter problem

Interconnects (Nathan Lambert) ▷ #news (22 messages🔥):

  • AI Agent Tool Operator Launch
  • Francois Chollet Leaves Google
  • Gemini-Exp-1114 Performance
  • ChatGPT for macOS Updates
  • Scaling Laws Theory Concerns

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (1 messages):

420gunna: https://x.com/richardmcngo/status/1856843040427839804?s=46


Interconnects (Nathan Lambert) ▷ #random (18 messages🔥):

  • Qwen vs Llama performance
  • Cognitive revolution podcast
  • Simple division problems with Qwen
  • Synthetic data in model training

Interconnects (Nathan Lambert) ▷ #memes (26 messages🔥):

  • Leadership Strategies
  • Open-source AI Discussion
  • Scaling Laws in Labs
  • Discord Shop Characters

Links mentioned:


Interconnects (Nathan Lambert) ▷ #posts (9 messages🔥):

  • Andrew Carr Interview
  • Gemini 1.5 Ultra
  • Claude 3.5 Opus
  • Personas in AI
  • Scaling Realities

Link mentioned: Andrew Carr on Pushing the Boundaries of Generative AI (Beyond Text),): Andrew Carr is co-founder and chief scientist at Cartwheel, where he is building text-to-motion AI models and products for gaming, film, and other creative e...


GPU MODE ▷ #general (2 messages):

  • rapids cudf

GPU MODE ▷ #triton (4 messages):

  • Kernel Design Challenges
  • Triton and Performance Tuning
  • Issues with torch.compile

Link mentioned: torch.compile breaks with Triton built from source · Issue #140423 · pytorch/pytorch: 🐛 Describe the bug torch.compile breaks with Triton built from source (as of Nov 12): How to reproduce: Build Triton from the master branch Run torch.compile with a model containing Triton modules,.....


GPU MODE ▷ #torch (2 messages):

  • Direct Access to GPU
  • Torch.compile() with DDP
  • Torch.compile() with FSDP

GPU MODE ▷ #beginner (4 messages):

  • GPU profiling tools
  • Thread creation on GPUs
  • RGB to greyscale conversion performance

GPU MODE ▷ #off-topic (1 messages):

  • Feijoa dessert
  • Grilled beef patties
  • Ivan tea

GPU MODE ▷ #rocm (1 messages):

leiwang1999_53585: did you use ck profiler?


GPU MODE ▷ #self-promotion (3 messages):

  • Video Length Discussions
  • Interest in Triton Content

GPU MODE ▷ #🍿 (1 messages):

apaz: <@325883680419610631>
https://github.com/gpu-mode/discord-cluster-manager/issues/23


GPU MODE ▷ #thunderkittens (34 messages🔥):

  • Kernel Shared Memory
  • Matrix Multiplication Optimization
  • Dynamic Shared Memory Issues
  • CUDA Function Attributes

Link mentioned: Using maximum shared memory in Cuda: I am unable to use more than 48K of shared memory (on V100, Cuda 10.2) I call cudaFuncSetAttribute(my_kernel, cudaFuncAttributePreferredSharedMemoryCarveout, ...


GPU MODE ▷ #edge (5 messages):

  • React Native LLM Library
  • LLM Inference on Android
  • Transformer Memory Bound
  • Bitnet 1.58 A4
  • GGUF Q8 Performance

Link mentioned: GitHub - software-mansion/react-native-executorch: Contribute to software-mansion/react-native-executorch development by creating an account on GitHub.


Notebook LM Discord ▷ #use-cases (16 messages🔥):

  • Magic Book Podcast Experiment
  • Functionality Concerns with Podcast Tools
  • Mobile Version Usability Issues
  • Summarizing 'The Body Keeps Score'
  • Connecting Old Theories with Current Events

Link mentioned: Top Shelf: Podcast · Four By One Technologies · "Top Shelf" is your go-to podcast for quick, insightful takes on today’s best-selling books. In just 15 minutes, get the gist, the gold, and a fresh pers...


Notebook LM Discord ▷ #general (40 messages🔥):

  • Privacy and Data Security on NotebookLM
  • Feature Requests for NotebookLM
  • Pronunciation Issues in NotebookLM
  • User Experience Feedback

Link mentioned: Privacy - Help: no description found


Latent Space ▷ #ai-general-chat (53 messages🔥):

  • Perplexity Ads Introduction
  • AI Agent Performance Update
  • ChatGPT Desktop App Enhancements
  • Gemini AI Feedback
  • Tech Debt and AI Impact

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):

swyxio: posted on hn!


OpenAI ▷ #ai-discussions (31 messages🔥):

  • AI-Driven Computer Control
  • Lorebook for GPT
  • Changes in Mac App Interface
  • Future of AI Advancements
  • Image Tools in Copilot

OpenAI ▷ #gpt-4-discussions (11 messages🔥):

  • Using LLMs Effectively
  • Content Flags Concerns
  • Custom GPTs Usage
  • Roleplay Character Creation
  • Model Performance in Writing

OpenAI ▷ #prompt-engineering (5 messages):

  • ChatGPT capabilities
  • Retrieving information from ancient texts
  • Nostalgia for old prompting techniques
  • Model improvements in games

OpenAI ▷ #api-discussions (5 messages):

  • 9 Pillars Solutions
  • Information Retrieval from Ancient Texts
  • Advancements in Technology
  • Model Performance in Games

OpenInterpreter ▷ #general (34 messages🔥):

  • Dockerized Open Interpreter
  • Open Interpreter as a Shell Pass-Through
  • Beta App Performance
  • Worker Pool Configuration
  • Memory Store Concept

OpenInterpreter ▷ #ai-content (7 messages):

  • VividNode v1.7.1 release
  • Voice Lab framework
  • ChatGPT for macOS
  • Probabilistic computing breakthroughs

Links mentioned:


Cohere ▷ #discussions (15 messages🔥):

  • Cohere embedding models
  • Discord access issues
  • Fostering young talent in AI and robotics
  • Podcast content analysis
  • Upcoming events

Link mentioned: Consent in Crisis: The Rapid Decline of the AI Data Commons: AI Reading Group session with one of the authors of "Consent in Crisis: The Rapid Decline of the AI Data Commons".


Cohere ▷ #announcements (1 messages):

  • Research Prototype Beta Program
  • Text-based Deliverables
  • User Feedback for Tool Development

Link mentioned: Research Prototype - Early Beta Sign Up Form: Thank you for your interest in participating in the beta testing phase of our research prototype — a tool designed to help users tackle research and writing tasks such as: creating complex reports, do...


Cohere ▷ #questions (2 messages):

  • Bug reporting process

Cohere ▷ #api-discussions (13 messages🔥):

  • HTTP Request Details
  • Network Error Analysis
  • Azure AI V2 API Status

Link mentioned: Cohere on Azure — Cohere: This page describes how to work with Cohere models on Microsoft Azure.


Cohere ▷ #projects (1 messages):

  • Vision Language Action Models
  • Benchmarking Robotic Learning Tasks
  • SoTA VLMs like GPT4o
  • Multimodal Action Models
  • Collaborative Research Release

Link mentioned: Tweet from harsh (@HarshSikka)): Excited to share our new paper "Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks" We evaluate how well VLM & VLA models can control robots across 20 different real-wor...


LlamaIndex ▷ #blog (1 messages):

  • RAGformation
  • Cloud architecture automation
  • Dynamic flow diagrams
  • Pricing estimates for architecture

LlamaIndex ▷ #general (26 messages🔥):

  • Memory for AI agents
  • Go version of LlamaIndex
  • ChromaDB ingestion issue
  • Using SentenceSplitter and SentenceWindowNodeParser
  • LlamaParse contact assistance

Link mentioned: Mem0 - LlamaIndex: no description found


tinygrad (George Hotz) ▷ #general (16 messages🔥):

  • GPU resource sharing between tinyboxes
  • MLPerf Training 4.1 results
  • Buffer transfer function in tinygrad
  • Network interactions and bottlenecks
  • PCIe bandwidth capabilities

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (3 messages):

  • Bitwise Operations in Tinygrad
  • CLANG Backend Bug Investigation
  • Tensor Gather Functionality

OpenAccess AI Collective (axolotl) ▷ #general (15 messages🔥):

  • Nanobitz Recommendations
  • Llama Event at Meta HQ
  • Tokenization Strategies
  • Optimal Dataset Size for Fine-Tuning Llama
  • Liger Kernel Improvements

Links mentioned:


LAION ▷ #general (5 messages):

  • EPOCH 58 COCK
  • LAION copyright discussion
  • Public indexing and copyright

Link mentioned: Re: LAION. Downloading 5Billion images 220TB of data permanently on external hard drives is not "Browser caching": Most on this sub are not erudite enough to have opinions about complex copyright law and yet some try to make false equivalence arguments to the...


LAION ▷ #research (5 messages):

  • Benchmarking Vision Language Action Models
  • Watermark Anything
  • AI Generators
  • 12M Public Domain Images

Links mentioned:


DSPy ▷ #show-and-tell (1 messages):

  • ChatGPT for macOS
  • Integration with desktop apps
  • dspy workflows
  • Coding assistance

Link mentioned: Tweet from OpenAI Developers (@OpenAIDevs): ChatGPT 🤝 VS Code, Xcode, Terminal, iTerm2 ChatGPT for macOS can now work with apps on your desktop. In this early beta for Plus and Team users, you can let ChatGPT look at coding apps to provide be...


DSPy ▷ #general (7 messages):

  • Long-code generation with large tokens
  • Deprecation of LM assertions
  • Developing a multi-infraction LLM application

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (2 messages):

  • Quiz Eligibility
  • Course Content Timeline

LLM Agents (Berkeley MOOC) ▷ #mooc-readings-discussion (1 messages):

sheilabel: Happening today! https://www.eventbrite.ca/e/1039740199927?aff=oddtdtcreator


Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (2 messages):

  • Palmyra X 004 model
  • Writer handler implementation
  • Pull Request Review

Link mentioned: [BFCL] Add support for Writer models and Palmyra X 004 by samjulien · Pull Request #755 · ShishirPatil/gorilla: This PR adds support for Writer models and our latest Palmyra X 004 to BFCL. Thank you!


AI21 Labs (Jamba) ▷ #general-chat (2 messages):

  • Legacy Model Deprecation
  • Transition to Open Source Solutions

Mozilla AI ▷ #announcements (1 messages):

  • Local LLMs Workshop
  • SQLite-Vec Metadata Filtering
  • Refact.AI Autonomous Agents





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}