Frozen AI News archive

Vision Everywhere: Apple AIMv2 and Jina CLIP v2

**Apple** released **AIMv2**, a novel vision encoder pre-trained with autoregressive objectives that achieves **89.5% accuracy on ImageNet** and integrates joint visual and textual objectives. **Jina** launched **Jina CLIP v2**, a multimodal embedding model supporting **89 languages** and high-resolution images with efficient Matryoshka embeddings reducing dimensions by **94%** with minimal accuracy loss. **Allen AI** introduced **Tülu 3** models based on **Llama 3.1** with **8B and 70B** parameters, offering **2.5x faster inference** and alignment via SFT, DPO, and RLVR methods, competing with **Claude 3.5** and **Llama 3.1 70B**. These developments highlight advances in autoregressive training, vision encoders, and multilingual multimodal embeddings.

Canonical issue URL

AI News for 11/22/2024-11/23/2024. We checked 7 subreddits, 433 Twitters and 28 Discords (211 channels, and 2674 messages) for you. Estimated reading time saved (at 200wpm): 265 minutes. You can now tag @smol_ai for AINews discussions!

Inline with the general theme of everyone going multimodal (Pixtral, Llama 3.2, Pixtral Large), advancements in "multimodal" (really just vision) embeddings are very foundational. This makes Apple and Jina's releases in the past 48 hours particularly welcome.

Apple AIMv2

Their paper (GitHub here) details "a novel method for pre-training of large-scale vision encoders": pairing the vision encoder with a multimodal decoder that autoregressively generates raw image patches and text tokens.

image.png

This extends last year's AIMv1 work on vision models pre-trained with an autoregressive objective, which added T5-style prefix attention and a token-level prediction head, managing to pre-train a 7b AIM that achieves 84.0% on ImageNet1k with a frozen trunk.

The main update is introducing joint visual and textual objectives, which seem to scale up very well:

image.png

AIMV2-3B now achieves 89.5% accuracy on the same benchmark - smaller but better. The qualitative vibes are also excellent:

image.png

Jina CLIP v2

While Apple did more foundational VQA research, Jina's new CLIP descendant is immediately useful for multimodal RAG workloads. Jina released embeddings-v3 a couple months ago, and now is rolling its text encoder into its CLIP offering:

image.png

The tagline speaks to how many state of the art features Jina have packed into their release: "a 0.9B multimodal embedding model with multilingual support of 89 languages, high image resolution at 512x512, and Matryoshka representations."

The Matryoshka embeddings are of particular distinction: "Compressing from 1024 to 64 dimensions (94% reduction) results in only an 8% drop in top-5 accuracy and 12.5% in top-1, highlighting its potential for efficient deployment with minimal performance loss."

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

1. Cutting-edge AI Model Releases and Developments: Tülu 3, AIMv2, and More

2. AI Agents Enhancements & Applications: FLUX Tools, Insights from Suno

3. AI, Science, and Society

4. Advancements in AI Ethics, Red Teaming, and Bug Fixing

5. Collaborations and Innovations in Companies and Tools

6. Memes, Humor, and Social Commentary


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. DeepSeek Emerges as Leading Chinese Open Source AI Company

Theme 2. Innovative Model Architectures: Marco-o1 and OpenScholar

Theme 3. System Prompts and Tokenizer Optimization Insights

Theme 4. INTELLECT-1: Distributed Training Innovation

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

Theme 1. Amazon x Anthropic $4B Investment & Cloud Partnership

Theme 2. GPT-4o Performance Regression on Technical Benchmarks

Theme 3. LTX Video: New Open Source Fast Video Generation Model

Theme 4. Chinese AI Models Emerge as Potential Competitors


AI Discord Recap

A summary of Summaries of Summaries by O1-preview

Theme 1. The AI Arms Race: New Models and Breakthroughs

Theme 2. Billion-Dollar Moves: Anthropic and Amazon Shake Hands

Theme 3. AI Accused: OpenAI Deletes Evidence in Lawsuit

Theme 4. AI Tools Get Smarter: Enhancing Development and Workflows

Theme 5. AI Art and Creativity: Machines with a (Sense of) Humor


PART 1: High level Discord summaries

Eleuther Discord


Unsloth AI (Daniel Han) Discord


LM Studio Discord


HuggingFace Discord


Latent Space Discord


OpenAI Discord


aider (Paul Gauthier) Discord


Nous Research AI Discord


OpenRouter (Alex Atallah) Discord


Perplexity AI Discord


Stability.ai (Stable Diffusion) Discord


LlamaIndex Discord


Notebook LM Discord Discord


Interconnects (Nathan Lambert) Discord


GPU MODE Discord


Cohere Discord


tinygrad (George Hotz) Discord


Modular (Mojo 🔥) Discord


Torchtune Discord


LLM Agents (Berkeley MOOC) Discord


OpenInterpreter Discord


DSPy Discord


OpenAccess AI Collective (axolotl) Discord


LAION Discord


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Eleuther ▷ #announcements (1 messages):

Revitalizing Reading Groups, Discord Forum Feature, Monthly Reading Group, YouTube Recordings, Feedback for New Groups


Eleuther ▷ #general (15 messages🔥):

N-shot prompting datasets, Quantum modeling assistance, Pre-NeurIPS gathering in SF, Vector environments in reinforcement learning, AI agent development tools

Links mentioned:


Eleuther ▷ #research (408 messages🔥🔥🔥):

Test-Time Training (TTT), Wave Network Token Representation, Learnable Positional Embeddings, RNN Extrapolation, Muon Orthogonalization

Links mentioned:


Eleuther ▷ #lm-thunderdome (13 messages🔥):

API Rate Limit Management, MCQ Implementation, Parsing CLI Arguments, Model Evaluation, Bug Fix for tokenizer_backend

Links mentioned:


Eleuther ▷ #multimodal-general (1 messages):

Types of models, User preferences


Unsloth AI (Daniel Han) ▷ #general (240 messages🔥🔥):

Unsloth updates, Fine-tuning models, Mistral model performance, Image generation models, Qwen model limitations

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (16 messages🔥):

Greek Yogurt Diet, Health Optimization, GPU Service Options


Unsloth AI (Daniel Han) ▷ #help (69 messages🔥🔥):

Fine-tuning Models, Inference Issues, Model Compatibility, Continued Pretraining, Tokenization Errors

Links mentioned:


LM Studio ▷ #general (153 messages🔥🔥):

Multi-GPU Processing, GPU Performance Comparisons, LM Studio Installation Issues, Model Fine-tuning vs. RAG, System Resource Requirements

Links mentioned:


LM Studio ▷ #hardware-discussion (121 messages🔥🔥):

eGPU Gang, Benchmarking LLMs with AMD GPUs, MacBook Performance for AI Tasks, Power Consumption and GPU Efficiency, Upcoming Graphics Cards

Links mentioned:


HuggingFace ▷ #general (215 messages🔥🔥):

Mamba2 Support Added, Errors with Faster-Whisper, AI Models and Language Training, Comedy by AI Models, Browser Integration with AI

Links mentioned:


HuggingFace ▷ #today-im-learning (4 messages):

Docker file with FFmpeg, Python latest version, Node.js latest version


HuggingFace ▷ #cool-finds (5 messages):

FLUX.1 Tools Release, Decentralized AI Model Training, LivePortrait Quota Issues, Classic Paper on Heuristic AI

Links mentioned:


HuggingFace ▷ #i-made-this (10 messages🔥):

IntelliBricks toolkit, Eternal AI framework, Social Receipt Generator, Cybertron v4 UNA-MGS model, Autotiktokenizer Windows support

Links mentioned:


HuggingFace ▷ #NLP (4 messages):

Discord chatbots for persona NLPs, Cerebras and Llama 3.1


HuggingFace ▷ #diffusion-discussions (17 messages🔥):

CLIP models, T5 Token Limit, Flux Tools Integration, Image Variation with FLUX.1 Redux, Serverless Implementation

Links mentioned:


Latent Space ▷ #ai-general-chat (87 messages🔥🔥):

AI Art Turing Test, Anthropic's $4 Billion Investment from AWS, LTX Video Generation Model, AI Vibrancy Rankings Tool, OpenAI's Deleted Training Findings

Links mentioned:


Latent Space ▷ #ai-in-action-club (162 messages🔥🔥):

LLM powered requirements analysis, Obsidian integration, Using AI for coding, Paranoiac-critical method, Windsurf tool improvement

Links mentioned:


OpenAI ▷ #ai-discussions (186 messages🔥🔥):

Voice Cloning, AI Accents Understanding, Bluesky vs Twitter, ChatGPT Developments, Airtable and Notion Integration


OpenAI ▷ #gpt-4-discussions (14 messages🔥):

Teaching GPT vocabulary constraints, Alternatives to Dall-E, Image generation models, Accessing free models


OpenAI ▷ #prompt-engineering (5 messages):

Maximizing AI response, Prompt engineering, Using variables in prompts, Humor in discussions


OpenAI ▷ #api-discussions (5 messages):

Effective AI Prompts, Prompt Engineering Discussion, Using Variables in Prompts


aider (Paul Gauthier) ▷ #general (155 messages🔥🔥):

Qwen models performance, Aider benchmark results, OpenRouter vs direct API access, Quantization impact on model performance, Recent investments in AI

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (38 messages🔥):

Managing context in Aider, Saving chat sessions, API connection errors, Benchmarking costs and performance, File detection issues in Aider

Link mentioned: xAI: aider is AI pair programming in your terminal


aider (Paul Gauthier) ▷ #links (1 messages):

Uithub Tool, AI-Driven Development, GitHub Alternatives

Link mentioned: uithub - Easily ask your LLM code questions: no description found


Nous Research AI ▷ #general (150 messages🔥🔥):

Daily LLM Drop, House of Lords Political Structure, Reasoning Datasets Quality, Alibaba's AI Developments

Links mentioned:


Nous Research AI ▷ #ask-about-llms (19 messages🔥):

Agent APIs availability, Fine-tuning and datasets, Graphical User Interfaces for LLMs, Chat interfaces for multiple models

Link mentioned: 🏡 Home | Open WebUI: Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI interface designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs...


Nous Research AI ▷ #research-papers (2 messages):

Marco-o1 model release, Research on reasoning models, Open-ended problem solving, Authors of new AI research

Links mentioned:


Nous Research AI ▷ #interesting-links (2 messages):

Agentic Translation Workflow, Few-shot Prompting, Iterative Feedback, LLM Output Refinement


Nous Research AI ▷ #research-papers (2 messages):

Marco-o1 model, ChatGPT o1 alternative, Open-ended problem-solving, Real-world reasoning models

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Claude 3.5 Haiku renaming, Model ID changes, Discord requests for models

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (118 messages🔥🔥):

Gemini Model Issues, OpenRouter API Usage, Tax on OpenRouter Credits, Prompt Engineering Strategies, Engineering Community Updates

Links mentioned:


OpenRouter (Alex Atallah) ▷ #beta-feedback (8 messages🔥):

Access to custom provider keys


Perplexity AI ▷ #general (81 messages🔥🔥):

Pro user approval process, Gemini AI experience, Perplexity extensions, AI accessibility for non-coders, AI model comparisons

Links mentioned:


Perplexity AI ▷ #sharing (15 messages🔥):

Luca - Last Universal Common Ancestor, Digital Twins, Jaguar Car Sales, AI Impact on Grammarly, Processing Techniques


Perplexity AI ▷ #pplx-api (2 messages):

API site status


Stability.ai (Stable Diffusion) ▷ #general-chat (79 messages🔥🔥):

Using Image Prompts in SDXL Lightning, Setting Parameters in WebUI, Generating Pixar-style Images, Video Fine Tuning Services, Stable Diffusion Download and Use Cases

Links mentioned:


LlamaIndex ▷ #general (63 messages🔥🔥):

Function calling in workflows, LlamaIndex security compliance, Ollama package issues, Hugging Face embedding format issues, LlamaParse parsing instructions

Links mentioned:


LlamaIndex ▷ #ai-discussion (1 messages):

LlamaParse issues, Scientific article parsing, Redundant information in parsing, Document flow maintenance, Bibliography exclusion


Notebook LM Discord ▷ #use-cases (10 messages🔥):

Podcast creation with NotebookLM, YouTube videos on AI and Robotics, Feature requests for Producer Studio, Translating NotebookLM audio, Social media content generation

Links mentioned:


Notebook LM Discord ▷ #general (46 messages🔥):

Podcast API Alternatives, Language Support in Audio, Podcast Creation Limitations, NotebookLM Usage Issues, Retrieval-Augmented Generation in NotebookLM

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (19 messages🔥):

OpenAI's Copyright Lawsuit, Decentralized Training of 10B Model, Anthropic's Collaboration with AWS, Expectations for New AI Models, Cynicism in AI Development

Links mentioned:


Interconnects (Nathan Lambert) ▷ #other-papers (1 messages):

420gunna: https://x.com/reach_vb/status/1859868073903423821 multimodal encoder bros ✊


Interconnects (Nathan Lambert) ▷ #ml-questions (9 messages🔥):

Flops in AI, Magic from Base Models, Reaction to Amanda's Post, Scale Research, Related Research Papers


Interconnects (Nathan Lambert) ▷ #ml-drama (5 messages):

CamelAIOrg Account Issues, OASIS Social Simulation Project, Customer Service Concerns

Link mentioned: Tweet from Guohao Li (Hiring!) 🐫 (@guohao_li): @OpenAI determined the account of our @CamelAIOrg organization for unknown reasons. It maybe related to our recent OASIS social simulation project we ran with one million agents but I am not sure: htt...


Interconnects (Nathan Lambert) ▷ #random (7 messages):

Black Market Data Sale, Benchmark Buying, Labs Operating Quickly


Interconnects (Nathan Lambert) ▷ #memes (3 messages):

Meme Formats, Language Models, Creative Content


Interconnects (Nathan Lambert) ▷ #rlhf (4 messages):

Tulu 3 Paper, On-policy vs Off-policy DPO, Online DPO Performance


Interconnects (Nathan Lambert) ▷ #posts (1 messages):

SnailBot News: <@&1216534966205284433>


GPU MODE ▷ #general (1 messages):

markus_41856: https://lu.ma/i8bow7sr


GPU MODE ▷ #triton (5 messages):

Triton on AMD GPUs, Swizzle for L2 Cache, Optimizing Triton Kernels, Memory Access Efficiency, MLIR Analysis

Links mentioned:


GPU MODE ▷ #torch (4 messages):

Torch Inductor Compilation Error, Custom Layer .to() Behavior

Links mentioned:


GPU MODE ▷ #algorithms (1 messages):

FlashAttention, Global vs Local Exp Sum


GPU MODE ▷ #cool-links (1 messages):

platers: https://research.character.ai/optimizing-ai-inference-at-character-ai-part-deux/


GPU MODE ▷ #jobs (1 messages):

Flash Attention optimizations, Character AI job opening

Link mentioned: Research Engineer, ML Systems (All Industry Levels): Joining us as a Research Engineer on the ML Systems team, you’ll be working on cutting-edge ML training and inference systems, optimizing the performance and efficiency of our GPU clusters, and develo...


GPU MODE ▷ #beginner (2 messages):

ldmatrix tile sizes, torch dispatch to triton


GPU MODE ▷ #torchao (4 messages):

Quantization Schemes Benchmarking, Weight Tensors Distribution, BNB NF4 vs Marlin Performance, Weight Outliers in Projections


GPU MODE ▷ #off-topic (2 messages):

Vercel's v0 Tool, System Prompts Leak, AI Coding Assistance

Link mentioned: Reddit - Dive into anything: no description found


GPU MODE ▷ #sparsity-pruning (1 messages):

Pruning techniques, Model efficiency papers, Data dependent strategies, Industrial applications of LLMS


GPU MODE ▷ #🍿 (4 messages):

GPT-2 Training Method, Discord Bot Integration, OpenCoder Paper Filtering Approach

Link mentioned: Train GPT-2 in five minutes -- for free!: Train GPT-2 in five minutes -- for free! GitHub Gist: instantly share code, notes, and snippets.


GPU MODE ▷ #edge (1 messages):

NPU Acceleration, Executorch, Qualcomm NPUs


Cohere ▷ #discussions (14 messages🔥):

Cohere API front-end, Chat history editing feature

Link mentioned: imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...


Cohere ▷ #projects (1 messages):

jilldomi_48896: This is super cool

https://docs.cohere.com/page/sql-agent-cohere-langchain


tinygrad (George Hotz) ▷ #general (4 messages):

SDXL casting issues, Regression concerns, Intermediate casting strategy


tinygrad (George Hotz) ▷ #learn-tinygrad (5 messages):

Custom Kernel Functions Support, Introduction to Tinygrad, Tensor Stride and View Hopping, objc_id and ctypes Behavior, Function Call Behavior in ops_metal.py

Link mentioned: tinygrad/tinygrad/runtime/ops_metal.py at master · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ - tinygrad/tinygrad


Modular (Mojo 🔥) ▷ #mojo (7 messages):

Mojo and Python integration, Async capabilities in Mojo, Performance challenges, Roadmap for Mojo enhancements, Parameterization and traits in Mojo


Torchtune ▷ #general (3 messages):

HF transfer, Download Speed Improvements, Internet Connection Impact

Link mentioned: Use hf transfer as default by felipemello1 · Pull Request #2046 · pytorch/torchtune: Context What is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here) pip install huggingface_hub[hf_transfer] HF_HUB_ENABLE_H...


Torchtune ▷ #papers (4 messages):

AI model evaluations, Statistical theory in model comparisons, Central Limit Theorem application, AI research community response

Link mentioned: A statistical approach to model evaluations: A research paper from Anthropic on how to apply statistics to improve language model evaluations


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (5 messages):

Hackathon Location, Team Registration Confirmation


LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (1 messages):

danman117: really good presenation from Percy Liang this week!


OpenInterpreter ▷ #general (4 messages):

Posting Job Ads, Desktop App Release, Exponent Demo, Open Source Devin, Windsurf Feedback


OpenInterpreter ▷ #O1 (1 messages):

Installing O1, Using Groq API, Free APIs on Linux


DSPy ▷ #general (3 messages):

VLMs for Invoice Processing, DSPy Support for VLMs, Complexity in Project Development


OpenAccess AI Collective (axolotl) ▷ #general (1 messages):

INTELLECT-1, Decentralized training, Open-source AGI, Fine-tuning with Axolotl

Link mentioned: Tweet from Prime Intellect (@PrimeIntellect): We did it — the first decentralized training of a 10B model is complete! Trained across the US, Europe, and Asia 🌐 Post-training with @arcee_ai is underway, and a full open-source release is coming...


LAION ▷ #general (1 messages):

Neural Turing Machines, Differentiable Neural Computers





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}