Frozen AI News archive

Fixing Gemma

**Google's Gemma model** was found unstable for finetuning until **Daniel Han from Unsloth AI** fixed 8 bugs, improving its implementation. **Yann LeCun** explained technical details of a pseudo-random bit sequence for adaptive equalizers, while **François Chollet** discussed the low information bandwidth of the human visual system. **Arav Srinivas** reported that **Claude 3 Opus** showed no hallucinations in extensive testing, outperforming **GPT-4** and **Mistral-Large** in benchmarks. Reflections from **Yann LeCun** highlight ongoing AI progress toward human-level intelligence. The community is shifting pipelines to work better with Claude models, and emotional experiences in ML development were shared by **Aidan Clark**.

Canonical issue URL

Google's recently released Gemma model was widely known to be unstable for finetuning. Last week, Daniel Han from Unsloth got some love for finding and fixing 8 bugs in the implementation, some of which are being upstreamed. There is a thread, blogpost, and today Hacker News commentary and Google Colab to follow along, with some deserved community love.

image.png

It is full of extremely subtle numerical precision issues like this: image.png

Which takes extreme attention to detail to notice. Kudos!


Table of Contents

[TOC]

PART X: AI Twitter Recap

all recaps done by Claude 3 Opus. Today's output is lightly swyx edited. We are working on antihallucination, NER, and context addition pipelines.


PART 0: Summary of Summaries of Summaries

Claude 3 Sonnet (14B?)

  1. Model Finetuning and Performance Optimization:

    • Unsloth AI discussions centered around finetuning Gemma models, dealing with special tokens, and addressing issues like OOM errors. Solutions included updating Unsloth, using pip install "unsloth[cu121-torch220] @ git+https://github.com/unslothai/unsloth.git", and exploring Gradient Low-Rank Projection (GaLore) (ArXiv paper) for reducing memory usage during LLM training.
    • The CUDA MODE community explored techniques like thread coarsening, vectorized memory access, and CUDA profiling tools to optimize performance. Projects like ring-attention and flash decoding were discussed.
    • Answer.AI announced the ability to train 70B models locally using FSDP + QLoRA on standard GPUs like RTX 3090 (blog post).
  2. AI Model Comparisons and Benchmarking:

    • Discussions compared models like Claude Opus, GPT-4, and Mistral for coding prowess, with Claude Opus often outperforming GPT-4 in areas like SQL and Rust. Users also anticipated the release of GPT-4.5/5 and its potential improvements.
    • The DiscoResearch community explored using GPT-4 and Claude3 as judges for creative writing, developing benchmarks, and comparing models like Brezn3 and Dpo on German datasets.
    • Gemini was highlighted for its impressive performance, with a YouTube video comparing it to Claude Opus and GPT-4 Turbo, noting its superior speed and lower costs.
  3. AI Ethics, Regulation, and Societal Impact:

    • Concerns were raised about censorship and restrictions creeping into AI models like the "Claude 2 self-moderated versions." Discussions touched on balancing free expression with content moderation.
    • The impact of AI on creativity and employment was debated, with some believing AI will assist rather than replace human creativity, while others anticipated job market shifts.
    • A Slashdot article highlighted U.S. government concerns about frontier AI posing an extinction-level threat, suggesting potential regulatory measures.
  4. Open-Source AI Models and Community Contributions:

    • Anticipation grew around the open-sourcing of models like Grok by @xAI, as announced by Elon Musk's tweet.
    • Cohere introduced Command-R, a new retrieval augmented model with a 128k context window and public weight release for research (blog post).
    • Community members shared projects like Prompt Mixer for building AI prompts, an open-source AI chatbot using LangChain, and tools like claudetools for function calling with Claude 3.

Claude 3 Opus (8x220B?)

ChatGPT (GPT4T)


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord Summary

Finetuning Frustrations and Triumphs: Discussions focused on finetuning Gemma created challenges with special tokens and the efficacy of model loading after finetuning, suggesting potential versioning issues and the impact of adapter precision. A recommendation included reinstalling xformers with pip install "unsloth[cu121-torch220] @ git+https://github.com/unslothai/unsloth.git" to address errors and updating Unsloth as a possible fix for OOM errors.

Unsloth Giveaways and Growth: The Unsloth community celebrated the implementation of multi-GPU support (oKatanaaa/unsloth) and the release of a new FSDP + QLoRA system by Answer.AI for training 70B models on gaming GPUs. A knowledge sharing exercise for Unsloth finetuned models on Kaggle identified key bugs and fixes, and the community also recognized contributors' support on Ko-fi.

Boosting Productivity with Unsloth AI: Ghost 7B v0.9.1 advanced in reasoning and language, ranking 3rd on VMLU's leaderboard and accessible on huggingface.co. Another significant achievement was reported by @lee0099, demonstrating Unsloth AI's optimizations resulting in a 2x speedup and 40% memory reduction during LLM fine-tuning with no loss in accuracy.

Celebrating AI Contributions and Cutting-edge Updates: The Unsloth AI community shared updates and insights, including a new 0.43.0 release of bitsandbytes for FSDP support, contributing to the existing finesse of framework operations. AI2 Incubator’s provision of $200 million in AI compute to startups was highlighted, and discussions around OpenAI’s transparency surfaced as consequential.

Welcoming Winds and Gear for Growth: New Unsloth community members were directed to essential information channels, while suggestions for Unsloth advancements involved integrating features from Llama-factory into Unsloth. The prominence of the Galore thread was acknowledged, and a GitHub project named GEAR was shared, showcasing an efficient cache compression recipe for generative inference (GEAR on GitHub).


OpenAI Discord Summary


LM Studio Discord Summary


Perplexity AI Discord Summary

Perplexity's Context Retention Struggles: Users expressed frustrations over Perplexity AI's context handling ability, with complaints about it defaulting to base knowledge responses and subsequent requests for refunds. Concerns were raised about transparency after the removal of the 32k context length from the roadmap.

Confusion Around API Token Limits: Queries on the maximum output token length for new models and the absence of the expected 32k context length feature on the roadmap sparked discussions, amidst concerns of documentation inconsistencies and how they might affect API usage and development of projects like an Alexa-like personal assistant.

New Users Navigate the Pro Plan: New Perplexity Pro users were confused about redeeming promo subscriptions and using the API conservatively to avoid depleting credits, leading to requests for clear guidance on usage tracking.

Legal, Health, and Tech Discussions on Sharing Channel: Insightful conversation threads from the sharing channel touched on Apple's legal actions against Epic, life expectancy concerns, the merits of a specific Super Bowl halftime show, Google's payments to publishers, and discussions on nootropic efficiencies recommending caffeine, L-theanine, and creatine stack.

Comparative Analysis and Learning: The community exchanged thoughts on diverse AI services, comparing Perplexity to others like Copilot Pro and ChatGPT Pro, with Perplexity drawing praise specifically for its image generation capabilities.


Nous Research AI Discord Summary


LlamaIndex Discord Summary


LAION Discord Summary


HuggingFace Discord Summary


Eleuther Discord Summary


Latent Space Discord Summary


Interconnects (Nathan Lambert) Discord Summary


OpenRouter (Alex Atallah) Discord Summary


CUDA MODE Discord Summary


LangChain AI Discord Summary


DiscoResearch Discord Summary


Alignment Lab AI Discord Summary


LLM Perf Enthusiasts AI Discord Summary


Skunkworks AI Discord Summary


Datasette - LLM (@SimonW) Discord Summary


AI Engineer Foundation Discord Summary

Mysterious Mention of InterconnectAI: A user named .zhipeng appears to have referenced a blog post from Nathan's InterconnectAI, but no specific details or context were provided.

AI Video Deep Dive Incoming: An event has been announced focusing on Gen AI Video and the 'World Model', featuring speakers such as Lijun Yu from Google and Ethan He from Nvidia, set for March 16, 2024, in San Francisco and available on Zoom. Those interested can RSVP here.


PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (368 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #welcome (4 messages):


Unsloth AI (Daniel Han) ▷ #random (19 messages🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (514 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (8 messages🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #suggestions (5 messages):

Links mentioned:

GitHub - opengear-project/GEAR: GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM: GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM - opengear-project/GEAR


OpenAI ▷ #ai-discussions (611 messages🔥🔥🔥):

Links mentioned:


OpenAI ▷ #gpt-4-discussions (78 messages🔥🔥):

Links mentioned:

OpenAI Status: no description found


OpenAI ▷ #prompt-engineering (90 messages🔥🔥):


OpenAI ▷ #api-discussions (90 messages🔥🔥):


LM Studio ▷ #💬-general (407 messages🔥🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (110 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧠-feedback (7 messages):

Links mentioned:

Models - Hugging Face: no description found


LM Studio ▷ #🎛-hardware-discussion (147 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (7 messages):


LM Studio ▷ #autogen (1 messages):


LM Studio ▷ #memgpt (3 messages):


LM Studio ▷ #amd-rocm-tech-preview (91 messages🔥🔥):

Links mentioned:


LM Studio ▷ #crew-ai (4 messages):


Perplexity AI ▷ #general (595 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (38 messages🔥):


Perplexity AI ▷ #pplx-api (10 messages🔥):


Nous Research AI ▷ #ctx-length-research (4 messages):


Nous Research AI ▷ #off-topic (39 messages🔥):

Links mentioned:


Nous Research AI ▷ #interesting-links (13 messages🔥):

Links mentioned:


Nous Research AI ▷ #general (395 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (175 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #collective-cognition (3 messages):


Nous Research AI ▷ #project-obsidian (3 messages):


Nous Research AI ▷ #bittensor-finetune-subnet (3 messages):


LlamaIndex ▷ #blog (10 messages🔥):


LlamaIndex ▷ #general (376 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #ai-discussion (4 messages):

Links mentioned:


LAION ▷ #general (302 messages🔥🔥):

Links mentioned:


LAION ▷ #research (75 messages🔥🔥):

Links mentioned:


LAION ▷ #learning-ml (2 messages):


HuggingFace ▷ #general (168 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (8 messages🔥):

Links mentioned:

wav2vec2-codebook-indices/scripts/helpers/w2v2_codebook.py at master · fauxneticien/wav2vec2-codebook-indices: Contribute to fauxneticien/wav2vec2-codebook-indices development by creating an account on GitHub.


HuggingFace ▷ #cool-finds (15 messages🔥):

Links mentioned:


HuggingFace ▷ #i-made-this (18 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (35 messages🔥):

Links mentioned:


HuggingFace ▷ #diffusion-discussions (7 messages):

Links mentioned:

wav2vec2-codebook-indices/scripts/helpers/w2v2_codebook.py at master · fauxneticien/wav2vec2-codebook-indices: Contribute to fauxneticien/wav2vec2-codebook-indices development by creating an account on GitHub.


HuggingFace ▷ #computer-vision (41 messages🔥):

Links mentioned:


HuggingFace ▷ #NLP (38 messages🔥):

Links mentioned:


HuggingFace ▷ #diffusion-discussions (7 messages):

Links mentioned:

wav2vec2-codebook-indices/scripts/helpers/w2v2_codebook.py at master · fauxneticien/wav2vec2-codebook-indices: Contribute to fauxneticien/wav2vec2-codebook-indices development by creating an account on GitHub.


Eleuther ▷ #general (101 messages🔥🔥):

Links mentioned:


Eleuther ▷ #research (75 messages🔥🔥):

Links mentioned:


Eleuther ▷ #interpretability-general (3 messages):

Links mentioned:

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team


Eleuther ▷ #lm-thunderdome (12 messages🔥):

Links mentioned:

lm-evaluation-harness/lm_eval/models/huggingface.py at main · EleutherAI/lm-evaluation-harness): A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness


Eleuther ▷ #multimodal-general (2 messages):


Eleuther ▷ #gpt-neox-dev (75 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-general-chat (39 messages🔥):

Links mentioned:


Latent Space ▷ #ai-announcements (5 messages):

Links mentioned:


Latent Space ▷ #llm-paper-club-west (30 messages🔥):

Links mentioned:


Latent Space ▷ #ai-in-action-club (162 messages🔥🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #announcements (1 messages):


Interconnects (Nathan Lambert) ▷ #news (51 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #other-papers (2 messages):

Links mentioned:

Will GPT-4 Run DOOM?: We show that GPT-4's reasoning and planning capabilities extend to the 1993 first-person shooter Doom. This large language model (LLM) is able to run and play the game with only a few instructions...


Interconnects (Nathan Lambert) ▷ #ml-questions (36 messages🔥):


Interconnects (Nathan Lambert) ▷ #ml-drama (12 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (19 messages🔥):

Links mentioned:

Mycorrhizal network - Wikipedia: no description found


Interconnects (Nathan Lambert) ▷ #memes (15 messages🔥):


Interconnects (Nathan Lambert) ▷ #rl (8 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rlhf (7 messages):

Links mentioned:

Teaching Large Language Models to Reason with Reinforcement Learning: Reinforcement Learning from Human Feedback (\textbf{RLHF}) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance...


OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

Links mentioned:

Google: Gemma 7B (nitro) by google | OpenRouter: Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks ...


OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Links mentioned:

GitHub - vatsalsaglani/claudetools: Claudetools is a Python library that enables function calling with the Claude 3 family of language models from Anthropic.: Claudetools is a Python library that enables function calling with the Claude 3 family of language models from Anthropic. - vatsalsaglani/claudetools


OpenRouter (Alex Atallah) ▷ #general (120 messages🔥🔥):

Links mentioned:


CUDA MODE ▷ #general (30 messages🔥):

Links mentioned:


CUDA MODE ▷ #triton (1 messages):

iron_bound: early tho sounds cool https://github.com/Deep-Learning-Profiling-Tools/triton-viz


CUDA MODE ▷ #cuda (21 messages🔥):

Links mentioned:

NVIDIA Magnum IO: IO Subsystem for Modern, GPU-Accelerated Data Centers


CUDA MODE ▷ #torch (5 messages):


CUDA MODE ▷ #announcements (1 messages):


CUDA MODE ▷ #algorithms (2 messages):

Links mentioned:


CUDA MODE ▷ #suggestions (1 messages):

Links mentioned:


CUDA MODE ▷ #jobs (3 messages):


CUDA MODE ▷ #beginner (9 messages🔥):


CUDA MODE ▷ #pmpp-book (12 messages🔥):


CUDA MODE ▷ #youtube-recordings (3 messages):

Links mentioned:

Lecture 9 Reductions: Slides https://docs.google.com/presentation/d/1s8lRU8xuDn-R05p1aSP6P7T5kk9VYnDOCyN5bWKeg3U/edit?usp=sharingCode https://github.com/cuda-mode/lectures/tree/ma...


CUDA MODE ▷ #ring-attention (27 messages🔥):


CUDA MODE ▷ #off-topic (5 messages):

Links mentioned:


LangChain AI ▷ #general (68 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #langserve (2 messages):


LangChain AI ▷ #share-your-work (14 messages🔥):

Links mentioned:


LangChain AI ▷ #tutorials (2 messages):

Links mentioned:


DiscoResearch ▷ #disco_judge (12 messages🔥):


DiscoResearch ▷ #general (4 messages):

Links mentioned:


DiscoResearch ▷ #benchmark_dev (3 messages):

Links mentioned:

tinyBenchmarks (tinyBenchmarks): no description found


DiscoResearch ▷ #discolm_german (14 messages🔥):

Links mentioned:


Alignment Lab AI ▷ #general-chat (7 messages):


Alignment Lab AI ▷ #oo (8 messages🔥):

Links mentioned:

GitHub - mermaid-js/mermaid: Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown: Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown - mermaid-js/mermaid


Alignment Lab AI ▷ #alignment-lab-announcements (1 messages):

Links mentioned:


Alignment Lab AI ▷ #general-chat (5 messages):


Alignment Lab AI ▷ #oo2 (5 messages):


LLM Perf Enthusiasts AI ▷ #general (1 messages):

Links mentioned:

Vercel AI SDK: Build AI-powered applications with the latest AI language models


LLM Perf Enthusiasts AI ▷ #gpt4 (1 messages):


LLM Perf Enthusiasts AI ▷ #claude (15 messages🔥):


LLM Perf Enthusiasts AI ▷ #opensource (1 messages):

res6969: https://x.com/elonmusk/status/1767108624038449405?s=20


LLM Perf Enthusiasts AI ▷ #offtopic (1 messages):


Skunkworks AI ▷ #general (1 messages):


Skunkworks AI ▷ #finetuning (1 messages):

henkdevries_starbound: math quuestions are hard


Skunkworks AI ▷ #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=H6xon8K4Ius


Datasette - LLM (@SimonW) ▷ #ai (1 messages):

dbreunig: Earhart


Datasette - LLM (@SimonW) ▷ #llm (2 messages):


AI Engineer Foundation ▷ #general (1 messages):

.zhipeng: from nathan's interconnectai blogpost right ?


AI Engineer Foundation ▷ #events (1 messages):

Links mentioned:

Gen AI Video Breakout and World Model by EntreConnect - #Sora #Genie #VideoPoet #V-JEPA #LTXStudio #AnimateDiff · Luma: Join us for a groundbreaking event that dives deep into the heart of Gen AI Video! This isn't just another tech talk; it's a journey into the future. We will also provide dial-in options, wh...