Frozen AI News archive

Claude 3 is officially America's Next Top Model

**Claude 3 Opus** outperforms **GPT4T** and **Mistral Large** in blind Elo rankings, with **Claude 3 Haiku** marking a new cost-performance frontier. Fine-tuning techniques like **QLoRA** on **Mistral 7B** and evolutionary model merging on HuggingFace models are highlighted. Public opinion shows strong opposition to ASI development. Research supervision opportunities in AI alignment are announced. The **Stable Diffusion 3 (SD3)** release raises workflow concerns for tools like **ComfyUI** and **automatic1111**. **Opus** shows a 5% performance dip on **OpenRouter** compared to the **Anthropic API**. A new benchmark stresses LLM recall at long contexts, with **Mistral 7B** struggling and **Qwen 72b** performing well.

Canonical issue URL

The blind Elo rankings for Claude 3 are in: Claude 3 Opus ($15/$75 per mtok) now slightly edges out GPT4T ($10/$30 per million tokens), and Claude 3 Sonnet ($3/$15 per mtok)/Haiku ($0.25/$1.25 per mtok) beats the worst version of GPT4 ($30/$60 per mtok) and the relatively new Mistral Large ($8/$25 per mtok).

image.png

Haiku may mark a new point on the Pareto frontier of cost vs performance:

image.png


Table of Contents

[TOC]


PART X: AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs

AI Models & Architectures

AI Ethics & Societal Impact

AI Alignment & Safety

Memes & Humor


PART 0: Summary of Summaries of Summaries


PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord


Unsloth AI (Daniel Han) Discord

Langchain Lacks Elegance: In a heated discussion, Langchain was called out for having poor code quality despite good marketing, with emphasis on avoiding dependency issues in production due to technical debt.

Emergent AI Skills Under Microscope: An article from Quantamagazine sparked debate on the growth of "breakthrough" behaviors in AI, which could have implications for AI safety and capability discussions.

Fine-tuning on the Edge: Users grappled with modifying Unsloth's fastlanguage from 4-bit to 8-bit quantization, yielding the conclusion that it's not feasible post-finetuning due to pre-quantization. Elsewhere, tips were shared for managing VRAM by altering batch size and sequence length during fine-tuning.

Showcasing Masher AI v6-7B: Someone showcased their Masher AI v6-7B model, using the OpenLLM Leaderboard for performance evaluation, with a demo available on Hugging Face.

Directives for Transformer Toolkit: Excitement was conveyed for a GitHub repository providing a toolkit to work with new heads on transformer models, potentially easing engineering tasks related to model customization.


OpenAI Discord


Nous Research AI Discord


Perplexity AI Discord


OpenInterpreter Discord

YouTube Learning: A Double-Edged Sword: There's a debate among the members about the efficacy of learning through YouTube, with some concerned about distractions and privacy, and others advocating for video tutorials despite worries of the platform's data mining practices.

Local Is the New Global for LLMs: The integration of local LLMs (like ollama, kobold, oogabooga) with Open Interpreter sparked interest, with discussions focused on the benefits of avoiding external API costs and achieving independence from services like ClosedAI.

Demand for Diverse Open Interpreter Docs: A call for varied documentation for Open Interpreter is on the rise. Proposals include a Wiki-style resource complemented by videos, and interactive "labs" or "guided setups" to cater to different learning preferences.

Growing the Open Interpreter Ecosystem: Community members are keen on extending Open Interpreter, exploring additional tools and models for applications on offline handheld devices and as research assistants. They're also sharing feedback for the project's development to improve usability and accessibility.

Technical Troubles: Issues with setting up the '01' environment in PyCharm, geographic limitations for the '01' device's pre-orders, multilingual support, system requirements, and Windows and Raspberry Pi compatibility were discussed amid reports of vibrant community collaboration and DIY case design discourses. Moreover, problems with the new Windows launcher for Ollama leaving the app unusable post-installation were highlighted without a clear solution.


HuggingFace Discord

Web Wrestling with HuggingFace's New Chat Feature: HuggingFace introduces a feature enabling chat assistants to interact with websites; a demonstration is available via a Twitter post.

Libraries Galore in Open Source Updates: Open source updates include enhancements to transformers.js, diffusers, transformers, and more. The updates are detailed by osanseviero on Twitter and further documentation can be found in the HuggingFace blog post.

Community Efforts in Model Implementation: Efforts to convert the GLiNER model from Pytorch to Rust using the Candle library were discussed, with insights regarding the performance advantages of Rust implementations and GPU acceleration with the Candle library.

Bonanza of Bot and Library Creations: The Command-R chatbot by Cohere was put on display for community contributions on the HuggingFace Spaces. Meanwhile, the new Python library loadimg, for loading various image types, is available on GitHub.

Focus on Fusing Image and Text: The BLIP-2 documentation on HuggingFace was highlighted for its potential in bridging visual and linguistic modalities. Discussions also centered around preprocessing normalization for medical images, referencing nnUNet's strategy.

Innovations in NLP and AI Efficiency: A member delved into model compression with the Mistral-7B-v0.1-half-naive-A model and its impact on performance. The possibility of summarizing gaming leaderboards with multi-shot inferences and fine-tuning was also brainstormed.

Diverse Discourses in Diffusion: Inquiry into the structure of regularization images for training diffusion models sought advice on creating an effective regularization set, with a focus on image quality, variety, and the use of negative prompts.


LM Studio Discord


LAION Discord


LlamaIndex Discord


Eleuther Discord

AMD Driver Dilemma Sparks Debate: Technical discussions reveal concerns over AMD's Radeon driver strategy, suggesting that poor performance could hinder confidence in the multi-million dollar ML infrastructure sector. An idea to open-source AMD drivers was discussed as a strategy to compete with Nvidia’s dominance.

Seeds of Change for Weight Storage: A new approach has been proposed where model weights are stored as a seed plus delta, potentially increasing precision and negating the need for mixed precision training. The conceptual shift towards "L2-SP," or weight decay towards pretrained weights instead of zero, was also a hot topic with references to L2-SP research on arXiv.

Chess-GPT Moves onto the Board: The Chess-GPT model, capable of playing at an approximate 1500 Elo rating, was introduced along with discussions about its ability to predict chess moves and assess players' skill levels. The community also explored limitations of N-Gram models and Kubernetes version compatibility issues for scaling tokengrams; GCP was mentioned as a solution for high-resource computing needs.

Retrieval Research and Tokenization Tricks: Participants requested advice on optimizing retrieval pipeline quality, mentioning tools such as Evals and RAGAS. Tokenizers' influence on model performance has also sparked discussion, with links to studies like MaLA-500 on arXiv and on Japanese tokenizers.

Harnessing lm-eval with Inverse Scaling: Focus was on integrating inverse scaling into the lm-evaluation-harness, as detailed in this implementation. Questions were also raised about BBQ Lite scoring methodology, and the harness itself was lauded for its functionality.


Latent Space Discord


OpenRouter (Alex Atallah) Discord


OpenAccess AI Collective (axolotl) Discord


LangChain AI Discord


CUDA MODE Discord


Datasette - LLM (@SimonW) Discord


Interconnects (Nathan Lambert) Discord


DiscoResearch Discord


LLM Perf Enthusiasts AI Discord


PART 2: Detailed by-Channel summaries and links

Stability.ai (Stable Diffusion) ▷ #general-chat (1071 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (485 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) â–· #random (2 messages):

Link mentioned: GitHub - center-for-humans-and-machines/transformer-heads: Toolkit for attaching, training, saving and loading of new heads for transformer models: Toolkit for attaching, training, saving and loading of new heads for transformer models - center-for-humans-and-machines/transformer-heads


Unsloth AI (Daniel Han) ▷ #help (102 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) â–· #showcase (6 messages):

Link mentioned: mahiatlinux/MasherAI-v6-7B · Hugging Face: no description found


Unsloth AI (Daniel Han) ▷ #suggestions (48 messages🔥):

Links mentioned:


OpenAI â–· #annnouncements (1 messages):

Link mentioned: Sora: First Impressions: We have gained valuable feedback from the creative community, helping us to improve our model.


OpenAI ▷ #ai-discussions (375 messages🔥🔥):

Link mentioned: Sora: First Impressions: We have gained valuable feedback from the creative community, helping us to improve our model.


OpenAI ▷ #gpt-4-discussions (22 messages🔥):


OpenAI ▷ #prompt-engineering (59 messages🔥🔥):


OpenAI ▷ #api-discussions (59 messages🔥🔥):


Nous Research AI â–· #ctx-length-research (4 messages):

Links mentioned:


Nous Research AI ▷ #off-topic (16 messages🔥):

Link mentioned: Voice Chat with Deepgram & Mistral AI: We make a voice chat with deepgram and mistral aihttps://github.com/githubpradeep/notebooks/blob/main/deepgram.ipynb#python #pythonprogramming #llm #ml #ai #...


Nous Research AI ▷ #interesting-links (9 messages🔥):

Link mentioned: Tweet from Hamel Husain (@HamelHusain): There are a growing number of voices expressing disillusionment with fine-tuning. I'm curious about the sentiment more generally. (I am withholding sharing my opinion rn). Tweets below are f...


Nous Research AI â–· #announcements (2 messages):


Nous Research AI ▷ #general (225 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (20 messages🔥):

Links mentioned:


Nous Research AI â–· #project-obsidian (2 messages):


Nous Research AI ▷ #rag-dataset (19 messages🔥):

Links mentioned:


Nous Research AI ▷ #world-sim (168 messages🔥🔥):

Links mentioned:


Perplexity AI ▷ #general (430 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (19 messages🔥):


Perplexity AI ▷ #pplx-api (10 messages🔥):

Links mentioned:


OpenInterpreter ▷ #general (167 messages🔥🔥):

Links mentioned:


OpenInterpreter ▷ #O1 (110 messages🔥🔥):

Links mentioned:


OpenInterpreter â–· #ai-content (3 messages):


HuggingFace â–· #announcements (5 messages):

Links mentioned:


HuggingFace ▷ #general (131 messages🔥🔥):

Links mentioned:


HuggingFace â–· #today-im-learning (7 messages):


HuggingFace â–· #cool-finds (2 messages):


HuggingFace ▷ #i-made-this (14 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (12 messages🔥):

Link mentioned: Hugging Face Reading Group 16: HyperZâ‹…Zâ‹…W Operator Terminator: Presenter: Harvie Zhang who is also the author of this work. For this meeting unfortunately there was a bit of moderation issue


HuggingFace â–· #core-announcements (1 messages):

Link mentioned: feat: support DoRA LoRA from community by sayakpaul · Pull Request #7371 · huggingface/diffusers: What does this PR do? Fixes: #7366. Fixes: #7422. @SlZeroth I tested the PR with the code below: from diffusers import DiffusionPipeline import torch pipe = DiffusionPipeline.from_pretrained( ...


HuggingFace ▷ #computer-vision (21 messages🔥):

Links mentioned:


HuggingFace ▷ #NLP (22 messages🔥):

Links mentioned:


HuggingFace â–· #diffusion-discussions (2 messages):


LM Studio ▷ #💬-general (139 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (24 messages🔥):


LM Studio â–· #đź§ -feedback (3 messages):


LM Studio ▷ #🎛-hardware-discussion (22 messages🔥):


LM Studio ▷ #🧪-beta-releases-chat (10 messages🔥):


LM Studio â–· #amd-rocm-tech-preview (1 messages):


LM Studio â–· #crew-ai (3 messages):


LM Studio â–· #open-interpreter (2 messages):

Link mentioned: bug: markdown disabled or not supported. · Issue #1124 · OpenInterpreter/open-interpreter: Describe the bug When prompting a local model, https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF, using LM Studio, I kept getting what should have been valid python output, but the code bl...


LAION ▷ #general (82 messages🔥🔥):

Links mentioned:


LAION ▷ #research (109 messages🔥🔥):

Links mentioned:


LlamaIndex â–· #blog (5 messages):

Link mentioned: LLM Meetup with Predibase, LlamaIndex, Guardrails and Tryolabs | San Francisco · Luma: LLMOps: From Prototype To Production | Developer Meetup Join Predibase, LlamaIndex, Guardrails AI, and Tryolabs for an evening of food, drinks, and discussions on all things LLMOps while...


LlamaIndex ▷ #general (153 messages🔥🔥):

Links mentioned:


LlamaIndex â–· #ai-discussion (1 messages):


Eleuther ▷ #general (23 messages🔥):

Link mentioned: GitHub - openai/evals: Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.: Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks. - openai/evals


Eleuther ▷ #research (57 messages🔥🔥):

Links mentioned:


Eleuther ▷ #interpretability-general (36 messages🔥):

Links mentioned:


Eleuther ▷ #lm-thunderdome (12 messages🔥):

Links mentioned:


Eleuther â–· #multimodal-general (2 messages):


Latent Space ▷ #ai-general-chat (82 messages🔥🔥):

Links mentioned:


Latent Space â–· #ai-announcements (4 messages):

Link mentioned: Tweet from swyx (@swyx): 🆕 The Unbundling of ChatGPT https://latent.space/p/feb-2024 A whole year has passed with ~0 growth in ChatGPT user numbers. Instead, users are exploring a whole host of verticalized players for ...


Latent Space â–· #llm-paper-club-west (4 messages):


OpenRouter (Alex Atallah) ▷ #general (70 messages🔥🔥):


OpenAccess AI Collective (axolotl) ▷ #general (37 messages🔥):

Links mentioned:

Some highli📝ghts:

  1. FSDP+QLoRA and DeepSpeed…": no description foundFully Sharded Data Parallel: no description foundDeepSpeed: no description foundDeepSpeed: no description found

OpenAccess AI Collective (axolotl) â–· #axolotl-dev (5 messages):


OpenAccess AI Collective (axolotl) â–· #general-help (6 messages):

Links mentioned:


OpenAccess AI Collective (axolotl) â–· #community-showcase (7 messages):

Link mentioned: Introducing Olier – an Integral Yoga AI initiative – La Grace: no description found


LangChain AI ▷ #general (42 messages🔥):

Links mentioned:


LangChain AI â–· #share-your-work (3 messages):

Link mentioned: What is Index Network | Index Network Documentation: no description found


LangChain AI â–· #tutorials (3 messages):

Links mentioned:


CUDA MODE â–· #general (1 messages):


CUDA MODE ▷ #triton (8 messages🔥):

Link mentioned: GitHub - pytorch-labs/ao: torchao: PyTorch Architecture Optimization (AO). A repository to host AO techniques and performant kernels that work with PyTorch.: torchao: PyTorch Architecture Optimization (AO). A repository to host AO techniques and performant kernels that work with PyTorch. - pytorch-labs/ao


CUDA MODE â–· #cuda (2 messages):

Links mentioned:


CUDA MODE â–· #beginner (5 messages):

Link mentioned: Issues · pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration - Issues · pytorch/pytorch


CUDA MODE â–· #pmpp-book (6 messages):


CUDA MODE â–· #youtube-recordings (2 messages):

Link mentioned: Lecture 11: Sparsity: Speaker: Jesse Cai


CUDA MODE â–· #torchao (1 messages):

marksaroufim: new RFC https://github.com/pytorch-labs/ao/issues/86


CUDA MODE â–· #ring-attention (6 messages):

Links mentioned:


CUDA MODE â–· #off-topic (5 messages):


CUDA MODE â–· #gtc-meetup (1 messages):

vim410: oops. i missed this! i was at GTC and now i am back to middle of nowhere


CUDA MODE â–· #triton-puzzles (3 messages):


Datasette - LLM (@SimonW) ▷ #llm (25 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) â–· #rlhf (2 messages):


Interconnects (Nathan Lambert) ▷ #reads (21 messages🔥):

Links mentioned:


DiscoResearch â–· #general (3 messages):


DiscoResearch â–· #embedding_dev (2 messages):


DiscoResearch ▷ #discolm_german (11 messages🔥):


LLM Perf Enthusiasts AI â–· #claude (2 messages):


Skunkworks AI â–· #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=Kan7GofHSwg