Frozen AI News archive

Not much happened today

**Anthropic** released **Claude 3**, replacing Claude 2.1 as the default on Perplexity AI, with **Claude 3 Opus** surpassing **GPT-4** in capability. Debate continues on whether Claude 3's performance stems from emergent properties or pattern matching. **LangChain** and **LlamaIndex** added support for Claude 3 enabling multimodal and tool-augmented applications. Despite progress, current models still face challenges in out-of-distribution reasoning and robustness. **Cohere** partnered with **Accenture** for enterprise AI search, while **Mistral AI** and **Snowflake** collaborate to provide LLMs on Snowflake's platform. **Together AI Research** integrates **Deepspeed** innovations to accelerate generative AI infrastructure. **Hugging Face** and the **European Space Agency** released a large earth observation dataset, and **Google** open sourced **Gemma 2B**, optimized for smartphones via the MLC-LLM project. **GPT4All** improved model discoverability for open models. The AI community balances excitement over new models with concerns about limitations and robustness, alongside growing enterprise adoption and open-source contributions. Memes and humor continue to provide social commentary.

Canonical issue URL

No big news or releases today. Perplexity is rumored to be the latest AI unicorn, Yi Tay's post on the hard parts of training LLMs outside Google got picked up on Twitter and HN, and we released Soumith's episode on the Latent Space pod.


Table of Contents

[TOC]


PART X: AI Twitter Recap

only one Claude Opus run today as we are currently retooling our pipelines for more functionality and we didnt get it viable in time for today. Sorry!

Anthropic Claude 3 Release

AI Progress and Limitations

Enterprise AI Adoption

Open Source Datasets and Models

Memes and Humor

In summary, the AI community is abuzz with the release of powerful new models like Anthropic's Claude 3, while also grappling with the limitations and robustness challenges of current approaches. Enterprises are rapidly adopting AI technologies through partnerships with leading AI and cloud vendors. Meanwhile, open source datasets and models continue to grow and democratize access to cutting-edge AI. Throughout it all, humor and memes provide levity and social commentary on the fast-moving AI landscape.


PART 0: Summary of Summaries of Summaries

Claude 3 Sonnet (14B?)

  1. Exploring AI Model Capabilities and Comparisons:

    • Claude 3 is generating excitement for its reported superior performance across various cognitive tasks, surpassing GPT-4 according to some users. Discussions revolve around its capabilities in coding, function calling, and self-moderation in group chats, as showcased in a Twitter story.
    • Opus, a model variant, is praised for its coding prowess, particularly in function calling. It achieved an impressive 800 score on the SAT Reading section, sparking conversations about avoiding memorization in large models.
    • Skepticism arises regarding the reliability of published benchmarks in capturing the full potential of newer models like GPT-4.
  2. Advancements in Multimodal and Retrieval-Augmented Models:

    • The release of Stable Diffusion 3 and its fusion of diffusion and transformer models is discussed, highlighting progress in multimodal approaches.
    • An arXiv paper suggests retrieval-augmented language models could be a promising alternative to parametric LMs, though research in this area is still developing.
    • The introduction of InfiMM-HD by @_akhaliq claims significant advancements in high-resolution multimodal understanding, potentially outperforming CogVLM and leveraging Vicuna 13B. (Tweet)
  3. Techniques for Efficient Model Serving and Inference:

    • A Fireworks AI blog post discusses FireAttention, a quantization method for serving open-source models up to 4x faster than vLLM with minimal trade-offs.
    • The Aphrodite Engine by PygmalionAI is humorously attributed to the "Waifu-Driven Performance Theory," showcasing community-driven research efforts for performance gains.
    • Discussions explore speculative decoding on GPUs to improve performance when memory is the bottleneck, and the inefficiency of generic masking in compute, leading to a PyTorch pull request for sliding window attention bias.
  4. Advancements in Hardware and Quantization:

    • Details emerge about the NVIDIA H100 GPU, with its L2 cache boasting a 5.5 TB/s read bandwidth, and speculations that its overall bandwidth could match the impressive 40TB/s L1 bandwidth of the RTX 4090.
    • The bitsandbytes package is recommended for k-bit quantization in PyTorch, enabling low-precision linear algebra operations on GPUs with potential 5700x speedup in int8 versus bf16 matrix multiplication.

Claude 3 Opus (8x220B?)

ChatGPT (GPT4T)


PART 1: High level Discord summaries

TheBloke Discord Summary


Mistral Discord Summary

Augmentoolkit Gains Traction: Engineers discussed a tool called Augmentoolkit, which enables datasets to be converted for instruct-tuning, vital for those considering switching from factual corpus data to multiturn interactions.

Mistral Model Token Boundaries and Hardware Talk: A debate unfolded over the ideal token length for Mistral models, with the sweet spot reported to be between 8k-10k tokens. Separately, a correction was made regarding VRAM requirements, stating that the RTX 4090, not the 3090, carries 24 GB VRAM, a crucial distinction for modelers considering hardware purchases.

Mistral Finetuning Frustrations and Fixes: Users shared struggle stories and success strategies around finetuning Mistral models, with one user encountering challenges in converting lora_fused_model to fp16.gguf as discussed in this GitHub issue. Some advocated that finetuning Mistral 7B may be more efficiently done full-model rather than via LoRA, as advised in this guide, a potential blueprint for those trekking through the finetuning forest.

Community Questioning Mistral's Commitment and Pricing: The Mistral community voiced concerns over the platform's commitment to open models and the pricing structure, especially in comparison to OpenAI's GPT-4 Turbo and the 20% higher cost of Mistral Large models.

Model Properties, Downloads, and Legal Provisos in Focus: The currently available models for download are Mistral 7B and 8x7b, with larger models to be announced. Meanwhile, dialogue on the legal implications of using AI models without clear licensing brought up potential risks, with suggestions concerning hidden watermarks as identifiers for illicit use.

Technical Tripping Points in Mistral Usage: From API error handling related to assigning null to max_tokens in the JSON body, to the challenges with JSON table parsing in API calls and setting up webhooks, engineers exchanged both issues and solutions. Moreover, the accuracy of responses, especially in multilingual contexts and mathematical calculations, raised concerns about variability and prompted discussions on improving reliability.


Perplexity AI Discord Summary


Nous Research AI Discord Summary


OpenAI Discord Summary


HuggingFace Discord Summary


LlamaIndex Discord Summary


Latent Space Discord Summary

Links mentioned:


Eleuther Discord Summary


LM Studio Discord Summary


LAION Discord Summary


OpenAccess AI Collective (axolotl) Discord Summary

Mix and Merge: Model Integration Techniques Explored:

Claude-3 Ethical Safeguards Scrutiny:

A Gearhead's Guide to AI Hardware:

Fine-Tuning Deep Dive and Data Enrichment Strategies:

Towards Better Model Parameter Efficiency:


OpenRouter (Alex Atallah) Discord Summary


LangChain AI Discord Summary


CUDA MODE Discord Summary


LLM Perf Enthusiasts AI Discord Summary


Datasette - LLM (@SimonW) Discord Summary


DiscoResearch Discord Summary


Interconnects (Nathan Lambert) Discord Summary


PART 2: Detailed by-Channel summaries and links

TheBloke ā–· #general (967 messagesšŸ”„šŸ”„šŸ”„):

Links mentioned:


TheBloke ā–· #characters-roleplay-stories (115 messagesšŸ”„šŸ”„):

Links mentioned:


TheBloke ā–· #model-merging (1 messages):

pablo.ce: https://huggingface.co/pabloce/Dolphin-2.8-slerp


TheBloke ā–· #coding (8 messagesšŸ”„):


Mistral ā–· #general (475 messagesšŸ”„šŸ”„šŸ”„):

Links mentioned:


Mistral ā–· #models (3 messages):


Mistral ā–· #deployment (2 messages):


Mistral ā–· #finetuning (40 messagesšŸ”„):

Links mentioned:


Mistral ā–· #announcements (1 messages):

sophiamyang: https://twitter.com/MistralAILabs/status/1765434559993123184


Mistral ā–· #showcase (7 messages):

Links mentioned:


Mistral ā–· #random (35 messagesšŸ”„):

Links mentioned:


Mistral ā–· #la-plateforme (33 messagesšŸ”„):

Links mentioned:

no title found: no description found


Mistral ā–· #office-hour (400 messagesšŸ”„šŸ”„):

Links mentioned:


Mistral ā–· #le-chat (114 messagesšŸ”„šŸ”„):

Links mentioned:

GitHub - huggingface/chat-ui: Open source codebase powering the HuggingChat app: Open source codebase powering the HuggingChat app. Contribute to huggingface/chat-ui development by creating an account on GitHub.


Mistral ā–· #failed-prompts (11 messagesšŸ”„):


Perplexity AI ā–· #announcements (2 messages):

Links mentioned:

Nothing Perplexity: Here at Nothing, we’re building a world where tech is fun again. Remember a time where every new product made you excited? We’re bringing that back.


Perplexity AI ā–· #general (755 messagesšŸ”„šŸ”„šŸ”„):

Links mentioned:


Perplexity AI ā–· #sharing (24 messagesšŸ”„):


Perplexity AI ā–· #pplx-api (29 messagesšŸ”„):

Links mentioned:

pplx-api: no description found


Nous Research AI ā–· #off-topic (11 messagesšŸ”„):

Links mentioned:


Nous Research AI ā–· #interesting-links (22 messagesšŸ”„):

Links mentioned:


Nous Research AI ā–· #general (327 messagesšŸ”„šŸ”„):

Links mentioned:


Nous Research AI ā–· #ask-about-llms (47 messagesšŸ”„):


Nous Research AI ā–· #project-obsidian (2 messages):


OpenAI ā–· #ai-discussions (158 messagesšŸ”„šŸ”„):

Links mentioned:

EvalPlus Leaderboard: no description found


OpenAI ā–· #gpt-4-discussions (24 messagesšŸ”„):

Links mentioned:


OpenAI ā–· #prompt-engineering (24 messagesšŸ”„):


OpenAI ā–· #api-discussions (24 messagesšŸ”„):


HuggingFace ā–· #announcements (1 messages):

Links mentioned:


HuggingFace ā–· #general (132 messagesšŸ”„šŸ”„):

Links mentioned:


HuggingFace ā–· #today-im-learning (7 messages):


HuggingFace ā–· #cool-finds (12 messagesšŸ”„):

Links mentioned:


HuggingFace ā–· #i-made-this (21 messagesšŸ”„):

Links mentioned:


HuggingFace ā–· #reading-group (13 messagesšŸ”„):

Links mentioned:


HuggingFace ā–· #diffusion-discussions (5 messages):

Links mentioned:


HuggingFace ā–· #computer-vision (6 messages):


HuggingFace ā–· #NLP (26 messagesšŸ”„):

Links mentioned:

Release v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer, AQLM Ā· huggingface/transformers: New model additions šŸ’Ž Gemma šŸ’Ž Gemma is a new opensource Language Model series from Google AI that comes with a 2B and 7B variant. The release comes with the pre-trained and instruction fine-tuned v....


HuggingFace ā–· #diffusion-discussions (5 messages):

Links mentioned:


HuggingFace ā–· #gradio-announcements (1 messages):

Links mentioned:

Gradio DownloadButton Docs: no description found


LlamaIndex ā–· #announcements (1 messages):

Links mentioned:

LlamaIndex Webinar: Tree-Structured Indexing and Retrieval with RAPTOR Ā· Zoom Ā· Luma: RAPTOR is a recent paper that introduces a new tree-structured technique, which hierarchically clusters/summarizes chunks into a tree structure containing both high-level and...


LlamaIndex ā–· #blog (6 messages):

Links mentioned:


LlamaIndex ā–· #general (200 messagesšŸ”„šŸ”„):

Links mentioned:


LlamaIndex ā–· #ai-discussion (1 messages):

Links mentioned:

GitHub - mominabbass/LinC: Code for "Enhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibration": Code for "Enhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibration" - mominabbass/LinC


Latent Space ā–· #ai-general-chat (69 messagesšŸ”„šŸ”„):

Links mentioned:


Latent Space ā–· #ai-announcements (4 messages):


Latent Space ā–· #llm-paper-club-west (82 messagesšŸ”„šŸ”„):

Links mentioned:


Eleuther ā–· #general (85 messagesšŸ”„šŸ”„):

Links mentioned:


Eleuther ā–· #research (41 messagesšŸ”„):

Links mentioned:


Eleuther ā–· #lm-thunderdome (17 messagesšŸ”„):

Links mentioned:


Eleuther ā–· #multimodal-general (1 messages):


Eleuther ā–· #gpt-neox-dev (10 messagesšŸ”„):

Links mentioned:


LM Studio ā–· #šŸ’¬-general (126 messagesšŸ”„šŸ”„):

Links mentioned:


LM Studio ā–· #šŸ¤–-models-discussion-chat (7 messages):

Links mentioned:

Reddit - Dive into anything: no description found


LM Studio ā–· #šŸŽ›-hardware-discussion (17 messagesšŸ”„):

Links mentioned:

Imgur: The magic of the Internet: no description found


LM Studio ā–· #open-interpreter (2 messages):


LAION ā–· #general (142 messagesšŸ”„šŸ”„):

Links mentioned:


LAION ā–· #research (4 messages):


OpenAccess AI Collective (axolotl) ā–· #general (53 messagesšŸ”„):

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #axolotl-dev (16 messagesšŸ”„):

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #general-help (12 messagesšŸ”„):

Links mentioned:

axolotl/deepspeed_configs/zero3_bf16.json at main Ā· OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.


OpenRouter (Alex Atallah) ā–· #announcements (1 messages):


OpenRouter (Alex Atallah) ā–· #general (78 messagesšŸ”„šŸ”„):

Links mentioned:


LangChain AI ā–· #general (61 messagesšŸ”„šŸ”„):

Links mentioned:


LangChain AI ā–· #langserve (1 messages):


LangChain AI ā–· #share-your-work (6 messages):

Links mentioned:


LangChain AI ā–· #tutorials (1 messages):

pradeep1148: https://www.youtube.com/watch?v=QPZpOBxUd1U


CUDA MODE ā–· #cuda (8 messagesšŸ”„):

Links mentioned:


CUDA MODE ā–· #torch (9 messagesšŸ”„):

Links mentioned:

GitHub - TimDettmers/bitsandbytes: Accessible large language models via k-bit quantization for PyTorch.: Accessible large language models via k-bit quantization for PyTorch. - TimDettmers/bitsandbytes


CUDA MODE ā–· #algorithms (3 messages):

Links mentioned:

Add sliding window attention bias by drisspg Ā· Pull Request #120143 Ā· pytorch/pytorch: Summary This PR adds a new attnetion-bias torch_function designed to interact with SDPA. This implements sliding window and updates "aten.sdpa_flash" to expose the window_size_left and wind...


CUDA MODE ā–· #jobs (1 messages):

bowtiedlark: Remote?


CUDA MODE ā–· #beginner (2 messages):


CUDA MODE ā–· #ring-attention (28 messagesšŸ”„):

Links mentioned:


LLM Perf Enthusiasts AI ā–· #claude (7 messages):


LLM Perf Enthusiasts AI ā–· #prompting (2 messages):


Datasette - LLM (@SimonW) ā–· #ai (8 messagesšŸ”„):

Links mentioned:


Datasette - LLM (@SimonW) ā–· #llm (1 messages):


DiscoResearch ā–· #general (9 messagesšŸ”„):

Links mentioned:

Reliable, Adaptable, and Attributable Language Models with Retrieval: Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability. However, they still face practical challenges such as hallucinations, di...


Alignment Lab AI ā–· #general-chat (1 messages):


Alignment Lab AI ā–· #oo2 (3 messages):


Skunkworks AI ā–· #off-topic (2 messages):

Links mentioned:


Interconnects (Nathan Lambert) ā–· #ideas-and-feedback (1 messages):

Links mentioned:

Intel's Humbling | Stratechery by Ben Thompson: Read the Article: https://stratechery.com/2024/intels-humbling/Links: Stratechery: https://stratechery.comSign up for Stratechery Plus: https://stratechery.c...


Interconnects (Nathan Lambert) ā–· #reads (1 messages):

Links mentioned:

Things I Don't Know About AI: The more I learn about AI markets, the less I think I know. I list questions and some thoughts.