Frozen AI News archive

Stable Diffusion 3 — Rombach & Esser did it again!

**Over 2500 new community members joined following Soumith Chintala's shoutout, highlighting growing interest in SOTA LLM-based summarization. The major highlight is the detailed paper release of **Stable Diffusion 3 (SD3)**, showcasing advanced text-in-image control and complex prompt handling, with the model outperforming other SOTA image generation models in human-evaluated benchmarks. The SD3 model is based on an enhanced Diffusion Transformer architecture called **MMDiT**. Meanwhile, **Anthropic** released **Claude 3** models, noted for human-like responses and emotional depth, scoring 79.88% on HumanEval but costing over twice as much as GPT-4. Microsoft launched new Orca-based models and datasets, and Latitude released **DolphinCoder-StarCoder2-15b** with strong coding capabilities. Integration of image models by **Perplexity AI** and 3D CAD generation by **PolySpectra** powered by **LlamaIndex** were also highlighted. *"SD3's win rate beats all other SOTA image gen models (except perhaps Ideogram)"* and *"Claude 3 models are very good at generating d3 visualizations from text descriptions."*

Canonical issue URL

Warm welcome to the >2500 people who joined from Soumith's shoutout last night! Its kinda like having a crowd of visitors over when the house isn't clean yet - we're still very much building the plane while we jump off a cliff. But we're increasingly happy with our prompts, pipeline, and exploration of what useful, SOTA LLM-based summarization can and should do.

Lots of people are still processing Claude 3 but we're moving on. Today's big news is the Stable Diffusion 3 paper. SD3 was announced (not released) a few days ago, but the paper provides much more detail.

Obligatory images because really who reads the text I'm writing here when you can see pretty pictures:

We are more impressed with the incredible level of text-in-image control and handling of complex prompts (see the progress over the last 2 years):

Paper highlights here but in short they have modified Bill Peebles' Diffusion Transformer (yes the one used in Sora) to be even more multimodal, hence "MMDiT":

image.png

DiT variants have been the subject of intense research this year, eg for Hourglass Diffusion and Emo.

Stability's messaging around its benchmarks has been all over the place recently (e.g. for SD2 and SDXL and , making it unclear whether the main benefit is image quality or open source customizability or something else, but SD3 is pretty unambiguous - when evaluated on Partiprompts questions via REAL HUMANS ($$$), SD3's win rate beats all other SOTA image gen models (except perhaps Ideogram).

image.png

It's currently unclear whether the 8B SD3 model will ever be released beyond Stability's API wall. But surely a new SOTA model, from the people that launched the new imagegen summer, is to be celebrated regardless.


Table of Contents

We are experimenting with removing Table of Contents as many people reported it wasn't as helpful as hoped. Let us know if you miss the TOCs, or they'll be gone permanently.


PART X: AI Twitter Recap

Claude 3 Sonnet (14B?)

Anthropic Claude 3 Release

AI Model Releases & Datasets

AI Capabilities & Use Cases

AI Development & Evaluation

Memes & Humor

In summary, the release of Anthropic's Claude 3 models has generated significant discussion, with comparisons being made to GPT-4 in terms of performance, cost, and capabilities. Claude 3 demonstrates strong language understanding and generation, but lags behind GPT-4 on some coding tests.

Alongside the Claude 3 release, there have been other notable AI model and dataset releases from Microsoft, Stability AI, Latitude, and others. These span a range of applications including coding, 3D model generation, and image-to-text.

Researchers continue to advance techniques for fine-tuning and evaluating large language models, such as using fine-grained RLHF and being cautious with metrics like validation loss. There are also observations about the reasoning capabilities and potential self-awareness of LLMs.

Amidst the technical discussions, there is still room for humor, as evidenced by jokes and memes shared alongside the AI news and analysis. Overall, the tweets paint a picture of an AI field that is rapidly advancing in terms of model scale and capabilities, but also grappling with important questions around evaluation, safety, and potential impacts.

ChatGPT (GPT4T)

AI Humor & Memes

This summary illuminates the multifaceted discussions within the AI tech community, from deep dives into model performance and its real-world applicability to societal reflections observed through technological lenses. The emphasis on Claude 3's capabilities versus GPT-4, alongside methodological considerations in AI model development and deployment, underscores the ongoing efforts toward more nuanced, human-like AI. Furthermore, the exploration of Korea's cultural and economic landscapes through the tech lens highlights the complex interplay between societal structures and technological development, offering invaluable insights for tech professionals navigating global AI applications.


PART 0: Summary of Summaries of Summaries

Operator notes: Prompt we use for Claude, and our summarizer GPT used for ChatGPT. What is shown is subjective best of 3 runs each.

Claude 3 Sonnet (14B?)

Interestingly Sonnet failed to understand the task the 2nd time we ran it (not understanding that we want it to summarize across ALL the summaries and raw text - which today total 20k words).

Claude 3 Opus (8x220B?)

ChatGPT (GPT4T)


PART 1: High level Discord summaries

TheBloke Discord Summary


Mistral Discord Summary


Perplexity AI Discord Summary


OpenAI Discord Summary


Nous Research AI Discord Summary


LAION Discord Summary


HuggingFace Discord Summary

AI Breakthroughs and Hiccups: Discussions spanned the performance of Hermes 2.5 over Hermes 2 and the limitations of expanding Mistral beyond 8k. There's also a focus on calculating gradients in novel ways and the repeated request for assistance with dataset creation without yet finding a resolution.

Diffusion Model Guidance on HuggingFace: Members discussed a potential NSFW model, AstraliteHeart/pony-diffusion-v6, on HuggingFace, with suggestions to tag it appropriately or report it. Additionally, guidance was provided for image prompting in diffusion models, directing users to a IP-Adapter tutorial.

CV and NLP Cross-Talk: The community engaged in topics ranging from the introduction of the Terminator network and its integration of past technologies to the quest for the SOTA in bidirectional NLP language models, touching on options like Deberta V3 and the monarch-mixer. Problems shared included difficulties with enhancing Mistral with GBNF grammar, variable inference times with Mistral and BLOOM models, and implementing Mistral in Windows apps.

Kubeflow Gets a Terraform Boost: In the realm of tools and platforms, Kubeflow can now be deployed using a terraform module, effectively transforming Kubernetes clusters into AI-ready environments. Moreover, MEGA's performance on short-context GLUE benchmarks and Gemma Model's speed boost using Unsloth were also introduced, showcasing various community-driven advancements.

Video-Related Innovations and Problems: The release of Pika operates as an indicator of the growing trend in text-to-video generation. Contrastingly, a user experienced visual issues with a Gradio-embedded OpenAI API chatbot, looking for assistance to fix the layout.

Reading Group Revival: Concern was expressed over the scheduling of reading group sessions, debating the merits between Discord and Zoom for hosting. There is also mention of recordings available on Isamu Isozaki's YouTube profile for those unable to attend live sessions.


OpenRouter (Alex Atallah) Discord Summary


LlamaIndex Discord Summary


CUDA MODE Discord Summary


LLM Perf Enthusiasts AI Discord Summary


Interconnects (Nathan Lambert) Discord Summary


LangChain AI Discord Summary


Latent Space Discord Summary


OpenAccess AI Collective (axolotl) Discord Summary


Datasette - LLM (@SimonW) Discord Summary


DiscoResearch Discord Summary


Skunkworks AI Discord Summary

No relevant technical discussions or important topics to summarize were provided in the given messages.


Alignment Lab AI Discord Summary


AI Engineer Foundation Discord Summary


PART 2: Detailed by-Channel summaries and links

TheBloke ▷ #general (1100 messages🔥🔥🔥):

Links mentioned:


TheBloke ▷ #characters-roleplay-stories (73 messages🔥🔥):

Links mentioned:


TheBloke ▷ #coding (9 messages🔥):

Links mentioned:


Mistral ▷ #general (412 messages🔥🔥🔥):

Links mentioned:


Mistral ▷ #models (75 messages🔥🔥):

Links mentioned:

LLM Visualization: no description found


Mistral ▷ #ref-implem (1 messages):


Mistral ▷ #showcase (7 messages):

Links mentioned:


Mistral ▷ #random (24 messages🔥):

Links mentioned:


Mistral ▷ #la-plateforme (32 messages🔥):

Links mentioned:


Mistral ▷ #office-hour (387 messages🔥🔥):

Links mentioned:


Mistral ▷ #le-chat (151 messages🔥🔥):

Links mentioned:

Client code | Mistral AI Large Language Models: We provide client codes in both Python and Javascript.


Mistral ▷ #failed-prompts (12 messages🔥):


Perplexity AI ▷ #announcements (2 messages):

Links mentioned:

no title found: no description found


Perplexity AI ▷ #general (831 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (18 messages🔥):


Perplexity AI ▷ #pplx-api (18 messages🔥):


OpenAI ▷ #ai-discussions (314 messages🔥🔥):

Links mentioned:


OpenAI ▷ #gpt-4-discussions (11 messages🔥):


OpenAI ▷ #prompt-engineering (60 messages🔥🔥):


OpenAI ▷ #api-discussions (60 messages🔥🔥):


Nous Research AI ▷ #off-topic (6 messages):

Links mentioned:

Introducing Claude 3 LLM which surpasses GPT-4: Today, we're look at the Claude 3 model family, which sets new industry benchmarks across a wide range of cognitive tasks. The family includes three state-of...


Nous Research AI ▷ #interesting-links (42 messages🔥):

Links mentioned:


Nous Research AI ▷ #general (227 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (14 messages🔥):


Nous Research AI ▷ #project-obsidian (4 messages):

Links mentioned:

GitHub - vikhyat/moondream: tiny vision language model: tiny vision language model. Contribute to vikhyat/moondream development by creating an account on GitHub.


Nous Research AI ▷ #bittensor-finetune-subnet (1 messages):

Links mentioned:

Release v0.2.2 · NousResearch/finetuning-subnet: v0.2.2


LAION ▷ #general (229 messages🔥🔥):

Links mentioned:


LAION ▷ #research (20 messages🔥):

Links mentioned:


HuggingFace ▷ #general (112 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (1 messages):

pacozaa: Transformer js


HuggingFace ▷ #cool-finds (7 messages):

Links mentioned:


HuggingFace ▷ #i-made-this (16 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (42 messages🔥):

Links mentioned:


HuggingFace ▷ #diffusion-discussions (6 messages):

Links mentioned:


HuggingFace ▷ #computer-vision (7 messages):


HuggingFace ▷ #NLP (14 messages🔥):


HuggingFace ▷ #diffusion-discussions (6 messages):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (180 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #announcements (1 messages):

RAPTOR Webinar

Links mentioned:

LlamaIndex Webinar: Tree-Structured Indexing and Retrieval with RAPTOR · Zoom · Luma: RAPTOR is a recent paper that introduces a new tree-structured technique, which hierarchically clusters/summarizes chunks into a tree structure containing both high-level and...


LlamaIndex ▷ #blog (6 messages):

Links mentioned:


LlamaIndex ▷ #general (97 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #ai-discussion (3 messages):

Links mentioned:

Empowering Long Context RAG: The Integration of LlamaIndex with LongContext: Ankush k Singal


CUDA MODE ▷ #general (7 messages):

Links mentioned:

Nvidia bans using translation layers for CUDA software — previously the prohibition was only listed in the online EULA, now included in installed files [Updated]: Translators in the crosshairs.


CUDA MODE ▷ #cuda (11 messages🔥):

Links mentioned:


CUDA MODE ▷ #torch (4 messages):


CUDA MODE ▷ #suggestions (1 messages):

iron_bound: https://www.youtube.com/watch?v=kCc8FmEb1nY


CUDA MODE ▷ #jobs (1 messages):

bowtiedlark: Remote?


CUDA MODE ▷ #beginner (4 messages):

Links mentioned:

GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines: CUDA Templates for Linear Algebra Subroutines. Contribute to NVIDIA/cutlass development by creating an account on GitHub.


CUDA MODE ▷ #youtube-recordings (10 messages🔥):

Links mentioned:

Lecture 8: CUDA Performance Checklist: Code https://github.com/cuda-mode/lectures/tree/main/lecture8Slides https://docs.google.com/presentation/d/1cvVpf3ChFFiY4Kf25S4e4sPY6Y5uRUO-X-A4nJ7IhFE/edit


CUDA MODE ▷ #ring-attention (49 messages🔥):

Links mentioned:


LLM Perf Enthusiasts AI ▷ #general (1 messages):


LLM Perf Enthusiasts AI ▷ #claude (76 messages🔥🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ideas-and-feedback (1 messages):

Links mentioned:

Intel's Humbling | Stratechery by Ben Thompson: Read the Article: https://stratechery.com/2024/intels-humbling/Links: Stratechery: https://stratechery.comSign up for Stratechery Plus: https://stratechery.c...


Interconnects (Nathan Lambert) ▷ #news (43 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (6 messages):


Interconnects (Nathan Lambert) ▷ #random (3 messages):


Interconnects (Nathan Lambert) ▷ #rl (7 messages):


LangChain AI ▷ #general (51 messages🔥):

Links mentioned:


LangChain AI ▷ #langserve (3 messages):


LangChain AI ▷ #share-your-work (5 messages):

Links mentioned:


Latent Space ▷ #ai-general-chat (58 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (23 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (3 messages):


OpenAccess AI Collective (axolotl) ▷ #datasets (1 messages):

drewskidang_82747: what is this nerf bs


Datasette - LLM (@SimonW) ▷ #ai (8 messages🔥):

Links mentioned:


Datasette - LLM (@SimonW) ▷ #llm (5 messages):

Links mentioned:

GitHub - simonw/llm-claude-3: LLM plugin for interacting with the Claude 3 family of models: LLM plugin for interacting with the Claude 3 family of models - simonw/llm-claude-3


DiscoResearch ▷ #general (9 messages🔥):


Skunkworks AI ▷ #general (2 messages):


Skunkworks AI ▷ #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=Zt73ka2Y8a8


Alignment Lab AI ▷ #looking-for-collabs (2 messages):


AI Engineer Foundation ▷ #general (1 messages):

Links mentioned:

Drafts of the Open Source AI Definition: The drafts of the Open Source AI Definition. We’re publishing the draft documents as they’re released. Check the individual drafts below for instructions on how to leave your comments. …