Frozen AI News archive

RIP Latent Diffusion, Hello Hourglass Diffusion

**Katherine Crowson** from **Stable Diffusion** introduces a hierarchical pure transformer backbone for diffusion-based image generation that efficiently scales to megapixel resolutions with under 600 million parameters, improving upon the original ~900M parameter model. This architecture processes local and global image phenomena separately, enhancing efficiency and resolution without latent steps. Additionally, Meta's Self Rewarding LM paper has inspired **lucidrains** to begin an implementation. Discord summaries highlight GPT-4's robustness against quantification tricks, discussions on open-source GPT-0 alternatives, challenges in DPO training on limited VRAM with suggestions like QLoRA and rmsprop, and efforts to improve roleplay model consistency through fine-tuning and merging. Philosophical debates on AI sentience and GPT-4 customization for markdown and translation tasks were also noted.

Canonical issue URL

Katherine Crowson, of Stable Diffusion fame, is back with a monster: [Direct pixel-space megapixel image generation with diffusion models](Direct pixel-space megapixel image generation with diffusion models):

a hierarchical pure transformer backbone for image generation with diffusion models that scales to high resolutions more efficiently than previous transformer-based backbones. Instead of treating images the same regardless of resolution, this architecture adapts to the target resolution, processing local phenomena locally at high resolutions and separately processing global phenomena in low-resolution parts of the hierarchy.

This updates the Latent Diffusion architecture (which Stable Diffusion is based on) with a fundamentally redesigned UNet which is less like a CNN and more Transformery. She also uses a bunch of SOTA inference tricks because why not:

image.png

The net result of all this is more efficiency - a hierarchical transformer arch that has an O(n) complexity, enabling it scale well to higher resolutions, like creating megapixel-scale images without any latent steps, with a <600m param model (the original SD was ~900M).

In other news, the Self Rewarding LM paper from Meta has gathered enough attention for lucidrains to start work on an implementation.

--

Table of Contents

[TOC]


PART 1: High level Discord summaries

TheBloke Discord Summary


OpenAI Discord Summary

AI Sentience: More Philosophy Than Reality?: In a stimulating back-and-forth, @lugui and @.pythagoras debated the concept of AI sentience, discussing human biases in perceiving intelligent behavior in non-sentient entities. The conversation touched on the dangers of future AI surpassing human control, drawing parallels with the Roko's Basilisk thought experiment and questioning the implications of our current actions on the future behavior of powerful AIs.

GPT-4: The Finer Points of Customization and Markdown: Users exchanged insights on GPT-4's customization for specific tasks like creating markdown documents and precision translations using custom dictionaries. Despite challenges and reported performance issues, the manipulation of contexts and structured prompting stood out as keys to improving output quality.

Prompt Engineering: Tackling Language and Logic: Focusing on nuanced use cases such as professional-level Latin-to-English translation and the reduction of repetitive language, @novumclassicum, @stealth2077, and others experimented with attaching text files and iteratively refining prompts. The cumulative knowledge highlighted the power of well-crafted instructions for guiding GPT-4 towards desired outcomes.

API Quandaries and Contextual Concerns: API-related discussions revealed complexities of custom dictionary translations, the management of long lists, and continuity in extended AI conversations. @darthgustav and @eskcanta provided key advice on overcoming repetitive outputs and context window limitations, pointing towards structured instructions and understanding of GPT-4's internal mechanisms.

Practical Advice for Knowledge and Action Management: The community offered strategies for enhanced handling of knowledge files when addressing issues like consistent GPT performance across varied applications from educational models to storytelling, emphasizing the need for explicit instruction for better AI behavior.


LM Studio Discord Summary


Nous Research AI Discord Summary


Mistral Discord Summary


Eleuther Discord Summary


HuggingFace Discord Discord Summary


Perplexity AI Discord Summary


LAION Discord Summary


Latent Space Discord Summary


DiscoResearch Discord Summary


LangChain AI Discord Summary

JavaScript Calls LangServe More Easily: A new method for calling LangServe chains from JavaScript frontends was highlighted, aiming to simplify the integration of LangServe with JS applications. This update, shared by @jacoblee93 in a Tweet, could streamline frontend and AI interactions.

Open Source RAG Models Elevate Multilingual Tech: The release of new EmbeddingModel and RerankerModel by @maidalun on Hugging Face enhances RAG's capabilities with support for multiple languages and domain-specific adaptations. These models, shared in the general and share-your-work channels, can be found on Hugging Face and checked out on the GitHub repo.

Write-Ahead Log Intrigues: In the #langserve channel, @veryboldbagel initiated a conversation about the complexities introduced by a write-ahead log, questioning its impact on feedback mutation.

Langchain Embarks on Biblical Interpretations: Users collaborated on a Bible study application, where @ilguappo shared his vector database project that prompts AI to provide priest-like responses; his work is available on GitHub.

Artistry Through AI's Eyes: In a blend of AI and art, @dwb7737 used LangChain with various vision models to analyze artworks and shared the results from OpenAI Vision and VertexAI Vision, noting OpenAI Vision as the top performer. Summaries from their research are accessible via the VertexAI Vision Gist and the OpenAI Vision Gist.

Tutorials Enlighten Custom Tool Creation and Systems Theory: Users provided resources for skill-building, such as @business24.ai's video tutorial on using crewAI to store notes in Obsidian, visible at this YouTube link, and @jasonzhou1993's video exploring System 2 thinking in LLMs and its future in GPT-5, found here on YouTube.


OpenAccess AI Collective (axolotl) Discord Summary


LlamaIndex Discord Discord Summary


Skunkworks AI Discord Summary


LLM Perf Enthusiasts AI Discord Summary


Alignment Lab AI Discord Summary

No relevant technical discussions to summarize.


YAIG (a16z Infra) Discord Summary


The Datasette - LLM (@SimonW) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

TheBloke ▷ #general (1250 messages🔥🔥🔥):

Links mentioned:


TheBloke ▷ #characters-roleplay-stories (541 messages🔥🔥🔥):

Links mentioned:


TheBloke ▷ #training-and-fine-tuning (13 messages🔥):

Links mentioned:

DPO Trainer: no description found

,

OpenAI ▷ #ai-discussions (154 messages🔥🔥):

Links mentioned:

Roko's basilisk - Wikipedia: no description found


OpenAI ▷ #gpt-4-discussions (47 messages🔥):


OpenAI ▷ #prompt-engineering (141 messages🔥🔥):


OpenAI ▷ #api-discussions (141 messages🔥🔥):

LM Studio ▷ #💬-general (235 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (11 messages🔥):


LM Studio ▷ #🧠-feedback (8 messages🔥):


LM Studio ▷ #🎛-hardware-discussion (24 messages🔥):

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (48 messages🔥):

Links mentioned:


LM Studio ▷ #autogen (2 messages):


LM Studio ▷ #crew-ai (21 messages🔥):


LM Studio ▷ #open-interpreter (5 messages):

Nous Research AI ▷ #off-topic (42 messages🔥):

Links mentioned:


Nous Research AI ▷ #interesting-links (6 messages):

Links mentioned:


Nous Research AI ▷ #general (118 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (52 messages🔥):

Links mentioned:

CUDA Pro Tip: Control GPU Visibility with CUDA_VISIBLE_DEVICES | NVIDIA Technical Blog: As a CUDA developer, you will often need to control which devices your application uses. In a short-but-sweet post on the Acceleware blog, Chris Mason writes: As Chris points out…

,

Mistral ▷ #general (148 messages🔥🔥):

Links mentioned:

Mistral 7B foundation models from Mistral AI are now available in Amazon SageMaker JumpStart | Amazon Web Services: Today, we are excited to announce that the Mistral 7B foundation models, developed by Mistral AI, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running in...


Mistral ▷ #models (15 messages🔥):


Mistral ▷ #deployment (2 messages):

Links mentioned:

Prompt Engineering Guide for Open LLM: Take your Open LLM application to the next level: Introduction: Why do we need another guide?


Mistral ▷ #ref-implem (8 messages🔥):

Links mentioned:

&lt;s&gt; [INST] Could you help me to an - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.


Mistral ▷ #finetuning (34 messages🔥):

Links mentioned:


Mistral ▷ #showcase (1 messages):


Mistral ▷ #la-plateforme (7 messages):

Links mentioned:

Eleuther ▷ #general (33 messages🔥):


Eleuther ▷ #research (38 messages🔥):

Links mentioned:


Eleuther ▷ #scaling-laws (5 messages):


Eleuther ▷ #interpretability-general (13 messages🔥):

Links mentioned:

Tweet from Stella Biderman (@BlancheMinerva)): @ghandeharioun This is a very interesting paper! I'm having trouble figuring out how I should interpret some of the results. For example, you discuss outperforming the Logit Lens and Tuned Lens, b...


Eleuther ▷ #lm-thunderdome (46 messages🔥):

Links mentioned:

HuggingFace Discord ▷ #general (41 messages🔥):

Links mentioned:

GPT5 unlocks LLM System 2 Thinking?: Human think fast & slow, but how about LLM? How would GPT5 resolve this?101 guide on how to unlock your LLM system 2 thinking to tackle bigger problems🔗 Lin...


HuggingFace Discord ▷ #today-im-learning (7 messages):

Links mentioned:

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining: The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimiz...


HuggingFace Discord ▷ #cool-finds (6 messages):

Links mentioned:


HuggingFace Discord ▷ #i-made-this (9 messages🔥):

Links mentioned:


HuggingFace Discord ▷ #diffusion-discussions (2 messages):


HuggingFace Discord ▷ #computer-vision (4 messages):

Links mentioned:

my_notes/Deep Learning, Deep RL/CNNs 2 (Advantages).pdf at main · merveenoyan/my_notes: My small cheatsheets for data science, ML, computer science and more. - merveenoyan/my_notes


HuggingFace Discord ▷ #NLP (8 messages🔥):

Links mentioned:


HuggingFace Discord ▷ #diffusion-discussions (2 messages):

pipe = StableDiffusionXLPipeline.from_single_file(".\models\Stable-diffusion\sdxl\sd_xl_base_1.0_0.9vae.safetensors", torch_dtype=torch.float16)
prompt = "concept art Amber Temple, snow, frigid air, snow-covered peaks of the mountains, dungeons and dragons style, dark atmosphere . digital artwork, illustrative, painterly, matte painting, highly detailed"
negative_prompt = "photo, photorealistic, realism, ugly"
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
image = pipe(prompt, negative_prompt=negative_prompt, guidance_scale=8, num_inference_steps=20, width=1024, height=1024, generator=torch.Generator(device='cuda').manual_seed(1337), use_karras_sigmas=True).images[0]

,

Perplexity AI ▷ #general (48 messages🔥):

Links mentioned:


Perplexity AI ▷ #sharing (14 messages🔥):

Links mentioned:


Perplexity AI ▷ #pplx-api (13 messages🔥):

LAION ▷ #general (59 messages🔥🔥):

Links mentioned:


LAION ▷ #research (11 messages🔥):

Links mentioned:

Latent Space ▷ #ai-general-chat (59 messages🔥🔥):

Please note: The HTML markup style, clickable links, and direct quotes were not included due to the constraints of this example.

Links mentioned:

DiscoResearch ▷ #disco_judge (2 messages):

Links mentioned:

self-rewarding-lm-pytorch/self_rewarding_lm_pytorch/self_rewarding_lm_pytorch.py at 1cc1e1d27ff5e120efcd677c1b0691cf3cdd0402 · lucidrains/self-rewarding-lm-pytorch: Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI - lucidrains/self-rewarding-lm-pytorch


DiscoResearch ▷ #mixtral_implementation (10 messages🔥):

Links mentioned:


DiscoResearch ▷ #general (16 messages🔥):

Links mentioned:


DiscoResearch ▷ #embedding_dev (14 messages🔥):

Links mentioned:


DiscoResearch ▷ #discolm_german (11 messages🔥):

LangChain AI ▷ #announcements (1 messages):


LangChain AI ▷ #general (34 messages🔥):

Links mentioned:


LangChain AI ▷ #langserve (1 messages):


LangChain AI ▷ #share-your-work (8 messages🔥):

Links mentioned:


LangChain AI ▷ #tutorials (2 messages):

Links mentioned:

OpenAccess AI Collective (axolotl) ▷ #general (11 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (25 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (2 messages):


OpenAccess AI Collective (axolotl) ▷ #datasets (2 messages):


OpenAccess AI Collective (axolotl) ▷ #rlhf (2 messages):

LlamaIndex Discord ▷ #blog (3 messages):

Links mentioned:

JSONalyze Query Engine - LlamaIndex 🦙 0.9.36: no description found


LlamaIndex Discord ▷ #general (25 messages🔥):

Links mentioned:


LlamaIndex Discord ▷ #ai-discussion (1 messages):

Links mentioned:

Advanced RAG with LlamaIndex & Together.ai’s Embedding: Ankush k Singal

,

Skunkworks AI ▷ #general (18 messages🔥):

Links mentioned:

LLM Perf Enthusiasts AI ▷ #embeddings (1 messages):


LLM Perf Enthusiasts AI ▷ #feedback-meta (2 messages):

Alignment Lab AI ▷ #general-chat (1 messages):

indietechie: Anyone with experience using token monster for training a tokenizer?


Alignment Lab AI ▷ #oo (1 messages):

bumingqiu: I have

YAIG (a16z Infra) ▷ #ai-ml (1 messages):