Frozen AI News archive

Somebody give Andrej some H100s already

**OpenAI**'s GPT-2 sparked controversy five years ago for being "too dangerous to release." Now, with **FineWeb** and **llm.c**, a tiny GPT-2 model can be trained in **90 minutes** for **$20** using **8xA100** GPUs, with the full 1.6B model estimated to take **1 week** and **$2.5k**. The project is notable for its heavy use of **CUDA** (75.8%) aiming to simplify the training stack. Meanwhile, a Twitter debate between **Yann LeCun** and **Elon Musk** highlighted the importance of **convolutional neural networks (CNNs)** in real-time image processing for autonomous driving, with LeCun emphasizing scientific research's role in technological progress. LeCun also criticized AI doomsday scenarios, arguing for cautious optimism about AI safety and regulation.

Canonical issue URL

AI News for 5/27/2024-5/28/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (382 channels, and 4432 messages) for you. Estimated reading time saved (at 200wpm): 521 minutes.

Five years ago, OpenAI spawned its first controversy with GPT-2 being called "too dangerous to release".

Today, with help from FineWeb (released last month), you can train a tiny GPT-2 in 90 minutes and $20 in 8xA100 server time. It is already working (kinda) for the 350M version, and Andrej estimates that the full 1.6B model will take 1 week and $2.5k.

image.png

And incredible accomplishment in 7 weeks of work from scratch, though at this point the repo is 75.8% CUDA, stretching the name of "llm.c".

Andrej also answered some questions on HN and on Twitter. one of the most interesting replies:

Q: How large is the set of binaries needed to do this training job? The current pytorch + CUDA ecosystem is so incredibly gigantic and manipulating those container images is painful because they are so large. I was hopeful that this would be the beginnings of a much smaller training/fine-tuning stack?

A: That is 100% my intention and hope and I think we are very close to deleting all of that.

It would be cheaper and faster if more H100s were available. Somebody help a newly GPU poor out?


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Yann LeCun and Elon Musk Twitter Debate

AI Safety and Regulation Discussions

AI Research and Engineering Discussions

Memes and Humor


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Models and Architectures

AI Applications and Tools

AI Ethics and Safety

AI Industry and Competition


AI Discord Recap

A summary of Summaries of Summaries

LLM Advancements and Benchmarking:

Optimizing LLM Inference and Training:

Open-Source AI Frameworks and Community Efforts:

Multimodal AI and Generative Modeling Innovations:


{% if medium == 'web' %}

PART 1: High level Discord summaries

LLM Finetuning (Hamel + Dan) Discord

OCR Showdown: Google Vision vs. Microsoft Azure: AI engineers debated the merits and pitfalls of Google Vision OCR, acknowledging its precision but criticizing the developer experience. Suggestions for using Microsoft Azure OCR and Mindee Doctr, potentially offering better ease of use, surfaced here.

Curated Data: The Key to LLM Success: Workshop discussions underscored the importance of fine-tuning LLMs with high-quality, curated datasets, ranging from pharma applications to technical support chatbots. Expert opinion highlighted the need for precision in data choice to maximize LLM effectiveness, spotlighting domains like drug discovery, law, sales, and interdisciplinary work.

Axolotl Angst and Optimization: Users faced hurdles running Axolotl's 70B model on M3 Macs, with overwhelming latency during local inference, pointing to deployment on Modal as a possible solution. Cost concerns with Weights & Biases (WandB) prompted considerations of alternatives like Aim and MLflow for economically-minded solo developers Axolotl examples.

LLM Evaluation Deep Dive: A session on evaluating LLMs offered a treasure trove of insights, covering product metrics, traditional and dynamic performance metrics, and tools like LangFuse and EvalGen. Recommending resources by Eugene Yan and practical examples to visualize fine-tuning, participants noted the necessity of nuanced evaluations for LLM development.

Transcription Tangles and the Path to Summaries: Communication around transcripts from large meetings illuminated needs for efficient summaries, exposing potential roles for LLMs. While Zoom transcripts are on the horizon, Hamel encouraged using LLMs to generate more digestible summaries, echoing wider community involvement.


Perplexity AI Discord


Stability.ai (Stable Diffusion) Discord

New AI Features to Tinker With: Stability AI announces the launch of Stable Assistant sporting editing features built on Stable Diffusion 3, boasting of improved text-to-image quality available for a free trial here, and a beta chatbot with Stable LM 2 12B, heralding future enhancements for text generation tasks.

Education Merges with AI Innovation: An upcoming 4-week course by Innovation Laboratory, a collaboration between Stability AI and HUG, intends to guide participants on training AI models utilizing Stability AI's framework in tandem with HUG's educational approach; sign-ups are open until June 25, 2024, accessible here.

GPU Sharing in the Spotlight: AI engineers discuss a community-based GPU sharing proposal to decrease compute costs, with options ranging from a custom node to a potential blockchain setup designed to validate model training operations.

SD3 Accessibility Stirs Controversy: Discordance surfaces as members air grievances regarding Stable Diffusion's SD3 weights not being available for local use — slating Stability AI's cloud-only approach and stirring debate over cloud-dependency and data privacy concerns.

User Interfaces Under Comparison: A technical discourse unfolds on the pros and cons of various interfaces for Stable Diffusion, with ComfyUI pitted against more user-friendly alternatives like Forge; discussions also include community tips, inpainting methods, and ways to enhance artificial intelligence workflows.


OpenAI Discord

OpenAI Forms Safety Shield: OpenAI has established a Safety and Security Committee that will take charge of critical safety and security decisions across all its projects; full details can be found in their official announcement.

AI Muscle Flexes in Hardware Arena: Discussions about hardware costs arose, speculating on a $200-$1000 increase due to NPUs (Neural Processing Units), with focus on their economic impact for high-end models.

Plotting the Prompt Landscape: AI engineers debated the merits of meta-prompting versus Chain of Thought (CoT), examining the potential of using mermaid diagrams to conserve tokens and enhance output quality. There was also a sharing of improved prompts like here, showcasing practical applications of advanced prompt engineering tactics.

Rubber Meets The Code: Practical discussions included how AI handles YAML, XML, and JSON formats natively, with suggestions on using these structures for prompts to improve AI understanding and performance, and shared resources pointing to real-life prompt application for generating code and planning.

Interactive Inconsistencies Ignite Inquiry: Users reported issues with ChatGPT ranging from its refusal to draw tarot cards to context drops and unresponsiveness, spotlighting the need for improved and more predictable AI behavior.


HuggingFace Discord

Voice Commands Meet Robotics: A demo video titled "Open Source Voice-Controlled Robotic Arm" exhibits a voice-activated AI robotic arm. The perspective of democratizing robotics technology via community collaboration was forwarded.

Bridging Modalities: Contributions on creating early multi-modal spaces point to the use of single models and possibly stacked models with routing functionalities. For insights on such implementation, a source link was shared, providing a model example with practical applications.

Deep Learning Consult on the Fly: A user consulted the community about overcoming common pain points in training a model using Stanford Cars Dataset, managing only a 60% accuracy using ViT-B_16, with struggles involving overfitting. Meanwhile, another member is looking for help on how to better their deep learning model, indicating an environment that supports knowledge exchange for newcomers.

Diffusers Update for Not-Just-Generation: Hugging Face announced its Diffusers library now supports tasks beyond generative models, such as depth estimation and normals' prediction through Marigold. The update suggests an escalating trend in the versatility of diffusion models and their applications.

Model Choices for Cyber Security Assessments: Analysis from researchers examines the aptitude of various large language models in cyber security contexts. This provides AI engineers an angle to consider the security ramifications inherent in the deployment of LLMs.

Robust SDXL Space Realignment: SDXL embed space discussions underscore that newly aligned spaces default to zeroes instead of an encoded space. Such insights reflect the underlying complexity and time demands associated with realigning models to new unconditioned spaces, revealing the intricate process behind the science.

Gradio Piques Curiosity with Upgraded Clients: The Gradio team announced a forthcoming live event to dive into the latest features of Gradio Python and JavaScript clients. The engagement invitation emphasizes Gradio's continuous push to streamline AI integration into diverse applications through enhanced interfaces.

Ambiguity in Finding an SFW Dataset: Community chatter touches on the difficulty of locating the Nomos8k_sfw dataset, which is tied to the 4x-Nomos8kDAT model, suggesting the dataset’s limited availability or obscure placement. This highlights the occasional challenges inherent to dataset procurement.

Launching Latest Tools for AI Storytelling: Typeface Arc emerges as a comprehensive platform for seamlessness in creating AI-driven content. It features a tool, appropriately dubbed "Copilot", designed to amplify content creation via an interactive experience pivotal for brand narratives.


LM Studio Discord

Visualize This: OpenAI Integrates with LLama!: Engineers can now leverage LLaVA for visual capabilities in LM Studio by deploying it on a server and making use of the Python vision template provided.

Speedy Model Loading on M1 Max: AI models like MLX and EXL2 load swiftly on Apple's M1 Max, taking a mere 5 seconds for L3 8bit, indicating superior performance compared to GGUF Q8 which takes 29 seconds.

LM Studio Finetuning Frustrations: Despite being a robust environment, LM Studio currently lacks the ability to directly fine-tune models, with enthusiasts being pointed to alternative solutions like MLX designed for Apple Silicon.

Budget or Bust: AI practitioners debated the value proposition of various Nvidia GPUs, considering alternatives like the Tesla P40/P100 and eagerly discussed rumored GPUs like the 5090 with anticipation.

Beta Testing Blues: As they navigate the waters of new releases, users reported problems such as Windows CPU affinity issues with large models and errors on AVX2 laptops, hinting at the complexities of configuring modern hardware for AI tasks.


Unsloth AI (Daniel Han) Discord


CUDA MODE Discord


Eleuther Discord


OpenRouter (Alex Atallah) Discord


Nous Research AI Discord


LangChain AI Discord

Loop-the-Loop in LangChain: Engineers are troubleshooting a LangChain agent entering continuous loops when calling tools; one solution debate involves refining the agent's trigger conditions to prevent infinite tool invocation loops.

Details, Please! 16385-token Error in LangChain 0.2.2: Users report a token limit error in LangChain version 0.2.2, where a 16385-token limit is incorrectly applied, despite models supporting up to 128k tokens, prompting a community-lead investigation into this discrepancy.

SQL Prompt Crafting Consultation: Requests for SQL agent prompt templates with few-shot examples have been answered, providing engineers with the resources to craft queries in LangChain more effectively.

Disappearing Act: Custom kwargs in Langserve: Some users experience a problem where custom "kwargs" sent through Langserve for logging in Langsmith are missing upon arrival, a concern currently seeking resolution.

Showcasing Applications: Diverse applications developed using LangChain were shared, including frameworks for drug discovery, cost-saving measures for logging, enhancements for flight simulators, and tutorials about routing logic in agent flows.


Modular (Mojo 🔥) Discord


Latent Space Discord


LlamaIndex Discord


LAION Discord

AI Reads Between the Lines: Members shared a laugh over SOTA AGI models' odd claims with one model's self-training assertion, "it has trained a model for us," tickling the collective funny bone. Musk's jab at CNNs—quipping "We don’t use CNNs much these days"—set off a chain of ironical replies and a nod towards vision transformer models as the new industry darlings.

Artificial Artist's Watermark Woes: Corcelio's Mobius Art Model is pushing boundaries with diverse prompts, yet leaves a watermark even though it's overtaking past models in creativity. Ethical dilemmas arose from the capability of image generation systems to produce 'inappropriate' content, sparking debate on community guidelines and systems' control settings.

Synthetic Sight Seeks Improvement: In an effort to grapple with SDXL's inability to generate images of "reading eyes," a member asked for collaborative help to build a synthetic database using DALLE, hoping to hone SDXL's capabilities in this nuanced visual task.

Patterns and Puzzles in Generative Watermarks: Observations within the guild pointed out a recurring theme of generative models producing watermarks, indicating possible undertraining, which was found both amusing and noteworthy among the engineers.

Elon's Eyeroll at CNNs Stokes AI Banter: Elon Musk's tweet sent a ripple through the community, sparking jests about the obsolete nature of CNNs in today's transformative AI methodologies and the potential pivot towards transformer models.


tinygrad (George Hotz) Discord

GPU Latency Predictions Without Benchmarks?: Engineers discussed the potential for symbolically modeling GPU latencies without running kernels by considering data movement and operation times, though complexities such as occupancy and async operations were recognized as potential confounders. There's also anticipation for AMD's open-source release of MES and speculation about quant firms using cycle accurate GPU simulators for in-depth kernel optimization.

Optimizing with Autotuners: The community explored kernel optimization tools like AutoTVM and Halide, noting their different approaches to performance improvement; George Hotz highlighted TVM's use of XGBoost and stressed the importance of cache emulation for accurate modeling.

Latency Hiding Mechanics in GPUs: It was noted that GPUs employ a variety of latency-hiding strategies with their ability to run concurrent wavefronts/blocks, thus making latency modeling more complex and nuanced.

Buffer Creation Discussions in Tinygrad: The #learn-tinygrad channel had members inquiring about using post dominator analysis in scheduling for graph fusion efficiency and the creation of LazyBuffer from arrays, with a suggestion to use Load.EMPTY -> Load.COPY for such scenarios.

Code Clarity and Assistance: Detailed discussions were had regarding buffer allocation and LazyBuffer creation in Tinygrad, with one member offering to provide code pointers for further clarification and understanding.


AI Stack Devs (Yoko Li) Discord


Cohere Discord


OpenAccess AI Collective (axolotl) Discord


Interconnects (Nathan Lambert) Discord


Datasette - LLM (@SimonW) Discord


Mozilla AI Discord

Link mentioned: granite-34b-code-instruct.llamafile


OpenInterpreter Discord


AI21 Labs (Jamba) Discord


MLOps @Chipro Discord


The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

LLM Finetuning (Hamel + Dan) ▷ #general (91 messages🔥🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #workshop-1 (10 messages🔥):


LLM Finetuning (Hamel + Dan) ▷ #asia-tz (9 messages🔥):


LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (87 messages🔥🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #learning-resources (11 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #jarvis-labs (26 messages🔥):

Link mentioned: Create custom environment | Jarvislabs: You may want to create and maintain separate virtual environments as your project gets more complicated.


LLM Finetuning (Hamel + Dan) ▷ #hugging-face (19 messages🔥):

Link mentioned: GitHub - rasbt/LLMs-from-scratch: Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step: Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step - rasbt/LLMs-from-scratch


LLM Finetuning (Hamel + Dan) ▷ #replicate (6 messages):


LLM Finetuning (Hamel + Dan) ▷ #langsmith (1 messages):


LLM Finetuning (Hamel + Dan) ▷ #kylecorbitt_prompt_to_model (3 messages):


LLM Finetuning (Hamel + Dan) ▷ #workshop-2 (64 messages🔥🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #workshop-3 (461 messages🔥🔥🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #clavie_beyond_ragbasics (2 messages):


LLM Finetuning (Hamel + Dan) ▷ #axolotl (31 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #wing-axolotl (22 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #freddy-gradio (78 messages🔥🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #charles-modal (5 messages):


LLM Finetuning (Hamel + Dan) ▷ #langchain-langsmith (2 messages):


LLM Finetuning (Hamel + Dan) ▷ #credits-questions (12 messages🔥):


Perplexity AI ▷ #general (659 messages🔥🔥🔥):

- **Anticipation for imfo alpha launch**: An exciting new development is incoming, with a teaser link shared: [spectate_or on X](https://x.com/spectate_or/status/1795077451195830661?s=46). This generated enthusiasm and comparisons to similar tools in the community.
- **Detailed discussion on AI task implementation**: Members discussed categorizing tasks into retrieval and mutation types, with queries like "Get the weight of the iPhone 15" exemplifying this structure. One member emphasized, *"all the steps just happen at the same time,"* needing adjustments for tasks requiring sequential execution.
- **Frustrations around scraping accuracy**: Members faced challenges with HTML parsing for accurate data retrieval, particularly from complex sources like Apple and Docker's release notes. Cloudflare issues and suggestions like using Playwright for JavaScript-heavy sites were also discussed.
- **Cost-effective AI model usage insights**: Detailed calculations were shared on the cost efficiency of using various AI models, with a combined system using Llama3 and Claude models showing significant potential savings.
- **Claude 3 model's performance concerns**: A member shared frustrations about Claude 3 not improving prompts as effectively as before. This triggered a broader discussion on prompt engineering and model performance across different tasks.

Links mentioned:


Perplexity AI ▷ #sharing (6 messages):


Perplexity AI ▷ #pplx-api (6 messages):


Stability.ai (Stable Diffusion) ▷ #announcements (2 messages):

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (495 messages🔥🔥🔥):

Links mentioned:


OpenAI ▷ #annnouncements (1 messages):


OpenAI ▷ #ai-discussions (321 messages🔥🔥):

Link mentioned: MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training: In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and c...


OpenAI ▷ #gpt-4-discussions (21 messages🔥):


OpenAI ▷ #prompt-engineering (76 messages🔥🔥):


OpenAI ▷ #api-discussions (76 messages🔥🔥):


HuggingFace ▷ #announcements (1 messages):

Links mentioned:


HuggingFace ▷ #general (333 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (1 messages):


HuggingFace ▷ #cool-finds (1 messages):

Link mentioned: What I learned from looking at 900 most popular open source AI tools: [Hacker News discussion, LinkedIn discussion, Twitter thread]


HuggingFace ▷ #i-made-this (6 messages):

Links mentioned:


HuggingFace ▷ #reading-group (1 messages):

pr0x7: okay I will try and prepare.update you accordingly. thanks


HuggingFace ▷ #computer-vision (4 messages):

Link mentioned: Hugging Face Computer Vision Hangout: Tabellenblatt1 Topic (Fine-Tuning/Cool Project/etc.),Style (Short Presentation/Discussion/etc.),Proposed by (discord name)


HuggingFace ▷ #diffusion-discussions (2 messages):

Link mentioned: Typeface | Personalized AI Storytelling for Work: Typeface, the generative AI application for enterprise content creation, empowers all businesses to create exceptional, on-brand content at supercharged speeds.


HuggingFace ▷ #gradio-announcements (1 messages):

Links mentioned:


LM Studio ▷ #💬-general (61 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (45 messages🔥):

Link mentioned: microsoft/Phi-3-vision-128k-instruct · Hugging Face: no description found


LM Studio ▷ #📝-prompts-discussion-chat (12 messages🔥):


LM Studio ▷ #🎛-hardware-discussion (135 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (13 messages🔥):


Unsloth AI (Daniel Han) ▷ #general (169 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (48 messages🔥):

- **Fix GDrive Save Error by Correcting Argument Order**: A member struggled with an error while saving a model to GDrive due to incorrect argument order in `save_pretrained_merged`. Another member suggested fixing the argument order which solved the issue (*"Welp, that was dumb of me, thanks!"*).
- **Batch Size and Steps During Training**: Members discussed how to set epochs and steps for a model with 500 examples using batch size 8 and 62 steps. It was suggested to use `num_train_epochs = 3` and remove `max_steps = 500` to potentially avoid repetitive outputs and overfitting.
- **Repeating Sentences in Model Training**: A member encountered an issue with the model repeating the same sentence after training, possibly due to missing EOS tokens. This suggests the need to ensure that an EOS token is added to prevent overfitting or insufficient training.
- **Exporting Models to ONNX**: A member sought help converting a fine-tuned model to ONNX format. They were directed to Hugging Face's [ONNX export guide](https://huggingface.co/docs/transformers/en/serialization) and clarified that VLLM format works for the conversion.
- **Support for 8-bit and OpenAI-compatible Servers**: Discussions covered future support for 8-bit models and OpenAI-compatible servers. It's indicated that 8-bit support is coming soon, and there's a pathway for running Unsloth models in environments similar to LM Studio, Jan AI, or Ollama.

Links mentioned:


CUDA MODE ▷ #general (2 messages):


CUDA MODE ▷ #triton (2 messages):

Links mentioned:


CUDA MODE ▷ #torch (12 messages🔥):

Links mentioned:


CUDA MODE ▷ #algorithms (1 messages):

For more details, you can read the full blog post.

Link mentioned: Near-Instant Full-File Edits: no description found


CUDA MODE ▷ #beginner (15 messages🔥):

Link mentioned: CUDA Toolkit 12.1 Downloads: Get the latest feature updates to NVIDIA's proprietary compute stack.


CUDA MODE ▷ #torchao (3 messages):

Links mentioned:


CUDA MODE ▷ #off-topic (27 messages🔥):


CUDA MODE ▷ #llmdotc (131 messages🔥🔥):

Links mentioned:

</a>: no description found

CUDA MODE ▷ #oneapi (1 messages):

orion160: What are tools to debug SYCL code? In general stepping into kernel code....


CUDA MODE ▷ #bitnet (9 messages🔥):

Link mentioned: Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets: Training activation quantized neural networks involves minimizing a piecewise constant function whose gradient vanishes almost everywhere, which is undesirable for the standard back-propagation or cha...


Eleuther ▷ #general (14 messages🔥):


Eleuther ▷ #research (122 messages🔥🔥):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):


OpenRouter (Alex Atallah) ▷ #general (122 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #off-topic (6 messages):

Link mentioned: CrewAI Introduction to creating AI Agents: We will take a look at how to create ai agents using crew aihttps://docs.crewai.com/how-to/Creating-a-Crew-and-kick-it-off/#python #pythonprogramming #llm #m...


Nous Research AI ▷ #general (63 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (28 messages🔥):

Links mentioned:


Nous Research AI ▷ #rag-dataset (2 messages):

Link mentioned: GitHub - EveryOneIsGross/densefeelsCHAT: sentiment and semantic density smoothing agent. w/ tts: sentiment and semantic density smoothing agent. w/ tts - EveryOneIsGross/densefeelsCHAT


Nous Research AI ▷ #world-sim (1 messages):

jakekies: hi


LangChain AI ▷ #general (76 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #langserve (4 messages):


LangChain AI ▷ #share-your-work (4 messages):

Links mentioned:


Modular (Mojo 🔥) ▷ #general (20 messages🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #tech-news (6 messages):


Modular (Mojo 🔥) ▷ #🔥mojo (14 messages🔥):


Modular (Mojo 🔥) ▷ #performance-and-benchmarks (21 messages🔥):

Link mentioned: fnands.com/blog/2024/mojo-crc-calc/crcn.mojo at main · fnands/fnands.com: My personal blog. Contribute to fnands/fnands.com development by creating an account on GitHub.


Modular (Mojo 🔥) ▷ #nightly (13 messages🔥):


Latent Space ▷ #ai-general-chat (68 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-announcements (3 messages):

- **New podcast on ICLR 2024 papers**: A new episode covering highlights from ICLR 2024 has been released, featuring various groundbreaking papers and talks. [Listen here](https://x.com/latentspacepod/status/1795196817044594817) for insights on ImageGen, Compression, Adversarial Attacks, Vision Learning, and more.
- **Spotlight on ImageGen and Compression**: Topics discussed include "Auto-encoding Variational Bayes" and "Würstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models". Notable mentions are detailed insights from Ilya Sutskever and Christian Szegedy.
- **Vision Learning advancements**: The podcast delves into papers like "Vision Transformers Need Registers" and "Think before you speak: Training Language Models With Pause Tokens". It also investigates the statistical theory of data selection under weak supervision.
- **Enhancing Transformer models**: Discussion on efficient fine-tuning and context window extension of large language models with papers like "LongLoRA" and "YaRN". Topics like adaptive KV cache compression and efficient communication for giant model training also featured.
- **State Space Models vs Transformers**: The importance of data-driven priors in long-sequence models is highlighted in the paper "Never Train from Scratch". Stay tuned for more content on LLM Reasoning and Agents in Part 2.

Link mentioned: Tweet from Latent Space Podcast (@latentspacepod): 🆕 ICLR 2024: Best Papers (Part 1) We present our selections of outstanding papers and talks thematically introducing topics for AI Engineers to track: Section A: ImageGen, Compression, Adversarial ...


LlamaIndex ▷ #blog (1 messages):


LlamaIndex ▷ #general (59 messages🔥🔥):

Links mentioned:


LAION ▷ #general (40 messages🔥):

Links mentioned:


LAION ▷ #research (2 messages):


tinygrad (George Hotz) ▷ #general (25 messages🔥):

Link mentioned: GPUs Go Brrr: how make gpu fast?


tinygrad (George Hotz) ▷ #learn-tinygrad (5 messages):


AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (25 messages🔥):

Links mentioned:


AI Stack Devs (Yoko Li) ▷ #ai-town-dev (1 messages):

gomiez: hi. how do i stop conversations from closing? i cant read that fast


AI Stack Devs (Yoko Li) ▷ #late-night-lounge (1 messages):

angry.penguin: LMK if you have any luck with inference


Cohere ▷ #general (12 messages🔥):

Link mentioned: GitGud: no description found


Cohere ▷ #project-sharing (3 messages):


OpenAccess AI Collective (axolotl) ▷ #general (4 messages):


OpenAccess AI Collective (axolotl) ▷ #general-help (1 messages):


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (4 messages):

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.


Interconnects (Nathan Lambert) ▷ #news (2 messages):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (6 messages):

Links mentioned:


Datasette - LLM (@SimonW) ▷ #llm (3 messages):


Mozilla AI ▷ #llamafile (3 messages):

Link mentioned: no title found: no description found


OpenInterpreter ▷ #O1 (3 messages):


AI21 Labs (Jamba) ▷ #general-chat (2 messages):

- **Server seems unmoderated**: A member pointed out that "it looks like the server is unmoderated..." highlighting an apparent lack of moderation.
- **Attempted @everyone ping fails**: The same member tried to use the @everyone tag but noted it "doesn't ping" as intended.

MLOps @Chipro ▷ #events (1 messages):




{% else %}

LLM Finetuning (Hamel + Dan) Discord

OCR Showdown: Google Vision vs. Microsoft Azure: AI engineers debated the merits and pitfalls of Google Vision OCR, acknowledging its precision but criticizing the developer experience. Suggestions for using Microsoft Azure OCR and Mindee Doctr, potentially offering better ease of use, surfaced here.

Curated Data: The Key to LLM Success: Workshop discussions underscored the importance of fine-tuning LLMs with high-quality, curated datasets, ranging from pharma applications to technical support chatbots. Expert opinion highlighted the need for precision in data choice to maximize LLM effectiveness, spotlighting domains like drug discovery, law, sales, and interdisciplinary work.

Axolotl Angst and Optimization: Users faced hurdles running Axolotl's 70B model on M3 Macs, with overwhelming latency during local inference, pointing to deployment on Modal as a possible solution. Cost concerns with Weights & Biases (WandB) prompted considerations of alternatives like Aim and MLflow for economically-minded solo developers Axolotl examples.

LLM Evaluation Deep Dive: A session on evaluating LLMs offered a treasure trove of insights, covering product metrics, traditional and dynamic performance metrics, and tools like LangFuse and EvalGen. Recommending resources by Eugene Yan and practical examples to visualize fine-tuning, participants noted the necessity of nuanced evaluations for LLM development.

Transcription Tangles and the Path to Summaries: Communication around transcripts from large meetings illuminated needs for efficient summaries, exposing potential roles for LLMs. While Zoom transcripts are on the horizon, Hamel encouraged using LLMs to generate more digestible summaries, echoing wider community involvement.


Perplexity AI Discord


Stability.ai (Stable Diffusion) Discord

New AI Features to Tinker With: Stability AI announces the launch of Stable Assistant sporting editing features built on Stable Diffusion 3, boasting of improved text-to-image quality available for a free trial here, and a beta chatbot with Stable LM 2 12B, heralding future enhancements for text generation tasks.

Education Merges with AI Innovation: An upcoming 4-week course by Innovation Laboratory, a collaboration between Stability AI and HUG, intends to guide participants on training AI models utilizing Stability AI's framework in tandem with HUG's educational approach; sign-ups are open until June 25, 2024, accessible here.

GPU Sharing in the Spotlight: AI engineers discuss a community-based GPU sharing proposal to decrease compute costs, with options ranging from a custom node to a potential blockchain setup designed to validate model training operations.

SD3 Accessibility Stirs Controversy: Discordance surfaces as members air grievances regarding Stable Diffusion's SD3 weights not being available for local use — slating Stability AI's cloud-only approach and stirring debate over cloud-dependency and data privacy concerns.

User Interfaces Under Comparison: A technical discourse unfolds on the pros and cons of various interfaces for Stable Diffusion, with ComfyUI pitted against more user-friendly alternatives like Forge; discussions also include community tips, inpainting methods, and ways to enhance artificial intelligence workflows.


OpenAI Discord

OpenAI Forms Safety Shield: OpenAI has established a Safety and Security Committee that will take charge of critical safety and security decisions across all its projects; full details can be found in their official announcement.

AI Muscle Flexes in Hardware Arena: Discussions about hardware costs arose, speculating on a $200-$1000 increase due to NPUs (Neural Processing Units), with focus on their economic impact for high-end models.

Plotting the Prompt Landscape: AI engineers debated the merits of meta-prompting versus Chain of Thought (CoT), examining the potential of using mermaid diagrams to conserve tokens and enhance output quality. There was also a sharing of improved prompts like here, showcasing practical applications of advanced prompt engineering tactics.

Rubber Meets The Code: Practical discussions included how AI handles YAML, XML, and JSON formats natively, with suggestions on using these structures for prompts to improve AI understanding and performance, and shared resources pointing to real-life prompt application for generating code and planning.

Interactive Inconsistencies Ignite Inquiry: Users reported issues with ChatGPT ranging from its refusal to draw tarot cards to context drops and unresponsiveness, spotlighting the need for improved and more predictable AI behavior.


HuggingFace Discord

Voice Commands Meet Robotics: A demo video titled "Open Source Voice-Controlled Robotic Arm" exhibits a voice-activated AI robotic arm. The perspective of democratizing robotics technology via community collaboration was forwarded.

Bridging Modalities: Contributions on creating early multi-modal spaces point to the use of single models and possibly stacked models with routing functionalities. For insights on such implementation, a source link was shared, providing a model example with practical applications.

Deep Learning Consult on the Fly: A user consulted the community about overcoming common pain points in training a model using Stanford Cars Dataset, managing only a 60% accuracy using ViT-B_16, with struggles involving overfitting. Meanwhile, another member is looking for help on how to better their deep learning model, indicating an environment that supports knowledge exchange for newcomers.

Diffusers Update for Not-Just-Generation: Hugging Face announced its Diffusers library now supports tasks beyond generative models, such as depth estimation and normals' prediction through Marigold. The update suggests an escalating trend in the versatility of diffusion models and their applications.

Model Choices for Cyber Security Assessments: Analysis from researchers examines the aptitude of various large language models in cyber security contexts. This provides AI engineers an angle to consider the security ramifications inherent in the deployment of LLMs.

Robust SDXL Space Realignment: SDXL embed space discussions underscore that newly aligned spaces default to zeroes instead of an encoded space. Such insights reflect the underlying complexity and time demands associated with realigning models to new unconditioned spaces, revealing the intricate process behind the science.

Gradio Piques Curiosity with Upgraded Clients: The Gradio team announced a forthcoming live event to dive into the latest features of Gradio Python and JavaScript clients. The engagement invitation emphasizes Gradio's continuous push to streamline AI integration into diverse applications through enhanced interfaces.

Ambiguity in Finding an SFW Dataset: Community chatter touches on the difficulty of locating the Nomos8k_sfw dataset, which is tied to the 4x-Nomos8kDAT model, suggesting the dataset’s limited availability or obscure placement. This highlights the occasional challenges inherent to dataset procurement.

Launching Latest Tools for AI Storytelling: Typeface Arc emerges as a comprehensive platform for seamlessness in creating AI-driven content. It features a tool, appropriately dubbed "Copilot", designed to amplify content creation via an interactive experience pivotal for brand narratives.


LM Studio Discord

Visualize This: OpenAI Integrates with LLama!: Engineers can now leverage LLaVA for visual capabilities in LM Studio by deploying it on a server and making use of the Python vision template provided.

Speedy Model Loading on M1 Max: AI models like MLX and EXL2 load swiftly on Apple's M1 Max, taking a mere 5 seconds for L3 8bit, indicating superior performance compared to GGUF Q8 which takes 29 seconds.

LM Studio Finetuning Frustrations: Despite being a robust environment, LM Studio currently lacks the ability to directly fine-tune models, with enthusiasts being pointed to alternative solutions like MLX designed for Apple Silicon.

Budget or Bust: AI practitioners debated the value proposition of various Nvidia GPUs, considering alternatives like the Tesla P40/P100 and eagerly discussed rumored GPUs like the 5090 with anticipation.

Beta Testing Blues: As they navigate the waters of new releases, users reported problems such as Windows CPU affinity issues with large models and errors on AVX2 laptops, hinting at the complexities of configuring modern hardware for AI tasks.


Unsloth AI (Daniel Han) Discord


CUDA MODE Discord


Eleuther Discord


OpenRouter (Alex Atallah) Discord


Nous Research AI Discord


LangChain AI Discord

Loop-the-Loop in LangChain: Engineers are troubleshooting a LangChain agent entering continuous loops when calling tools; one solution debate involves refining the agent's trigger conditions to prevent infinite tool invocation loops.

Details, Please! 16385-token Error in LangChain 0.2.2: Users report a token limit error in LangChain version 0.2.2, where a 16385-token limit is incorrectly applied, despite models supporting up to 128k tokens, prompting a community-lead investigation into this discrepancy.

SQL Prompt Crafting Consultation: Requests for SQL agent prompt templates with few-shot examples have been answered, providing engineers with the resources to craft queries in LangChain more effectively.

Disappearing Act: Custom kwargs in Langserve: Some users experience a problem where custom "kwargs" sent through Langserve for logging in Langsmith are missing upon arrival, a concern currently seeking resolution.

Showcasing Applications: Diverse applications developed using LangChain were shared, including frameworks for drug discovery, cost-saving measures for logging, enhancements for flight simulators, and tutorials about routing logic in agent flows.


Modular (Mojo 🔥) Discord


Latent Space Discord


LlamaIndex Discord


LAION Discord

AI Reads Between the Lines: Members shared a laugh over SOTA AGI models' odd claims with one model's self-training assertion, "it has trained a model for us," tickling the collective funny bone. Musk's jab at CNNs—quipping "We don’t use CNNs much these days"—set off a chain of ironical replies and a nod towards vision transformer models as the new industry darlings.

Artificial Artist's Watermark Woes: Corcelio's Mobius Art Model is pushing boundaries with diverse prompts, yet leaves a watermark even though it's overtaking past models in creativity. Ethical dilemmas arose from the capability of image generation systems to produce 'inappropriate' content, sparking debate on community guidelines and systems' control settings.

Synthetic Sight Seeks Improvement: In an effort to grapple with SDXL's inability to generate images of "reading eyes," a member asked for collaborative help to build a synthetic database using DALLE, hoping to hone SDXL's capabilities in this nuanced visual task.

Patterns and Puzzles in Generative Watermarks: Observations within the guild pointed out a recurring theme of generative models producing watermarks, indicating possible undertraining, which was found both amusing and noteworthy among the engineers.

Elon's Eyeroll at CNNs Stokes AI Banter: Elon Musk's tweet sent a ripple through the community, sparking jests about the obsolete nature of CNNs in today's transformative AI methodologies and the potential pivot towards transformer models.


tinygrad (George Hotz) Discord

GPU Latency Predictions Without Benchmarks?: Engineers discussed the potential for symbolically modeling GPU latencies without running kernels by considering data movement and operation times, though complexities such as occupancy and async operations were recognized as potential confounders. There's also anticipation for AMD's open-source release of MES and speculation about quant firms using cycle accurate GPU simulators for in-depth kernel optimization.

Optimizing with Autotuners: The community explored kernel optimization tools like AutoTVM and Halide, noting their different approaches to performance improvement; George Hotz highlighted TVM's use of XGBoost and stressed the importance of cache emulation for accurate modeling.

Latency Hiding Mechanics in GPUs: It was noted that GPUs employ a variety of latency-hiding strategies with their ability to run concurrent wavefronts/blocks, thus making latency modeling more complex and nuanced.

Buffer Creation Discussions in Tinygrad: The #learn-tinygrad channel had members inquiring about using post dominator analysis in scheduling for graph fusion efficiency and the creation of LazyBuffer from arrays, with a suggestion to use Load.EMPTY -> Load.COPY for such scenarios.

Code Clarity and Assistance: Detailed discussions were had regarding buffer allocation and LazyBuffer creation in Tinygrad, with one member offering to provide code pointers for further clarification and understanding.


AI Stack Devs (Yoko Li) Discord


Cohere Discord


OpenAccess AI Collective (axolotl) Discord


Interconnects (Nathan Lambert) Discord


Datasette - LLM (@SimonW) Discord


Mozilla AI Discord

Link mentioned: granite-34b-code-instruct.llamafile


OpenInterpreter Discord


AI21 Labs (Jamba) Discord


MLOps @Chipro Discord


The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}