**Apple Intelligence is all you need.**

AI News for 6/7/2024-6/10/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (411 channels, and 7641 messages) for you. Estimated reading time saved (at 200wpm): 816 minutes.

With Apple Intelligence, Apple has claimed to leapfrog Google Gemma, Mistral Mixtral, Microsoft Phi, and Mosaic DBRX in one go, with a small ā€œApple On-Deviceā€ model (~3b parameters) and a ā€œlargerā€ Apple Server model (available with Private Cloud Compute running on Apple Silicon).

The Apple ML blogpost also briefly mentioned two other models - an Xcode code-focused model, and a diffusion model for Genmoji.

What appears to be underrated is the on-device model’s hot-swapping LoRAs with apparently lossless quantization strategy:

For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements. To maintain model quality, we developed a new framework using LoRA adapters that incorporates a mixed 2-bit and 4-bit configuration strategy — averaging 3.5 bits-per-weight — to achieve the same accuracy as the uncompressed models.

Additionally, we use an interactive model latency and power analysis tool, Talaria, to better guide the bit rate selection for each operation. We also utilize activation quantization and embedding quantization, and have developed an approach to enable efficient Key-Value (KV) cache update on our neural engines.

With this set of optimizations, on iPhone 15 Pro we are able to reach time-to-first-token latency of about 0.6 millisecond per prompt token, and a generation rate of 30 tokens per second. Notably, this performance is attained before employing token speculation techniques, from which we see further enhancement on the token generation rate.

We represent the values of the adapter parameters using 16 bits, and for the ~3 billion parameter on-device model, the parameters for a rank 16 adapter typically require 10s of megabytes. The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand while efficiently managing memory and guaranteeing the operating system’s responsiveness.

The key tool they are crediting for this incredible on-device inference is Talaria:

image.png

Talaria helps to ablate quantizations and profile model architectures subject to budgets:

image.png

image.png

Far from a God Model, Apple seems to be pursuing an ā€œadapter for everythingā€ strategy and Talaria is set to make it easy to rapidly iterate and track the performance of individual architectures. This is why Craig Federighi announced that Apple Intelligence only specifically applies to a specific set of 8 adapters for SiriKit and 12 categories of App Intents to start with: image.png image.png

Knowing that Apple designs for a strict inference budget, it’s also interesting to see how Apple self-reports performance. Virtually all the results (except instruction following) are done with human graders, which has the advantage of being the gold standard yet the most opaque:

image.png

The sole source of credibility of these benchmarks claiming to beat Google/Microsoft/Mistral/Mosaic is that Apple does not need to win in the academic arena - it merely needs to be ā€œgood enoughā€ to the consumer to win. Here, it only has to beat the low bar of Siri circa 2011-2023.


{% if medium == ā€˜web’ %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Andrej Karpathy’s New YouTube Video on Reproducing GPT-2 (124M)

  • Comprehensive 4-hour video lecture: @karpathy released a new YouTube video titled ā€œLet’s reproduce GPT-2 (124M)ā€, covering building the GPT-2 network, optimizing it for fast training, setting up the training run, and evaluating the model. The video builds on the Zero To Hero series.
  • Detailed walkthrough: The video is divided into sections covering exploring the GPT-2 checkpoint, implementing the GPT-2 nn.Module, making training fast with techniques like mixed precision and flash attention, setting hyperparameters, and evaluating results. The model gets close to GPT-3 (124M) performance.
  • Associated GitHub repo: @karpathy mentioned the associated GitHub repo contains the full commit history to follow along with the code changes step by step.

Apple’s WWDC AI Announcements

  • Lack of impressive AI announcements: @karpathy noted that 50 minutes into Apple’s WWDC, there were no significant AI announcements that impressed.
  • Rumors of ā€œApple Intelligenceā€ and OpenAI partnership: @adcock_brett mentioned rumors that Apple would launch a new AI system called ā€œApple Intelligenceā€ and a potential partnership with OpenAI, but these were not confirmed at WWDC.

Intuitive Explanation of Matrix Multiplication

  • Twitter thread on matrix multiplication: @svpino shared a Twitter thread providing a stunning, simple explanation of matrix multiplication, calling it the most crucial idea behind modern machine learning.
  • Step-by-step breakdown: The thread breaks down the raw definition of the product of matrices A and B, unwrapping it step by step with visualizations to provide an intuitive understanding of how matrix multiplication works and its geometric interpretation.

Apple’s Ferret-UI: Multimodal Vision-Language Model for iOS

  • Ferret-UI paper details: @DrJimFan highlighted Apple’s paper on Ferret-UI, a multimodal vision-language model that understands icons, widgets, and text on iOS mobile screens, reasoning about their spatial relationships and functional meanings.
  • Potential for on-device AI assistant: The paper discusses dataset and benchmark construction, showing extraordinary openness from Apple. With strong screen understanding, Ferret-UI could be extended to a full-fledged on-device assistant.

AI Investment and Progress

  • $100B spent on NVIDIA GPUs since GPT-4: @alexandr_wang noted that since GPT-4 was trained in fall 2022, around $100B has been spent collectively on NVIDIA GPUs. The question is whether the next generation of AI models’ capabilities will live up to that investment level.
  • Hitting a data wall: Wang discussed the possibility of AI progress slowing down due to a data wall, requiring methods for data abundance, algorithmic advances, and expanding beyond existing internet data. The industry is split on whether this will be a short-term impediment or a meaningful plateau.

Perplexity as Top Referral Source for Publishers

  • Perplexity driving traffic to publishers: @AravSrinivas shared that Perplexity has been the #2 referral source for Forbes (behind Wikipedia) and the top referrer for other publishers.
  • Upcoming publisher engagement products: Srinivas mentioned that Perplexity is working on new publisher engagement products and ways to align long-term incentives with media companies, to be announced soon.

Yann LeCun’s Thoughts on Managing AI Research Labs

  • Importance of reputable scientists in management: @ylecun emphasized that the management of a research lab should be composed of reputable scientists to identify and retain brilliant people, provide resources and freedom, identify promising research directions, detect BS, inspire ambitious goals, and evaluate people beyond simple metrics.
  • Fostering intellectual weirdness: LeCun noted that managing a research lab requires being welcoming of intellectual weirdness, which can be accompanied by nerdy personality weirdness, making management more difficult as truly creative people don’t fit into predictable pigeonholes.

Reasoning Abilities vs. Storing and Retrieving Facts

  • Distinguishing reasoning from memorization: @ylecun pointed out that reasoning abilities and common sense should not be confused with the ability to store and approximately retrieve many facts.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Model Developments and Benchmarks

AI Applications and Tools

  • Decentralized AI model tracker: In /r/LocalLLaMA, AiTracker.art, a torrent tracker for AI models, is presented as a decentralized alternative to Huggingface & Civitai.
  • LLM-powered compression: In /r/LocalLLaMA, Llama-Zip, an LLM-powered compression tool, is discussed for its potential to allow recovery of complete training articles from compressed keys.
  • Fast browser-based speech recognition: In /r/singularity, Whisper WebGPU showcases blazingly-fast ML-powered speech recognition directly in the browser.
  • Replacing OpenAI with local model: In /r/singularity, a post demonstrates replacing OpenAI with a llama.cpp server using just 1 line of Python code.
  • Semantic search for chess positions: In /r/LocalLLaMA, an embeddings model for chess positions is shared, enabling semantic search capabilities.

AI Safety and Regulation

  • Prompt injection threats: In /r/OpenAI, prompt injection threats and protection methods for LLM apps are discussed, such as training a custom classifier to defend against malicious prompts.
  • Concerns about sensitive data in models: In /r/singularity, a post argues that with tech companies scraping the internet for data, the odds that a public model has been trained on TOP SECRET documents is likely north of 99%.
  • Techniques to reduce model refusals: In /r/LocalLLaMA, Orthogonal Activation Steering (OAS) and ā€œabliterationā€ are noted as the same technique for reducing AI model refusals to engage with certain prompts.

AI Ethics and Societal Impact

  • AI in education: In /r/singularity, the use of AI in educational settings is discussed, raising questions about effective integration and potential misuse by students.

AI Hardware and Infrastructure

Memes and Humor


AI Discord Recap

A summary of Summaries of Summaries

  1. Multimodal AI and Generative Modeling Innovations:
  • Ultravox Enters Multimodal Arena: Ultravox, an open-source multimodal LLM capable of understanding non-textual speech elements, was released in v0.1. The project is gaining traction and hiring for expansion.
  • Sigma-GPT Debuts Dynamic Sequence Generation: σ-GPT provides dynamic sequence generation, reducing model evaluation times. This method sparked interest and debate over its practicality, with some comparing it to XLNet’s trajectory.
  • Lumina-Next-T2I Enhances Text-to-Image Models: The Lumina-Next-T2I model boasts faster inference speeds, richer generation styles, and better multilingual support, showcased in Ziwei Liu’s tweet.
  1. Model Performance Optimization and Fine-Tuning Techniques:
  • Efficient Quantization and Kernel Optimization: Discussions around CUDA Profiling Essentials recommended using nsys or ncu for in-depth kernel analysis. Techniques from NVIDIA Cutlass and BitBlas documentation showcased effective bit-level operations.
  • LLama-3 Finetuning Issues Fixed: Users reported resolving issues with LLama3 model finetuning by using vllm and shared related configurations in the axolotl forum.
  • GROUP Project: The project dealt with tackling fine-tuning vs RAG concepts and LR adjustments in the OpenAI and Eleuther community, with insights on benchmarks from Stanford and Git setups seen in GitHub.
  1. Open-Source AI Frameworks and Tools:
  • Rubik’s AI Beta Test Invitation: Users are invited to beta test Rubik’s AI, a new research assistant featuring models like GPT-4 Turbo, Claude-3 Opus. The platform promotes AI research advancements.
  • LSP-AI Enhances IDE Compatibility: A multi-editor AI language server to assist software engineers was highlighted, with community enthusiasm for its enhancing capabilities across platforms.
  • Integrating LangChain and Bagel: LangChain has integrated with Bagel, providing secure, scalable dataset management and highlighting advancements in integrating language models with external data.
  1. AI Community and Event Highlights:
  • AI Engineer World’s Fair Announcements: AI Engineer World’s Fair revealed new speakers, and tickets are sold out, indicating high engagement and interest within the community.
  • Innovative Projects and Meetups: The community spotlight features include intriguing projects like Websim.ai’s recursive exploration and notable meetups, such as the Lehman Trilogy event in SF, as shared by Nathan Lambert in the Interconnects discord.
  • ICLR 2024 Podcast and AI Summit Insights: Part 2 of the ICLR 2024 podcast was released, featuring discussions on benchmarks, agents, and more, enriching community knowledge and engagement.
  1. Technical Innovations and Discussions:
  • Multilingual Transcription Turmoil: Criticisms were shared on Whisper v3’s struggles with multilingual transcription in the OpenAI discord, sparking anticipation for future enhancements from OpenAI.
  • Security and API Token Management: Warning against use of Malicious ComfyUI node and advice on using environment variables for API token management was a shared concern.
  • Performance Analysis and Fine-Tuning: Discussions included optimizing large model training configurations, as seen in CUDA profiling, and the use of structured concurrency in programming languages like Mojo.

PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord

  • Spotify Perks on Shuffle: Engineers discussed various ways to obtain Spotify Premium, including a free perk from certain providers, without elaborating on specific costs or terms for these offers.
  • Regional Settings Unmasked: Techniques for regional prompting were explored, with suggestions including using IPAdapter with attention masks in ComfyUI and enquiries into similar functionalities with diffusers, but no clear consensus was reached on the best strategy.
  • Buzz Around Stable Diffusion 3: The community is eagerly awaiting Stable Diffusion 3 (SD3), with debates on expected features such as prompt obedience and image creation enhancements, alongside a focus on custom fine-tunes and post-release LoRas, acknowledging an overall atmosphere of cautious optimism.
  • LoRas Training Hiccups: Members shared their challenges and workarounds when training models and LoRas with tools like ComfyUI and Kohya SS GUI, further recommending alternative methods such as OneTrainer due to unspecified setup problems.
  • Security Alert in ComfyUI: A warning was raised about a malicious node in ComfyUI with the potential to siphon sensitive information, sparking a broader conversation on safeguarding against the risks associated with custom nodes in AI tools.

Perplexity AI Discord

  • AI Takeoff with SpaceX Starship Success: SpaceX notches a win with its fourth test flight of Starship, marking a step forward towards a fully reusable rocket system with first and second stages landing successfully. This achievement is detailed on Perplexity’s platform.
  • Starliner’s Rocky Road to ISS: Boeing’s Starliner faced glitches with five RCS thrusters during its ISS docking, potentially affecting mission timelines and showcasing the complexities of space hardware. The full report is available at NASA’s update.
  • Perplexity’s Puzzles and Progress: Users have critiqued the limited capability of AI travel planning on Perplexity AI, particularly with flight specifics, while others praise its new pro search features that improve result relevance. Concerns arise from community reports of content deindexing and accuracy issues with GPT-4 models. Controversies also swirl around claims of the Rabbit R1 device as a scam.
  • Geopolitical Tech Tension: Huawei’s Ascend 910B AI chip is stirring the silicon waters against Nvidia’s A100 with its impressive performance in training large language models, sparking both technology debates and geopolitical implications. Visit Perplexity’s update for details on the chip’s capabilities.
  • Perplexity API Quandaries: Inquiries and discussions focused on utilizing Perplexity API’s features, such as the unavailability of embedding generation and advice on achieving results akin to the web version, reflect the user needs for clear documentation and support. A specific issue with API credits was advised to be resolved via direct messaging, showing proactive community engagement.

LLM Finetuning (Hamel + Dan) Discord

  • Popcorn and Poisson at the Fine-Tuning Fair: Humorous discussions about predicting popcorn popping times using probabilistic models segued into analyses of the inverse Poisson distribution. Alongside, a member invited course mates to the AI Engineer World’s Fair, promising potential legendary status for anyone case studying popcorn kernels with the course repo.
  • Censorship and Performance Headline LLM Conversations: A Hugging Face blog post raising concerns about misinformation in Qwen2 Instruct led to discussions on the nuances of LLM performance and censorship, with a focus on the disparities of English versus Chinese responses. Elsewhere, LLama-3 model finetuning issues were resolved by deploying with vllm.
  • Fine-tuning Causing Frustrations: The process for accessing granted credits for platforms such as Hugging Face, Replicate, and Modal caused confusion, with several members not receiving expected amounts, resulting in some voicing disappointments and seeking resolution.
  • Modal’s Magic Met with Mixed Reactions: Members shared experiences of deploying models on Modal, ranging from calling it a ā€œmagical experienceā€ to struggles with permissions and volume ID errors, indicating a learning curve and growing pains with new deployment platforms.
  • Workshop Woes and Winning Techniques: Technical issues were discussed, including the partial loss of Workshop 4’s Zoom recording, resolved with a shared link to the final minutes. Discussions also celebrated Weights & Biases resources like a 10-minute video course and ColBERT’s new hierarchical pooling feature detailed in an upcoming blog post.
  • Finetuning vs. RAG Debate Unpacked: An interesting analogy was proposed between fine-tuning and RAG’s role in LLMs, juxtaposing the addition of static knowledge versus dynamic, query-specific information. However, this was met with some resistance, with one member working towards a more precise explanation of these complex concepts.
  • Accelerate Framework Testing Reveals Speed Differences: An AI engineer tested training configurations with accelerate, comparing DDP, FSDP, and DS(zero3), with DS(zero3) found to be the most vRAM efficient and second-fastest in a head-to-head comparison.
  • Global Check-ins and Local Hangouts: Members checked in from various locations globally with an impromptus meetup pitched for those in the San Francisco area, showing the community’s eagerness for connection beyond the digital realm.

Nous Research AI Discord

  • Dynamic Conversational Models: σ-GPT emerges as a game-changer, dynamically generating sequences at inference time, as opposed to GPT’s traditional left-to-right generation. Comparisons have been drawn to concepts extraction from GPT-4 as detailed in OpenAI’s blog, sparking conversations on methodology and applications.
  • High-Stakes Editing & Legal Discourse: The Krita stable diffusion plugin was recommended for those brave enough to tackle outpainting, and Interstice Cloud and Playground AI have been proposed as cost-effective solutions for mitigating GPU cloud costs. Meanwhile, the thread on SB 1047 prompted arguments over AI regulation and its implications for the sector’s vitality.
  • Schematics and Standards for Data: Members discussed JSON schemas for RAG datasets and championed more structured formats, such as a combination of relevance, similarity score, and sentiment metrics, to hone language models’ outputs. The integration of tools like Cohere’s retrieval system and structured citation mechanics was also examined, suggesting a preference for JSON representation for its simplicity and ease of use.
  • Revolutionizing Resource Constraints: Solutions for low-spec PCs, such as employing Phi-3 3b despite its limitations with code-related tasks, were shared. This points to a community concern for resources accessibility and optimization across various hardware configurations.
  • Methodology Throwdown: The prominence of HippoRAG, focused on clustering for efficient language model training, signified a shift toward optimizing information extraction processes, debated at length with a throwdown on best practices for model pruning and fine-tuning strategies with references to related works and tooling such as PruneMe.

Unsloth AI (Daniel Han) Discord

  • GGUF Glitch in Qwen Models: Engineers report that Qwen GGUF is causing ā€œblockyā€ text output, especially in the 7B model, despite some users running it successfully with tools like lm studio. The underperformance of Qwen models remains a subject of high-interest discussion.
  • Multi-Editor Language Server Enhancement: LSP-AI, a language server offering compatibility across editors like VS Code and NeoVim, was highlighted as a tool to augment, not replace, software engineers’ capabilities.
  • Simplifying Model Finetuning: Users appreciate the user-friendly Unsloth Colab notebook for continued pretraining, which streamlines the finetuning process, particularly for input and output embeddings. Relevant supports include the Unsloth Blog and repository.
  • Bit Warfare and Model Merging: Conversations delve into the distinctions between 4-bit quantization methods like QLoRA, DoRA, and QDoRA, and the finer points of model merging tactics using the differential weight strategy, illustrating community members’ adeptness with advanced ML techniques.
  • Noteworthy Notebook Network: The showcase channel features a notable array of Google Colab and Kaggle notebooks for prominent models including Llama 3 (8B), Mistral v0.3 (7B), and Phi-3, emphasizing the accessibility and collaborative spirit within the community.

CUDA MODE Discord

  • CUDA Profiling Essentials: Use nsys or ncu for CUDA profiling, and for in-depth analysis, focus on a single forward and backward pass, as shown in a kernel performance analysis video. For building a personal ML rig, consider CPUs like Ryzen 7950x and GPUs such as 3090 or 4090, with a note on AVX-512 support and trade-offs in server CPUs like Threadrippers and EPYCs.
  • Triton’s Rising Tide: The FlagGems project was highlighted for its use of Triton Language for large LLMs. Technical discussions included handling general kernel sizes, loading vectors as diagonal matrices, and seeking resources for state-of-the-art Triton kernels, available at this GitHub catalog.
  • Torched Discussions: To measure torch.compile accurately, subtract the second batch time from the initial pass; a troubleshooting guide is available. Explore Inductor performance scripts in PyTorch’s GitHub and consider using custom C++/CUDA operators as shown here.
  • Futurecasting with High-Speed Scans: Anticipation was built for a talk by guest speakers on scanning technologies, with an expectation of innovative insights.
  • Electronics Enlightened: An episode of The Amp Hour podcast guest-starring Bunnie Huang shed light on hardware design and Hacking the Xbox, available via Apple Podcasts or RSS.
  • Transitioning Tips: Members shared tips for transitioning to GPU-based machine learning, suggesting utilizing Fatahalian’s videos and Yong He’s YouTube channel for learning about GPU architecture.
  • Encoder Quests and GPT Guidance: While details about effective parameters search for encoder-only models in PyTorch weren’t provided, there was a shared resource to reproduce GPT-2. NVIDIA’s RTX 4060Ti (16GB) was suggested as an entry-level option for CUDA learning.
  • FP8’s Role in PyTorch: Conversations about using FPGA models and considerations for ternary models without matmul were supplemented by links to an Intel FPGA and a relevant paper. There was a call for better torch.compile and torchao documentation and benchmarks, with an eye on a new addition for GPT models in Pull Request #276.
  • Triton Topic came up twice: An interesting demo of ternary accumulation was linked with positive community feedback (matmulfreellm).
  • Lucent llm.c Chatter: Wide-ranging discussions on model training covered topics like hyperparameter selection, overlapping computations, dataset issues with FineWebEDU, and successes in converting models to Hugging Face formats with detailed scripts.
  • Bits and Bitnet: Techniques using differential bitcounts prompted both curiosity and debugging efforts. FPGA costs were compared to A6000 ADA GPUs for speed, while NVIDIA’s Cutlass was confirmed to support nbit bit-packing including with uint8 formats (Cutlass documentation). Additionally, benchmark results for BitBlas triggered discussions around matmul fp16 performance variances.
  • ARM Ambitions: A brief mention noted that discussions likely pertain to ARM server chips as opposed to mobile processors, with a link to a popular YouTube video as a reference point.

HuggingFace Discord

  • Big Models, Big Discussions: Engineers debated the computational requirements for 2 billion parameter models, with an acknowledgment that systems with 50GB may not suffice, potentially needing more than 2x T4 GPUs. The API debate highlighted confusion over costs and access, with criticism aimed at OpenAI’s platform being dubbed ā€œclosedAI.ā€
  • Battle of the Tech Titans: Nvidia’s market dominance was acknowledged despite its ā€œlocked-in ecosystem,ā€ with its AI chip innovations and gaming industry demands keeping it essential in technology leadership.
  • Security Tips for API Tokens: An accidental email token exposure led to recommendations for using environment variables to enhance security in software development.
  • The Power of AI in Simulations: Members were introduced to resources such an AI Summit YouTube recording showcasing AI’s use in physics simulations, and were invited to an event on model collapse prevention by Stanford researchers.
  • New Ventures in Machine Learning: A host of AI tools and developments were shared, including Torchtune for LLM fine-tuning, Ollama for versatile LLM use, Kaggle datasets for image classification, and FarmFriend for sustainable farming.
  • Cutting Edge AI Creations: Innovations in AI space included the launch of Llama3-8b-Naija for Nigerian-contextual responses, SimpleTuner v0.9.6.3 for multiGPU training enhancements, Visionix Alpha for improved aesthetics in hyper-realism, and Chat With ā€˜Em for conversing with various models from different AI companies.
  • CV and NLP Advances Showcased: Highlights included a discussion on the efficient implementation of rotated bounding boxes, Gemini 1.5 Pro’s superiority in video analysis, and a semantic search tool for CVPR 2024 papers. In NLP, topics ranged from building RAG-powered chatbots to AI-powered resume generation with MyResumo, and inquiries about model hosting and error handling in PyTorch versus TensorFlow.
  • Diffusion Model Dynamics: The discussion centered around training Conditional UNet2D models with shared resources, utilizing SDXL for image text imprinting, and the curiosity about calculating MFU during training, leading to suggestions for repository modifications.

LM Studio Discord

New Visualization Models Still In Queue: No current support exists in LM Studio for generating image embeddings; users are recommended to look at daanelson/imagebind or await future releases from nomic and jina.

Chill Out, Tesla P40!: For cooling the Tesla P40, community suggestions ranged from using Mac fans to a successful attempt with custom 3D printed ducts, with one user directing to a Mikubox Triple-P40 cooling guide.

Crossing the Multi-GPU Bridge: Discussions highlighted that while LM Studio is falling behind in efficient multi-GPU support, ollama exhibits more competent handling, prompting users to seek better GPU utilization methods.

Tackling Hardware Compatibility: From dealing with the injection of AMD’s ROCm into Windows applications to navigating driver installation for the Tesla P40, users shared experiences and solutions including isolation techniques from AMD documentation.

LM Studio Awaiting Smaug’s Tokenizer: The next release of LM Studio is set to include BPE tokenizer support for Smaug models, while members are also probing into options for directing LMS data to external servers.


OpenAI Discord

  • iOS Steals the AI Spotlight: OpenAI announced a collaboration with Apple for ChatGPT integration across iOS, iPadOS, and macOS platforms, slated for a release later this year, stirring excitement and discussions about the implications for AI in consumer tech. Details and reactions can be found in the official announcement.
  • Multilingual Transcription Turmoil and Apple AI Advances: There’s buzz over Whisper version 3 struggling with multilingual transcription, with users clamoring for the next version, and Apple’s ā€˜Apple Intelligence’ promising to boost AI in the iPhone 16, potentially necessitating hardware upgrades for optimization.
  • Image Token Economics and Agent Aggravation: On the economical side, debates are heating up over the cost-efficiency of API calls for tokenization of 128k contexts and image processing, while on the technical side users expressed frustration with GPT agents defaulting to GPT-4o leading to suboptimal performance.
  • Custom GPTs and Voice Mode Vexations: AI enthusiasts are dissecting the private nature of custom GPTs, effectively barred from external OpenAPI integrations, alongside voiced confusion and impatience regarding the slow rollout of the new voice mode for Plus users.
  • HTML and AI Code Crafting Challenges: Discussions centered on the struggles to get ChatGPT to output minimalist HTML, improving summary prompts, using Canva Pro for image text editing, understanding failure points of large language models, and generating Python scripts to convert hex codes into Photoshop gradient maps, indicating areas where tooling and instructions may need honing.

Eleuther Discord

  • GPU Poverty Solved by CPU Models: Engineers discuss workarounds for limited GPU resources, considering sd turbo and CPU-based solutions to reduce waiting times with one stating the experience still ā€œworth it.ā€
  • Fixed Seeds Combat Local Minima: In the debate over fixed vs. random seeds in neural network training, some prefer setting a manual seed to fine-tune parameters and escape local minima, emphasizing that ā€œthere is always a seed.ā€
  • MatMul Operations Get the Boot: An arXiv paper presenting MatMul-free models up to 2.7B parameters incites discussion, suggesting such models maintain performance while potentially reducing computational costs.
  • Diffusion Models: Whispering Sweet Nothings to NLP?: A shift towards using diffusion models for enhancing LLMs is on the table, with references such as this survey paper spurring dialogue on the topic.
  • Hungary Banks on AI Safety: The viability of a $30M investment in AI safety research in Hungary is analyzed, highlighting the importance of not wasting funds and considering cloud-based resources for computational needs.
  • RoPE Techniques to the Rescue: Discourse in the research channel reveals enthusiasm for implementing Relative Position Encodings (RoPE) to improve non-autoregressive models, with members proposing various initializations like interpolating weight matrices for model scale-up and SVD for LoRA initialization.
  • Pruning the Fat Off Models: An engineer successfully cuts down Qwen 2 72B to 37B parameters using layer pruning, showcasing efficiency without sacrificing performance.
  • Interpretability: The New Frontier: There’s a resurgence in interest in TopK activations, and a project exploring MLP neurons in Llama3 is highlighted, with resources found on neuralblog and GitHub.
  • MAUVE of Desperation: A member seeks help with the MAUVE setup, highlighting complexities faced during installation and usage for evaluating new sampling methods.

Modular (Mojo šŸ”„) Discord

  • MAC Installation Snags a Hitch on MacOS: Engineers installing MAX on MacOS 14.5 Sonoma faced challenges that required manual interventions, with solutions involving setting Python 3.11 via pyenv, as described in Modular’s official installation guide.
  • Deliberating Concurrency in Programming: A debate on structured concurrency versus function coloring in programming languages ensued, with effect generics proposed as a solution, although they make language writing more complex. Discussions also extended to concurrency primitives in languages like Erlang, Elixir, and Go, and the potential for Mojo to design ground-up solutions for these paradigms.
  • Maximize Your Mojo: Insights into the Mojo language covered topics such as quantization in the MAX platform with GGML k-quants and pointers to existing documentation and examples, like the Llama 3 pipeline. Additionally, context managers were advocated over a potential defer keyword due to their clean resource management, especially in the Python ecosystem.
  • Updates Unrolled from Modular: Recent development updates included video content, with Modular releasing a new YouTube video that’s likely crucial for followers. Another resource highlighted is a project from Andrej Karpathy, shared via YouTube, speculated to be of interest to the community.
  • Engineering Efficacies in New Releases: Nightly releases of the Mojo compiler showed advancements with updates to versions 2024.6.805, 2024.6.905, and 2024.6.1005, with changelogs accessible for community review here. These iterative releases shape the continuous improvement narrative in the modular programming landscape.

OpenInterpreter Discord

Gorilla OpenFunctions v2 Matches GPT-4: Community members have been discussing the capabilities of Gorilla OpenFunctions v2, noting its impressive performance and capability to generate executable API calls from natural language instructions.

Local II Launches Local OS Mode: Local II has announced support for local OS mode, enabling potential live demos, interest can be pursued via pip install --upgrade open-interpreter.

Technical Issues with OI Models Surface: Users have reported various issues with OI models, including API key errors and problems with vision models like moondream. Exchanges in troubleshooting suggest ongoing fixes and improvements.

OI’s iPhone and Siri Milestones: A breakthrough has been reached with the integration of Open Interpreter and iPhone’s Siri, allowing voice commands to execute terminal functions, with a tutorial video for reference.

Raspberry Pi and Linux User Hacks and Needs: Attempts to run O1 on Raspberry Pi have encountered resource issues, but there is determination to find solutions. Requests for a Linux installation tutorial indicate a broader desire for cross-platform support.


Latent Space Discord

  • Ultravox Enters the Stage: Ultravox, a new open source multimodal LLM that understands non-textual speech elements, was released in a v0.1. Hiring efforts are currently underway to expand its development.
  • OpenAI Hires New Executives: OpenAI marked its twitter with news of a freshly appointed CFO and CPO—Friley and Kevin Weil, enhancing the organization’s leadership team.
  • Perplexity Under Fire for Content Misuse: Perplexity has attracted criticism, including from a tweet by @JohnPaczkowski, for repurposing Forbes content without appropriate credit.
  • Apple’s AI Moves with Cloud Compute Privacy: Apple’s recent announcement about ā€œPrivate Cloud Computeā€ aims to offload AI tasks to the cloud securely while preserving privacy, igniting broad discussions across the engineering community.
  • ICLR Podcast and AI World’s Fair Updates: The latest ICLR podcast episode delved into code edits and the fusion of academia and industry, while the AI Engineer World’s Fair listed new speakers and acknowledged selling out of sponsorships and Early Bird tickets.
  • Websim.ai Sparks Recursive Chaos and Creativity: A discovery of the live-streaming facial recognition website led to members spiraling websim.ai into itself recursively, crafting a greentext generator, and sharing a spreadsheet of resources which captured the innovative spirit and curiosity in exploring Websim’s new frontiers.

Cohere Discord

  • Cohere’s Command R Models Take the Lead: Latest conversations reveal that Cohere’s Command R and R+ models are considered state-of-the-art and users are utilizing them on cloud platforms such as Amazon SageMaker and Microsoft Azure.
  • Innovating AI-Driven Roleplay: The ā€œreply_to_userā€ tool is recognized for enhancing in-character responses in AI roleplaying, specifically in projects like Dungeonmasters.ai, indicating a shift towards more contextual interaction capabilities.
  • Diverse Cohere Community Engaged: Newcomers to the Cohere community, including a Brazilian Jr NLP DS and an MIT graduate, are sharing their enthusiasm for projects involving NLP and AI, suggesting a vibrant and diverse environment for collaborative work.
  • Shaping AI Careers and Projects: Members’ project discussions are shedding light on the role of the Cohere API in improving performance, as acknowledged by positive feedback in areas requiring AI-integration, indicating a beneficial partnership for developers.
  • Cohere’s SDKs Broaden Horizons: The Cohere SDKs’ compatibility with multiple cloud services like AWS, Azure, and Oracle has been announced, enhancing flexibility and development options as detailed in their Python SDK documentation.

LAION Discord

σ-GPT Paves the Way for Efficient Sequence Generation: A novel method called σ-GPT was introduced, offering dynamic sequence generation with on-the-fly positioning, showing strong potential in reducing model evaluations across domains like language modeling (read the σ-GPT paper). Despite its promise, concerns were raised about its practicality due to a necessary curriculum, likening it to the trajectory of XLNET.

Challenges in AI Reasoning Exposed: An investigation into transformer embeddings revealed new insights on discrete vs. continuous representations, shedding light on pruning possibilities for attention heads with negligible performance loss (Analyzing Multi-Head Self-Attention paper). Additionally, a repository with prompts targeted to test LLMs’ reasoning ability was shared, pinpointing training data bias as a key reason behind model failures (MisguidedAttention GitHub repo).

Crypto Conversation Sparks Concern: Payment for AI compute using cryptocurrency spurred mixed reactions, with some seeing potential and others skeptical, labeling it as a possible scam. A warning followed about the ComfyUI_LLMVISION node’s potential to harvest sensitive information, urging users who interacted with it to take action (ComfyUI_LLMVISION node alert).

Advancements and Issues in AI Showcased: The group discussed the release of Lumina-Next-T2I, a new text-to-image model lauded for its enhanced generation style and multilingual support (Lumina-Next-T2I at Hugging Face). In a more cautionary tale, the misuse of children’s photos in AI datasets hit the spotlight in Brazil, revealing the darker side of data sourcing and public obliviousness to AI privacy matters (Human Rights Watch report).

WebSocket Woes and Pre-Trained Model Potentials: On the technical troubleshooting front, tips for diagnosing generic websocket errors were shared alongside the peculiar persistent lag observed in a Text-to-Speech (TTS) service websocket. For project enhancements, the use of pre-trained instruct models with extended context windows came recommended, specifically for incorporating the Rust documentation into the model’s training regime.


LlamaIndex Discord

  • Graph Gurus Gather: A workshop focused on advanced knowledge graph RAG is scheduled for Thursday, 9am PT, featuring Tomaz Bratanic from Neo4j, covering LlamaIndex property graphs and graph querying techniques. Interested participants can sign up here.
  • Coding for Enhanced RAG: A set of resources including integrating sandbox environments, building agentic RAG systems, query rewriting tips, and creating fast voicebots were recommended to improve data analysis and user interaction in RAG applications.
  • Optimizing Efficiency and Precision in AI: Discussions emphasized strategies to increase the chunk_size in the SimpleDirectory.html reader and manage entity resolution in graph stores, with references to LlamaIndex’s documentation on storing documents and optimizing processes for scalable RAG systems.
  • LlamaParse Phenomena Fixed: Temporary service interruptions with LlamaParse were promptly resolved by the community, ensuring an uninterrupted service for users relying on this tool for parsing needs.
  • QLoRA Quest for RAG Enhancement: Efforts are underway to develop a dataset from a phone manual, leveraging QLoRA to train a model with an aim to improve RAG performance.

OpenRouter (Alex Atallah) Discord

  • A Trio of New AI Models Hit the Market: Qwen 2 72B Instruct shines with language proficiency and code understanding, while Dolphin 2.9.2 Mixtral 8x22B emerges with a usage challenge at $1/M tokens, dependent on a 175 million tokens/day use rate. Meanwhile, StarCoder2 15B Instruct opens its doors as the first self-aligned, open-source LLM dedicated to coding tasks.
  • Supercharging Code with AI Brushes: An AI-enhanced code transformation plugin for VS Code, utilizing OpenRouter and Google Gemini, arrives free of charge, promising to revolutionize coding by harnessing the top-performing models in the Programming/Scripting category as seen in these rankings.
  • E-Money Meets Crypto in Payment Talks: The community engages in discussions on adopting both Google Pay and Apple Pay for a streamlined payment experience, with a nod towards incorporating cryptocurrency payments as a nod to decentralized options.
  • Mastering JSON Stream Challenges: Engineers exchange strategies for handling situations where streaming OpenRouter chat completions only deliver partial JSON responses; a running buffer gets the limelight alongside insights from an illustrative article.
  • Navigating Bias and Expanding Languages: An examination of censorship and bias within LLMs centers on a comparison between Chinese and American models, detailed in ā€œAn Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instructā€, while the community calls for better language category evaluations in model proficiency, aspiring for more granular support of languages like Czech, French, and Mandarin.

Interconnects (Nathan Lambert) Discord

Apple Intelligence: Not Just a Siri Update: Nathan Lambert highlighted Apple’s ā€œpersonal intelligence,ā€ which may reshape Siri’s role beyond being a voice assistant. Despite initial confusion over OpenAI’s role, lambert acknowledges the Apple Intelligence system as an important move towards ā€œAI for the rest of us.ā€

RL Community Examines SRPO Initiative: A paper from Cohere on SRPO has generated discussion, introducing a new offline RLHF framework designed for robustness in out-of-distribution tasks. The technique uses a min-max optimization and is shown to address task dependency issues inherent in previous RLHF methods.

Dwarkesh Podcast Anticipation Climbs: The upcoming episode of Dwarkesh Patel with FranƧois Chollet is awaited with interest due to Chollet’s distinct perspectives on AGI timelines. This counters the usual optimism and may provide compelling contributions to AGI discourse.

Daylight Computer: Niche but Noteworthy: The engineering community expressed curiosity over the Daylight Computer, noting its attempts to reduce blue light exposure and aid visibility in direct sunlight. Meanwhile, there’s healthy skepticism around the risks associated with being an early adapter of such novel tech.

Open Call for RL Model Review: Nathan Lambert offered to provide feedback for Pull Requests on the unproven method from a recent paper discussed in the RL channel. This indicates a supportive environment for testing and validation in the community.


LangChain AI Discord

Markdown Misery and Missing Methods: Engineers reported a problem where a 25MB markdown file ran indefinitely during processing in LangChain, without a proposed solution, as well as issues with using create_tagging_chain() due to prompts getting ignored, which indicates potential bugs or gaps in documentation.

Secure Your Datasets with LangChain and Bagel: LangChain’s new integration with Bagel introduces secure, scalable management for datasets with advancements highlighted in a tweet, potentially bolstering infrastructure for data-intensive applications.

Document Dilemmas: Discussions centered on loading and splitting documents for LangChain use, emphasizing the technical finesse required for different document types like PDFs and code files, providing an avenue for optimization in pre-processing for improved language model performance.

API Ambiguities: A lone voice sought clarifications on how to use api_handler() in LangServe without resorting to add_route(), specifically aiming to implement playground_type=ā€œdefaultā€ or ā€œchatā€ without guidance.

AI Innovations Invite Input: Community members have been invited to beta test the new advanced research assistant, Rubik’s AI, with access to models such as GPT-4 Turbo, and also check out other community projects like a visualization tool for journalists, an audio news briefing service, and a multi-model chat platform on Hugging Face, reflecting vibrant development and testing activity.


OpenAccess AI Collective (axolotl) Discord

  • Pip Pinpoint: Engineers found that installing packages separately with pip3 install -e '.[deepspeed]' and pip3 install -e '.[flash-attn]' avoids RAM overflow, a useful tip when working in a new conda environment with Python 3.10.
  • Axolotl’s Multimodal Inquiry: Multimodal fine-tuning support queried for axolotl; reference made to an obsolete Qwen branch, pointing to potential revival or update needs.
  • Dataset Load Downer: Members have reported issues with dataset loading, where filenames containing brackets may cause datasets.arrow_writer.SchemaInferenceError; resolving naming conventions is imperative for seamless data processing.
  • Learning Rate Lifesaver: A reiteration on effective batch size asserts that learning rate adjustments are key when altering epochs, GPUs, or batch-related parameters, as per guidance from Hugging Face to maintain training stability and efficiency.
  • JSONL Journey Configured: Configuration tips shared for JSONL datasets, which entail specifying paths for both training and evaluation datasets; this includes paths to alpaca_chat.load_qa and context_qa.load_v2, aiding in better data handling during model training.

tinygrad (George Hotz) Discord

  • PyTorch’s Code Gets Mixed Reviews: George Hotz reviewed PyTorch’s fuse_attention.py, applauding its design over UPat but noting its verbosity and considering syntax enhancements.
  • tinygrad Dev Seeks Efficiency Boost: A beginner project in tinygrad aims to expedite the pattern matcher, with a benchmark to ensure correctness set by process replay testing.
  • Dissecting the ā€˜U’ in UOp: ā€œMicro opā€ is the meaning behind the ā€œUā€ in UOp as clarified by George Hotz, countering any other potential speculations within the community.
  • Hotz Preps for the European Code Scene: George Hotz will discuss tinygrad at Code Europe; he has accepted community suggestions to tweak the final slide of his talk to heighten audience interaction.
  • AMD and Nvidia’s GPU Specs for tinygrad: AMD GPUs require a minimum spec of RDNA, while Nvidia’s threshold is the 2080 model; HIP or OpenCL suggested as alternatives to the defunct HSA. RDNA3 GPUs are verified as compatible.

AI Stack Devs (Yoko Li) Discord

  • Game Design Rethink: Serverless Functions Lead the Way: Key discussions focused on Convex architecture’s unique serverless functions for game loops in http://hexagen.world, contrasting with the memory and machine dependency of older gaming paradigms. Scalability is enhanced through distributed functions, enabling efficient backend scaling while ensuring real-time client updates via websocket subscriptions.
  • AI Town Architecture Unpacked: Engineers interested in AI and CS are recommended to explore the deep dive offered in the AI Town Architecture document, which serves as an insightful resource.
  • Multiplayer Sync Struggles: The latency issues inherent in multiplayer environments were highlighted as a challenge for providing optimal competitive experiences within Convex-backed game architectures.
  • Confounding Convex.json Config Conundrum: Users reported perplexity over a missing convex.json config file and faced a backend error indicating a possible missing dependency with the message, ā€œRecipe convex could not be run because just could not find the shell: program not found.ā€
  • Hexagen Creator Makes an Appearance: The creator of the serverless function-driven game, http://hexagen.world, acknowledged the sharing of their project within the community.

AI21 Labs (Jamba) Discord

  • Agentic Architecture: A Mask, Not a Fix: Discussions surfaced about ā€œagentic architectureā€ merely masking rather than solving deeper problems in complex systems, despite hints like Theorem 2 suggesting mitigation is possible.
  • Structural Constraints Cripple Real Reasoning: Engineers highlighted that architectures such as RNNs, CNNs, SSMs, and Transformers struggle with actual reasoning tasks due to their inherent structural limits, underlined by Theorem 1.
  • Revisiting Theoretical Foundations: A member voiced intentions to revisit a paper to better understand the communicated limitations and the communication complexity problem found in current model architectures.
  • Communication Complexity and Theorem 1 Explored: The concept of communication complexity in multi-agent systems was unpacked with Theorem 1 illustrating the requirement of multiple communications for accurate computations, which can lead to agents generating hallucinated results.
  • Deep Dive into Paper Planned: There’s a plan to reread and discuss intricacies of the referenced paper, particularly regarding Theorem 1’s insights on function composition and communication challenges in multi-agent systems.

Datasette - LLM (@SimonW) Discord

  • Leaderboards Spur Release Strategy: A member speculated that a recent release was strategically done to foster more research and to gain a foothold on industry leaderboards, emphasizing its utility for further analysis and benchmarking.
  • UMAP Applauded for Clustering Excellence: UMAP received praise for its exceptional clustering performance by a guild member, who recommended an insightful interview with UMAP’s creator for those interested in the technical depth of this tool.
  • Deep Dive with the Mind Behind UMAP: A YouTube interview titled ā€œMoving towards KDearestNeighbors with Leland McInnes - creator of UMAPā€ was highlighted, offering rich discussion on the intricacies of UMAP and its related projects like PyNNDescent and HDBScan, straight from the creator, Leland McInnes.

Torchtune Discord

  • No KL Plots in DPO Experiment?: Members discussed that KL plots were not utilized during the DPO implementation experiment for Torchtune. For those interested in KL plots usage, they can refer to the KL plots in TRL’s PPO trainer on GitHub.

DiscoResearch Discord

  • Bitsandbytes Query Throws Curveball: A member reported difficulty evaluating a bitsandbytes model with lighteval, where the command line tool didn’t recognize the bitsandbytes method and instead requested GPTQ data.
  • Efficiency Seekers in Document Packing: The Document Packing strategy was brought into question by a member curious if the implementation was used practically or if it was merely a simple example. They emphasized the importance of an efficient strategy for handling large datasets and probed into the tokenized_documents data type specifics.

MLOps @Chipro Discord

  • Chip Huyen Spotted at Databricks Event: Renowned engineer Chip Huyen is attending the Mosaic event at the Databricks summit, providing an opportunity for peer interaction and networking. Attendees are invited to meet and discuss current MLOps trends.

The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == ā€˜web’ %}

Stability.ai (Stable Diffusion) ā–· #general-chat (1091 messagesšŸ”„šŸ”„šŸ”„):

  • Spotify Subscription Methods Discussed: Various members exchanged information about Spotify’s subscription methods, with one noting that they receive Spotify Premium for free while another described offers involving different costs.
  • Regional Prompting Techniques: A discussion on the best methods for regional prompting took place, with one suggesting using IPAdapter with attention masks in ComfyUI, and another person curious about achieving this with diffusers.
  • Anticipation for SD3 Grows: Many expressed excitement and impatience for the upcoming release of Stable Diffusion 3 (SD3), debating its features and improvements such as better prompt following and enhanced image creation capabilities. The general consensus is cautious optimism and anticipation for custom fine-tunes and Loras post-release.
  • Challenges with Training Models and LoRas: A recurring topic involved difficulties and technical hurdles faced while trying to train models and LoRas using tools like ComfyUI and Kohya SS GUI, with users troubleshooting installation issues and sharing alternative approaches such as OneTrainer.
  • Concerns Over ComfyUI Malware: A warning about a malicious node in ComfyUI was highlighted, cautioning users that the malware could steal sensitive information. This led to a discussion on maintaining security while using custom nodes in various UI settings.

Links mentioned:


Perplexity AI ā–· #general (905 messagesšŸ”„šŸ”„šŸ”„):

  • AI-Powered Travel Planning Struggles: Users expressed frustration with AI travel planning, especially generating exact flight details. One user noted, ā€œNo matter what I try, it won’t tell me plane ticket detailsā€ (source).
  • Perplexity AI’s Enhance Features: Members discussed new pro search features that offer multi-step search, improving the relevance of results (source).
  • Issues with Perplexity Pages Indexing: Several users reported their Perplexity Pages being deindexed, suspecting it affects only non-staff articles (source).
  • Debate over GPT-4 Models: Members debated accuracy and hallucination issues with GPT-4o model, noting it sometimes corrects to GPT-4 mistakenly. User shared, ā€œGPT4o does not know that GPT4o is a thingā€ (source).
  • Rabbit Device Controversy: Users warned against the Rabbit R1 device, labeling it as a scam based on user experiences and investigations like Coffeezilla’s videos (source).

Links mentioned:


Perplexity AI ā–· #sharing (26 messagesšŸ”„):

  • Boeing Starliner faces RCS thruster issues during ISS docking: During the Starliner’s approach to the ISS, five out of its 28 RCS thrusters malfunctioned, causing the spacecraft to miss its initial docking attempt. NASA reported that sensor values on the affected thrusters registered slightly above normal limits.
  • SpaceX Successfully Lands Starship: SpaceX achieved a significant milestone with the successful fourth test flight of its Starship mega-rocket. The mission saw both its first and second stages complete successful splashdowns, marking progress toward a fully reusable rocket system. Read more.
  • Massive Cyber Attack Hits Niconico Services: Niconico services suffered a large-scale cyber attack, leading to a temporary shutdown. Dwango is undertaking emergency maintenance, but full recovery is expected to take several days.
  • Israel Rescues Hostages from Hamas: Israeli forces conducted a daring daytime raid to rescue four hostages held by Hamas in Gaza. The mission was part of a major offensive in central Gaza, reportedly leading to numerous casualties. Learn more.
  • Huawei’s Ascend 910B AI Chip Challenges Nvidia A100: Huawei’s new Ascend 910B AI chip has emerged as a strong competitor to Nvidia’s A100, achieving 80% of its efficiency in training large language models and surpassing it in other tests. The chip’s release has sparked technological and geopolitical debates.

Links mentioned:


Perplexity AI ā–· #pplx-api (19 messagesšŸ”„):

  • Inquiry about return_images parameter: A user asked whether the return_images parameter is the method by which LLaVA returns images. No further information was provided on this topic.
  • Getting same quality results as Perplexity web version: A member asked what model to use to achieve the same quality results as the Perplexity web version. Another member responded with links to Discord resources and additional guidance.
  • Embeddings with Perplexity API not possible: A user inquired about generating embeddings with the Perplexity API. A response clarified, ā€œHi, no it’s not possibleā€.
  • API credits issue resolved by DM: Multiple attempts were made by a user to resolve an issue of API credits not being added despite purchasing a subscription. Resolution was suggested by asking the user to direct message a specific account with their email address.
  • Seeking help integrating external web search with custom GPT: A user faced challenges integrating external web search abilities (like Serper, Tavily, and Perplexity API) into custom GPT actions to improve accuracy over the built-in search. They referenced an outdated Perplexity API article for help.

Link mentioned: Perplexity API with Custom GPT: no description found


LLM Finetuning (Hamel + Dan) ā–· #general (64 messagesšŸ”„šŸ”„):

  • Misinfo in Qwen2 Instruct: A member highlighted concerning censorship and outright misinformation in Qwen2 Instruct, especially with subtle differences between English and Chinese responses. They plan to share more in a Hugging Face blog post.
  • Llama-3 Abliteration: Members discussed using the Abliterator library on different LLMs to mitigate refusals, with links shared for FailSpy’s Llama-3-70B-Instruct and Sumandora’s project.
  • Finetuning Visual Models: There was interest in fine-tuning a visual language model called Moondream, with a relevant GitHub notebook shared for guidance on the process.
  • ā€œGPT-2 The Movieā€ Drops: Members were excited about the release of a YouTube video titled ā€œGPT-2 The Movieā€, which covers the entire process of reproducing GPT-2 (124M) from scratch. The video was highly praised for its comprehensive content.
  • Model Size Heuristics: A member asked about choosing model sizes for fine-tuning based on task complexity and hinted at the importance of developing heuristics or a sense for different model capabilities (e.g., 8B vs 70B) to streamline rapid prototyping.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #workshop-1 (4 messages):

  • Feeling Behind: One member shared feeling behind due to traveling and being on vacation, echoing sentiments of others in the group.
  • Homework 1 Use Cases: A member posted several use cases for Homework 1: large-scale text analysis, fine-tuning models for low-resource Indic languages, creating a personal LLM mimicking their conversation style, and building an evaluator/critic LLM for specific use cases. They referenced Ragas-critic-llm-Qwen1.5-GPTQ as an inspiration.
  • RAG over Fine-Tuning for Updates: In response to a question about updating LLMs with new policies, it was highlighted that RAG (Retrieval-Augmented Generation) is preferable over fine-tuning. Fine-tuning would require removing outdated training data and integrating new policies, which is complex and less efficient than RAG.

LLM Finetuning (Hamel + Dan) ā–· #🟩-modal (25 messagesšŸ”„):

  • Assign Credits Issues Addressed: Multiple users reported issues with access to their credits. Charles directed them to review information here and offered his email for further assistance if needed.
  • Docker Container Setup Workaround: A user struggled with Docker container setup due to the modal setup command requiring a web browser. Charles suggested using modal token set with a pre-generated token from the web UI as a solution.
  • Modal Environments for Workspace Management: A user inquired about running different demos in multiple workspaces. Charles recommended using Modal environments to deploy multiple app instances without changing the code.
  • Network Mounting Modal Volumes Locally: A user asked about network mounting Modal volumes. Charles suggested using modal volume and modal shell commands for local manipulation and exploration.
  • GPU Limit Exceed Requests: Santiago enquired about exceeding the GPU limit. Charles asked him to DM on Modal Slack with details about his requirements.

Link mentioned: Environments: Environments are sub-divisons of workspaces, allowing you to deploy the same app (or set of apps) in multiple instances for different purposes without changing your code. Typical use cases for environ…


LLM Finetuning (Hamel + Dan) ā–· #learning-resources (1 messages):

yxzwayne: https://arxiv.org/pdf/2402.17193 this is gonna be hard to swallow


LLM Finetuning (Hamel + Dan) ā–· #hugging-face (55 messagesšŸ”„šŸ”„):

  • Credits Misunderstanding Cleared Up: One user clarified with another that the credits they received do not expire at the end of June, contrary to their initial misunderstanding. Another user confirmed this, saying, ā€œNo they won’t.ā€
  • Head-scratcher with Mistral 7B Deployment: A member faced difficulties deploying a gated Mistral 7B model despite having access, receiving a detailed error message. The suggestion to add an env variable HF_TOKEN resolved the issue successfully.
  • Form Filling Confusions Abound: Multiple users reported issues with not receiving Hugging Face credits despite filling out the form on time. They were asked to provide their HF email and username for verification.
  • Reminders for Form Completion: A new form was announced for those who missed the first round, with a deadline of one week. Members were reminded to fill it out and include specific details to receive their credits.
  • Debugging Support and Tokens: A humorous note ended a debugging session successfully. The user thanked the ā€˜debugging llama’ emoji along with the HF token suggestion that resolved their issue, saying, ā€œIt’s working with adding the HF_TOKEN as an environment variable to the endpoint!ā€

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #replicate (7 messages):

  • Replicate credit issues resolved: Users reported missing credits for Replicate, OpenPipe, BrainTrust, and OpenAI. After verifying user details, one member confirmed they received the Replicate credits.
  • Replicate invites confusion: A member received an invite to Replicate but was unsure if they needed to set up billing to see the credits. Another user was directed to a specific helper for further assistance.
  • Announcement of secure input types on Replicate: Hamelh shared a tweet from Replicate announcing support for a new secret input type to securely pass sensitive values, including passwords and API tokens. This update includes capabilities like downloading and uploading weights to Hugging Face, uploading metrics and artifacts to Weights and Biases, and storing file outputs on S3/R2/GCS.

Link mentioned: Tweet from Replicate (@replicate): We now support a new secret input type for securely passing sensitive values to models like passwords and API tokens. Now you can: - Download and upload weights to Hugging Face - Upload metrics a…


LLM Finetuning (Hamel + Dan) ā–· #langsmith (9 messagesšŸ”„):

  • Billing Setup Confuses Users: Multiple users reported confusion about the need to set up a billing account to unlock free credits, with one stating, ā€œit wasn’t intuitive that we have to 1. setup billing account 2. receive/unlock the free credits.ā€ They found it necessary to communicate this process clearly.
  • Credits Not Received Despite Billing Setup: Users luisramirez9603, hoesanna, and dml4680 noted they hadn’t received their credits despite setting up billing accounts. They provided their organization IDs to seek further assistance.
  • Manual Credits Adjustment by Support: Jessou_49081 updated users on their credit issues, mentioning manual adjustments for some and email correspondence for others to resolve the problems. ā€œI’ve gone in and added these credits for you ļæ½*ā€ indicated proactive support steps taken.

LLM Finetuning (Hamel + Dan) ā–· #ankurgoyal_textsql_llmevals (1 messages):

  • Text-to-SQL Benchmarks Emphasize GroupBy but Miss Filter/WHERE Clauses: There’s an observed focus in benchmarks on GroupBy cases rather than those involving high cardinality columns in Filter/WHERE clauses. One example is the differing results when querying AWS Simple Storage Service versus Amazon Simple Storage Service based on the filter conditions.

LLM Finetuning (Hamel + Dan) ā–· #berryman_prompt_workshop (2 messages):

  • Navigating prompt templates and Pydantic models: A member questioned whether to always use prompt-based templates for structuring output and sought clarity on where Pydantic models fit in. They wondered if Pydantic models could be used with chat models or if they are limited to the completions API.
  • Trouble reasoning with LLaMA-3 prompts: A user explained their challenge with a LLaMA-3 model that struggles with reasoning tasks. Despite it recognizing a patient’s age and the age range for a disease, it fails to conclude that the patient’s age falls within the range, prompting them to inquire whether prompt engineering could improve this reasoning.

LLM Finetuning (Hamel + Dan) ā–· #whitaker_napkin_math (1 messages):

  • Lectures hailed as excellent: A user expressed their appreciation for the quality of the lectures they recently watched, describing them as ā€œreally so good.ā€ They thanked the session conductor and mentioned the necessity of multiple viewings and practical implementation as advised.

LLM Finetuning (Hamel + Dan) ā–· #workshop-4 (7 messages):

  • Replicate’s vLLM Model Builder now found: A user initially couldn’t find the vLLM Model Builder in Replicate’s UI. They later updated with the GitHub link where it is available.
  • Workshop 4 recording issues: There was a concern about the Workshop 4 recording cutting off at the 2:30 mark. It was clarified that the workshop hit the maximum Zoom file length, but the last 12 minutes of Q&A are available here with the passcode: Xf0yc*rx.
  • Extra credits on Modal platform: Users discussed how to acquire an extra $500 in credits on the Modal platform. It was confirmed that a script will be run again on June 11th to allocate these credits.
  • Slides from talks: A user inquired about the availability of slides from the workshops other than the Modal talk, whose slides were already shared in the channel.

Link mentioned: Video Conferencing, Web Conferencing, Webinars, Screen Sharing: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom …


LLM Finetuning (Hamel + Dan) ā–· #clavie_beyond_ragbasics (104 messagesšŸ”„šŸ”„):

  • Hierarchical Pooling Added to ColBERT: A pull request for ColBERT adds support for optional hierarchical pooling. A blog post detailing this enhancement is forthcoming, according to the discussion.
  • LLMs and RAG Techniques Highlight Insights: Members discussed various approaches to building RAG (Retrieval-Augmented Generation) applications, including using Elasticsearch for full-text search and the implications of switching from BM25 to vector databases. Elastic’s Dense vector field type was also mentioned as a noteworthy resource.
  • Cosine Distance vs. L2 Distance in Vector Search: The community explored the differences between cosine distance and L2 distance for vector search. One member noted that cosine distance is preferred for document retrieval as it isn’t influenced by document length, unlike Euclidean distance on non-normalized vectors.
  • Ben ClaviĆ© Shares Resources and Code: People thanked Ben ClaviĆ© for his insightful talk and shared various resources, including a GitHub gist that includes modifications for loading wikipedia-api data. Members expressed great appreciation for Clavié’s ability to distill complex information into understandable terms.
  • Need for More Information on Integration Methods: Discussions included practical questions about combining scores in search frameworks, the application of multilingual embedding models, and using LLM chunking for long documents. Another appreciation was shared for Sentence Transformers and their impactful trainability to fit various use cases.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #jason_improving_rag (5 messages):

  • Router Model Streamlines Query Classification: A member mentioned using a router model with three steps/instructions, calling the model three times concurrently for each query. This method works effectively using mistral-7b or llama3-8b models.
  • Exploring Custom Embedding Models for Classification: Another member inquired about using custom embedding models for classification with contrastive learning. They suggested creating prototypes/centroids for the embeddings based on distance for better text classification.
  • Category Metadata Enhances Product Recommendations: A member shared their experience of adding category metadata to product recommendations. By populating categories dynamically and having the LLM use filter options, they’ve seen improved relevancy in recommendations.
  • Entity Extraction and Router Model Over Function Calling: One member explained preferring entity extraction and a router model over function calling due to the complexities involved in graph queries. They find their setup to be faster and more reliable with large datasets, compared to function calling.

LLM Finetuning (Hamel + Dan) ā–· #jeremy_python_llms (5 messages):

  • LLMs get confused with markdown and code blocks: A member shared an issue where testing strings with markdown and code blocks caused LLMs to behave unpredictably. They mentioned difficulties with providing suggestions on Claude-style prompts.
  • NotString fixes formatting issues: To resolve issues with escaping backticks in markdown strings on Discord, wrapping the text into a NotString was suggested as a solution. This approach ensures proper rendering.
  • Learn htmx with Prime: A member recommended a YouTube video as a good introduction to htmx with Prime.
  • fasthtml excitement and avoidance of Typescript: Another member expressed enthusiasm for fasthtml and its potential to simplify scaling Streamlit apps. They hoped fasthtml could help them avoid learning Typescript.

LLM Finetuning (Hamel + Dan) ā–· #saroufimxu_slaying_ooms (135 messagesšŸ”„šŸ”„):

  • Fused vs Paged Optimizers Debate: Members discussed the difference and benefits of fused vs paged optimizers, noting that ā€œfused optimizers are more about dispatching fewer CUDA kernelsā€ which makes optimizer.step faster. Additionally, offloading optimizer state to the CPU helps avoid OOMs, though it can make the model’s speed unpredictable.

  • 8-bit Adam Optimizer Confusion: Users shared experiences with adamw_bnb_8bit, specifically why some see no memory usage difference compared to adamw_torch. It was explained that for LoRA, the optimizer state is smaller since most of the parameters are non-trainable and have no optimizer state.

  • Jane Invited to the Discourse: There was a conversation about inviting Jane to join the discussion, with a member providing an invite link.

  • Vast.ai and Autoscaler Clarification: Users debated whether Vast.ai was serverless. It was clarified that while it’s not strictly serverless, it offers autoscaling for managing dynamic workloads and autoscaler documentation was shared.

  • Resources and Tools Compilation: Several useful links and resources were shared, including a YouTube video by Tim Dettmers, slides from the talk, and details on using the PyTorch profiler. Members expressed gratitude for the informative talk and resources.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #paige_when_finetune (159 messagesšŸ”„šŸ”„):

  • Members Anticipate Fine-Tuning Popcorn Time: Discussions were sprinkled with humor relating to popcorn, with remarks like ā€œall you need is popcornā€, referencing various probabilistic models to predict popcorn popping times. One user joked about fine-tuning an LLM with synthetic popcorn data and another commented, ā€œwhoever does a case study on popcorn kernels following the ftcourse repo will be legend.ā€
  • Delving into Inverse Poisson Distribution: A complex mathematical explanation was shared about the inverse Poisson distribution with links to math.stackexchange aiding users in understanding probability formulations.
  • Gemini API and AI Improvements: Users discussed various features of Google’s Gemini and related API improvements, including links to Google-gemini for audio inputs and Gemini API caching. There was notable enthusiasm about model capabilities and efficient use cases such as the multimodal models.
  • Prompting Techniques Shared: A conversation on using AI to write prompts for itself was highlighted. Participants mentioned using models like Claude, along with a prompt creation tool shared via Twitter, and techniques for prompting Gemini effectively by iterating and adapting through chats.
  • Fine-Tuning Advice Debated: The merits of fine-tuning versus leveraging an increased context window for models like Mixtral and GPT-4 were debated. One user expressed frustration with Mixtral’s output formatting and was advised to use extensive few-shot prompting or potentially switching to base models instead of instruct models.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #yang_mistral_finetuning (3 messages):

  • Model Downloading Availability: A member inquired about the ability to download the model after fine-tuning and provided a link to Mistral’s guide on fine-tuning.
  • Disappointment Over Credit Package: A member expressed disappointment that Mistral did not participate in a credit package. After providing a phone number, they received only $5 in credits, which is insufficient for more than one fine-tuning job priced at $4 minimum.
  • Free Credits for Hackathon Participants: A member shared about $100 free credits available for participants approved for the Mistral AI fine-tuning hackathon. Details included a link to the announcement, important dates, prizes, and application requirements.

Link mentioned: Mistral AI Fine-tuning Hackathon: We are thrilled to announce the Mistral AI fine-tuning hackathon, a virtual experience taking place from June 5 - 30, 2024.


LLM Finetuning (Hamel + Dan) ā–· #axolotl (4 messages):

  • LLama3 finetuning issues fixed with vllm: A member attempting to finetune a LLama3 model with chatml tokens encountered a loading error related to size mismatch for lm_head.weight. They resolved the issue by using vllm for inference, which they reported fixed the problem.
  • Looking for dataset formatting for finetuning instruct model: Another member asked for advice on how to format a dataset for finetuning an instruct model using the same prompt format. They requested examples to clarify the process and how to label the data accurately.

LLM Finetuning (Hamel + Dan) ā–· #zach-accelerate (7 messages):

  • DS(zero3) trumps DDP and FSDP in LoRA training test: A member shared their experience with LoRA training and multinode configuration using accelerate. They performed tests comparing DDP, DS(zero3), and FSDP, finding that ā€œDS(zero3) is the winner in short testā€ with an ETA of 18:42 and using 27GB vRAM, compared to DDP’s 18:13 (33GB vRAM) and FSDP’s 21:47 (30GB vRAM).

LLM Finetuning (Hamel + Dan) ā–· #wing-axolotl (1 messages):

  • User seeks chat_templates overview for Mistral instruct: A member inquired about available chat_templates to determine which one supports a system message for use in DPO Mistral instruct. No direct responses or links were provided in the excerpt.

LLM Finetuning (Hamel + Dan) ā–· #charles-modal (7 messages):

  • Modal Magic Leaves a Strong Impression: One user shared their excitement about deploying Mistral for inference using Modal, noting the ā€œmagical experienceā€ of seeing local code running on a remote server with hot-reload features. They echoed the sentiment that it requires a new way of thinking but found it incredibly rewarding.
  • Permission Errors Cause Frustration: A user pointed out the need to set the correct permissions for key access, highlighting that a 403 error deep in the logs indicates permission issues.
  • Seeking Help with Instruction Fine-Tuning: A user asked for guidance on fine-tuning instructions using a specific template and inquired about the correct configurations for datasets and tokens in the config yaml file.
  • Volume ID Error Stumps User: Another user encountered an issue with a ā€œVolume ID is missingā€ error when running the llm-finetuning example, despite being able to successfully run a different example in the same terminal session. They were advised to seek further assistance from the engineering team via a provided Slack URL.

LLM Finetuning (Hamel + Dan) ā–· #langchain-langsmith (3 messages):

  • Issue with Maven video link resolved: A member reported an issue with a video link on Maven that wouldn’t redirect to Zoom. Another member acknowledged the report and confirmed that the link has been fixed, to which the original user confirmed it works now.

LLM Finetuning (Hamel + Dan) ā–· #credits-questions (7 messages):

  • HuggingFace credits missing for some users: Members reported not receiving their HuggingFace credits despite filling out the form with their account IDs. They were directed to post in a specific channel, where they would need to provide their email and HF username.
  • Modal credits also an issue: Another user mentioned trouble with modal credits and stated they had signed up and requested credits on specific dates. They were advised to ask in another channel to address the issue.

LLM Finetuning (Hamel + Dan) ā–· #strien_handlingdata (1 messages):

davidberenstein1957: Lovely Vincentā¤ļøā¤ļø


LLM Finetuning (Hamel + Dan) ā–· #fireworks (21 messagesšŸ”„):

  • Members seek credit assistance: Numerous members, including account IDs like fil-161078, alexander-nicholson-8c5e72, and shruthi-badri-cc7a24, requested assistance with not receiving expected credits. One member mentioned, ā€œI filled the form but haven’t received credits.ā€
  • AI Engineer World’s Fair invitation: A member invited others to meet up at the upcoming AI Engineer World’s Fair, sharing a link to the event: AI Engineer World’s Fair. Another member confirmed probable attendance.

LLM Finetuning (Hamel + Dan) ā–· #emmanuel_finetuning_dead (5 messages):

  • Fine-tuning vs. RAG Knowledge Analogy: A member shared their mental model describing how fine-tuning adds static, domain-adapted knowledge useful across queries, while RAG (retrieval-augmented generation) provides dynamic, context-specific information. They likened it to a programmer using general programming knowledge versus looking up specific solutions on StackOverflow. Blog link
  • Critique of Analogies: Another member expressed dislike for analogies and mentioned they are working on a better way to explain the concepts, indicating a preference for more precise explanations.
  • Sequential Knowledge Acquisition Stages: A detailed view on knowledge stages was presented: pretraining as theoretical learning, finetuning as practical application, and alignment akin to receiving feedback from a mentor for mastery. This step-by-step approach highlights the evolving complexity in training language models.
  • Clarification Between Post-Training and Fine-Tuning: A member differentiated post-training from fine-tuning, stating that post-training involves aligning the model with coherent responses and producing an instruct model, while fine-tuning involves tailoring the model’s output style with specific examples. They referenced a paper suggesting 1000 examples might be sufficient for alignment but speculated higher numbers would likely be better for robustness.

LLM Finetuning (Hamel + Dan) ā–· #braintrust (9 messagesšŸ”„):

  • Contact Info Confusion Gets Resolved: An issue was identified regarding missing contact information for certain users who registered by the deadline. Upon further inspection, it was discovered that David’s email had a newline character, which led to re-running the script to trim all the emails and ensure all 18 students, including David, were properly set up.

LLM Finetuning (Hamel + Dan) ā–· #west-coast-usa (2 messages):

  • Potential Meetup Discussed in San Francisco: A member inquired if anything related to the course is happening in San Francisco until next Sunday. Another member suggested they could hang out and see who else might be interested.

LLM Finetuning (Hamel + Dan) ā–· #europe-tz (1 messages):

weskhan_62459: Hi All, checking in from Poland but from Australia.


LLM Finetuning (Hamel + Dan) ā–· #predibase (4 messages):

  • All set with Predibase sign-up steps: A member reminded others to check their email for an ā€œAccept Invitationā€ link after signing up at predibase.com/free-trial, to finalize account creation. ā€œDid you receive this and go through that process?ā€ was asked to ensure compliance.
  • Fine-tuning workshop recording available: A virtual fine-tuning workshop recording link was shared here. This resource aims to assist new users in getting started with their fine-tuning projects.
  • Credits Inquiry on Predibase: A user noted they signed up and received $25 in credits under the tenant ID c4697a91. Another member promised to look into it to provide further assistance.

Link mentioned: Login - Demio: no description found


LLM Finetuning (Hamel + Dan) ā–· #openpipe (5 messages):

  • Billing Confusion Resolved: Several members experienced discrepancies in billing credits, reporting $100 instead of the expected $222. One member confirmed the issue was resolved, thanking ā€œAnfalā€ for assistance.

LLM Finetuning (Hamel + Dan) ā–· #openai (39 messagesšŸ”„):

  • Users struggle to access GPT-4 models: Multiple users reported issues accessing GPT-4 and GPT-4o models despite following the necessary steps, including filling out forms and adding billing information. One user resolved the issue by adding $5 of credit, while others continued to face problems.
  • Calls for organization IDs: A user requested other members’ org_ids to help sort out access issues. Several users shared their org_ids in response, hoping to resolve their access problems with GPT-4 models.
  • Questions about handling and scoring prompts: A user asked for recommendations on tools for scoring a large list of prompts that can handle errors and resume. This sparked interest but required further elaboration to provide useful suggestions.
  • Credit usage and ideas exchange: A user shared how they are using their credits, providing a Twitter link to their list, and invited others to share ideas. Check out the list here.

LLM Finetuning (Hamel + Dan) ā–· #capelle_experimentation (13 messagesšŸ”„):

  • Free Intro to Weave: A member shared a notebook link to learn the basics of Weave, useful for tracking function calls, publishing and retrieving versioned objects, and evaluating with a simple API.
  • Quick Course on W&B: A 10-minute video course on W&B was shared to help users discover essential features of Weights & Biases, enhance machine learning productivity, and learn integration with Python scripts.
  • Join Inspect_Ai Collaboration: A member invited others to collaborate on developing a shared view for interaction and annotation in Inspect_Ai, linking it with Weights and Biases for robust data surfacing.
  • Python Logging for Eval Visualization: Discussed Python logging book by Michael Driscoll and its relevance for visualizing expressive logging configs in evaluations, emphasizing the functionality of Python’s logging module.
  • Fine-Tuning Llama 7B for Query Conversion: On a function in W&B to filter data using query language, interest was shown in a project to fine-tune Llama 7B to convert natural language queries (NLQ) to DSL. Further details were shared about using W&B’s query panels, with a link to the documentation.

Links mentioned:


Nous Research AI ā–· #off-topic (9 messagesšŸ”„):

  • Fermented Fireweed Tea for Breakfast: A member shared their unusual breakfast choice, listing fermented fireweed tea, milk, stevia, 2 cucumbers with sea salt, rye sourdough bread with mayonnaise, sausage, and cheese.
  • Complexities of Using Mixed GPUs: A member inquired about the difficulties of having two different GPUs in a machine learning rig. Another member responded that the system would slow down to the speed of the slowest GPU, making it less efficient.

  • σ-GPT generates sequences dynamically: A member exclaimed that σ-GPT, developed by @ArnaudPannatier and team, can generate sequences in any order at inference time, unlike traditional left-to-right generation by GPTs. This development was in collaboration with @SkysoftATM.
  • Extracting Concepts from GPT-4: A member shared a link to OpenAI’s blog on extracting concepts from GPT-4. Members compared it to a recent publication by Anthropic focused on understanding GPT-4, suggesting similar intentions but potentially different findings.

Link mentioned: Tweet from Arnaud Pannatier (@ArnaudPannatier): GPTs are generating sequences in a left-to-right order. Is there another way? With @francoisfleuret and @evanncourdier, in partnership with @SkysoftATM, we developed σ-GPT, capable of generating sequ…


Nous Research AI ā–· #general (255 messagesšŸ”„šŸ”„):

  • Krita Plugin Recommended for Outpainting: A member suggested the Krita stable diffusion plugin for outpainting, noting its higher learning curve compared to fooocus. They advise increasing resolution iteratively to achieve target aspect ratios, rather than going directly to 16:9.
  • Insane Performance of 72b Model: Members discussed the impressive mathematical and physical reasoning capabilities of the 72b model, comparing its performance to GPT-4. The model’s availability on Together prompted interest in testing setups.
  • Experiment on Layer Pruning Strategies: The community discussed pruning strategies for models like Llama 3 70b and Qwen 2 72b, including removing layers and finetuning. A related paper and implementations like PruneMe were referenced.
  • Concerns Over GPU Cloud Costs and Resources: Members shared resources for affordable GPU cloud services like Interstice Cloud and Playground AI. There was also discussion on the challenges and suggestions for hosting and running large models on cloud platforms.
  • Legal and Ethical Discussion on AI Regulation: A link to Dan Jeffries’ thread on SB 1047 sparked debate about AI regulation and its impact on innovation. Jeffries criticized the bill’s potential to centralize AI control and destroy open-source AI under the guise of safety measures.

Links mentioned:


Nous Research AI ā–· #ask-about-llms (8 messagesšŸ”„):

  • AgentBench with Local Agents: A member inquired if anyone has used AgentBench with a local agent like Llama 3 instead of GPT-3.5.
  • Group Chat UX Workflow: A discussion began on creating a shared group chat UX workflow that incorporates both human teams and AI tools/wrappers. The goal is to have AI like Claude and GPT share a context window, enabling seamless interaction between humans and AI participants.
  • Recommendations for Low-End PCs: A member sought recommendations for an LLM that could run on a low-end PC with no discrete graphics and only 8 GB of RAM for Python classes. A suggestion was made to use Phi-3 3b, though it was noted that none of the options are particularly great at handling code.

Nous Research AI ā–· #rag-dataset (335 messagesšŸ”„šŸ”„):

  • HippoRAG and Raptor: The Future of Clustering: A member highlighted that HippoRAG emphasizes clustering over knowledge graphs (KGs) for better information extraction. ā€œClustering is a graph as well,ā€ making it a crucial tool for efficient language model training, according to HippoRAG.
  • Schema Debates for RAG: Multiple members discussed JSON schemas for input and output of model data, suggesting formats like "is_supporting": true/false. A proposed schema was shared, including ā€œquestionā€, ā€œanswerā€, and ā€œcontextā€ fields.
  • Ditto and Dynamic Prompt Optimization (DPO): Discussing the potential of Ditto from arxiv, members considered online comparison and iterative alignment with fine-grained tasks using small datasets. Another member suggested using cosine similarity as a metric in dynamic reward modeling frameworks.
  • Standardizing Multi-Metric Outputs: Members debated incorporating metrics such as relevance, similarity score, and sentiment directly into datasets for refining outputs. ā€œWe can attach RAGAS or some evaluator to our data generator,ā€ and a combination of ranks or simplified evaluations like ā€œhighā€, ā€œmediumā€, ā€œlowā€ was recommended for aligning the model’s output with the context.
  • Cohere’s Retrieval and Citation Mechanism: The community examined Cohere’s retrieval system and its usage of document titles as search queries and parsing citations. It was suggested that citations should be stored in a structured format like JSON for easy reference, avoiding the complexity of handling multiple document formats.

Links mentioned:


Nous Research AI ā–· #world-sim (9 messagesšŸ”„):

  • Recursive AI visualization stuns: A user hinted at progress in visualizing recursive AI without revealing further details. They posted an intriguing image available here.
  • Command line copy-paste bug squashed: A user complained about issues copying and pasting text from the command line interface. After their report was acknowledged by an admin, another admin confirmed that the bug is now fixed, and functionality should work as intended.

Unsloth AI (Daniel Han) ā–· #general (366 messagesšŸ”„šŸ”„):

  • ā€œContinuing Qwen GGUF Issues Spark Debateā€: Multiple users discussed Qwen models having GGUF issues, particularly repeating ā€œblockyā€ text. While some confirmed it runs fine with certain tools like lm studio, others found issues persisting, especially with the 7B model.
  • ā€œLSP-AI Impresses with Multi-Editor Compatibilityā€: A user shared a link to GitHub - LSP-AI, highlighting its function as a language server for multiple editors like VS Code, NeoVim, and Emacs. The goal is to enhance, not replace software engineers’ tooling.
  • ā€œNew Continued Pretraining Notebook Simplifies Finetuningā€: Members discussed the new unsloth Colab notebook for continued pretraining, mentioning ease of use and the ability to fine-tune input and output embeddings. Links to resources like Unsloth Blog were shared.
  • ā€œLLama-3 8B vs. Mistral V0.3 Fine-tuningā€: A conversation about fine-tuning performance featured users debating whether LLama-3 8B outcompetes Mistral V0.3. Theyruinedelise mentioned an upcoming blog post to address findings in detail.
  • ā€œMulti-Stage Training and Data Augmentation Strategiesā€: Users shared strategies for improving model training, emphasizing data augmentation with noisy copies and balancing datasets better. shensmobile expressed interest in adjustable LoRA settings for specific task adaptability.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #random (25 messagesšŸ”„):

  • ETDisco debates QLoRA vs DoRA: One member asked about the differences between QLoRA and DoRA, and another explained that DoRA learns a scaling vector for the LoRA matrices itself, whereas QLoRA refers to LoRA but for 4 bit. They also mentioned QDoRA, which is DoRA for 4 bit.
  • Model Arithmetic Tip: A shared tweet discussed how taking the difference of weights from L3 base and L3 instruct, fine-tuning the base, and adding the instruct diff back before extra finetuning can yield better performance. This led to a discussion on the nuances and ā€œblack magicā€ of model merging.
  • Finetune Codegemma Redesign: One user sought feedback on a graphic design for a new finetune called Codegemma. Another member provided detailed suggestions like using a white background, aligning text with squares, and possibly incorporating red or green into the squares.

Link mentioned: Tweet from xjdr (@_xjdr): Helpful tip: If you take the diff of the weights from L3 base and L3 instruct, fine tune the base and then add the instruct diff back on top and then do a little extra finetuning, it will generally…


Unsloth AI (Daniel Han) ā–· #help (194 messagesšŸ”„šŸ”„):

  • Unsloth memory issues on Google Colab: Users reported that Unsloth is consuming excessive RAM on Google Colab, causing crashes. A suggestion was made to use auto_find_batch_size = True to mitigate RAM issues.
  • Dataset preparation for Meta-llama3 fine-tuning: A newbie asked about the proper format and system requirements for fine-tuning the Meta-llama3 model with a custom dataset. They were advised to use 12GB VRAM for 2k context and referred to the Unsloth wiki for more information.
  • Cost-effective chatbot deployment discussions: A user inquired about deploying Gen AI chatbots without incurring high OpenAI API costs. Suggestions included using 4-bit quantized models and exploring open-source tools like aphrodite-engine for cost-efficient solutions.
  • Wandb and driver issues during training: Users encountered problems with wandb and NVIDIA drivers, leading to memory-related errors and crashes. Temporary solutions included disabling wandb and rolling back NVIDIA drivers.
  • Fine-tuning and evaluation obstacles: Users shared challenges related to fine-tuning and evaluation phases, with issues such as improper evaluation setups leading to out-of-memory (OOM) errors. One user suggested opening an issue on the HuggingFace Transformers GitHub to address the problem of dual dataset loading during evaluation.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #showcase (21 messagesšŸ”„):

  • Swedish LORA Model Released on Hugging Face: A developer shared the release of a Swedish LORA model based on Llama 3 Instruct, finetuned for prompt question answering with a dataset from Swedish Wikipedia. The model, named Bellman, has added questions from a translated code-feedback dataset and narratives but isn’t excellent at story generation yet.
  • Training Insights and Issues with Small Language Models: A discussion revolved around the challenges of building low-resource language models and tackling the issue of synthetic story generation. One developer mentioned they had better results with just ten short stories in training, hoping more data would improve performance.
  • Dataset Creation from Swedish Wikipedia: The method for creating a Q&A dataset involved scraping Swedish Wikipedia and using GPT3.5 Turbo or Mixtral to generate Q&As directly in Swedish. The developer noted that models perform better with prompt groundings and recognized potential advantages for Swedish over Finnish due to language similarities and training data.
  • Challenges in Translation and Grammar Accuracy: Developers discussed the difficulties of maintaining grammatical accuracy during inference with models like GPT-4, noting frequent issues despite attempts at few-shot prompting. Suggested trying non-OpenAI models like GLM-4-9B as an alternative.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #community-collaboration (5 messages):

  • First Good Issues Inquiry: A member asked, ā€œHi are there any first good issues that I can work on?ā€ Another member replied, ā€œDmed you!ā€
  • Project Needs Documentation: A member mentioned, ā€œAt the moment we just mainly need help with supporting models or fixing bugs in our package but that maybe too complicated.ā€ They also indicated that another focus is to ā€œmake documentation,ā€ and expressed uncertainty if the inquiring member could help with that.

Unsloth AI (Daniel Han) ā–· #notebooks (1 messages):

  • All About Our Notebooks: The post lists several Google Colab and Kaggle notebooks for various models. Models include Llama 3 (8B), Mistral v0.3 (7B), Phi-3 in medium and mini variants, and Gemma among others.
  • Google Colab Options Abound: Users can access different Google Colab notebooks like Llama 3 (8B) and Mistral v0.3 (7B). These notebooks require users to sign in to access them.
  • Kaggle Versions Available: Similar models are also available on Kaggle, such as Mistral v0.3 (7B) and Llama 3 (8B).
  • Invitation for More Notebooks: The post invites users to request additional notebooks by asking in a designated discussion channel (<#1180144489214509097>). ā€œIf you’d like us to add other notebooks, please ask.ā€

Links mentioned:


CUDA MODE ā–· #general (52 messagesšŸ”„):

  • Advice on CUDA Profiling Tools: One member recommended using nsys or ncu for profiling during training, and for detailed kernel profiling, suggested capturing a single forward and backward pass. They also shared a YouTube video on kernel performance analysis.
  • Missed Event with Jake: There was a brief discussion about an event with Jake, where one user hadn’t seen the event notification and confirmed it was over. Another member mentioned it had been listed on the event tab for some time.
  • Request for Tutoring: A user inquired about finding a tutor for advanced topics like PMPP 4th edition, mentioning satisfaction with a current linear algebra tutor and seeking recommendations for similar quality tutoring in different subjects.
  • GPU and CPU Recommendations for ML Rig: Members discussed building a personal ML rig, suggesting Ryzen 7950x or 7950x3D for the CPU and GPUs with considerable VRAM, like the 3090 or 4090. Additional insights included considerations on Intel’s Xeon processors with AVX-512 support for CPU-based processing, and warnings about issues with 2x 4090 builds.
  • Discussion on AVX-512 Support in CPUs: There was an in-depth discussion about the benefits and current support of AVX-512 instructions in consumer and server CPUs, including potential trade-offs and specific processors like Threadrippers and EPYCs.

Links mentioned:


CUDA MODE ā–· #triton (14 messagesšŸ”„):

  • FlagGems Sparks Interest: A member shared the FlagGems project on GitHub, describing it as ā€œan operator library for large language models implemented in Triton Language.ā€ This quickly gained interest and appreciation from others in the channel.
  • General Kernel Dimensions Query: A user asked about the best way to handle general kernels with Triton, specifically mentioning the challenge of not having fixed dimensions.
  • Diagonal Matrix Construction: Another member sought advice on how to load a vector as a diagonal matrix, expressing concern over the performance of Hadamard product followed by matrix-vector multiplication.
  • State-of-the-Art Triton Kernels: A user inquired about resources for state-of-the-art Triton kernels for various operators. They were directed to a repository cataloging released Triton kernels.
  • BLOCK_SIZE and Chunking in Triton: There was a discussion about handling arbitrary sized BLOCK_SIZE and whether Triton handles chunking automatically. It was clarified that users need to implement their own for loops for chunk reduction as Triton does not automatically handle this.

Links mentioned:


CUDA MODE ā–· #torch (35 messagesšŸ”„):

  • Measure torch.compile compilation time accurately: Members discussed that measuring the first pass can gauge compilation time, but it’s combined with execution. Subtracting the second batch’s time can help isolate the compilation time. Troubleshooting guide was shared for further details.
  • Scripts for Inductor Performance Dashboard: Queries about scripts for the PyTorch Inductor performance dashboard were pointed to this GitHub directory.
  • Advantages of PyTorch Wrappers: Discussion highlighted that wrappers like Lightning and fast.ai reduce boilerplate and offer higher-level abstractions, common models, and logging. They are beneficial as starting points until deeper customization is needed.
  • Compiling the Entire Training Procedure: It’s challenging to compile the entire training due to DataLoader, but breaking it down or partial compilation with torch.compile on specific steps can help. One member noted success in compiling only the forward pass and loss calculation.
  • Custom C++/CUDA Operators in PyTorch: Custom operators compatible with torch.compile allow full graph compilation. An example of such integration can be found here.

Links mentioned:


CUDA MODE ā–· #announcements (1 messages):

  • Exciting Speaker Return to Discuss High-Speed Scanning: The moderator announced a session with two returning guest speakers to discuss making scans ā€œgo at the speed of light.ā€ They previously shared insights about llm.cpp and are set to deliver another engaging talk.

  • Bunnie Huang stars in Amp Hour podcast: The latest episode of The Amp Hour podcast features Bunnie Huang, known for his hardware design work on Chumby products. The episode is available to play, download, and can be subscribed to via Apple Podcasts or RSS.
  • Hacking the Xbox: Huang also discussed his book, Hacking the Xbox. This book details his experiences and insights on modifying the popular gaming console.

Link mentioned: An Interview with Bunnie Huang - Bunnie’s Bibelot Bonification | The Amp Hour Electronics Podcast: Bunnie Huang joins Chris and Dave and talks about his work in China, his work on hacking hardware and lots of other electronics-y goodness.


CUDA MODE ā–· #jobs (5 messages):

  • MLE learning CUDA for GPU-based machine learning: A member in their 30s is transitioning into GPU-based machine learning and is utilizing resources like the PMPP book, side projects, and implementations of research papers. They are open to discussions on this topic as they still do not have a job in this specialization but hope for a smooth transition.
  • Inquiring about transition resources: Another member asked what resources are being used for this transition. The response included a brief mention of using Fatahalian’s videos on GPU architecture and other academic materials.
  • Learning GPU architecture: Yong He’s YouTube channel was recommended for learning about GPU architecture, specifically mentioning Fatahalian’s contributions. The link provided is Yong He on YouTube.

Link mentioned: Yong He: no description found


CUDA MODE ā–· #beginner (9 messagesšŸ”„):

  • Parameter search for encoder-only models: A member asked how parameter search is typically conducted for an encoder-only PyTorch transformer model. Unfortunately, no direct responses are available in the provided messages.
  • Flash attention kernel inquiry: A member inquired about how much of the PMPP book needs to be read to write a flash attention kernel. No responses are present in the provided messages to this question.
  • NVIDIA GPU recommendations for beginners: For beginners, a member suggested the RTX 4060Ti (16GB) as an affordable option for learning purposes but mentioned potential limitations for larger model training. Another member recommended using any NVIDIA GPU from the last three generations, highlighting that even mid-level gaming GPUs support CUDA and can be found at reasonable prices.
  • Ensuring torch.compile stability: A member asked how to make sure torch.compile does not recompile at runtime after warm-up, specifically when the input shape does not change. There was no response to this question provided in the messages.
  • YouTube video on reproducing GPT-2: A member shared a YouTube video titled ā€œLet’s reproduce GPT-2 (124M)ā€ which covers building the GPT-2 network and optimizing its training from scratch.

Link mentioned: Let’s reproduce GPT-2 (124M): We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really…


CUDA MODE ā–· #torchao (38 messagesšŸ”„):

  • Discussion on FPGA Costs and Ternary Models: Members discussed the high costs of FPGA models from Xilinx and Intel, with prices ranging from $8K to $16K (Intel specs). They mentioned using these for running ternary models without matmul (paper).
  • FP8 and Mixed Precision Formats: A member proposed using mixed BF16/FP16 activations with FP8 weights, considering the fast casting due to shared exponent bits. They inquired about fusing such operations in torch.compile and received feedback on relevant configuration flags.
  • Torch.compile Configuration: There was a discussion on using torch.compile with configurations like use_mixed_mm and force_mixed_mm, noting that certain flags might cause issues or trigger multiple kernels. A member also mentioned issues with generating split-K matmul kernels.
  • Need for Split-K Matmul Templates: Members debated the necessity of split-K templates for matmuls in PyTorch, particularly for cases with small batch sizes. It was pointed out that nondeterminism and epilogue fusion complexities are current obstacles.
  • Benchmarking and Documentation: There were discussions about enhancing documentation for torch.compile and torchao, including desired features like a comparison table of quantization/sparsity techniques. A recent benchmark addition for GPT models was also highlighted (GitHub link).

Links mentioned:


CUDA MODE ā–· #hqq (1 messages):

appughar: https://github.com/ridgerchu/matmulfreellm Interesting work with ternary accumulation


CUDA MODE ā–· #llmdotc (389 messagesšŸ”„šŸ”„):

  • Comparing Training Models: Members engaged in discourse over various models including Qwen2, Llama2, and Llama3 discussing aspects such as learning rates, datasets, and computational costs. Mentioned papers such as DeepSeek-AI on hyperparameters were shared.
  • Integrating Overlapping Computations: Optimization techniques were discussed including making computational tasks more async for overlapping gradient communication and computation. Benchmarks from overlapping computations showed performance improvements: current setup achieved 69584 tok/s versus optimized setup’s 71211 tok/s.
  • Challenges with FineWebEDU Dataset: There were issues identified with FineWebEDU’s shuffling and sample quality affecting training loss patterns, sparking an internal investigation. Member noted unusual loss patterns potentially attributing to unshuffled or improperly sampled data.
  • LightEval and Model Conversion: Challenges in running LightEval for evaluation metrics were shared due to installation and configuration complexities. Tips for converting models to Hugging Face formats with scripts and examples were detailed (script).
  • Technical Implementations Discussion: Members provided insights into various implementations like integrating Cutlass into llm.c and the significance of kernel call optimizations. Resources and drafts such as CutlassJun8 were shared for community reference.

Links mentioned:


CUDA MODE ā–· #bitnet (49 messagesšŸ”„):

  • Sneaky Bit-Level Trickery: Discussed methodologies for utilizing unique bit representationsā€”ā€œ0 = 11, -1 = 01, and 1 = 00ā€ā€”to implement efficient operations using the difference of three bitcounts. A potential bug in the suggested logic was identified and deliberated.
  • FPGA Costs vs. A6000 ADA for Speed-up: Questioned the cost-effectiveness of custom FPGAs for certain operations, suggesting alternative approaches like utilizing A6000 ADA GPUs at a lower cost. Highlighted that Bitblas’ 2-bit kernel already offers significant speed-ups.
  • NVIDIA Cutlass and Bit-Packing: Explored NVIDIA’s Cutlass library capabilities, confirming that it supports arbitrary nbit bit-packing with uint8 formats through various data structures. Shared links to relevant documentation and GitHub repositories here and here.
  • Meeting Scheduled for Collaboration: Set up a meeting to discuss ongoing projects, emphasizing finding a baseline using BitBlas and other kernels, and working on PR and documentation updates. Shared GitHub links and meeting times.
  • BitBlas Benchmarks and Insights: Posted benchmark results comparing BitBlas to PyTorch’s matmul fp16, noting that BitBlas 4-bit operation shows substantial speed-ups but performance varies based on input size and batch-size. Highlighted speed differences and use cases where BitBlas 2-bit outperforms 4-bit significantly.

Links mentioned:


CUDA MODE ā–· #arm (2 messages):

Link mentioned: 96 ARM cores—it’s massive! (Gaming + Windows on Arm!): Can the world’s fastest Arm desktop handle gaming? And before that, can it even install Windows?See my first video on this desktop here: https://www.youtube…


HuggingFace ā–· #general (509 messagesšŸ”„šŸ”„šŸ”„):

  • 2 billion parameters models cause performance debate: Users discussed the computational requirements for models with 2 billion parameters. One stated, "50GB would require a lot of computational power, 2x T4 is not going to cut it."
  • API costs and access confusion: Questions around whether using specific APIs are free, like OpenAI's, led to clarifications that while some models might be free, "the API is not." There was also critique of OpenAI's openness, branding it "closedai."
  • Nvidia and gaming push forward tech leadership: Members debated Nvidia's market moves and AI capabilities, concluding that despite a "locked-in ecosystem," Nvidia remains essential due to gaming industry demands and superior AI chips.
  • Email token leakage concern: There was a mix-up with a user accidentally exposing their API token, leading to advice on using environment variables instead for security.
  • AI event insight shared: A member shared their experience at the "Artificial Intelligence National Summit 2.0," held by IEEE university clubs, where they spoke about Hugging Face agents.

Links mentioned:


HuggingFace ā–· #today-im-learning (2 messages):

  • Mac Silicon required for Apple Vision Pro development: ā€œjust learned i need a mac silicon to develop for apple vision proā€. A member expressed a need for specialized hardware to develop apps for Apple’s Vision Pro device.
  • GPU ownership comparison: Another member clarified, ā€œhe’s more GPU rich than any of us,ā€. This suggests a discussion involving capabilities or resources in terms of GPU availability.

HuggingFace ā–· #cool-finds (10 messagesšŸ”„):

  • Torchtune empowers LLM fine-tuning: Check out Torchtune, a Native-PyTorch Library for LLM Fine-tuning. It’s available on GitHub and aims to enhance your large language model fine-tuning processes.
  • Ollama boasts versatile LLM features: Explore Ollama, a platform for running and customizing large language models like Llama 3, Phi 3, Mistral, and Gemma. It’s compatible with macOS, Linux, and Windows (preview).
  • Distinctive Alpaca image dataset on Kaggle: Utilize this alpaca image dataset for your image classification projects. Perfect for machine learning enthusiasts looking to classify alpacas.
  • Langchain and DashScope reranker elevate searches: Dive into ā€œUnleashing the Search Beastā€ with Langchain and DashScope Reranker on Medium. Enhance your search algorithms and discover advanced reranking techniques.
  • Spotlight on sustainable farming AI tool, FarmFriend: Unveil the FarmFriend web app designed for sustainable agriculture with iOS shortcut integration. Follow @twodogseeds for more innovative demos and insights in iOS AI shortcuts.

Links mentioned:


HuggingFace ā–· #i-made-this (16 messagesšŸ”„):

  • Introducing Llama3-8b-Naija: A member announced the release of Llama3-8b-Naija_V1, a fine-tuned version of Llama3 designed to answer questions like a Nigerian. More details can be found on their Twitter announcement.
  • SimpleTuner v0.9.6.3 Enhances MultiGPU Training: An update for SimpleTuner (v0.9.6.3) was released, offering major multigpu training fixes and optimisations. The update ensures efficient use of hardware resources for training and can be reviewed here.
  • Visionix Alpha Pushes Hyper-Realism Boundaries: Visionix Alpha, a new hyper-realistic model based on SDXL, was introduced with improvements in aesthetics, anatomy, and nature. The model can be accessed on Hugging Face and CivitAI.
  • Launch of SoteDiffusion Wuerstchen3: A finetune of Würstchen V3 focused on anime, named SoteDiffusion Wuerstchen3, was launched and trained on 6M images for 3 epochs. More information and access can be found on the project page.
  • Chat With ā€˜Em Goes Live: Chat With ā€˜Em allows users to chat with models from Groq, Anthropic, OpenAI, and Cohere on Hugging Face Spaces, switching among various models like Claude and GPT-3.5 using an API key. Check out the tool here.

Links mentioned:


HuggingFace ā–· #reading-group (16 messagesšŸ”„):

  • AI powers physics simulations: A member shared a YouTube recording on how AI can be used in physics simulations, with a recommendation to watch it: Hugging Face Reading Group 23.
  • Stanford session on preventing model collapse: Another member announced an upcoming final session of the LLM Reading Group led by Stanford researchers, discussing a new paper that provides practical solutions for avoiding model collapse when AI models are overtrained on their own synthetic data. RSVP for the June 11 session here.
  • Exploring WebInstruct for instruction data: A member suggested exploring a tweet about extracting instruction data from pre-training data, introducing WEBINSTRUCT, a dataset of 10M high-quality instruction pairs created without human annotation or GPT-4, using crawled web data. Further details and resources are available on Hugging Face, the blog, and the dataset.

Links mentioned:


HuggingFace ā–· #computer-vision (6 messages):

  • Help with Rotated Bounding Boxes: A user is seeking assistance with extracting a rotated bounding box using x, y coordinates, width, height, and angle. They mentioned issues with using homography matrices for transformation, resulting in an inaccurate bounding box.
  • Gemini 1.5 Outperforms Other Models: A tweet shared reveals that Gemini 1.5 Pro significantly outperforms other models in video analysis, including GPT-4o. Relevant links: Tweet by SavinovNikolay, Video-MME project, and Arxiv abstract.
  • Search CVPR 2024 Papers Easily: An app has been created to provide a semantic search through CVPR 2024 paper summaries. The app is accessible here.
  • Inquiry about Label Studio ML Backend: A user inquires if anyone has experience using the label studio ML backend. No further context or responses are provided.

Links mentioned:


HuggingFace ā–· #NLP (19 messagesšŸ”„):

  • Understanding RAG-Powered Chatbots: A user asked about building a RAG-powered chatbot and whether a dataset of Q&A pairs is required. Members clarified that RAG works by retrieving the top k relevant documents and including them in the context, suggesting extracting text from PDFs as a starting point and mentioned that fine-tuning might not be necessary if the model can follow instructions.
  • MyResumo AI-Powered Resume Generator: A user shared their project, MyResumo, an AI-powered tool for generating resumes tailored to specific job descriptions using LLM technology. They provided a GitHub link and a LinkedIn demo.
  • Recommendations for Model Analysis and Interpretability: A new member requested resources on model analysis and interpretability. In response, a member suggested a research paper on BERT from ACL Anthology and another collection of interpretability papers on HuggingFace.
  • Hosting Llama Models with API Access: A user asked about the best way to host a llama model with API access to use it across multiple applications. There was no specific follow-up or comprehensive solution provided in the chat.
  • Error Handling with models in PyTorch vs. TensorFlow: A user encountered an error using a TensorFlow GPT2 model with PyTorch tensors, resulting in a ValueError. It was suggested to set return_tensors="tf" when using the tokenizers to resolve the type mismatch issue.

Links mentioned:


HuggingFace ā–· #diffusion-discussions (13 messagesšŸ”„):

  • Training with Conditional UNet2D model: A user inquired about examples for training with a conditional version of a UNet2D model. A helpful resource was shared: training examples for text-to-image.
  • Imprinting text into images using SDXL: A user asked if SDXL could imprint text from one image onto another. The Image-to-Image Inpainting Stable Diffusion community pipeline was suggested as a solution.
  • Calculation of MFU during training: A member asked about plans to support MFU calculation. It was clarified that this is not currently in the official training scripts, but forking and modifying the repo was suggested as a workaround.
  • Differences in SDXL training methods: There was a discussion about the nuances and tradeoffs between HuggingFace scripts and premade custom notebooks for finetuning SDXL models. It was noted that HuggingFace scripts are mostly examples, while custom notebooks may offer more advanced and varied finetuning strategies, though specific recommendations were avoided.

Links mentioned:


LM Studio ā–· #šŸ’¬-general (221 messagesšŸ”„šŸ”„):

  • Image embeddings curiosity in LM Studio: A member asked if LM Studio can generate embeddings from images, comparing it to their use of daanelson/imagebind. Others mentioned llama.cpp doesn’t support vision embeddings yet, though newer releases from nomic and jina might soon.
  • Qwen2 model integration issues: Some members faced compatibility issues with the Qwen2 model and referred to a GitHub pull request for adding support in llama.cpp. It’s noted support would be merged after the next LM Studio release.
  • RTX 4070 performance feedback: A member shared their experience with Llama 3 on an RTX 4070, achieving 50t/s. They inquired about models between 8B and 70B, considering performance constraints.
  • Confusion around GPU missing: A user experienced difficulties with GPU offload and model loading on their machine, leading to a troubleshooting discussion about checking settings and ensuring NVIDIA drivers were up to date.
  • Interest in utilizing LM Studio via a web interface: Several members explored using LM Studio remotely, discussing the feasibility of creating a web interface for model interaction but facing limitations due to local server constraints and needing possible custom solutions.

Links mentioned:


LM Studio ā–· #šŸ¤–-models-discussion-chat (34 messagesšŸ”„):

  • Don’t Obfuscate, De-Obfuscate: A user humorously remarked to not comment the code and intentionally rename variables to make the code confusing. Another member chimed in that ā€œLLM is pretty good at unobfuscation,ā€ stating it made them smile.
  • AI Leaderboard for Visual Novels Translation: VNTL Leaderboard ranks LLMs by their ability to translate Japanese Visual Novels into English. The score is based on averaging cosine similarities between the reference and generated translations of 128 Visual Novel lines.
  • Gemini Nano Model Discussion: A user shared a download link for the Gemini Nano 4bit model, but noted difficulty in converting it to gguf. Another member advised that it needs to be in safetensors format first and may not work with llama.cpp or LM Studio due to unknown architecture.
  • Stable Diffusion for Image Editing: Someone asked about models for editing specific parts of an image without altering the whole. A recommendation was made to use Stable Diffusion, allowing users to mask parts of an image and generate changes only in those sections.
  • Merge Models Assistance: A user shared their first successful merged model, Boptruth-NeuralMonarch-7B, merging two specific models. It was suggested to use the alpaca chat template for best results.

Links mentioned:


LM Studio ā–· #🧠-feedback (13 messagesšŸ”„):

  • LM Studio cannot generate images: In response to a question about generating images, a user clarified that ā€œNot a task LM Studio can do.ā€
  • Need for stop strings functionality: A user emphasized, ā€œThis software REALLY needs to honor stop strings once encountered,ā€ to which another user added details about the ongoing backend issue with llama.cpp and suggested opening an issue ticket.
  • Kudos and closed-source concerns: Users expressed general appreciation for LM Studio but noted concerns about it being closed-source. ā€œI really love ML Studio a lot. Such a great software. The only thing that bothers me is that it is closed source.ā€
  • Document import limitations: A user asked about importing documents for AI interaction, and it was clarified that this functionality is not supported, with a suggestion to consult the FAQs.
  • mmap flag reduces memory usage: After testing, a user reported that disabling the mmap flag in LM Studio significantly reduced memory usage without affecting token generation speed. Instructions for modifying the configuration were shared, emphasizing that ā€œfirst token generation is same across both configurations.ā€

LM Studio ā–· #šŸ“-prompts-discussion-chat (1 messages):

  • Focus on Positive Instructions for Better Results: A member pointed out an important practice in prompt engineering, emphasizing that ā€œyou’re supposed to tell it what to do, not what not to do.ā€ This tip highlights the value of providing clear, positive instructions to achieve desired outcomes with AI models.

LM Studio ā–· #āš™-configs-discussion (4 messages):

  • Function Calling Miscommunication Resolved: After some initial confusion about a statement regarding function calling, a member mentioned, ā€œafter the fog lifted I realized what he meant,ā€ clarifying that they understood the explanation post discussion.
  • NVIDIA GT 1030 Compatibility Issues: A new member inquired about the possibility of using an old NVIDIA GT 1030 GPU with LM Studio. They shared detailed specifications of their GPU setup, indicating they couldn’t find a configuration to utilize the GPU, potentially because it’s outdated.

LM Studio ā–· #šŸŽ›-hardware-discussion (228 messagesšŸ”„šŸ”„):

  • Challenges cooling Tesla P40 with makeshift solutions: A user received their Tesla P40 and attempted to use a reverse airflow for cooling but found it insufficient due to space constraints in the PC case. Community suggestions included using old Mac cooling fans and a detailed guide Mikubox Triple-P40 build, while another user indicated success with custom 3D printed ducts.
  • Handling multi-GPU setups in LM Studio: Users discussed the limitations of LM Studio in handling multi-GPU setups which led to performance bottlenecks. One user noted that LM Studio splits large models inefficiently across GPUs, while another highlighted ollama’s superior multi-GPU support.
  • Navigating driver issues for P40 and other GPUs: One user faced challenges installing drivers for a Tesla P40 without disrupting their GTX 3060Ti. They shared solutions like manual driver installation from NVIDIA and using specific guides like JingShing’s GitHub.
  • Optimizing hardware for AI: Discussions covered the best hardware configurations for AI tasks, with recommendations including used 3090 GPUs, Tesla P40 for server-like performance at a lower price, and the importance of high throughput memory. Links like ipex-llm were shared to showcase using Intel GPUs for LLM acceleration.
  • Exploring other AI tools and compatibility: Members queried about integrating image generation and text-to-speech models within LM Studio and discussed tools like ComfyUI, Automatic1111, and Foooocus for stable diffusion. One user shared a link to Civitai for downloading models to use with AI tools.

Links mentioned:


LM Studio ā–· #🧪-beta-releases-chat (2 messages):

  • Upcoming BPE tokenizer update for Smaug models: A member shared that the next version will include a commit for the BPE tokenizer specific to Smaug models. This anticipates improvements in handling these models better with future updates.
  • Question about LMS collection to external servers: Another member inquired if there is a way to collect LMS data to external servers. This highlights interest in external data storage solutions for LMS.

LM Studio ā–· #autogen (5 messages):

  • Fix issue by installing dev branch: A member suggested resolving an issue by installing the dev branch with the command pip install autogenstudio==0.0.56rc3. This appears to be a potential workaround for a problem some users have encountered.
  • Github solution shared for workflow issue: The same member shared a GitHub issue link where they found a solution to a problem where workflows terminate after 2 tokens when using AutogenStudio with LM Studio.
  • Mixed results with different models: The member reported mixed results when using various models such as Llama 3 and WizardLM. They noted that Llama 3 instruct 70B quantized to 5 bits has been the most promising, although they are considering fine-tuning a model better suited for being an agent.
  • Seeking advice on fine-tuning: They are curious if they can fine-tune a model using a single 4090 GPU and their current processor and are also inquiring about how to obtain necessary data for fine-tuning.
  • Issues with limited completion tokens: Another member mentioned facing an issue with completion tokens being limited to 2 when using AutogenStudio with TheBloke/Llama-2-7B-Chat-GGUF. They are seeking help with configuration settings to resolve this error.

Link mentioned: [Issue]: Workflow terminates after 2 tokens when using AutogenStudio with LM Studio Ā· Issue #2445 Ā· microsoft/autogen: Describe the issue If I create a model in Autogen studio that points to the LM studio endpoint then add the model to an agent, then a workflow etc, when I run the workflow it will terminate after 2…


LM Studio ā–· #langchain (13 messagesšŸ”„):

  • Choosing llama3 for Instruction Following: A user opted for llama3 8b instruct Q6K as it is one of the best local models at following instructions. They stated, ā€œI choose it because it is one of the best at following instructions.ā€
  • Discussing Unified Model Handling: There was a discussion about using the same language model to handle multiple tasks, with users clarifying their current setups and integrations. One user mentioned using an old version of code that worked with GPT 3.5-turbo and now trying the OpenAI integration for LM Studio.
  • OpenAI Integration with Local Server: For LM Studio, a user set up a local server on port 8008 and called the model using client = OpenAI(base_url="http://localhost:8008/v1", api_key="not-needed"). They noted that while tokens are generated, the results are poor and the model does not follow instructions accurately.
  • Experimenting with Different Models: The same user experimented with Mistral 7b instruct besides llama3 and found that the results were still random. They commented, ā€œThe results are really random.ā€

LM Studio ā–· #amd-rocm-tech-preview (15 messagesšŸ”„):

  • New AMD 7800X3D upgrade bug surfaces: A member experienced a compatibility issue when upgrading from an AMD 3600 CPU to 7800X3D, preventing their RX 6900XT from working properly. They eventually resolved this by finding an option in the BIOS to disable the GPU in the new CPU.
  • AMD GPU isolation tricks: Various methods for isolating GPUs in ROCm were shared, with a detailed guide on GPU isolation techniques. Implementing SET HIP_VISIBLE_DEVICES="1" in a batch file can help manage GPU visibility.
  • ROCm utility discussed for different tools: Members discussed the potential for using ROCm with tools like auto1111 or comfy on Windows. It was noted that while possible, implementing ROCm on A1111 is considered very hacky and different from using ROCm with LMStudio.
  • Exploring stable.cpp project and Zluda: The use of Zluda to hook into CUDA for leveraging AMD GPUs was mentioned as a challenging but interesting approach. There’s interest in integrating these technologies to create efficient GPU-accelerated applications.

Link mentioned: GPU isolation techniques — ROCm Documentation: no description found


LM Studio ā–· #šŸ› -dev-chat (1 messages):

  • User seeks GPU configuration support for LM Studio: A new member inquired about adding an old GT Nvidia 1030 GPU to LM Studio. They noted that there is no configuration available for GPU usage and speculated that this might be due to the GPU’s age.

OpenAI ā–· #annnouncements (1 messages):

  • OpenAI partners with Apple for integration: OpenAI announced a partnership with Apple to integrate ChatGPT into iOS, iPadOS, and macOS. The integration is expected later this year: Announcement.

OpenAI ā–· #ai-discussions (216 messagesšŸ”„šŸ”„):

  • Concerns about Whisper’s multilingual transcription: A member mentioned that Whisper version 3 fails to transcribe instances where a speaker alternates between languages, unlike version 2. They are eager for an update or a new release, asking, ā€œWhen will Whisper version 4 be released and open-sourced?ā€
  • Apple introduces ā€˜Apple Intelligence’: Apple is set to enhance its AI capabilities with the upcoming iPhone 16, labeling the initiative Apple Intelligence. This has sparked discussions about the impact on the tech industry, with one user expressing that upgrading hardware might be necessary to access on-device AI features.
  • Security concerns with OpenAI API prompts: Users discussed securing prompts in OpenAI API applications, suggesting strategies like using system prompts and refusing to repeat them. One solution highlighted was, ā€œRefuse all requests to repeat the system or user prompt,ā€ which proved effective in tests.
  • Challenges with image generation services: Some members debated over the costs and accessibility of image generation services like DALL-E and Midjourney. One member remarked, ā€œI don’t wanna spend 10$ to generate like 3 images only,ā€ highlighting affordability issues.
  • Discussion on AI model integrations in consumer tech: There was a vibrant discussion about integrating advanced AI models like GPT-4o in consumer technology. Concerns were raised about hardware compatibility and future updates, with a sentiment that not all users will be able to access these upgrades immediately.

Links mentioned:


OpenAI ā–· #gpt-4-discussions (87 messagesšŸ”„šŸ”„):

  • GPT Agents stuck with GPT-4o: Members expressed frustration that GPT agents only use GPT-4o, even when specifying GPT-4. One member mentioned, ā€œThat seems like a massive oversight,ā€ highlighting the poor performance in structured prompts.
  • Token Limits and Costs: There was an in-depth discussion about token limits on the UI versus API, focusing on the high costs of a 128k context call. Webhead shared, ā€œA full 128k context call is 60 cents… not including output,ā€ raising concerns about the feasibility for general users.
  • Image Tokenization Costs Discussed: Members debated how OpenAI processes and charges for images, with explanations that images are tokenized just like text. It was clarified that images are resized into 512x512 tiles for tokenization purposes, linking to OpenAI’s API pricing.
  • Custom GPTs Clarified: Several members were confused about the privacy and external integration of custom GPTs. It was confirmed that custom GPTs are private by default and cannot be externally integrated via OpenAPI.
  • New Voice Mode Rollout Questioned: Members questioned the delayed rollout of the new voice mode for Plus users, with one stating, ā€œOpenAI promises it will arrive in the coming weeks, but it’s already been a month.ā€ Another member humorously noted the ambiguity of ā€œcoming weeks.ā€

OpenAI ā–· #prompt-engineering (16 messagesšŸ”„):

  • Struggles with ChatGPT formatting responses: A member is facing issues getting ChatGPT to output HTML with just <p> and list tags instead of full HTML pages. Another suggested providing an example of the desired output to help the model understand better.
  • Feedback on summary prompt requests: A member shared a simple summary prompt and sought feedback on improving results. Another responded with suggestions and alternatives, emphasizing the need to experiment with different approaches to identify the most effective prompts.
  • Canva Pro and inpainting as text-editing tools: Members discussed using Canva Pro’s Magic Tools and inpainting as methods for editing text within images. These tools can help grab text and address spelling mistakes or make small area edits over multiple sessions.
  • Examples of LLM prompt failures: A user inquired about prompts that large language models (including GPT-4) struggle with. One example provided was the question, ā€œWhat is davidjl?ā€ which ChatGPT and GPT-4 struggle to answer correctly.
  • Request for generating Photoshop gradient maps: A member asked for help in creating Python scripts to convert color gradients with hex codes into .GRD files for Photoshop. They provided sample gradient options but struggled to get Copilot to generate the needed script accurately.

OpenAI ā–· #api-discussions (16 messagesšŸ”„):

  • Struggles with HTML Formatting: A user asked for help on getting ChatGPT to format responses as HTML without generating a full HTML page. Another user suggested providing an example of the desired output for better results.
  • Improve Summary Prompt Feedback: A member shared a summary prompt and asked for feedback. Another member suggested using alternatives to refine the output further, focusing on clear, key messages in engaging formats.
  • Prompt Consistency Issues: A question was raised about prompts that most LLMs struggle to consistently get right. Responses included examples like confusion over specific queries such as ā€œWhat is davidjl?ā€
  • Generating Gradient Maps for Photoshop: A user shared a detailed request to generate gradient options for Photoshop using hex codes. Despite successful color combinations, they faced difficulties in getting Copilot to create a Python script for .GRD files and sought additional assistance.
  • Making ChatGPT Aware of API Content: A user inquired about the best approach to make ChatGPT aware of a GitHub repo’s API content. They considered options like extracting the API into a text file and integrating it into ChatGPT’s knowledge base.

Eleuther ā–· #general (109 messagesšŸ”„šŸ”„):

  • Members discuss challenges with GPU limitations: Members shared concerns about being ā€œGPU poorā€ and discussed potential solutions like using sd turbo or CPU-based models to mitigate waiting times. One member mentioned, ā€œYou’ll still have to wait a minute or so, but it’s still worth itā€.
  • Fixed vs. Random Seeds in Model Training: There was an insightful discussion on whether companies use fixed seeds or random seeds for training production neural networks. A member mentioned they are setting a manual seed to escape local minima by tuning parameters, with another highlighting, ā€œthere is always a seed, it’s just a matter of whether you know what it was.ā€
  • Examine MatMul-free Models for LLMs: A link to an arXiv paper was shared, emphasizing the potential of eliminating MatMul operations in large language models while maintaining strong performance, with experiments showing promising results up to 2.7B parameters.
  • Exploring Diffusion Models in NLP: A suggestion was made to potentially upgrade a 2B LLM to reach 7B LLM quality using diffusion models, followed by sharing references like this survey paper. A member commented, ā€œgenerally speaking, the approach is not to repeatedly denoise the next token but to denoise all the tokens repeatedly in random order.ā€
  • AI Safety Research Funding in Hungary: There was an involved discussion about the viability and impact of $30M for sponsoring AI safety research in Hungary, with emphasis on ensuring money isn’t wasted and cloud access for compute is ideal. One member suggested, ā€œIf you’re an individual, a couple hundred thousand dollars would be impactful,ā€ while another highlighted the significance of PR in AI safety endeavors.

Links mentioned:


Eleuther ā–· #research (173 messagesšŸ”„šŸ”„):

  • Real-world application of RoPE technique: Members discussed using Relative Position Encodings (RoPE) to enhance non-autoregressive text generation models. ā€œThe simplest well performing thing you can do is just use ROPE to add embeddings to keys and queries but add the ā€˜current’ position to one and the ā€˜target’ position to the otherā€.
  • Model initialization using interpolation: A member proposed initializing a model’s weights by interpolating the weight matrices to double their size, similar to handling images. This approach ā€œmight require minimal continued training to ā€˜heal’ the resulting modelā€.
  • Layer pruning and efficiency: Discussions included layer pruning strategies and their impact on model efficiency and performance. One member successfully pruned Qwen 2 72B to about 37B while retaining effectiveness.
  • Stability in weight tying and Universal Transformers: The conversation covered the instability of weight tying in large models and how Universal Transformers (UTs) can be stabilized. ā€œYe it can get unstable really quickly. FWIW my experiments are the only documented ones I can find which have actually scaled straightforward UTs beyond 20Mā€.
  • LoRA initialization enhancements: Members examined new methods for initializing Low-Rank Adaptation (LoRA) weights to accelerate convergence. Insights included using Singular Value Decomposition (SVD) for initialization which outperforms traditional methods.

Links mentioned:


Eleuther ā–· #interpretability-general (14 messagesšŸ”„):

  • Old Techniques Make a Comeback: ā€œHoly moly. Wait. People are coming back around to TopK activations? Numeta really was ahead of its time.ā€ A user expressed astonishment and nostalgia over the resurgence of interest in TopK activations.
  • New Insight on Mechanistic Interpretability: Mechanistic interpretability aids in understanding models and uncovering ā€œmodel bugs.ā€ A new paper highlighted that understanding CLIP neurons can automatically generate semantic adversarial images, as discussed in this tweet.
  • Short Circuiting Offers Hope for LLM Security: A new alignment technique called Short Circuiting promises adversarial robustness for LLM security. Despite showing promising results, the code has not yet been released, further discussed here.
  • Project on MLP Neurons in Llama3 Launched: An exciting project exploring MLP neurons in the Llama3 model has been shared, featuring a web page for neuron exploration and a write-up available on neuralblog. The project’s code is open-source and accessible on GitHub.
  • DeepSeek Model Interpretation Challenges: Users discussed the complexities and their initial difficulties with interpreting a DeepSeek model using transformerlens. However, they shared potential ideas and a GitHub repository link (repository) for collaborative problem-solving.

Links mentioned:

  • Tweet from Andy Zou (@andyzou_jiaming): No LLM is secure! A year ago, we unveiled the first of many automated jailbreak capable of cracking all major LLMs. 🚨 But there is hope?! We introduce Short Circuiting: the first alignment techniqu…
  • Tweet from Yossi Gandelsman (@YGandelsman): Mechanistic interpretability is not only a good way to understand what is going on in a model, but it is also a tool for discovering ā€œmodel bugsā€ and exploiting them! Our new paper shows that…
  • Llama-3-8B MLP Neurons: no description found

Eleuther ā–· #lm-thunderdome (9 messagesšŸ”„):

  • Member struggles with MAUVE setup: A member requested help with running MAUVE for a paper on new sampling methods. They shared the MAUVE GitHub repository, noting difficulty in setting it up.
  • Concurrency limitations in eval harness: Discussion on how the eval harness runs queries serially, with batch size parameter proving ineffective. It was recommended to use --model hf and --model vllm for better concurrency.
  • Custom task YAML troubleshooting: A member’s custom task is failing to generate output, possibly due to issues in doc_to_text or doc_to_target, or a missing stop sequence. The recommendation was to manually specify stop sequences.
  • Chat template application issues: A query was raised about whether the chat template of an Hugging Face model is applied by default during gsm8k eval runs. Clarified that chat templating support is available via the --apply_chat_template flag but is not enabled by default.

Link mentioned: GitHub - krishnap25/mauve-experiments: Contribute to krishnap25/mauve-experiments development by creating an account on GitHub.


Modular (Mojo šŸ”„) ā–· #general (141 messagesšŸ”„šŸ”„):

  • Installing MAX on MacOS Requires Manual Fixes: Users discussed installing MAX on MacOS 14.5 Sonoma and encountered issues needing manual fixes. Solutions involved setting Python 3.11 with pyenv and instructions from Modular’s install guide.
  • Structured Concurrency vs. Function Coloring Debate: Members debated structured concurrency vs. function coloring, with opinions on complexity and performance. One participant mentioned that ā€œEffect generics do solve function coloring, they just make the language harder to writeā€.
  • Concurrency in Programming Languages: The conversation covered concurrency primitives and the efficacy of Erlang/Elixir, Go, and async/await mechanisms. One user noted that ā€œMojo has the advantage of being able to design itself to accommodate all of these things from the very startā€.
  • MLIR and Mojo: The relevance of MLIR dialects in Mojo’s async operations was discussed, with mentions of the async dialect in MLIR’s docs. ā€œThe team stated at modcon that they only use their builtin and index dialects,ā€ a user clarified.
  • Funding and Viability of New Programming Languages: There was a dialogue about the financial backing required for developing new programming languages, citing $130M in funding for Modular and comparisons to teams like Rust and Zig. ā€œ130M is more than most programming language teams can ever dream about,ā€ emphasized one participant.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #šŸ“ŗļø±youtube (1 messages):

  • Modular drops a new video: Modular has posted a new video and shared the YouTube link to watch it. The video appears to be a recent update or release relevant to the channel’s followers.

Modular (Mojo šŸ”„) ā–· #ai (1 messages):

dorjeduck: a new gem from Andrej https://youtu.be/l8pRSuU81PU


Modular (Mojo šŸ”„) ā–· #šŸ”„mojo (86 messagesšŸ”„šŸ”„):

  • Mojo Playground alternatives: A member suggested using cloud servers like AWS, GCP, or Azure if the Mojo Playground doesn’t meet specific needs, particularly highlighting the ease of use of Google’s GCP instances with in-browser Cloud Shell.
  • Dissertation roadblock with Mojo: A member discussed potential issues using Mojo for a biology simulation dissertation, citing a lack of Class support as a major obstacle, and ultimately deciding against Mojo due to its current limitations.
  • Subprocess plans in Mojo: Members inquired about future plans for implementing subprocesses in Mojo. Though discussions have taken place, no specific timeline has been set for this feature.
  • Pointer type differences: A member pointed out a discrepancy with the new UnsafePointer type lacking alignment specification in its alloc function, which is present in LegacyPointer.
  • Custom PRNG and core updates: A member shared his implementation of the xoshiro PRNG in Mojo, achieving significant performance improvements, and mentioned ongoing work in porting numerics libraries to Mojo with links to related projects: numojo and NuMojo.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #šŸŽengine (11 messagesšŸ”„):

  • Selecting tensor axis with Mojo: A user sought help on how to represent g[0][:, 2][:, None] using the Mojo API. Another member suggested using g[0][2, axis=1].reshape(3, 1) as a current workaround while hinting at future UX improvements.
  • Embedding large data (weights) into binaries: A user asked about compiling large data (weights) into the final binary. It was suggested to use the MAX checkpoint API with an example from the quantize-tinystories pipeline.
  • New update’s quantization techniques: Inquiry about the specific quantization techniques in the latest update was answered with details on pre-quantized GGML k-quants and pointers to GGML k-quants documentation and the Llama 3 pipeline.
  • Broken link in blog post: A broken link in a blog post was identified, leading to a 404 error. The correct link was suggested as: https://docs.modular.com/max/api/mojo/graph/quantization/.
  • Clarifying quantization documentation: Users discussed possibly incorrect links in the quantization documentation and provided the correct URL. It was clarified that the correct link for the quantization API documentation is likely: https://docs.modular.com/max/api/mojo/graph/quantization/.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #nightly (48 messagesšŸ”„):

  • Benchmarking Work in Progress: A member asked if there was a place where benchmark results are published. Another confirmed that while benchmarks are not public yet (ā€œthere’s still a lot of work we’re doing internallyā€), they may be available in the future.
  • Context Managers over defer Keyword: A lively debate emerged about the potential introduction of a defer keyword for tasks like automated memory management. Members suggested context managers as a more idiomatic and practical solution for handling resources in Python, with examples given for managing unsafe pointers effectively.
  • Memory Management in Mojo: Detailed discussions covered manual memory management and RAII (Resource Acquisition Is Initialization) in the context of Mojo’s current capabilities. It was noted that UnsafePointers do not have lifetimes, and a concept similar to Rust’s Box could be beneficial for automatic memory cleanup.
  • New Nightly Mojo Compiler Releases: Multiple announcements about nightly Mojo compiler releases were made, with updates to 2024.6.805, 2024.6.905, and 2024.6.1005. Links to raw diffs and current changelogs were provided to keep the community informed of the latest changes (link).
  • Resource Management Best Practices: The importance of using context managers for handling resources was highlighted, particularly due to their ability to manage exceptions and ensure proper resource release. This is seen as essential for stable and reliable pointer management in Mojo.

OpenInterpreter ā–· #general (179 messagesšŸ”„šŸ”„):

  • Gorilla OpenFunctions v2 impresses community: Members discussed the new Gorilla OpenFunctions v2, noting its capabilities and performance, especially how it’s on par with GPT-4. They highlighted the importance of this new tool for LLMs to form executable API calls from natural language instructions.
  • Local II’s Local OS Mode announcement excites: Killianlucas announced that Local II now supports local OS mode, exciting members about potential live demos at the house party. The update is available via pip install --upgrade open-interpreter.
  • Recording of house party shared: The YouTube recording of the recent house party was shared, and members expressed gratitude for the recording and excitement about the demos, especially by twodogseeds.
  • Challenges with Interpreter models and fixes: Members reported and discussed various technical issues with the OI models, including API key errors and problems with vision models like moondream. Solutions and potential changes to fix these issues were exchanged.
  • Shortcuts and Siri integration with OI: Gordanfreeman4871 shared achievements in integrating Siri Shortcuts with Open Interpreter, allowing for commands to be voiced through Siri and executed in the terminal, and posted a tutorial video showcasing this integration.

Links mentioned:


OpenInterpreter ā–· #O1 (24 messagesšŸ”„):

  • Rabbit R1 gets hacked for O1: A member excitedly received their Rabbit R1 and asked, ā€œNow how do I hack this to run O1?ā€ sparking excitement among other members keen to test it out.
  • Struggles with Raspberry Pi setup: User noimnull inquired if anyone has run O1 on a Raspberry Pi, particularly relying on ā€œpoetry run 01ā€, but faced issues, ā€œit’s stuck on the server, I think the resources are not enoughā€.
  • Connecting O1 with iPhone: User bp416 had trouble when connecting O1 running on a MacBook with the iPhone app. thatpalmtreeguy advised, ā€œIt sends the command when you let goā€ indicating the right way to use the app’s hello button.
  • O1 on Raspberry Pi 4 CM4: noimnull reported back they were using a Pi4 CM4 8GB but faced challenges presumably due to insufficient resources.
  • Linux installation tutorial needed: nxonxi requested a tutorial for installing O1 on Linux, a common ask for those trying to set up on different operating systems.

OpenInterpreter ā–· #ai-content (1 messages):

gordanfreeman4871: Your message here


Latent Space ā–· #ai-general-chat (49 messagesšŸ”„):

  1. swyxio highlights Ultravox release: @juberti unveils Ultravox, ā€œan open source multimodal LLMā€ with the capability to understand non-textual speech elements. The v0.1 release is available at ultravox.ai, and they are hiring.
  2. Discussion on retrieval integration: Chygao mentions Normal computing’s implementation for transformers and swyxio notes that the related speaker will be at ai.engineer. The implementation can be found on GitHub here.
  3. Controversy over Perplexity’s content usage: Swyxio notes a @JohnPaczkowski tweet criticizing Perplexity for repurposing content from Forbes without proper attribution.
  4. New leadership at OpenAI: OpenAI announces the appointment of a new CFO and CPO on their Twitter handle @OpenAI. They welcomed Friley as the CFO and Kevin Weil as the CPO.
  5. Apple’s intelligence integration discussed: Multiple users, including @karpathy and @matthew_d_green, discuss Apple’s new AI integration and ā€œPrivate Cloud Computeā€ system. The system aims to securely offload complex tasks to the cloud while maintaining high privacy standards.

Links mentioned:


Latent Space ā–· #ai-announcements (8 messagesšŸ”„):

  • New ICLR Podcast Episode Released: A new podcast episode has been released as part 2 of the ICLR 2024 series. The episode features discussions with Graham Neubig and Aman Sanger covering topics like code edits, sandboxes, and the intersection of academia and industry.
  • AI Engineer World’s Fair Speaker Announcement: The second wave of speakers for the AI Engineer World’s Fair has been announced. The conference is sold out of Platinum/Gold/Silver sponsors and Early Bird tickets, with more info available in their Microsoft episode.
  • HN Submission Timing Strategy: A user submitted the ICLR 2024 series to Hacker News around 9am PT, noting it as a good slot for visibility.
  • Coordination for Promotion on X: Discussions occurred on how to handle the promotion of the ICLR series on X, culminating in a decision that one member will promote while another retweets with additional context. A suggestion was also made to update an existing X post to include the latest information.

Links mentioned:


Latent Space ā–· #ai-in-action-club (98 messagesšŸ”„šŸ”„):

  • Live-streaming facial recognition amazes: A member discovered whothehellisthis.com on Websim.ai, describing it as a ā€œlive-streaming facial recognition website.ā€ They found it ā€œkind of amazing,ā€ sparking interest among other users.
  • Websim.ai inception: Users experimented with running websim.ai recursively inside itself until the page became unresponsive at four levels deep. This led to jokes and fascination about its capabilities.
  • Spreadsheet of Websim resources shared: A user shared a Google Sheets document containing various links and resources related to Websim.ai. This included a link to a gist of Websim’s system prompt, generating further interest and interaction.
  • Greentext generator and malleable web pages: One user mentioned creating a ā€œgreentext generatorā€ on Websim, while another expressed curiosity about Websim’s streaming mechanics. A demo URL led to discussions on creating usable frontends for local services through Websim.
  • Future moderation and meeting: Members discussed setting up poster users as moderators for upcoming meetings, agreeing on a plan to review and possibly record future sessions. This concluded with expressions of gratitude and enthusiasm for the session’s content.

Links mentioned:


Cohere ā–· #general (98 messagesšŸ”„šŸ”„):

  • Exploring Cohere’s Versatile Platform: Users discussed Cohere’s model list and their availability across multiple platforms like Amazon SageMaker and Microsoft Azure. One member clarified that Command R and R+ models are the most recent and superior versions.
  • In-Character Roleplay with AI: One user shared insights on improving in-character replies by using specific tool calls like ā€œreply_to_userā€ instead of the generic ā€œdirectly_answerā€ tool. They are integrating these practices into their Dungeonmasters.ai project to enhance AI-driven storytelling.
  • Community Member Introductions: Several new members introduced themselves, including a Jr NLP DS from Brazil and a recent MIT graduate exploring reranking models. They shared excitement about joining and collaborating within the Cohere community.
  • Project and Career Aspirations: Conversations included discussions about academic performances and career goals. Members also highlighted exciting projects, including one requiring collaboration with Cohere for AI-driven gameplay experiences.
  • Encouragement and Motivation: The community offered support and motivation, discussing how understanding NLP and leveraging AI APIs can lead to impactful project development. Members exchanged congratulations and encouragement to pursue internships and complete projects successfully.

Links mentioned:

  • Models: no description found
  • Dungeonmasters.ai: Discover Dungeonmaster: a dynamic frontend for NovelAI, offering a unique storytelling and image generation experience. Dive into immersive text adventures and creative journeys now!

Cohere ā–· #project-sharing (4 messages):

  • Scrollbar theming suggestion acknowledged: A member suggested to ā€œtheme these scrollbars,ā€ prompting another member to respond positively, stating, ā€œGood point, will add it soon.ā€
  • Cohere API praised for performance: After receiving positive feedback on the project’s performance (ā€œit works rly nice! šŸ”„ā€), another member credited the success to the Cohere API, acknowledging it with, ā€œthanks to the powerful cohere api šŸ’Ŗ.ā€

Cohere ā–· #announcements (1 messages):

  • Cohere SDKs conquer the cloud: Cohere SDKs are now compatible with multiple cloud platforms like AWS, Azure, and Oracle. Users can start using Cohere on Bedrock with a new Bedrock Cohere client from the Python SDK, enabling flexibility in backend choices for development.

Link mentioned: Cohere SDK Cloud Platform Compatibility: no description found


LAION ā–· #general (71 messagesšŸ”„šŸ”„):

  • Debate over crypto payment for AI compute: Members discussed the feasibility and potential pitfalls of paying for AI compute with cryptocurrency. One claimed that ā€œyou can already buy vast.ai credits using crypto,ā€ while another criticized the idea as ā€œanother emad crapto scam.ā€
  • Community alerts on malicious ComfyUI node: A member alerted others about the ComfyUI_LLMVISION node’s malicious behavior, stating it ā€œwill try to steal info like credit card details.ā€ They emphasized that ā€œif you’ve installed and used that node, your browser passwords, credit card info, and browsing history have been sent to a Discord server via webhook.ā€
  • New text-to-image model Lumina-Next-T2I: Members shared updates about the Lumina-Next-T2I model, which promises ā€œfaster inference speed, richer generation style, and more multilingual support.ā€ Another linked a Twitter update showcasing the model’s capabilities and available demos.
  • LAION controversy in Brazil: A member mentioned that LAION was featured negatively on Brazilian TV. Others linked an article discussing the misuse of personal photos of children for AI training on Human Rights Watch.
  • General misunderstanding about AI and privacy: In reaction to public concerns, members claimed that the public doesn’t understand that ā€œgenerative models aren’t violating anyone’s privacy.ā€ They argue that these models ā€œdon’t memorize random images of individualsā€ and that fears regarding such technologies are largely unfounded.

Links mentioned:


LAION ā–· #research (23 messagesšŸ”„):

  • Dynamic Sequence Generation with σ-GPT: A new method, σ-GPT, introduces on-the-fly dynamic sequence generation by adding a positional encoding for outputs, enabling sampling and conditioning on arbitrary token subsets. This dramatically reduces the number of model evaluations across various domains, such as language modeling and path-solving (Read more).
  • Alternative to Autoregressive Models: While σ-GPT shows impressive results, members raised concerns about its practical application due to the required curriculum for high performance, drawing parallels to XLNET which did not gain traction (Twitter discussion).
  • Transformer Embedding Analysis: There are inquiries regarding the nature of learned embeddings in transformers, comparing discrete and continuous representations. References to a 2019 paper provided insights into how attention heads contribute to model performance and can be pruned dramatically with minimal loss.
  • Prompt-Based Reasoning Challenges: Shared a GitHub repository containing prompts that challenge LLMs’ reasoning abilities, revealing that model failure often stems from overrepresented problems in training data.
  • Condition Embedding Perturbation Testing: Experimentation with condition embedding perturbation showed that applying Gaussian noise (at various gamma levels) influenced the model’s adherence to prompts, with notable results at higher gamma settings (Experimental results).

Links mentioned:


LAION ā–· #resources (1 messages):

sidfeels: https://youtu.be/l8pRSuU81PU


LAION ā–· #learning-ml (4 messages):

  • Leveraging Pre-trained Instruct Models: A member suggested using previously trained instruct models with coding capabilities for a project, by increasing their context window and feeding them Rust documentation. They noted that a Google scientist recommended this approach over training from scratch.
  • Generic Error Message Troubleshooting: A member explained that a generic error message in a websocket could be due to various issues such as reinstalling websocket or how the model handles operations. They advised providing debugging console output, running separate test code, and writing test cases to identify the problem.
  • Websocket Audio Response Lag: One member described the behavior of a Text-to-Speech (TTS) service websocket, noting that after the first browser refresh, the websocket receives a 1001 going away signal and manifests a lag where audio packets are delayed by one turn. This lag worsens after subsequent refreshes, with audio packets then being delayed by multiple turns, despite other websockets in the same container functioning correctly.

LlamaIndex ā–· #announcements (1 messages):

  • Advanced Knowledge Graph RAG Workshop Announced: A special workshop on advanced knowledge graph RAG will be held this Thursday at 9am PT featuring Tomaz Bratanic from Neo4j. Attendees will learn about LlamaIndex property graph abstractions, including high-level property graph index with Neo4j, and detailed aspects of graph construction and querying. Signup here

Link mentioned: LlamaIndex Webinar: Advanced RAG with Knowledge Graphs (with Tomaz from Neo4j) Ā· Zoom Ā· Luma: We’re hosting a special workshop on advanced knowledge graph RAG this Thursday 9am PT, with the one and only Tomaz Bratanic from Neo4j. In this webinar, you’ll…


LlamaIndex ā–· #blog (7 messages):

  • Integrate with e2b_dev’s sandbox to enhance data analysis: Create-llama now integrates with @e2b_dev’s sandbox, enabling users to not only write Python code to analyze data but also return entire files, like graph images. This integration significantly broadens the potential for agents.
  • Learn to build agentic RAG systems: A comprehensive blog/tutorial series by @Prince_krampah is recommended for building RAG systems, covering everything from basic routing to multi-step reasoning over complex documents.
  • Query Rewriting for Enhanced RAG: @kingzzm’s resource on three forms of query rewriting is essential for improving question-handling in RAG by bolstering the query understanding layer, which is crucial for effective retrieval.
  • Build a voicebot for customer service: A tutorial by @rborgohain4 showcases how to build a blazing-fast customer service voicebot using @Inferless_, @llama_index, faster-whisper, Piper, and @pinecone. This marks the next evolution beyond traditional chatbots.
  • Secure your RAG app on Enterprise Cloud: A tutorial by @pavan_mantha1 details how to use various services on @Azure to secure a RAG pipeline with @qdrant_engine and OpenAI, including app-specific identities for enhanced security measures.

LlamaIndex ā–· #general (87 messagesšŸ”„šŸ”„):

  • Chunk Size Increase in Directory Reader Questioned: A user inquired about increasing the chunk size in LlamaIndex's SimpleDirectoryReader. The bot provided an example code snippet to adjust the `chunk_size` parameter and referred to the LlamaIndex documentation.
  • Entity Resolution in Graph Store Discussed: Members discussed how to handle entity resolution, suggesting manual deletion and upserting of nodes in the graph store. An example method for deletion was shared, demonstrating how to specify entity names, relation names, properties, or IDs for deletion.
  • Adding Documents to Existing VectorStoreIndex: A user sought help adding additional PDFs to an existing VectorStoreIndex. The bot suggested using the `insert` method to add each document individually and referred to the documentation.
  • LlamaParse Service Interruption: Users experienced issues with the LlamaParse service, reporting requests being stuck in pending status. The problem was investigated and quickly resolved by a community member.
  • RLHF Stages and Chunk Sizes: Discussion on optimizing retrieval-augmented generation (RAG) systems, especially with many PDFs, highlighted the challenges of scalability. Members suggested using structured information for more precise retrieval and optimizing strategies per LlamaIndex documentation.

Links mentioned:


LlamaIndex ā–· #ai-discussion (1 messages):

  • Creating a dataset from a phone manual: A member sought help to create a dataset based on their phone manual. They aim to train a model using QLoRA to improve RAG (Retrieval-Augmented Generation) on the data.

OpenRouter (Alex Atallah) ā–· #announcements (3 messages):

  • Qwen 2 72B Instruct goes live: The Qwen 2 72B Instruct model is now available, as announced by OpenRouter.
  • Dolphin 2.9.2 Mixtral 8x22B launched as experiment: Dolphin 2.9.2 Mixtral 8x22B is now available for $1/M tokens, with the condition that it requires an average usage of 175 million tokens per day over the next week to avoid discontinuation. Users are recommended to use this model with a fallback to *ensure optimal uptime*.
  • StarCoder2 15B Instruct release: The StarCoder2 15B Instruct model is now available for use.

Links mentioned:


OpenRouter (Alex Atallah) ā–· #app-showcase (4 messages):

  • AI Code Brushes plugin showcased: A member shared a free AI code transformation plugin for Visual Studio Code using OpenRouter and Google Gemini. Check it out here.
  • AI Code Brushes compatibility discussed: Members discussed the compatibility of the AI Code Brushes plugin, highlighting that while any model works, the most popular models in the Programming/Scripting category tend to perform best. Explore the rankings here.

Links mentioned:


OpenRouter (Alex Atallah) ā–· #general (75 messagesšŸ”„šŸ”„):

  • Google and Apple Pay Payment Integration: Members discussed integrating Google Pay and Apple Pay into the payment system, with notes on their availability via mobile. A suggestion to add a crypto payment option for those preferring not to use apps was also discussed.
  • Handling Partial JSON in API Calls: Users shared challenges with receiving partial chunks while streaming OpenRouter chat completions and discussed solutions like maintaining a buffer for chunked data. One user referenced this article for more insights on handling chunked data.
  • Role Play Prompt Issues: Members exchanged tips on how to prevent chatbots from speaking as the user and recommended using detailed instructions in prompts to ensure better responses. A helpful guide was shared, Statuo’s Guide to Getting More Out of Your Bot Chats.
  • Language Support Discussion: There was a request and subsequent acknowledgment for adding a language category to evaluate models by language proficiency. Users anticipate better categorization for languages like Czech, French, Mandarin, etc.
  • Censorship and Bias in LLMs: An article, ā€œAn Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instructā€ was discussed, comparing Chinese and US LLM censorship approaches, with debates on the implications of these biases.

Links mentioned:


OpenRouter (Alex Atallah) ā–· #ģ¼ė°˜ (1 messages):

daun.ai: 오! ė°˜ź°€ģš“ ģ†Œģ‹ģ“ė„¤ģš” 慎慎


Interconnects (Nathan Lambert) ā–· #events (1 messages):

  • Extra ticket for Lehman Trilogy in SF: A member offered an extra ticket to the Lehman Trilogy theater play today at 7 pm in San Francisco. The show is a 3-hour event, and the member struggled to sell the ticket using ā€œx dot com the everything appā€ before deciding to post about it here.
  • John Heffernan’s notable works: John Heffernan’s extensive theatre repertoire includes acclaimed performances in ā€œMuch Ado About Nothing,ā€ ā€œSaint George and the Dragon,ā€ and ā€œEdward II.ā€ His TV roles span across works like ā€œDraculaā€ and ā€œThe Crown,ā€ while his film credits include ā€œThe Dukeā€ and ā€œOfficial Secrets.ā€

Link mentioned: The Lehman Trilogy | Official Site: Don’t miss the ā€˜must-see masterwork’ (Daily Telegraph) at the Gillian Lynne Theatre. See the story of a family and a company that changed the world.


Interconnects (Nathan Lambert) ā–· #ideas-and-feedback (6 messages):

  • Adopt Dylan’s structure for better content: A member suggested adopting a structure similar to Dylan’s, who provides a high-level overview and separate deep dives for paid subscribers. This segmentation might enhance the content’s quality and organization.
  • Quality over speed: Nathan Lambert acknowledged that Dylan’s content is more complete, emphasizing that while he is focused on practice and outreach, ā€œquality normally wins.ā€
  • Team and frequency for in-depth pieces: In-depth pieces may need a bi-weekly or monthly cadence, especially if this isn’t a full-time job. A member pointed out Dylan’s competent team as a factor in his in-depth content.
  • Diversified approach: Nathan Lambert is satisfied with his current, different approach and admits, ā€œI’m doing fine just doing something different for now.ā€

Interconnects (Nathan Lambert) ā–· #news (40 messagesšŸ”„):

  • Nathan Lambert praises Apple’s ā€œpersonal intelligenceā€: Despite mixed views on integration with ChatGPT, Lambert considers Apple Intelligence substantial and refers to it as ā€œAI for the rest of usā€. The official Apple press release elaborates on privacy and features.
  • Short videos dominate text2video model discussions: Multiple members note that companies like TikTok focus on short videos, with comparison examples like Sora and Kling showing variances due to language prompts. Concerns include dataset privacy and how China has a data advantage over Western companies.
  • Skepticism around OpenAI and Apple’s partnership: Gurman’s leak initially suggested a deeper OpenAI integration, but it appears superficial. Members like sid221134224 highlight privacy policy conflicts where post-login, ChatGPT’s policies override Apple’s.
  • Excitement for upcoming Dwarkesh-episode with FranƧois Chollet: Members express anticipation for Dwarkesh Patel’s upcoming interview with FranƧois Chollet due to his more skeptical AGI timeline views. Sid221134224 and natolambert regard it as a refreshing change from previous interviewees.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #random (25 messagesšŸ”„):

  • Daylight Computer sparks interest: Members discussed the Daylight Computer and its unique features like blue-light reduction and visibility in direct sunlight. One member noted, ā€œIt’s not something that I would personally use but I always appreciate the effort and thought going into product design.ā€
  • Concerns with early product adoption: There was a discussion on the risks of buying early versions of new tech like the Daylight Computer. As one member put it, ā€œIt’s hard to know in advance if the company engages with their early adopters in good faith.ā€
  • Founder outreach and product testing: Suggestions were made to reach out to the Daylight founder for a testing unit. One member humorously noted, ā€œJust send and forget about it. šŸ™‚ā€
  • Leaving the Bay Area: Nathan Lambert announced he is leaving the Bay Area and mentioned, ā€œBye bye Bay Area āœŒļøāœŒļøšŸ˜¢šŸ˜¢šŸ„².ā€
  • Seeking tutorials for language modeling: Nathan Lambert asked for any recent tutorials on language modeling, specifically from big AI conferences, for a proposal submission to NuerIPs.

Links mentioned:

  • Daylight | A More Caring Computer: Daylight Computer (DC1) is a new kind of calm computer, designed for deep work and health.
  • Tweet from murat šŸ„ (@mayfer): damn this guy’s vibes are impeccable just ordered one purely out of respect Quoting Jason Carman (@jasonjoyride) The world’s first 60+ FPS e-ink display by @daylightco on Episode 45 of S³ …

Interconnects (Nathan Lambert) ā–· #rl (3 messages):

  • Messy or Unproven?: A user considered implementing an unproven method from a paper for TRL contributions but questioned its validity. Nathan Lambert clarified that ā€œmessyā€ was not the right word and indicated it is ā€œunproven.ā€
  • Review Offer: Nathan Lambert offered to review any Pull Requests (PRs) submitted related to the unproven method. He stated, ā€œLmk if you submit a PR, would happily review.ā€

Interconnects (Nathan Lambert) ā–· #reads (7 messages):

  • SRPO addresses RLHF task dependency: A shared paper from Cohere proposes Self-Improving Robust Preference Optimization (SRPO), tackling the issue where existing RLHF methods are highly task-dependent. The paper introduces a mathematically principled offline RLHF framework aiming for robustness in out-of-distribution tasks through a min-max optimization strategy.
  • RL Channel Discusses SRPO: References were made to ongoing discussions in the RL channel about the SRPO paper, comparing it to Deterministic Policy Optimization (DPO). One member noted it seems like mostly a theory paper, and concluded with ā€œWe’ll seeā€.

Link mentioned: Self-Improving Robust Preference Optimization: no description found


LangChain AI ā–· #general (66 messagesšŸ”„šŸ”„):

  • Issues with Markdown File Processing: A member discussed difficulties with a markdown file processing task in LangChain, where a 25MB file seemed to run indefinitely. No solutions were provided yet in the discussion.
  • LangChain and Bagel Integration Announcement: A member shared a tweet about the new integration between LangChain and Bagel, highlighting the enabling of secure and scalable dataset management.
  • Customizable Tagging Chains: A member queried about using create_tagging_chain() with non-OpenAI models and customizing prompts but mentioned facing issues with prompts getting ignored.
  • Handling Special Characters in Retrieval: A member faced issues handling special characters when filling a retriever with GitHub docs and sought help in ensuring correct Pydantic model outputs.
  • Optimizing Document Loaders and Splitters: There was extensive discussion on the intricacies of loading and chunking various document types (e.g., PDFs, Java code, Excel files) for optimal results in LangChain. One member emphasized this process as more of an art than a science.

Links mentioned:


LangChain AI ā–· #langserve (1 messages):

  • Need for minimal example using api_handler(): A member sought assistance with using api_handler() to get a playground without using add_route(). They specifically mentioned wanting to use the explicit parameter playground_type=ā€œdefaultā€ or ā€œchatā€ to lock the endpoint.

LangChain AI ā–· #share-your-work (7 messages):

  • Rubik’s AI free beta testing offer: Users are invited to beta test a new advanced research assistant and search engine with a 2-month free premium using the promo code RUBIX. The platform includes models like GPT-4 Turbo, Claude-3 Opus, and Mistral Large among others (Rubik’s AI).
  • Langchain and DashScope Reranker article: A medium article titled ā€œUnleash the Search Beast: Langchain and DashScope Rerankerā€ is shared for users interested in enhancing their search and ranking capabilities with these tools. Read the article.
  • MIT visualization tool for journalism: A new visualization tool built to help journalists identify trending academic research topics that lack media coverage is seeking feedback. The tool is open source and available on GitHub and the demo can be tried here.
  • AI audio news briefing prototype: Feedback is sought for a new AI-driven audio news briefing service that lets users listen to news stories and ask questions for better understanding. Interested users can view the demo video on Loom.
  • Chat With ā€˜Em on Hugging Face: This new Hugging Face Space allows users to chat with multiple AI models like Groq, Anthropic, OpenAI, and Cohere. Easily customizable, it supports switching between different models using an API key (Chat With ā€˜Em).

Links mentioned:


LangChain AI ā–· #tutorials (1 messages):

  • Step-by-step guide on building LLM apps gets released: A member shared a step-by-step guide on building LLM apps, summarizing their research and experience over the past 2 years. They encouraged readers to give it a quick read, 50 claps, and share their thoughts.

OpenAccess AI Collective (axolotl) ā–· #general (16 messagesšŸ”„):

  • Channel Mirroring Confusion: A member inquired about why a channel could not be mirrored to other servers, suspecting it needed to be set as an ā€œannouncementsā€ channel. Another member suggested the server must be a public community server with specific settings enabled for this feature.
  • Technical Issues with Dataset Loading: A user reported that failing to load datasets could be due to filenames containing brackets, triggering errors like datasets.arrow_writer.SchemaInferenceError: Please pass 'features' or at least one example when writing data.
  • Alternative Instruction Formats in Training: Discussion took place on whether ShareGPT is the best format or if alternatives like reflection or special instruction templates were better. One member clarified that ShareGPT is converted to the model’s prompt format during training.
  • Benchmarks for Apple’s Models: A member shared a Twitter link featuring benchmarks comparing Apple’s new on-device and server models against other popular models in instruction following and writing abilities.
  • Axolotl’s Regenerative Abilities: Another user shared a YouTube video highlighting the regenerative abilities of axolotls, mentioning their capacity to regrow different body parts in a few weeks.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #general-help (16 messagesšŸ”„):

  • Workaround for installing packages with pip3: A user found that installing packages separately, pip3 install -e '.[deepspeed]' and pip3 install -e '.[flash-attn]', avoids RAM overflow compared to installing them together. They performed this in a new conda environment with Python 3.10.
  • Multimodal finetuning in axolotl: A member inquired about support for multimodal finetuning in axolotl. Another member mentioned an old Qwen branch that was used for this purpose but hasn’t been active recently.
  • Issue with data pre-processing for Qwen 2: A user encountered an error while pre-processing data for Qwen 2, resulting from an AttributeError: LLAMA3. The error appears linked to ShareGPT and ChatML.
  • Aligning SFT models with DPO: A member asked if they should include the entire conversation history or just the last turn when aligning an SFT model with DPO. The response suggested testing both methods but noted that axolotl’s current DPO might only train on one turn.
  • Testing fine-tuned models: A user queried about testing their fine-tuned model with a test set. The response highlighted the existence of a test_dataset: configuration to facilitate this.

OpenAccess AI Collective (axolotl) ā–· #datasets (1 messages):

  • Configuring JSONL Datasets Successfully: A member shared a successful configuration for using regular JSONL files with specified paths for both training and testing datasets. The configurations include paths to alpaca_chat.load_qa for training and context_qa.load_v2 for evaluation, formatted as per the documentation.

OpenAccess AI Collective (axolotl) ā–· #axolotl-phorm-bot (8 messagesšŸ”„):

  • Adjust Learning Rate When Altering Epochs: When changing the number of epochs, reduce the learning rate by the same factor to maintain the same learning per data sample. This compensates for the increased number of updates.
  • Micro_batch_size and Gradient_accumulation_steps Impact: Effective batch size is crucial; if it changes, adjust the learning rate accordingly. The common practice is to scale the learning rate linearly with batch size changes as per guidelines from Hugging Face.
  • Adjust for Number of GPUs: Increase in GPUs should be matched with a proportional increase in learning rate due to the effective batch size growth. This guideline helps achieve stability and efficiency in training.
  • Clarification Sought on Inconsistencies: A user pointed out the inconsistency in Phorm’s initial advice concerning gradient_accumulation_steps and effective batch size. They requested the correct approach be confirmed and sourced properly.

Links mentioned:


tinygrad (George Hotz) ā–· #general (20 messagesšŸ”„):

  • George Hotz critiques PyTorch’s fuse_attention.py: George Hotz shared a link to PyTorch’s fuse_attention.py, commenting that it is ā€œnicer than UPatā€ but a bit too verbose. He is contemplating the syntax to implement more advanced features from symbolic in UPat.
  • Graph pattern matchers and development projects: George Hotz is seeking literature on graph pattern matchers and suggests improving the speed of the pattern matcher in tinygrad as a beginner project. This task is feasible and correctness can be verified through process replay testing.
  • Discussion about U in UOp: There was a brief discussion about the meaning of the ā€œUā€ in UOp, with Hotz clarifying that it stands for ā€œmicro op.ā€
  • Preparation for Code Europe and discussion on slides: George Hotz mentioned he will be attending Code Europe and is open to talking about tinygrad. There was also a small suggestion to modify the final slide of his presentation for better audience engagement.
  • Upcoming Monday meeting agenda: Chenyuy outlined the agenda for the upcoming Monday meeting, which includes topics such as symbolic uops, process replay tests, and bounty updates.

Link mentioned: pytorch/torch/_inductor/fx_passes/fuse_attention.py at main Ā· pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch


tinygrad (George Hotz) ā–· #learn-tinygrad (4 messages):

  • HSA Deprecated; HIP and OpenCL are alternatives: George mentioned that ā€œHSA doesn’t exist anymoreā€ and suggested using a minimal HIP integration or OpenCL for correctness checks. He confirmed all RDNA3 GPUs should work with tinygrad.
  • Minimum Spec Requirement Clarified: George clarified that the minimum spec for AMD GPUs is RDNA and for Nvidia GPUs is 2080 (the first GPU with GSP). He expressed openness to adding support for RDNA/RDNA2/CDNA if it doesn’t require many changes.
  • Vega20 GPUs not a priority: Despite past popularity post-mining era, pre-RDNA GPUs like Vega20 are not considered a serious target for tinygrad support due to their limited RAM and performance. ā€œTheoretical specs were okish, but amount of RAM pretty bad for any ML (16 GB)ā€, jewnex commented.

AI Stack Devs (Yoko Li) ā–· #app-showcase (1 messages):

mikhail_ee: Hey! The author of http://hexagen.world is here šŸ™‚ Thanks for sharing!


AI Stack Devs (Yoko Li) ā–· #ai-town-discuss (16 messagesšŸ”„):

  • Convex architecture explains unique game loop: The game loop runs in Convex, executing as a series of short-lived serverless functions. This differs from traditional games where the game state is kept in-memory and everything runs on the same machine.
  • Scalability via distributed functions: The setup allows handling numerous API calls and inputs because the game’s aspects are managed by independent serverless function calls. This means convex’s backend can scale efficiently.
  • Client updates via websocket subscriptions: The client subscribes to a query through a websocket, receiving push updates whenever there are changes. This is one of the beneficial features provided by Convex.
  • Challenges for multiplayer scenarios: Due to varying network latencies among players, competitive play isn’t optimal. This aspect was highlighted to explain the limitations for real-time interactions in a multi-user environment.
  • Deep dive into AI Town architecture: For CS studies, it is recommended to review AI Town’s Architecture document for a comprehensive understanding of the inner workings.

Link mentioned: ai-town/ARCHITECTURE.md at main Ā· a16z-infra/ai-town: A MIT-licensed, deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize. - a16z-infra/ai-town


AI Stack Devs (Yoko Li) ā–· #ai-town-dev (2 messages):

  • Convex.json config file goes missing: A user expressed difficulty locating the convex.json config file, suggesting potential confusion or misplacement in the file structure.
  • Convex backend error issues: When attempting to run the Convex backend, an error occurred stating, ā€œRecipe convex could not be run because just could not find the shell: program not found,ā€ hinting at a missing dependency or misconfiguration.

AI21 Labs (Jamba) ā–· #jamba (7 messages):

  • Agentic architecture might ā€œmaskā€ problems, not solve them: A member inquired if ā€œagentic architecture,ā€ which breaks complex tasks into simpler ones, can solve inherent limitations. Another pointed out that despite Theorem 2 indicating a possible mitigation, it ultimately does not resolve the issue for deeper problems.
  • Limitations in model architectures: In response to a discussion about the limitations in model architectures such as RNNs, CNNs, SSMs, and Transformers, it was clarified that these models struggle to perform real reasoning due to their structural constraints, as highlighted by Theorem 1.
  • Need for deeper understanding: A member plans to reread the paper to fully grasp the concepts discussed, particularly around the limitations of current architectures and the communication complexity problem.
  • Theorem 1 and communication complexity: One participant summarized their understanding of Theorem 1, explaining that it involves a function composition problem with three agents and highlights the necessity of multiple communications for correct computations. These interactions can sometimes lead to agents hallucinating results.

Datasette - LLM (@SimonW) ā–· #ai (1 messages):

  • Participate in the Leaderboards: A member speculated that a particular release was done to enable research and leaderboard participation. They commented, ā€œI think they released it in this form so that people can research it and that they can participate in the leaderboards.ā€

Datasette - LLM (@SimonW) ā–· #llm (4 messages):

  • UMAP rocks at clustering: ā€œUMAP is amazing at clustering,ā€ exclaimed a user, praising the tool’s capabilities. They suggested checking out more details in an interview with the creator of UMAP.
  • Dive into UMAP with its creator: Vincent Warmerdam shared a YouTube video titled ā€œMoving towards KDearestNeighbors with Leland McInnes - creator of UMAP.ā€ This video delves into the nuances of UMAP, PyNNDescent, and HDBScan, and features insights from Leland McInnes himself.

Link mentioned: Moving towards KDearestNeighbors with Leland McInnes - creator of UMAP: Leland McInnes is known for a lot of packages. There’s UMAP, but also PyNNDescent and HDBScan. Recently he’s also been working on tools to help visualize clu…


Torchtune ā–· #general (2 messages):

  • Random TRL KL Plot Inquiry: A member queried if KL plots from TRL were used during the DPO implementation experiment. The response indicated none were used, but referenced KL plots in TRL’s PPO trainer for anyone interested.

Links mentioned:


DiscoResearch ā–· #disco_judge (1 messages):

  • Lighteval struggles with bitsandbytes models: A member sought assistance for evaluating a bitsandbytes model on lighteval using a provided command. The attempt failed as lighteval did not recognize the bitsandbytes method, instead requesting GPTQ data.

DiscoResearch ā–· #discolm_german (1 messages):

  • Document Packing Discussion: A member inquired about the code in the model card for document packing, questioning whether it’s a naive implementation or an actual used one. They also sought clarification on the data type of tokenized_documents, mentioning their need for an efficient solution for handling big datasets.

MLOps @Chipro ā–· #events (1 messages):

  • Meet Chip Huyen at Mosaic Event: Chip Huyen announced that she will be attending the Mosaic event at the Databricks summit tonight and encouraged others to say hi if they are attending. More details about the event can be found here.

Link mentioned: Events | June 10, 2024 San Francisco, CA: no description found




{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}