AI News for 7/19/2024-7/22/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (474 channels, and 7039 messages) for you. Estimated reading time saved (at 200wpm): 765 minutes. You can now tag @smol_ai for AINews discussions!

We know it's coming tomorrow (with Soumith's ICML keynote), so we really tried to avoid discussing the leaks since we're going to be covering it tomorrow, but Llama 3.1 is leaking like a sieve (weights, evals, model card) that is leaky, so unfortunately it is all the community is talking about today, despite a lot of it being repeats of the first Llama 3 release in April.

Apart from the well telegraphed 405B dense model release, here are the diffs to Llama 3.1 as far as we can tell, mostly from the model card spells out the various priorities they had:

"The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks."
Explicitly advertising "Multilingual text and code as an output modality
every model bumped up to 128k context length (up from 8k)
Training utilized a cumulative of 39.3M GPU hours of computation on H100-80GB (TDP of 700W): 1.5m for 8B, 7m for 70B, 31M for 405B.
Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples.
sizable bumps to the 8B and 70B benchmarks (MMLU from 65 to 73 for the 8B (+8 points), and from 81 to 86 for the 70B (+5 points), and MATH from 29 to 52 (+23 points) for the 8B

We made a diff spreadsheet to visualize - TLDR, HUGE bump for the 8B, across the board and instruct 70b is mildly better. 405B is still behind flagship models.:

However some independently run evals have Llama 3.1 70b doing better than GPT 4o - jury still out.

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

GPT-4o Mini Release and Performance

GPT-4o Mini Launch: @adcock_brett announced the release of GPT-4o mini, a compact and cost-efficient version of the GPT-4o model, with pricing at 15 cents per million input tokens and 60 cents per million output tokens, over 60% cheaper than GPT-3.5 Turbo.
Strong Performance: @swyx highlighted GPT-4o mini's impressive performance, achieving 82 MMLU at $0.15/mtok, outperforming models that were state-of-the-art just 3 months ago.
Reasoning Deficits: @JJitsev tested GPT-4o mini on AIW problems and found basic reasoning deficits and lack of robustness on simple problem variations, performing worse than Llama-3-8B despite similar compute scale.

Synthetic Data and Model Performance

Surpassing Teachers: @_philschmid shared that the AI-MO team's winning dataset with a fine-tuned @Alibaba_Qwen 2 model approaches or surpasses GPT-4o and Claude 3.5 in math competitions, demonstrating the potential of synthetic datasets to enable models to outperform their teachers.
NuminaMath Datasets: @_lewtun introduced the NuminaMath datasets, the largest collection of ~1M math competition problem-solution pairs, which were used to win the 1st Progress Prize of the AI Math Olympiad. Models trained on NuminaMath achieve best-in-class performance among open weight models.

Reasoning and Robustness Benchmarks

Comprehensive Reasoning Task List: @Teknium1 suggested creating a master list of reasoning tasks for people to contribute to, aiding dataset builders in targeting tasks that improve reasoning capabilities.
Illusion of Strong Performance: @JJitsev argued that current benchmarks overlook clear deficits in SOTA LLMs, creating an illusion of strong performance for models that manage to score high, despite their inability to perform basic reasoning robustly.

Memes and Humor in the AI Community

Meme Potential: @kylebrussell shared a meme, suggesting its potential impact in the AI community.
AI-Generated Humor: @bindureddy shared an AI-generated image, highlighting the role of AI in creating humorous content and providing a break from serious topics.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. AI-Powered Mathematics Training

NuminaMath datasets: the largest collection of ~1M math competition problem-solution pairs (Score: 53, Comments: 1): NuminaMath unveils massive math dataset: The NuminaMath collection, featuring approximately 1 million math competition problem-solution pairs, has been released on the Hugging Face Hub. This dataset, accompanied by models and a technical report, represents the largest collection of its kind, potentially advancing AI capabilities in mathematical problem-solving.

Theme 2. Local LLM Resource Optimization

large-model-proxy allows to run multiple LLMs on different ports of the same machine while automatically managing VRAM usage by stopping/starting them when needed. (Score: 68, Comments: 10): large-model-proxy is a tool that enables running multiple Large Language Models (LLMs) on different ports of the same machine while automatically managing VRAM usage. The proxy dynamically stops and starts models as needed, allowing users to efficiently utilize their GPU resources without manual intervention. This solution addresses the challenge of running multiple memory-intensive LLMs on a single machine with limited VRAM.
- The author developed large-model-proxy to efficiently manage multiple LLMs in their workflow. It automates VRAM management and model starting/stopping, making it easier to script and utilize various models without manual intervention.
- A user points out that Ollama offers similar functionality, allowing multiple models to run concurrently with automatic unloading/loading based on VRAM availability, without the need for multiple ports or config file editing.
- Another developer mentions using Python and OpenResty Lua scripts to proxy OpenAI API requests and manage LLaMa.cpp instances on demand, expressing interest in the VRAM management aspect of large-model-proxy.

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

Theme 1. LLaMA 3 405B Model Release and Implications

[/r/singularity] Looks like we’re getting LLama3 405B this week (Score: 443, Comments: 113): LLaMA 3 405B model imminent: Meta's AI research team is expected to release the LLaMA 3 405B model this week, according to insider information. This new model is anticipated to be a significant upgrade from its predecessors, potentially rivaling or surpassing the capabilities of GPT-4.

Theme 2. AI in Healthcare: Improving Cancer Detection

[/r/singularity] GPs use AI to boost cancer detection rates in England by 8% (Score: 203, Comments: 25): AI-assisted cancer detection in England has led to an 8% increase in referrals for suspected cancer. General practitioners using the C the Signs tool have referred 92,000 more patients for urgent cancer checks over a two-year period. This AI system helps doctors identify potential cancer symptoms and determine appropriate next steps, demonstrating the potential of AI to enhance early cancer detection in primary care settings.

Theme 3. LocalLLaMA Advancements and Applications

[/r/StableDiffusion] On the same day, two people stole my design and app, and posted them as their own. (Score: 259, Comments: 195): Plagiarism strikes AI developer: The creator of a virtual dress try-on Chrome extension using LocalLLaMA reports that two individuals copied their design and application, presenting them as original work on the same day. This incident highlights the ongoing challenges of intellectual property protection in the rapidly evolving field of AI development and application.
- Two individuals allegedly copied OP's virtual dress try-on Chrome extension, sparking debate about intellectual property in AI development. Many users pointed out that similar products already exist, questioning OP's claim to originality.
- Users highlighted that the project, using fal.ai API, could be recreated in 15 minutes with basic inputs and a button. This ease of replication raised questions about the value of simple AI implementations and the need for more robust barriers to entry.
- Discussion centered on the importance of open-sourcing projects and properly crediting ideas. Some argued that ideas cannot be copyrighted, while others emphasized the need to acknowledge inspirations, even for simple implementations.

AI Discord Recap

A summary of Summaries of Summaries

1. LLM Model Releases and Benchmarks

DeepSeek-V2 Tops Benchmarks: DeepSeek-V2, a 236B parameter MoE model (21B activated per token), is praised for its excellent performance and cost-efficiency at $0.14 per input token, outperforming GPT-4 in some areas like AlignBench and MT-Bench.
- The model's impressive 1-bit quantization results for DeepSeek-V2-Chat-0628 showed optimized CPU performance, ranking #7 globally on LMSYS Arena Hard. Users noted its strong performance in multilingual tasks.
Llama 3.1 Leak Sparks Excitement: Leaked evaluations for Llama 3.1 suggest its 8B, 70B, and 405B models might outperform current state-of-the-art models, even before instruct tuning, with the 70B model noted as being very close to leading models.
- The leak revealed a 405B model distilled into 8B and 70B versions with 128k context. Community members expressed excitement about potential capabilities, especially after instruct tuning is applied.

2. AI Infrastructure and Optimization

Elon Musk's Memphis Supercluster Announcement: Elon Musk announced the launch of the Memphis Supercluster, claiming it to be the world's most powerful AI training cluster with 100k liquid-cooled H100s on a single RDMA fabric.
- However, fact-checks revealed discrepancies in power usage and GPU availability, suggesting that the facility is not yet fully operational as claimed.
Advances in Model Quantization: Discussions highlighted advancements in model quantization techniques, with AQLM and QuaRot aiming to run large language models (LLMs) on individual GPUs while maintaining performance.
- An example shared was the AQLM project successfully running Llama-3-70b on an RTX3090, showcasing significant progress in making large models more accessible on consumer hardware.

3. AI Model Performance and Efficiency

Implicit CoT boosts GPT-2 performance: Implicit Chain-of-Thought (CoT) internalizes steps by removing intermediate stages and finetuning, enabling GPT-2 Small to solve 9-by-9 multiplication with 99% accuracy.
- This method also enhances Mistral 7B, achieving over 50% accuracy on GSM8K without intermediate steps.
ReFT shocks with parameter efficiency: ReFT achieves 15x-60x more parameter efficiency than LoRA and fine-tunes models like Llama 2 7B in under a minute on an A10 with ~100 examples.
- Greg Schoeninger discussed its practical applications and challenges, diving deeper in a YouTube video.
DeepSeek impresses with 1-bit quant results: 1-bit quantization for DeepSeek-V2-Chat-0628 showed impressive CPU optimization, ranking #7 globally on LMSYS Arena Hard (link).
- kotykd queried the model’s coherence and performance changes from previous versions.

4. Knowledge Graphs and Retrieval-Augmented Generation (RAG)

Triplex cuts KG costs by 98%: Triplex from SciPhi.AI cuts knowledge graph extraction costs by 98%, outperforming GPT-4 at 1/60th the price by using local graph building with SciPhi's R2R.
- R2R supports multimodal data and hybrid search, optimizing knowledge graphs and Microsoft’s method using deeper adjacency matrices for more efficient RAG.
Deploying RAG app to production: A member shared a tutorial on using MongoDB Atlas with LangChain to build a RAG implementation.
- The tutorial covers setting up the environment, storing data, creating search indices, and running vector search queries.
Improving RAG via Deasie Workshop: A YouTube session with Deasie cofounders covers advanced parsing and metadata for improved RAG.
- Parsing and metadata enhancements are highlighted as key techniques for boosting RAG performance.

5. Community Contributions and Open-Source Projects

Nemotron-340B bounty sparks interest: Nathan offered a bounty starting at $75 for converting Nemotron-340B to HuggingFace with FP8 quantization and multi-node implementation.
- The bounty has skyrocketed to over $2,000, with considerable interest from the synthetic data community.
OpenRouter Provider for GPTScript Now Available: A new OpenRouter provider for GPTScript has been announced, with an image and detailed description on GitHub.
- This tool contributes significantly to the development of GPTScript applications.
Bud-E presents new demo with open-source goals: A demo of the Bud-E voice assistant was shared, showcasing the vision of a future where everyone has access to highly capable, open-source systems for the cost of electricity.
- The code base currently optimized for Ubuntu will be restructured for clean separation between client, server, and interchangeable ASR, TTS, LLM components.

PART 1: High level Discord summaries

Nous Research AI Discord

Implicit CoT boosts GPT-2 performance: Implicit Chain-of-Thought (CoT) internalizes steps by removing intermediate stages and finetuning, enabling GPT-2 Small to solve 9-by-9 multiplication with 99% accuracy.
- This method also enhances Mistral 7B, achieving over 50% accuracy on GSM8K without intermediate steps.
ReFT shocks with parameter efficiency: ReFT achieves 15x-60x more parameter efficiency than LoRA and fine-tunes models like Llama 2 7B in under a minute on an A10 with ~100 examples.
- Greg Schoeninger discussed its practical applications and challenges, diving deeper in a YouTube video.
DeepSeek impresses with 1-bit quant results: 1-bit quantization for DeepSeek-V2-Chat-0628 showed impressive CPU optimization, ranking #7 globally on LMSYS Arena Hard (link).
- kotykd queried the model’s coherence and performance changes from previous versions.
Graphs boost RAG performance: Triplex from SciPhi.AI cuts knowledge graph extraction costs by 98%, outperforming GPT-4 at 1/60th the price by using local graph building with SciPhi's R2R.
- R2R supports multimodal data and hybrid search, optimizing knowledge graphs and Microsoft’s method using deeper adjacency matrices for more efficient RAG.
QuietStar sparks auto-generated prompts: QuietStar inspired a discussion about LLMs generating subsequent prompts in parallel, aiming to enhance their reasoning capabilities dynamically.
- Participants debated adapting LLM architectures for better token-level reasoning through intermediate representations and type systems.

HuggingFace Discord

Hermes 2.5 outperforms Hermes 2: After adding code instruction examples, Hermes 2.5 appears to perform better than Hermes 2 in various benchmarks.
- Hermes 2 scored a 34.5 on the MMLU benchmark whereas Hermes 2.5 scored 52.3.
Mistral has struggles expanding beyond 8k: Members stated that Mistral cannot be extended beyond 8k without continued pretraining and this is a known issue.
- They pointed to further work on mergekit and frankenMoE finetuning for the next frontiers in performance.
Discussion on Model Merging Tactics: A member suggested applying the difference between UltraChat and base Mistral to Mistral-Yarn as a potential merging tactic.
- Others expressed skepticism, but this member remained optimistic, citing successful past attempts at what they termed "cursed model merging".
Open Empathic Project Plea for Assistance: A member appealed for help in expanding the categories of the Open Empathic project, particularly at the lower end.
- They shared a YouTube video on the Open Empathic Launch & Tutorial that guides users to contribute their preferred movie scenes from YouTube videos, as well as a link to the OpenEmpathic project itself.
SmolLM Arena Launches: A new project called SmolLM Arena has been launched, allowing users to compare various Small Language Models (<1.7B params).
- The arena features a chatbot interface, runs faster, and includes usage instructions for a smoother user experience.

Modular (Mojo 🔥) Discord

Mojo 2.0: Socket Implementation: Discussions centered around various socket implementations for Mojo, focusing on platform-specific challenges like Windows sockets.
- Members also highlighted the potential of using Rust’s socket implementation and discussed the importance of accommodating SCTP for future protocols.
Debate on Dual Stack Sockets: Dual stack sockets were preferred for new server sockets, allowing both IPv4 and IPv6 connections, mirroring Python’s implementation.
- There was a consensus on using io_uring for Linux for handling high-performance workloads.
Flat Buffers Shine in Community Meeting: The Mojo 🔥 Community Meeting #4 covered topics like Flat Buffers for memory-efficient serialization and updates on Forge Tools.
- Discussions highlighted optimizing data handling and extending the Mojo standard library.
Newton's Method for Float Literals: A member shared an implementation of Newton’s Method for Float Literals in Mojo, prompting detailed discussions on capturing keywords for numerical equations.
- This led to conversations on closures and solving complex numerical problems within Mojo.
Mojo GPU: Eyeing the Summer: Former XLA team from Google has joined Mojo, bringing new insights into AI infrastructure development.
- Mojo GPU support is expected to be available this summer, enhancing computational capabilities.

Stability.ai (Stable Diffusion) Discord

ComfyUI recommended over Forge: Multiple users suggested switching from Forge to ComfyUI for a better experience, citing issues with Forge's functionality and compatibility.
- Users praised ComfyUI for having significantly more tools and features, although noting it has a more complex node-based interface.
Comparing Forge to Easy Diffusion: A user noted that while Forge is faster than Easy Diffusion, it lacks some features and outputs errors when upscaling.
- Others commented on upscaling issues and resolutions not being handled properly in Forge, suggesting alternatives.
Using Latent mode for Regional Prompter: Guidance was provided on using Latent mode instead of Attention mode for Regional Prompter to prevent character blending.
- Detailed prompts and clarifications were shared for improving the use of Latent mode with multiple character LoRAs.
VRAM and GPU compatibility issues: Discussions covered VRAM requirements for stable diffusion and issues with VRAM on GPUs, particularly with AMD cards.
- Solutions included using local installations and cloud GPUs for those with limited home GPU capability.
Errors with upscaling in Forge: Users encountered 'NoneType' errors when upscaling images with Forge.
- Suggestions included switching to hi-res fix and alternative upscaling tools like real-ESRGAN.

CUDA MODE Discord

SVD CUDA Implementation Woes: A user inquired about how cusolver/hipSolver performs SVD as most implementations found were just wrappers for closed-source solutions.
- A GitHub repository was referenced for insights, mentioning methods like Gram-Schmidt, Householder reflections, and Givens rotations.
Building with LLMs: Year in Review: A video and blog post summarized lessons from a year of practitioners working with LLMs, highlighting tactical, operational, and strategic insights.
- The author created a visual TLDR to make the series more accessible, emphasizing the value of watching the video for a better understanding.
Triton Profiling Tools: Members discussed tools suitable for profiling Triton kernels, specifically for tasks like peak memory usage.
- nsight-compute and nsight-systems were recommended for detailed profiling, with a note that nvprof should be avoided as it has been succeeded by these tools.
FP16 vs FP32 Performance on A100: Discussions focused on why FP16 performance in A100 is 2x that of FP32, despite the seemingly expected ratio from computational complexity being around 2.8x (Ampere Architecture Whitepaper).
- Members investigated if the performance issue might be I/O-bound rather than computation bound, discussing hardware architecture and overheads.
CUDA MODE IRL Invites on September 21st: CUDA MODE team members were invited to an IRL event on September 21st in San Francisco, coinciding with PyTorch devcon, including potential talk on llm.c for 20 minutes.
- Logistics and details for the event were shared via Google Document.

LM Studio Discord

Parsing GGUF file metadata in C#: A user announced the parsing of GGUF file metadata in C#, introducing a console tool designed to report metadata and offer statistics.
- Initially intended as a ShellExtension for file properties, the developer faced registration issues and opted to focus on the console tool.
Quantization demystified: A detailed explanation on the quantization process (q5, q6, f16) aimed to help fit large models on older hardware.
- Discussions included how q5 is more quantized than q8, with insights on running large models on limited VRAM using devices like a Dell laptop with RTX 3050.
Hugging Face API disrupts LM Studio search: LM Studio's search functionality broke due to issues with the Hugging Face API, causing numerous troubleshooting attempts by users.
- The issue was eventually resolved, restoring search capabilities within the app.
Nexusflow launches Athese model: Nexusflow introduced the Athese model, showing impressive results and potentially being the current SOTA for its size.
- The model demonstrates exceptional multilingual performance, making it suitable for users beyond English-speaking communities.
Creating Discord bot with LM Studio: A developer shared a blog post regarding making a Discord bot with LM Studio.js.
- The post includes a tutorial and source code available on GitHub, detailing necessary modifications for private responses.

Perplexity AI Discord

Chrome Restart Fixes Pro Image Generation Issue: A user resolved an issue of generating only one image with a Pro subscription by restarting their Chrome browser. Pro users can expect smoother image generation after this fix.
- Community members noted the need for better error handling within the image generation feature to avoid relying on browser restarts.
GPTs Agents Struggle Post Training: Discussion emerged about GPTs agents being unable to learn from new information after their initial training phase.
- Suggestions to overcome this included incremental updates and community-driven patches for enhanced adaptability.
Perplexity API Clarifications on Token Billing: Perplexity API now charges for both inbound and outbound tokens, as discussed in a recent thread.
- Users expressed concerns over billing transparency and asked for detailed documentation to better understand these charges.
YouTube Tests its Conversational AI Features: Perplexity AI covered YouTube's testing of new AI conversational capabilities to assess their efficacy in enhancing user engagement.
- Initial community responses were mixed, with some excited about the potential for better interaction and others skeptical about the AI's conversational depth.
OpenAI Introduces GPT-4.0 Mini: OpenAI's GPT-4.0 Mini debuted, offering a scaled-down version focused on accessibility without compromising on sophisticated functionalities.
- Early feedback highlighted its impressive balance between compute efficiency and performance, making it suitable for a wider range of applications.

OpenAI Discord

4O Mini outshines Sonnet 3.5: 4O Mini can solve complex questions that even the entire Claude family struggles with, showing its superior capabilities.
- This generates excitement among users for the potential of GPT-4o mini to phase out GPT-3.5 and dominate advanced use cases.
Finetuning multimodal models for coral reef research: A user inquired about finetuning multimodal models to study coral reefs, but another suggested using Azure AI's Custom Vision Service for better accuracy.
- No specific models or datasets were cited, highlighting a need for more tailored recommendations in this area.
API lacks real-time voice integration: A discussion highlighted that OpenAI’s new voice features are available in ChatGPT but not in the API, generating concerns about functional limitations.
- Members noted the significant latency and quality differences, with opinions leaning towards ChatGPT being more suited for end-user real-time interactions.
Improving ChatGPT response modifications: A user faced challenges instructing ChatGPT to modify only specific text parts without rewriting entire responses, a common issue among users.
- Suggestions included using the 'Customize ChatGPT' section in settings, and sharing detailed custom instructions for better accuracy.
ChatGPT Voice vs API Text-to-Speech: Concerns were raised about the latency and quality differences between ChatGPT's new voice feature and the API’s Text-to-Speech endpoint.
- Members suggested potential improvements and alternative solutions but acknowledged the current limitations of the API for real-time application.

OpenRouter (Alex Atallah) Discord

Rankings Page Slows Due to Migration: The rankings page will be slow to update over the weekend and often present stale data while the migration to new infrastructure occurs.
- Users should expect delays and inaccuracies in rankings during this timeframe.
OpenRouter Provider for GPTScript Now Available: A new OpenRouter provider for GPTScript has been announced, with an image and detailed description on GitHub.
- This tool contributes significantly to the development of GPTScript applications.
Dolphin Llama 70B's Performance Issues: Using Dolphin Llama 70B at 0.8-1 temperature led to erratic behavior in a 7k token context chat, producing incoherent content split between code and unrelated output.
- Another member noted similar issues with Euryale 70B's fp8 quantized models, suggesting potential problems stemming from the quantization process.
DeepSeek's Low Cost and Efficiency: DeepSeek v2, a 236B parameter MoE model (21B activated per token), is praised for its excellent performance and cost-efficiency at $0.14 per input token.
- “DeepSeek’s pricing is very competitive, and it seems to be hugely profitable,” explaining their strategy of using high batch sizes and compression techniques.
Leaked Information on Llama 3.1 405B: Llama 3.1 405B Base apparently leaked early due to a misstep on HuggingFace, sparking discussions about its extended context capabilities via RoPE scaling.
- Members are excited, anticipating software updates for efficient utilization and eager for the official instruct model release.

Cohere Discord

Harnessing LLMs for RPG Games: The community showed interest in using LLMs for classifications, JSON, and dialogue generation in RPG projects, with CMD-R emerging as a top choice due to its strict instruction adherence.
- After successful tests, members discussed further enhancements and integration possibilities, expanding AI's role in RPG gameplay.
Structured Outputs in Cohere API: Cohere announced that Command R and Command R+ can now produce structured outputs in JSON format, enhancing integration and data analysis for downstream applications.
- This new feature is thoroughly documented here and aims to streamline data workflows for developers.
New Enterprise AI Services from Cohere and Fujitsu: Cohere and Fujitsu formed a strategic partnership to deliver new enterprise AI services in Japan, as detailed in their blog.
- This collaboration targets improved AI service accessibility and performance for various applications, highlighting advancements in the Cohere toolkit.
Interactive Multiplayer Text Games with Command R+: A member introduced Command R+, a Discord app for creating and playing multiplayer text games, enhancing the social and interactive aspects of gaming communities.
- This app was showcased on Product Hunt, offering unlimited possibilities for engaging community experiences.
Developer Office Hours 2.0 Launched: Cohere hosted another Developer Office Hours session, discussing new API features, toolkit updates, and recent Cohere For AI research papers.
- The community was invited to join these sessions to discuss updates, share insights, and connect over various initiatives.

Eleuther Discord

Nemotron-340B bounty sparks interest: Nathan offered a bounty starting at $75 for converting Nemotron-340B to HuggingFace with FP8 quantization and multi-node implementation.
- The bounty has skyrocketed to over $2,000, with considerable interest from the synthetic data community.
Hypernetworks and the scaling law debate: Hypernetworks face constraints due to scaling laws, needing to be less than O(scaling_law(output_model_compute)(target_error)) to achieve target error.
- Discussion focused on the necessity for the task predicting the neural network being simpler or having a 'nice' scaling law to be effective.
Feature Contamination and OOD Generalization: A paper on out-of-distribution generalization details that neural networks suffer from feature contamination, affecting generalization.
- Relevant discussions highlighted the significant roles of inductive biases and SGD dynamics in creating a potential unified theory to explain these model failures.
Scaling Exponents Across Parameterizations and Optimizers: A tweet about scaling exponents discussed findings across optimizers and models, involving over 10,000 models.
- Key insights: O(1/n) LR schedule outperformed mUP, successful hparam transfer across configurations, and a new Adam-atan2 optimizer was proposed to avoid gradient underflow issues.
MATS 7.0 Applications Open: Neel Nanda and Arthur Conmy have opened applications for their Winter MATS 7.0 streams, with a deadline on August 30th. Announcement and admissions doc provided.
- The MATS program emphasizes its unique contribution to fostering mechanistic interpretability research.

Interconnects (Nathan Lambert) Discord

Nemotron-4-340B conversion to HuggingFace: Nathan Lambert offers a paid bounty of $75 to convert nvidia/Nemotron-4-340B-Instruct to HuggingFace.
- This effort aims to unlock synthetic permissive data for distillation projects, requiring both FP8 quantization and multi-node implementation.
Llama-3 and 3.1 leaks spark excitement: Rumors and leaks about Llama-3 405b and Llama 3.1 models, including benchmarks, were widely discussed, referencing Azure's GitHub and Reddit.
- Leaked benchmarks show Llama 3.1 outperforming GPT-4 in several areas, excluding HumanEval, sparking conversation on its potential superiority.
ICML 2024 celebrates Faithfulness Measurable Models: Andreas Madsen announced their ICML 2024 spotlight on a new approach for interpretability: Faithfulness Measurable Models, claiming 2x-5x better explanations and accurate faithfulness metrics.
- A user pointed out its resemblance to a 2021 NeurIPS paper, emphasizing the need for improved literature reviews in submissions.
Meta AI's potential premium offerings: Speculation arose that Llama 405B might be part of a premium offering from Meta AI, hinted by snippets of code and a tweet by Testing Catalog.
- The buzz includes the possible Meta AI API platform, AI Studio, with announcements expected on July 23.
Surprising Effectiveness of UltraChat: Discussion noted that the Zephyr paper significantly filtered UltraChat data from 1.5M to 200k, questioning the data quality.
- Despite the rigorous filtering, UltraChat was surprisingly effective, leading to further inquiries about its generation process.

Latent Space Discord

Langfuse Outshines Langsmith: Anecdotal feedback from users suggests that Langfuse is performing better than Langsmith, with positive experiences shared about its ease of self-hosting and integration.
- Clemo_._, the founder, encouraged more community interaction, emphasizing their commitment to maintaining a great OSS solution.
GPT-4o Mini Enables AI-generated Content: OpenAI's new GPT-4o mini model costs $0.15 per 1M input tokens, making it possible to create dynamic AI-generated content supported entirely by ads.
- Discussion includes the potential impact on web content, hypothesizing a shift towards more AI-generated outputs.
Harvey AI Rumors and Predictions: Rumors and skepticism about Harvey AI's viability emerged, with some calling it a 'smoke and mirrors company'.
- Debates ensued about the challenges facing vertical AI startups, including dependency on big AI labs and the industry's current cycle.
Elon Musk's Memphis Supercluster: Elon Musk announced the launch of the Memphis Supercluster, claiming it to be the world's most powerful AI training cluster with 100k liquid-cooled H100s on a single RDMA fabric.
- However, fact-checks reveal discrepancies in power usage and GPU availability, suggesting that the facility is not yet fully operational.
LLaMA 3.1 Leaks Spark Excitement: Leaked evaluations for LLaMA 3.1 suggest that its 8B, 70B, and 405B models might outperform current state-of-the-art models, even before instruct tuning.
- These leaks have led to widespread anticipation and speculation about the future capabilities of open-source AI models.

OpenAccess AI Collective (axolotl) Discord

Triplex Cuts KG Costs by 98%: SciPhi's new Triplex model slashes knowledge graph creation costs by 98%, outperforming GPT-4 at a fraction of the cost.
- The model extracts triplets from unstructured data and operates locally, making affordable, accessible knowledge graphs a reality.
Mistral 12b Tokenizer Troubles: Multiple members raised issues with Mistral 12b's tokenizer, which outputs text without spaces despite its promising metrics.
- The outputs were criticized as 'garbage', likely tied to special token handling problems.
LLaMA 3.1 Benchmarks Impress: Members lauded the benchmarks for LLaMA 3.1, highlighting stellar performances by the 8B and 70B models.
- The 70B model was noted to be particularly very close behind the leading models, even outperforming some expectations.
DeepSpeed Zero-3 Compatibility Fix: A user solved DeepSpeed Zero-3 compatibility issues involving a ValueError tied to low_cpu_mem_usage=True and custom device_map settings.
- The problem was fixed by deleting the accelerate config, resuming error-free setup.
Axolotl Training Hits GPU Roadblocks: Training errors in Axolotl traced back to GPU memory roadblocks, as noted by Phorm.
- Troubleshooting steps included reducing batch size, adjusting gradient accumulation, and switching to mixed precision training.

LangChain AI Discord

Hermes 2.5 outperforms Hermes 2: After adding code instruction examples, Hermes 2.5 appears to perform better than Hermes 2 in various benchmarks.
- Hermes 2 scored a 34.5 on the MMLU benchmark whereas Hermes 2.5 scored 52.3.
Triplex slashes knowledge graph creation costs: The newly open-sourced Triplex by SciPhi.AI reduces knowledge graph creation costs by 98%, outperforming GPT-4 at 1/60th the cost.
- Triplex, a finetuned version of Phi3-3.8B, extracts triplets from unstructured data, enhancing RAG methods like Microsoft's Graph RAG.
Deploying RAG app to production: A member shared a tutorial on using MongoDB Atlas with LangChain to build a RAG implementation.
- The tutorial covers setting up the environment, storing data, creating search indices, and running vector search queries.
LangChain beginner-friendly article published: A user shared a Medium article about LangChain and its components, aimed at beginners interested in understanding its applications.
- Imagine having a virtual assistant that can handle complex tasks through simple natural language commands, the article delves into why these components are important.
AI-powered function builder for TypeScript: A new project called AI Fun was developed at a hackathon to build LLM-powered functions for TypeScript.
- The project leverages AI to automate and simplify TypeScript function building processes.

LAION Discord

Bud-E presents new demo with open-source goals: A demo of the Bud-E voice assistant was shared, showcasing the vision of a future where everyone has access to highly capable, open-source systems for the cost of electricity.
- The code base currently optimized for Ubuntu will be restructured for clean separation between client, server, and interchangeable ASR, TTS, LLM components.
Join the BUD-E discord server for collaboration: Volunteers are invited to join the new BUD-E discord server to help develop the voice assistant further and contribute new skills akin to Minecraft Mods.
- Daily Online-Hackathon meetings will occur every day at 9 PM CEST to onboard new volunteers and coordinate project work.
Switch Back to Epochs for Plotting Loss Curves: A member initially plotted their loss curves with wall-clock time but found it more meaningful to measure model learning efficiency by switching back to epochs.
- The member found WandB useful for this purpose but admitted the initial change was incorrect and a 'foolish' decision.
Mem0 Introduces Smart Memory Layer for LLMs: Mem0 has released a memory layer for Large Language Models, enabling personalized AI experiences with features like user, session, and AI agent memory, and adaptive personalization.
- For more information on integration and features, view the GitHub page for Mem0.
Datadog Publishes SOTA Results in Time Series Modeling: Datadog has published state-of-the-art results on time series modeling and is actively recruiting for research roles.
- Datadog's foundation models aim to handle time series data effectively by identifying trends, parsing high-frequency data, and managing high-cardinality data.

LlamaIndex Discord

PostgresML Boosts Reranking: PostgresML for reranking enhances search result relevancy with additional parameters for precise control.
- A guest blog explains how a managed index approach optimizes reranking in practical applications.
LLMs as Production Judges: A session with Yixin Hu and Thomas Hulard discussed deploying LLMs as judges in production systems.
- This session covered key concepts and practices behind RAG evaluation for development.
Merlinn: Open-source On-call Copilot: Merlinn introduces an AI-powered Slack assistant for incident management.
- It integrates with observability and incident management tools like Datadog.
Multimodal RAG Simplified with Ollama and Qdrant: Pavan Mantha presents an article on setting up multimodal RAG with Ollama and Qdrant.
- The guide includes steps for ingesting audio/video sources and indexing data through text transcription.
Improving RAG via Deasie Workshop: A YouTube session with Deasie cofounders covers advanced parsing and metadata for improved RAG.
- Parsing and metadata enhancements are highlighted as key techniques for boosting RAG performance.

DSPy Discord

GPT4o-mini Model Struggles with Verbosity: GPT4o-mini has been reported to be verbose and repetitive, impacting data extraction compared to GPT3.5-turbo.
- This issue causes significant inefficiencies in data pipelines, necessitating better model tuning or alternative approaches.
DSPy Tracing Release Enhances Workflow: The new DSPy tracing feature is now available, offering efficient tracking of modules, predictions, LMs, and retrievers (documentation here).
- This update is expected to streamline debugging and performance tracking significantly.
TypedPredictors Compatibility Limited: GPT-4o and Sonnet-3.5 uniquely handle complex pydantic class generation, whereas other models fall short.
- This limitation calls for careful selection of models based on project requirements, especially in handling intricate data structures.
Joint Optimization in DSPy Yields Big Gains: A new DSPy paper reveals that alternating between prompt optimization and finetuning results in up to 26% performance gains.
- The study validates the efficiency of a dual optimization strategy over single-method approaches (paper link).
Reliability of DSPy Optimizers Discussed: The BootstrapFewShotWithRandomSearch optimizer is highlighted as a reliable and straightforward starting point.
- Members debated the reliability of various optimizers, pointing out BootstrapFewShotWithRandomSearch for its simplicity and robustness.

tinygrad (George Hotz) Discord

George Hotz Spurs OpenPilot Insights: George Hotz shared OpenPilot model run analysis focusing on documenting kernel changes and potential slowdowns.
- He mentioned that the task should be accessible for anyone technically inclined but noted that some beginners might overlook initial problem-solving.
Debate on Bitcast Shapes in Tinygrad: Tyoc213 questioned if the bitcast function in Tinygrad should align with TensorFlow's bitcast, especially regarding shape differences.
- George Hotz and members agreed that syncing Tinygrad with TensorFlow/Torch/Numpy was sensible and Tyoc213 committed to the necessary updates.
Promising PRs in Tinygrad: George Hotz recognized a pull request by Tyoc213 as highly promising and noteworthy for its thorough testing.
- Tyoc213 appreciated the acknowledgment and revealed plans to align it further with other framework standards.
Tinygrad Weekly Meeting Highlights: Chenyuy shared the agenda for the Monday meeting, detailing updates on tinybox, hcopt speed recovery, and MCTS search enhancements.
- Discussions also included better search features, conv backward fusing, fast Llama improvements, and various bounties aimed at kernel and driver improvements.
Debate Over Viability of Tinygrad: Members debated Tinygrad's viability versus PyTorch, with questions on whether to switch now or wait for version 1.0.
- The discussion reflected productivity concerns and was notably fueled by a detailed YouTube implementation tutorial on Shapetrackers.

OpenInterpreter Discord

First Day Update at Crowdstrike: Vinceflibustier shared a lighthearted update about their first day at Crowdstrike, mentioning they pushed a small update and took the afternoon off.
- The message ended with a peace sign emoji, creating a casual and friendly tone.
Python Subinterpreters in Python 3.12: A member shared a tutorial on Python subinterpreters detailing enhancements in GIL control and parallelism for Python 3.12 and a preview of changes in 3.13.
- The tutorial discusses changes to CPython's global state, aimed at improving parallel execution and suggests familiarity with Python basics.
Meta Llama 3.1 Repository Leak: AlpinDale confirmed that Meta Llama 3.1 includes a 405B model distilled into 8B and 70B, with 128k context and noted that the 405B model can't draw unicorns.
- The repository was accidentally made public ahead of time, retaining the same architecture as Llama 3, with the instruct tuning possibly being safety aligned.
Deepseek Chat v2 6.28 Outperforms Deepseek Coder: A member mentioned that the Deepseek chat v2 6.28 update performs incredibly well, even surpassing the Deepseek coder and being more cost-effective than the 4o mini.
- The update underscores Deepseek chat v2 6.28's improved performance metrics and cost advantages.
Launch of Pinokio's Augmentoolkit on GitHub: Pinokio's new Augmentoolkit has been released on GitHub for public access, featuring tools to enhance AI applications.
- The project has gathered momentum across Discord, GitHub, and Twitter, generating substantial interest.

LLM Finetuning (Hamel + Dan) Discord

Finetuning with GPT models proves costly: Finetuning GPT models is rare due to high costs and vendor lock-in. This involves expensive API calls and dependency on specific company infrastructure.
- A discussion in the #general channel highlighted how these factors deter many from adopting finetuning practices.
OpenAI credits remain elusive: Issues in receiving OpenAI credits have been reported, with members providing details like organizational ID org-EX3LDPMB5MSmidg3TrlPfirU and multiple form submissions.
- Despite following the process, credits are not being allocated, as detailed in the #openai channel.
Exploring Openpipe with alternate providers: Inquiries were made on using Openpipe with providers like Replicate or Modal, aside from OpenAI or Anthropic.
- Discussions focused on integrating models from Replicate while ensuring compatibility with existing systems, as seen in the #openpipe channel.
East Coast Meetup scheduled for late August: A proposition for a late August meetup in New York was made in the #east-coast-usa channel.
- Members are considering the logistics for this informal gathering.

LLM Perf Enthusiasts AI Discord

OpenAI Scale Tier Confusion: Discussion on the new OpenAI Scale Tier left many puzzled, particularly regarding the throughput per second (TPS) calculations for different models.
- Queries centered around the calculation of 19 TPS on the pay-as-you-go tier in comparison to GPT-4-o's throughput of about 80 TPS.
Websim seeks Founding AI Engineer: Websim is on a mission to create the world's most adaptable platform for software creation, empowering individuals to solve their own challenges.
- The company is hiring a Founding AI Engineer to establish systems for rapidly iterating on non-deterministic programs targeting automated product development.

Alignment Lab AI Discord

Insightful Year of Building with LLMs: A user shared a video and blog post summarizing a three-part series on lessons learned by practitioners building with LLMs for a year.
- The summary highlights tactical, operational, and strategic insights, with a recommendation to consume the content via video for better understanding.
BUD-E Voice Assistant Invites Collaboration: A user shared a YouTube video showcasing a demo of the open-source BUD-E voice assistant, inviting others to join their new Discord server for collaboration.
- Daily online hackathons will begin at 9 PM CEST to onboard new volunteers and coordinate project work.

AI Stack Devs (Yoko Li) Discord

Artist Aria Calls for Creative Collaboration: Aria introduced themselves as a 2D/3D artist looking for collaboration opportunities in the AI community.
- They invited interested members to direct message them for potential partnership projects.
No Additional Topics Available: There were no other significant topics discussed or shared in the provided message history.
- This summary reflects the lack of further technical discussions, announcements, or noteworthy events.

MLOps @Chipro Discord

Clarify Target Audience Needs: A member raised a query regarding the target audience and the main goals behind the communication strategy.
- The discussion brought up the need to craft different approaches for engineers, aspiring engineers, devrels, and solution architects when discussing products.
Strategic Communication for Roles: Various communication strategies were explored for effectively engaging engineers, devrels, solution architects, and aspiring engineers.
- Participants agreed that each role necessitates uniquely tailored messages to clearly convey product features and benefits.

DiscoResearch Discord

Lessons from 1 Year of Building with LLMs: Lessons from 1 Year of Building with LLMs detail tactical, operational, and strategic insights from six practitioners.
- A visual TLDR video accompanies the series, making the lessons more digestible.
TLDR Series Heroic Entry: The TLDR series provides in-depth, actionable advice for those deeply involved with LLMs, as shared by six authors.
- The series is recommended by its authors as a vital resource for LLM practitioners.

The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Nous Research AI ▷ #research-papers (3 messages):

Implicit Chain-of-Thought (CoT)

UltraChat and Multi-turn Interactions

Multiagent Debate Models

Achieving High Accuracy with Implicit CoT in GPT-2: A new method is proposed to internalize Chain-of-Thought (CoT) steps by starting with a model trained for explicit CoT, gradually removing intermediate steps, and finetuning it, enabling a GPT-2 Small model to solve 9-by-9 multiplication with 99% accuracy (arxiv).
- The same method improves performance in larger models like Mistral 7B, achieving over 50% accuracy on GSM8K without producing any intermediate steps.
Seeking Papers on Multi-turn User-Agent Interactions: A member is seeking recommendations for papers similar to UltraChat that discuss building multi-turn user-agent interactions and mentions the SODA paper as a potential read (arxiv).
- "The SODA paper is cited in UltraChat for having a similar average number of turns in a given figure."
Factuality and Reasoning Improved by Multiagent Debate: A new approach enhances factuality and reasoning in language models through multiagent debate, where multiple language model instances propose and debate responses over multiple rounds (composable-models).
- This method significantly improves mathematical and strategic reasoning across various tasks.

Links mentioned:

Nous Research AI ▷ #datasets (1 messages):

fedorovist: https://huggingface.co/datasets/jdpressman/retroinstruct-mix-v0.2

Nous Research AI ▷ #off-topic (5 messages):

PPLX Pro Search AI

ReFT paper discussion

Greg Schoeninger's Reddit post on ReFT

YouTube video on ReFT

Oxen.ai community and Paper Club

PPLX Pro Search discovers it's an AI: A user humorously remarked that PPLX Pro Search on discovering it's an AI rather than a human could entertain the idea vice-versa.
ReFT paper impresses with parameter efficiency: ReFT achieves 15x-60x more parameter efficiency than LoRA, fine-tunes models rapidly, such as Llama 2 7B in under a minute on an A10 with ~100 examples.
- The technique operates on the residual stream instead of the K-V matrices, making it both efficient and composable.
Greg Schoeninger's insights on ReFT: Greg Schoeninger shared his Reddit post discussing the ReFT paper with a link to his Notion notes.
YouTube video on ReFT with Zhengxuan Wu: A YouTube video titled "How ReFT Works w/ Author Zhengxuan Wu" dives into the ReFT paper from Stanford, providing both deep knowledge and relatable explanations by Greg.
- Watch the video on YouTube to understand how the ReFT technique works in detail.
Oxen.ai fosters a community of AI enthusiasts: Oxen.ai promotes a community for academic researchers and developers by hosting Paper Clubs every Friday to discuss and apply research papers.
- Join the community and subscribe to future Paper Club invites on Oxen.ai.

Links mentioned:

Nous Research AI ▷ #interesting-links (1 messages):

alexanderlong_84476: https://pluralisresearch.substack.com/p/decentralized-ai-looms

Nous Research AI ▷ #general (434 messages🔥🔥🔥):

1 bit quant results for DeepSeek-V2-Chat-0628

New AI projects and model updates

Llama 3.1 benchmarking

AI model legality concerns

Tech tools and deployment experiences

Insane 1-bit quant results for DeepSeek: A member shared crazy results for 1-bit quantization of DeepSeek-V2-Chat-0628, optimized for CPU and currently ranked #7 globally on LMSYS Arena Hard.
- kotykd questioned the model's coherence and the differences from previous versions, highlighting specific performance and memory usage data.
Hermes achieves recursive function calling: A member shared that the Hermes-2-Pro-Llama-3-8b model on Ollama has achieved recursive function calling, with an example provided from a Jupyter notebook.
- Discussion included potential improvements and configurations for tool calling in similar models.
Llama 3.1 set to launch soon: The Llama 3.1 model is reportedly being launched soon with significant improvements over 3.0, including a 405B distillation for better performance.
- Members discussed expected benchmarks and features, such as native tool calling in the upcoming 3.1 instruct versions.
AI tool deployment and legal gray areas: Authors shared experiences deploying AI tools, including a member's difficulties with fine-tuning Mistral instruct using chat data and applying the correct templates.
- Another user raised concerns about hosting a leaked model and potential legal ramifications.
New AI research and educational content: A new blog post titled 'Lessons from 1 year of building with LLMs' was shared, providing a TLDR of a year’s worth of practitioner experiences.
- Members discussed the usefulness of accessible educational content, especially in video format, to make complex concepts understandable.

Links mentioned:

Nous Research AI ▷ #ask-about-llms (20 messages🔥):

Captcha Solving with LLMs

VRAM Estimates for LLama-3 405B

Consumer Level Hardware for AI Models

Challenges in Solving Text Captchas with LLMs: A member shared their experiences using models like Florence, Moondream, and InternVL2 for solving text captchas, noting varying levels of accuracy.
- While InternVL2 showed the most success, the member could not run it locally and had to rely on online demos.
VRAM Requirements for LLama-3 405B Quantization: Inquiries were made about the VRAM estimates for running LLama-3 405B at different quantization levels, revealing it requires approximately 410 GB at 8 bit, 205 GB at 4 bit, and 102 GB at 2 bit.
- A participant noted this would mean needing at least 9 24GB GPUs for 4-bit, making it impractical for most consumer setups despite having a server with multiple GPU slots.
Frustration over High VRAM Requirements: There was a desire for more feasible consumer-level hardware capable of running large models like LLama-3 405B locally.
- Users expressed frustration over the hardware constraints and noted the potential need to explore cloud hosting solutions.

Nous Research AI ▷ #rag-dataset (40 messages🔥):

Triplex for knowledge graph construction

R2R platform for local LLM experimentation

Application of knowledge graphs in RAG/LLM

Microsoft's Graph RAG

Deeper adjacency matrix and symbolic reasoning

Triplex revolutionizes knowledge graph creation: Triplex, a finetuned version of Phi3-3.8B, offers a 98% cost reduction in knowledge graph creation and outperforms GPT-4 by being 1/60th the cost, enabling local graph building with SciPhi's R2R.
- Developed by SciPhi.AI, Triplex extracts triplets from unstructured data to create cost-efficient knowledge graphs.
R2R bridges gap between local LLMs and scalable RAG: R2R by SciPhi.AI supports multimodal data ingestion and hybrid search, allowing efficient creation and management of knowledge graphs and documents with a RESTful API.
- Features include graph RAG, observability, app management, and extensibility, facilitating scalable and production-ready applications.
Graphs enhance RAG for better conversational AI: Knowledge graphs enable new forms of analysis in Retrieval-Augmented Generation (RAG) by facilitating symbolic reasoning and improving usability in recommendation systems.
- Mixed mode RAG can leverage graphs for intermediate symbolic reasoning steps, optimizing retrieval of supporting facts from a knowledge base.
Microsoft's Graph RAG method revolutionizes subjective datasets: Microsoft's Graph RAG method extends knowledge graphs to create enhanced RAG datasets for more general Q/A tasks, showing potential in handling subjective data.
- Their technique integrates knowledge graphs into LLMs for deeper context and robust responses.
Optimizing knowledge graphs with deep adjacency matrix: Triplet extraction is a starting point, but deeper adjacency matrices are needed to fully leverage LLMs' context lengths.
- Entity deduplication and resolution will further enhance the accuracy and utility of knowledge graphs in symbolic reasoning tasks.

Links mentioned:

Nous Research AI ▷ #world-sim (3 messages):

World Sim

World Client

Nous Research

Introduction to World Sim: A member asked for an explanation about World Sim, which led to a direct link to the World Sim by Nous Research being provided.
Contributions to World Sim/World Client: Another member inquired if someone is a contributor to the World Sim/World Client project.
- No detailed community discussion or opinions followed this inquiry.

Link mentioned: worldsim: no description found

Nous Research AI ▷ #reasoning-tasks-master-list (691 messages🔥🔥🔥):

QuietStar

Auto-generate prompts

Type systems for LLMs

Intermediate representations

Task structure in Open-Reasoning-Tasks

QuietStar concept inspires auto-generating prompts: During a discussion about QuietStar, members debated how LLMs could benefit from architectures that handle context in parallel, suggesting the idea of models constructing their own subsequent prompts automatically.
- Participants explored how adapting LLM architectures could enable dynamic prompt generation, enhancing reasoning capacities at the token level.
LLM's type system proposal stirs debate: A member proposed implementing a type system allowing LLMs to construct and verify code, sparking a complex discussion on its feasibility and necessity.
- Despite objections and confusion, the debate highlighted different perspectives on the importance of formalizing LLM outputs into machine-checkable and expressive languages.
Intermediate representations bridge code and natural language: The community delved into using intermediate representations to manage LLM outputs, balancing between code and natural language, especially for complex tasks like reasoning and refactoring.
- Discussions underscored the potential of frameworks that translate natural language tasks into structured intermediates to facilitate better programmatic control and verification.
Task structure refinement in Open-Reasoning-Tasks repository: Participants contributed to refining the structure of reasoning tasks in the Open-Reasoning-Tasks repository, emphasizing the need for clearer examples and potentially separate files per task.
- There were considerations about making the task examples more rigorous and the task descriptions more readable and machine-parsable.
Various frameworks and tools proposed for LLM reasoning enhancement: In debates over boosting LLM reasoning capabilities, Prolog, ProbLog, and other logic programming languages emerged as contenders alongside Python for incorporating formal logic into LLM tasks.
- The conversation highlighted the necessity of probabilistic reasoning and multi-agent systems, inspired by tools like Logic-LLM for empirical, multi-framework reasoning.

Links mentioned:

HuggingFace ▷ #general (959 messages🔥🔥🔥):

LLM fine-tuning

GPU and hardware capabilities

Model deployment issues

Whisper Model for transcriptions

LLM code and architecture troubleshooting

Challenges fine-tuning LLMs for different tasks: Users discussed fine-tuning Large Language Models (LLMs) on different datasets and architectures, including frustrations over limited training time and GPU capabilities.
In-depth discussions on GPU capabilities and configurations: Community members shared their experiences with various GPUs, including RTX 3060, RTX 4070, and H100, for training and deploying AI models, highlighting differences in performance.
Exploring available resources for model deployment: A user inquired about automating speaker diarization and Whisper transcriptions with timestamps for audio files, mentioning previous use of Whisper Large v3 on an RTX 4070.
Troubleshooting LLM code and architecture: Detailed troubleshooting of language models was discussed, focusing on issues like model size, training duration, and specific layers targeted during optimization.

Links mentioned:

HuggingFace ▷ #today-im-learning (4 messages):

Knowledge Graphs

News Reading Experience

Hugging Face Model Kwargs

Speaker Diarization & Whisper Transcription

Exploring Knowledge Graphs: A user offered assistance on Knowledge Graphs, mentioning their past experience in building and using them.
- They are very fun and useful! was the sentiment shared about working with Knowledge Graphs.
Improving News Reading Experience: Another user expressed interest in improving news reading experiences and sentiment analysis, praising Hugging Face's tools: I am no coder, really loved what Huggingface built so far!
Finding Model Kwargs on Hugging Face: A user inquired about where to look for model_kwargs when using models on Hugging Face.
- They shared an example snippet where they used {'temperature': 1.0, 'top_k': 20, 'max_length': 128} as model_kwargs.
Automating Speaker Diarization and Transcription: A member asked for advice on automating a pipeline for speaker diarization, Whisper transcriptions, and timestamps.
- They are willing to learn about database management for the output and looking for recommended models or open-source repositories.

HuggingFace ▷ #cool-finds (8 messages🔥):

Apple Intelligence

LoRA fine-tuning

AI paper on arXiv

AI's world transformation

Free online courses

Apple's single model capabilities impress: An enthusiastic member shared insights on how Apple Intelligence uses a single model for multiple tasks, emphasizing its genius and versatility.
- They recommended readers to check out a YouTube video tutorial on LoRA fine-tuning and a detailed paper on LoRA on arXiv.
Interesting AI paper on arXiv: A member found an insightful AI paper on arXiv, sharing it with the community for further reading.
- No additional comments provided.
How AI is transforming the world: A member shared a comprehensive article from Brookings discussing AI's impact on various sectors and offering policy recommendations.
- The article emphasizes AI’s role in improving decision making and transforming everyday life.
Free online courses with certificates: Several free online courses were recommended, including Introduction to Python Programming, Product Design, and Data Analysis on Udacity.
- The courses cover essential skills such as Python, product validation, and data analysis using popular Python libraries.

Links mentioned:

HuggingFace ▷ #i-made-this (49 messages🔥):

Hermes 2.5

Model Merging

Open Empathic

Gary4live

SmolLM

Gary4live: Ableton Plugin for MusicGen: A new Ableton plugin called Gary4live is demonstrated in a speedrun video showcasing its capabilities in music generation.
- The plugin has been proven to work collaboratively, creating songs with others in real-time musical jam sessions.
SmolLM Arena Launches: A new project called SmolLM Arena has been launched, allowing users to compare various Small Language Models (<1.7B params).
- The arena features a chatbot interface, runs faster, and includes usage instructions for a smoother user experience.
Manifold Research Call: Manifold Research Group is hosting a Community Research Call covering topics like Generalist Multimodality, LLM Agents, and Robotics.
- The call aims to provide project updates and facilitate Q&A sessions to involve more community members in their efforts.
On-Device LLMs Workshop: A YouTube workshop by Enrico Rampazzo discusses the future of on-device Large Language Models (LLMs) and their capabilities.
- The session explores how on-device LLMs can be revolutionary for mobile applications and AI deployment.
Rust Client Library for Gradio: A new Rust client library for Gradio has been released, designed to facilitate easier integration with various Gradio spaces.
- The library supports models like hf-audio/whisper-large-v3 and stabilityai/stable-diffusion-3-medium, and the developer is seeking feedback and contributions from the community.

Links mentioned:

HuggingFace ▷ #reading-group (2 messages):

Event Creation

Diagram Feedback

Event Creation Notification: <@607608455976583206> notified about creating an event and sought feedback or edits.
Positive Feedback on Diagram: Member lunarflu praised the event, specifically noting the diagram: 'awesome, love the diagram as well!'.

HuggingFace ▷ #core-announcements (1 messages):

SD3 training bugs

Diffusers repository

Fixed SD3 Training Bugs: A community effort led to the discovery and fixing of bugs in the SD3 training scripts in the Diffusers repository.
- The fix addresses issues #8887 and #8708 and adds an option to control the pre-conditioning behavior on the model outputs.
Diffusers Repository Update: The Diffusers repository has been updated to fix training script bugs.
- Community contributions greatly aided in identifying and resolving the issues in the SD3 training process.

Link mentioned: [Training] SD3 training fixes by sayakpaul · Pull Request #8917 · huggingface/diffusers: What does this PR do? Fixes #8887 and #8708. Additionally, it adds an option to control the pre-conditioning behavior on the model outputs. Multiple folks have reported that for rectified-flows we ...

HuggingFace ▷ #computer-vision (19 messages🔥):

Hybrid Model with Inception and ViT

Scrabble Board Tile Detection

Binary Segmentation Projects

Creating Hybrid Model with Inception and ViT: A member inquired about integrating an Inception network with ViT for image classification, and another suggested using the hybrid ViT implementation in timm, detailing the process of flattening the feature map to feed into ViT.
Challenges in Scrabble Board Tile Detection: A member described their attempt to detect Scrabble tiles using CV and ML but faced accuracy issues, while others suggested methods including using a scrabble-opencv GitHub project for ideas.
- Recommendations included marking the board using tools like CVAT and applying a hardcoded method considering fixed board dimensions and camera angles, although ML might be unavoidable for better accuracy.
Binary Segmentation Project Discussion: A brief interaction occurred where a member asked if anyone had experience with binary segmentation projects and received a response from another member who has worked on it using UNet.

Links mentioned:

HuggingFace ▷ #NLP (10 messages🔥):

SQL RAG

Date Extraction

Open-source Text-to-HTML/CSS Model

Fine-tuning Language Models

Metrics in Transformer Fine-tuning

SQL RAG sees positive use case: A member noted that Perplexity AI successfully implements SQL Retrieval-Augmented Generation (RAG) for correct results.
Challenges in Date Extraction in NLP: A user has difficulty extracting correct start and end dates from statements like 'in the last 6 months' using dateparser and an LLM called Qwen2.
Seeking Open-source Text-to-HTML/CSS Models: A user is looking for an open-source text-to-HTML/CSS generation model and requests recommendations from the community.
Identical Metrics during Transformer Fine-tuning: A user experienced identical values for recall, F1, accuracy, and precision metrics while fine-tuning a LILT model and queries if others have encountered the same.
High-Accuracy Character Impersonation with LLMs: A beginner user wants to fine-tune Llama3 model for impersonating a philosopher using unlabelled text data.
- Another user suggests trying retrieval-augmented generation but the beginner remains skeptical about its accuracy and coherence for character impersonation.

HuggingFace ▷ #diffusion-discussions (4 messages):

Open-source text-to-html/css generation

Artistic styles in SDv1.4

Evaluating diffusion models

Stable diffusion paper encoder methods

Hunt for open-source text-to-html/css model: A member is seeking an open-source text-to-html/css generation model and is asking the community for guidance.
Curiosity about SDv1.4 artistic styles: A member inquired about the list of artistic styles used in the training of SDv1.4.
Tips on evaluating diffusion models: A newcomer to diffusion models and image generation is seeking advice on how to evaluate their models.
Stable diffusion paper encoder techniques: A member is questioning the construction of the latent space in the original stable diffusion paper, specifically whether the model was trained on quantized weights or another method.

Modular (Mojo 🔥) ▷ #general (213 messages🔥🔥):

Mojo Socket Implementation

Dual Stack Sockets Issue

Mojo Integration with Existing Libraries

Interoperability between Mojo and Python

Production Readiness of Mojo

Mojo Socket Implementation Discussion: Users explored various socket implementations for Mojo, with a specific focus on the challenges posed by platform-specific differences, such as those in Windows sockets.
- darkmatter__ noted that Rust’s socket implementation might provide cleaner references, and the group also highlighted the importance of accommodating SCTP for future protocols.
Debate on Dual Stack Sockets in Mojo: There was consensus on preferring the dual stack approach for new server sockets, where a single socket accepts both IPv4 and IPv6 connections, mirroring Python’s implementation.
- darkmatter__ suggested using io_uring for Linux to unify the handling of incoming connections under high performance workloads.
Integration of Mojo with External Libraries: The community discussed the use of various external libraries with Mojo, including darkmatter__ working on a Zig-like translate-c solution to bridge the gap until proper interop is available.
- There were also mentions of calling into DPDK for network operations and the potential complexities of dependencies in large codebases.
Handling Python Classes in Mojo: Users attempted to integrate Python classes into Mojo for custom data types, facing challenges with error handling and aliasing for Python’s dynamic types.
- The discussion highlighted the limitations of Mojo in supporting Python classes fully and explored possible workarounds, including adopting built-in Mojo structures.
Community Input on Mojo’s Production Readiness: There were mixed opinions on Mojo's readiness for production use, with darkmatter__ advising caution until the language reaches a stable 1.0 release.
- darinsimmons provided a longer-term perspective, suggesting that stable production use might be realistic within a couple of years, depending on the specific use cases and features needed.

Links mentioned:

Modular (Mojo 🔥) ▷ #💬︱twitter (1 messages):

ModularBot: From Modular: https://twitter.com/Modular/status/1815463417391837596

Modular (Mojo 🔥) ▷ #📺︱youtube (1 messages):

Mojo 🔥 Community Meeting #4

Flat Buffers

Forge Tools

Mojo 🔥 Standard Library

Mojo 🔥 Gen

Mojo 🔥 Community Meeting #4 highlights: Modular just posted a new YouTube video titled 'Mojo 🔥 Community Meeting #4,' featuring discussions on Flat Buffers, Forge Tools, and extending the Mojo 🔥 standard library.
- The meeting covered topics such as memory-efficient serialization, new tools for the Mojo 🔥 ecosystem, and updates on Mojo 🔥 Gen.
Flat Buffers: efficient serialization: A key highlight from the Mojo 🔥 Community Meeting #4 was the discussion on /Flat Buffers/ for memory-efficient serialization.
- Attendees explored their use in optimizing data handling within the Mojo 🔥 framework.

Link mentioned: Mojo 🔥 Community Meeting #4: Recording of the Mojo Community Meeting #4🫓 Flat Buffers: memory efficient serialization⚒️ Forge Tools: extending the Mojo 🔥 standard library🔄 Mojo 🔥 Gen...

Modular (Mojo 🔥) ▷ #mojo (185 messages🔥🔥):

Anti-Pattern Discussions

OpenSSL and Mojo Projects

Newton's Method for Float Literals

Future of Mojo's Async Scheduler

CPU Performance Comparisons

Anti-pattern sparks lively debate: Members humorously discussed an 'anti-pattern' workaround for conditional conformance not working, leading to a mix of reactions from agreement to playful banter.
OpenSSL integration poses challenges: Discussion revealed the massive size of OpenSSL and the hurdles in integrating it with Mojo projects, emphasizing the complexity of cryptography implementations.
Newton's Method helper proposal: A member shared an implementation of Newton's Method for Float Literals in Mojo, sparking detailed discussions on closures and capturing keywords for numerical equation solving.
Future of Mojo's async scheduler debated: Members debated on whether Mojo should include a default async scheduler in its stdlib, with arguments for better ecosystem interoperability and concerns over limiting alternative developments.
AMD vs Intel for CPU performance: The community compared AMD and Intel CPUs for various tasks, with a focus on recent stability issues with Intel and the relative performance benefits of AMD's L3 cache for specific use cases.

Links mentioned:

Modular (Mojo 🔥) ▷ #performance-and-benchmarks (1 messages):

Matrix Multiplication in Mojo

Comparing Mojo to Numpy Performance

Matrix Multiplication Examples in Mojo: A user reported trying out matrix multiplication examples from the documentation and found them amazing, transitioning from pure Python to optimized implementations.
- The notebook starts with a Python-like implementation and optimizes by adding types, vectorizing, tiling, and parallelizing.
Mojo Implementation 4x Slower than Numpy: A user shared a real-life assessment comparing Mojo's final version to numpy, finding it still 4x slower on their machine.
- Looking for insights, they questioned if their expectations were unrealistic.

Link mentioned: Matrix multiplication in Mojo | Modular Docs: Learn how to leverage Mojo's various functions to write a high-performance matmul.

Modular (Mojo 🔥) ▷ #max (5 messages):

nightly/max feed reliability

getting started with MAX

open source contributions to MAX

guidance for new contributors

Issues with nightly/max feed reliability: A member reported issues with the nightly/max feed not keeping nightly/mojo in sync, leading to them switching back to nightly/mojo.
- Another member acknowledged the issues and mentioned that work is ongoing to resolve them soon.
Getting started with MAX: New member eager to learn was directed to the MAX documentation and quickstart guide for getting started.
- Resources include a blog post on contributing to Mojo standard library and a contribution guide.

Links mentioned:

Modular (Mojo 🔥) ▷ #max-gpu (3 messages):

XLA Involvement

Mojo on GPU Availability

Former XLA team joins Mojo: A member shared that many of their team were directly involved in XLA at Google, and they are bringing their learnings and a different philosophy to AI infrastructure development at Mojo.
Mojo GPU support coming this summer: When asked about GPU availability for Mojo, a member confirmed that it should be available this summer.

Modular (Mojo 🔥) ▷ #nightly (83 messages🔥🔥):

Nightly Mojo Compiler Updates

LegacyPointer and DTypePointer Discussion

Issues with memcpy Function

Changes in Mojo API

Community Interaction and Documentation

Nightly Mojo Compiler 2024.7 Releases: A new nightly Mojo compiler has been released, now updated to 2024.7.2205 with the command: modular update nightly/mojo. Check the current changelog and raw diff for detailed changes.
LegacyPointer and DTypePointer Deprecated: A tight discussion revealed that LegacyPointer and DTypePointer are deprecated, with current and future support focusing on UnsafePointer.
- Users express concerns that frequent changes make code maintenance difficult, citing issues with the constant deprecation of old functions.
Migration Issues with memcpy Override: Changes to the memcpy function with three new overrides have sparked confusion among users. Carl Caulkett shared a workaround involving the transition from DTypePointer to UnsafePointer.
Mojo API Changes for Improved Pointer Safety: The community discussed API changes aimed at enhancing pointer safety, specifically the shift from DTypePointer to UnsafePointer. Daniel confirmed that using UnsafePointer[Float64] will replace DTypePointer[DType.float64].
Demand for Better Documentation: Users highlighted the necessity for better documentation to manage the fast-evolving APIs. The community acknowledges the technical debt involved with extensive documentation that needs frequent updates.

Links mentioned:

Modular (Mojo 🔥) ▷ #mojo-marathons (43 messages🔥):

NumPy Performance Testing

Understanding CPU FLops

Matrix Multiplication Benchmarks

Architecture-Specific Optimizations

Mojo's Generics Limitations

NumPy Performance Numbers Raise Eyebrows: A member shared results from benchmarking numpy indicating a significant performance difference, with values like 271.864 GFLOPS for NumPy and 249.371 GFLOPS for a pure Python implementation.
- They expressed surprise over almost no performance difference with unrolling, questioning if older architecture-specific optimizations in BLAS might be the cause.
Calculating & Comparing Theoretical Peak FLops: Members discussed how to calculate peak GFLOPS using core count, frequency, and SIMD width, providing an example calculation for an i7 7th Gen CPU.
- One nailed down that achieving around 80% of the theoretical peak value of 358 GFLOPS is reasonable for NumPy performance.
Pipeline Efficiency in Matrix Multiplications: Discussions dove into pipeline efficiencies where the focus was to keep the ALU fed by avoiding memory-related stalls.
- A specific note was made about FMA instructions being 2-cycle operations on Kaby Lake architecture with reference to uops.info.
Calling CPUID in Mojo for Target Information: The conversation shifted towards using cpuid instructions for gathering target-specific architecture information within Mojo, with some members expressing the need for better exposure of such details.
- Existing tools like CpuId.jl and Intel's C libraries were discussed but noted to be architecture-specific.
Challenges with Mojo for Generics in High-Performance Computing: Members agreed that Mojo needs to expose more information about the target architecture to support truly generic high-performance algorithms.
- There was also a mention of supercomputers having architecture-specific BLAS libraries, underlining the complexity and specificity required.

Link mentioned: GitHub - m-j-w/CpuId.jl: Ask the CPU for cache sizes, SIMD feature support, a running hypervisor, and more.: Ask the CPU for cache sizes, SIMD feature support, a running hypervisor, and more. - m-j-w/CpuId.jl

Stability.ai (Stable Diffusion) ▷ #general-chat (424 messages🔥🔥🔥):

ComfyUI recommendations

Forge vs. Easy Diffusion experience

Using Latent mode for Regional Prompter

Issues with VRAM and GPU compatibility

Upscaling errors in Forge

ComfyUI recommended over Forge: Multiple users suggested switching from Forge to ComfyUI for a better experience, citing issues with Forge's functionality and compatibility.
- Users praised ComfyUI for having significantly more tools and features, although noting it has a more complex node-based interface.
Comparing Forge to Easy Diffusion: A user noted that while Forge is faster than Easy Diffusion, it lacks some features and outputs errors when upscaling.
- Others commented on upscaling issues and resolutions not being handled properly in Forge, suggesting alternatives.
Using Latent mode for Regional Prompter: Guidance was provided on using Latent mode instead of Attention mode for Regional Prompter to prevent character blending.
- Detailed prompts and clarifications were shared for improving the use of Latent mode with multiple character LoRAs.
VRAM and GPU compatibility issues: Discussions covered VRAM requirements for stable diffusion and issues with VRAM on GPUs, particularly with AMD cards.
- Solutions included using local installations and cloud GPUs for those with limited home GPU capability.
Errors with upscaling in Forge: Users encountered 'NoneType' errors when upscaling images with Forge.
- Suggestions included switching to hi-res fix and alternative upscaling tools like real-ESRGAN.

Links mentioned:

CUDA MODE ▷ #general (28 messages🔥):

Google Meet Troubleshooting

CUDA and SVD

Building with LLMs

ECC Memory in Workstations

Register Allocation in Flash Attention

Google Meet link for event confirmed: After confirming, it was noted that the correct Google Meet link for the event is this one.
- One user mentioned experiencing issues with Firefox, but switching to Chrome resolved the problem.
Understanding SVD performance on GPUs: A user inquired about how cusolver/hipSolver performs SVD as most implementations found were just wrappers for closed-source solutions.
- A GitHub repository was referenced for insights, mentioning methods like Gram-Schmidt, Householder reflections, and Givens rotations.
LLMs: Key lessons from one year of building: A new video and blog post summarized lessons from a year of practitioners working with LLMs, highlighting tactical, operational, and strategic insights.
- The author created a visual TLDR to make the series more accessible, emphasizing the value of watching the video for a better understanding.
Debating the necessity of ECC memory in workstations: Users debated the necessity of ECC memory in workstations, with consensus suggesting it's not crucial for most desktop applications.
- Discussion included considerations of cosmic radiation in specialized computing environments and the impact on CPU options and costs.
Clarification on register allocation in Flash Attention: A user questioned how to explicitly allocate registers in Flash Attention and whether initial projections from input matrices are fused into one kernel.
- The doubt concerned if matrix sizes are too large to support such kernel fusion effectively.

Links mentioned:

CUDA MODE ▷ #triton (4 messages):

Profiling Triton Kernels

Memory Usage

CUDA tools for Profiling

Nsight Compute

Nsight Systems

Discussing tools for profiling Triton kernels: A member inquired about which tools are suitable for profiling Triton kernels, specifically for tasks like peak memory usage.
- nsight-compute and nsight-systems were recommended for detailed profiling, with a note that nvprof should be avoided as it has been succeeded by these tools.
Compatibility of standard CUDA tools with Triton kernels: There was a conversation about whether standard CUDA tools like nvprof and built-in torch profilers work with Triton kernels.
- A member mentioned that while traditional tools may not work, newer tools like nsight-compute and nsight-systems offer the needed functionality.

CUDA MODE ▷ #torch (7 messages):

at::Tensor.mutable_data_ptr

torch.cond

torch control flow operators

Confusion over const method in at::Tensor: A member expressed confusion about why at::Tensor.mutable_data_ptr is a const method, bringing into question the design choice behind it.
Emulating jax.lax.while_loop() in Torch: A member inquired if Torch supports anything similar to jax.lax.while_loop() and whether this behavior could be emulated with torch.cond().
- Another user pointed out that torch.cond() currently only supports inference, not training, citing the official documentation.

Link mentioned: torch.cond — PyTorch 2.3 documentation: no description found

CUDA MODE ▷ #announcements (1 messages):

AMD ROCm

Composable Kernel library

AMD tech stack

AMD ROCm's Composable Kernel library presentation: A member announced an opportunity to learn about AMD ROCm's Composable Kernel library presented by <@813802565426479145> in about 4 minutes.
- The talk is highlighted as a chance to dive deeper into the AMD tech stack.
AMD Tech Stack Overview: <@813802565426479145> is set to provide an overview of AMD's tech stack, focusing on the ROCm ecosystem.
- This includes insights into how the Composable Kernel library integrates within ROCm.

CUDA MODE ▷ #algorithms (3 messages):

Similarity Search Algorithm

Schotastic Rounding in Quantization

Choosing Efficient Similarity Search Algorithm for SQLite: A member asked for recommendations on the best similarity search algorithm for a lightweight and efficient vector database like SQLite.
- Any helpful suggestions or personal experiences regarding specific algorithms would aid in decision-making.
Schotastic Rounding Validity in Quantization: A member queried about whether introducing a random element maintains the definition of schotastic rounding in the context of quantization.
- No responses provided additional clarity or alternatives.

CUDA MODE ▷ #cool-links (27 messages🔥):

CubeCL

FlashAttention2 Custom Mask

FLUTE Kernel

CubeCL enables CUDA kernels in Rust: The CubeCL project introduces multi-platform high-performance compute language extension for Rust, using nvrtc for CUDA and a comptime system for kernel specialization.
- Discussion points include the limitations of borrow checking and pointers, with CubeCL's system still far from matching Zig's capabilities but useful for avoiding bounds checks and loop unrolling.
Custom Masking in FlashAttention2: A new GitHub repo by az_zz_za introduces custom masks to the Triton implementation of FlashAttention2, addressing a common limitation.
- Discussion involves the correct masking dimensions and potential easy modifications to implement arbitrary attention biases.
FLUTE accelerates LLM inference: FLUTE provides a CUDA kernel for non-uniformly quantized LLM inference via a lookup table, with significant speed improvements over traditional methods.
- The kernel is integrated with vLLM and uses CUTLASS 3, TensorCore, and Async Copy to achieve up to 2-3x faster performance, with discussions on potential implementations in Triton and Torch.

Links mentioned:

CUDA MODE ▷ #beginner (17 messages🔥):

Triton Block Size

Non-GPU part of NVCC Compiler and VLAs

FP16 vs FP32 Performance

Triton Multi-Stage Pipelining

Triton Block Size Explained: A member clarified that the BLOCK_SIZE in Triton is more like TILE_SIZE in CUDA, with num_warps controlling the size of a thread block (reference).
- 'I get confused at first too,' another member admitted while discussing the similarity and differences.
NVCC Compiler and Variable-Length Arrays: A member asked whether using runtime-defined length arrays (VLAs) in the non-GPU part of the NVCC compiler is feasible, even without concerns for memory or speed (reference).
- Another member explained that VLAs are technically a C feature and may not be fully supported in C++ standards, even though most compilers implement them.
FP16 Performance Considerations: Discussion focused on why FP16 performance in A100 is 2x that of FP32, despite the seemingly expected ratio from computational complexity being around 2.8x (Ampere Architecture Whitepaper).
- Members investigated if the performance issue might be I/O-bound rather than computation bound, discussing hardware architecture and overheads.
Understanding Triton Multi-Stage Pipelining: A member queried the purpose and implementation specifics of pipelining stages in Triton, especially the use of oddly specific fixed values in kernel stages (getting started tutorial).
- They struggled to visualize what multi-stage pipelining accomplishes beyond basic CPU architecture pipelining, indicating a need for further clarification.

Links mentioned:

CUDA MODE ▷ #pmpp-book (3 messages):

Model Weights in Shared Memory

CUDA Register Capacity

Discussing Model Weights in Shared Memory: A user questioned whether model weights could be a good example of using shared memory if register files were not an issue.
- Another member responded affirmatively but asked why not put all model weights into registers instead.
Clarification on Register Capacity: A discussion was prompted on whether the issue of register capacity affects the decision to place model weights in shared memory.
- The clarification pointed out that if register capacity were not a limiting factor, it would indeed be a viable option.

CUDA MODE ▷ #youtube-recordings (1 messages):

andreaskoepf: https://youtu.be/-732zELVbpU?si=HBXEE8t2fxCKhC5v

CUDA MODE ▷ #off-topic (8 messages🔥):

Request for Cuda/C++ Channel

Discussion on existing channel usage

LLM Deployment Cost Inquiry

Request for a dedicated Cuda/C++ channel: A member asked about the possibility of creating a Cuda/C++ channel, citing interest from people who write Triton and Cuda C++.
- Another user redirected them to an existing channel, but the requester was unsure if it was the right one due to limited relevant discussion.
Discussion on existing channel usage: Members debated whether an existing channel covers Cuda and Triton discussions adequately.
- One user noted the last mention of Triton was on June 5th, sparking doubts about the channel's relevance.
Inquiry into LLM deployment costs: A member inquired about accurate numbers for LLM deployment costs at scale, comparing various providers like Mistral, OpenAI, and Fireworks AI.
- They speculated that companies might be losing money unless they have nearly 100% hardware utilization or access to very cheap hardware.

CUDA MODE ▷ #irl-meetup (6 messages):

ICML Meetup at Conference

CUDA Learners in Berkeley

ICML Meetup Coordinated: A member mentioned being at ICML, prompting another to suggest meeting up at the conference.
- The proposal was positively received with, 'sure', marking a mutual interest in meeting up.
Seeking CUDA Learners in Berkeley: A member inquired about CUDA learners in the Berkeley area, expressing interest in forming a community for meetups and solving book chapter problems together.
- Another user reminisced about past interest in similar meetups, indicating prior enthusiasm for such activities.

CUDA MODE ▷ #llmdotc (297 messages🔥🔥):

CUDA MODE IRL Event

Train GPT-2 to GPT-3 Progress

Multi-GPU and ZeRO Implementation

MuP Branch Progress

FP8 Training Optimizations

CUDA MODE IRL Invites on September 21st: CUDA MODE team members were invited to an IRL event on September 21st in San Francisco, coinciding with PyTorch devcon, including potential talk on llm.c for 20 minutes.
- Logistics and details for the event were shared via Google Document.
Shift from train_gpt2.c to train_llama3.cu: Among the tasks covered for CUDA MODE developments include transitioning the codebase from train_gpt2.c to train_llama3.cu.
- This involves significant rework due to differences with Meta’s LLaMA release, including issues with torchrun and complex undocumented code.
Multi-GPU and ZeRO Challenges: Discussion around multi-GPU training includes dealing with challenges like integrating ZeRO-2 and ZeRO-offload to manage large model weights effectively.
- The core issues are balancing memory between GPU and CPU, maintaining determinism, and integrating master weights efficiently.
MuP Branch Stability Issues: Attempts to integrate MuP branch with master to stabilize training runs encountered conflicts and required extensive rebasing.
- Current focus is on resolving these conflicts and testing MuP's potential to enhance stability during long runs.
Advancements in FP8 Training Optimizations: Efforts to include FP8 training optimizations are ongoing, with plans to implement techniques such as cuDNN FP8 forwards and backwards attention.
- Further plans include adding visualization features for tensor values to track and debug model performance during training optimizations.

Links mentioned:

CUDA MODE ▷ #rocm (10 messages🔥):

ROCm hardware entry options

Differences between RDNA and CDNA

MI300 capabilities

Challenges with AMD GPUs in ROCm

FP8 MMA acceleration on MI300

ROCm Hardware Entry Options Discussed: Members questioned whether Radeon RX 7900 XTX and Radeon PRO W7900 are the best options for ROCm development, comparing them to the MI300X.
- Another asked about the value of having a weaker AMD GPU locally versus using cloud solutions for ROCm kernel development.
Diverging Paths of RDNA and CDNA: RDNA and CDNA GPUs have diverged since GCN, with RDNA GPUs having more registers per CU, more shared memory, and supporting both w32 and w64.
- It was noted that RDNA dGPUs lack XNACK, a feature allowing detailed page fault handling, and this deficiency restricts features like ASAN-enabled kernel compilation.
MI300 Series Capabilities Highlighted: MI300A has memory shared with the CPU while MI300X does not and lacks texture units, with no Vulkan driver available for MI300.
- The MI300X is primarily available through cloud providers and these differences impact certain texture cache operations but are generally irrelevant to HIP.
Challenges in Using AMD GPUs for ROCm Development: Recent AMD GPUs can be used for ROCm kernel development, though some like RX6500 lack certain operations and support issues arise with APUs.
- The need to recompile the ROCm stack for unsupported GPUs and lack of performance tuning were also cited as challenges.
Exclusive Feature: FP8 MMA Acceleration on MI300: One unique feature of the MI300 is its support for FP8 MMA acceleration, setting it apart from other GPUs.

CUDA MODE ▷ #lecture-qa (1 messages):

Arithmetic Intensity

GPU Performance Metrics

Question on Arithmetic Intensity's use in GPU performance: A member asked why Arithmetic Intensity 1 is used to check if a workload is memory or compute bound, questioning whether it should depend on the specific GPU's FLOPS/GB/s bandwidth ratio.
- The member pointed out that this ratio can be as high as 20, depending on the GPU model.
Impact of GPU FLOPS/GB/s ratio on performance assessment: The discussion highlighted how the GPU's FLOPS/GB/s ratio, which can vary, affects the determination of memory or compute-bound workloads.
- This variation necessitates considering the specific GPU model when assessing performance.

LM Studio ▷ #💬-general (168 messages🔥🔥):

GGUF file metadata in C#

Quantization process (q5, q6, f16)

Issues with LM Studio & Hugging Face

Running large models locally

Using local LLMs for profit

GGUF file metadata parsed in C#: A user announced the parsing of GGUF file metadata in C#, introducing a console tool designed to report metadata and perhaps offer some statistics.
- Initially intended as a ShellExtension for file properties, the user faced registration issues and decided to focus on the console tool.
Quantization process confusion: A detailed explanation was given about the quantization process (e.g., q5, q6, f16) used to make large models fit on older hardware, with q5 being more quantized than q8.
- One user clarifies that f16 is not quantized and much larger, while another shares experiences running large models on limited VRAM with a Dell laptop and RTX 3050.
LM Studio and Hugging Face search issues: Multiple users reported that LM Studio's search was broken, likely due to issues with the Hugging Face API.
- Despite troubleshooting steps like using VPNs and reinstalling, the search functionality intermittently failed until it was confirmed to be resolved.
Effectively running large models locally: Users shared their setups for running large models locally, balancing between VRAM and RAM usage efficiently using tools like LM Studio and llama.cpp.
- One user recommended updating NVIDIA and CUDA drivers to solve loading issues with large models, making local inference smoother.
Local LLMs for profit and utility: The discussion explored reasons for using local LLMs: maintaining data privacy for corporate use and offline capabilities for personal projects.
- One user highlighted that unlike cloud models like ChatGPT, local LLMs keep data on the user's machine, emphasizing profit is generally through utility, not direct monetary gain like crypto mining.

Links mentioned:

LM Studio ▷ #🤖-models-discussion-chat (126 messages🔥🔥):

Open Model Test Results

Converting Models to GGUF

New Jail-Breaking Technique

DeepSeek-Coder Issues

Memory Usage with Qwen2 72B

Open Model Test Results Shared: A member shared new results of various open models tested and inquired if others had any major discrepancies outside of Gemma 2 27B.
- They mentioned that they couldn't fix the borked Gemma 2 27B issue.
Converting Models to GGUF for LM Studio: A user inquired about converting microsoft/llava-med-v1.5-mistral-7b to GGUF for LM Studio, mentioning they needed it for a hobby project.
- Another member confirmed if they succeeded in obtaining the GGUF conversion.
New Jail-Breaking Technique Unveiled: A member announced a new jail-breaking technique for frontier models and shared a paper link.
- They suggested using it before it gets patched.
DeepSeek-Coder V2 Lite Instruct Model Malfunctions: Users encountered issues with DeepSeek-Coder V2 Lite Instruct generating incoherent text after initial normal output.
- Troubleshooting efforts included changing context windows, disabling flash, and downgrading LM Studio versions, with no lasting success.
High Memory Usage for Qwen2 72B in llama.cpp: A member reported excessively high memory usage when loading Qwen2 72B using llama.cpp.
- Another member advised lowering the context length to manage memory usage more effectively.

Links mentioned:

LM Studio ▷ #announcements (2 messages):

Hugging Face API Networking Errors

Issue Resolution

Hugging Face API hit with Networking Errors: An announcement was made about intermittent networking errors affecting the Hugging Face API, causing search issues within the app.
- We will update here as we know more.
Issue with Hugging Face API Resolved: The networking errors with the Hugging Face API have been resolved, and users should be able to search through the app again.
- Sorry for the inconvenience.

LM Studio ▷ #🧠-feedback (11 messages🔥):

Failed model load

Flash Attention troubleshooting

HuggingFace API issues

Alternative model repositories

Failed to load model due to llama.cpp error: A member shared an error about failing to load a model in LM Studio due to an unknown pre-tokenizer type: 'mistral-bpe' in llama.cpp.
- Another member mentioned that newly released models would not work in current LM Studio versions and needed llama.cpp architecture support updates.
Flash Attention may cause model load issues: A user suggested that having Flash Attention enabled might cause models not to load and recommended turning off Flash Attention to resolve this.
- Try turning it off and reloading your model to see if it resolves the issue.
HuggingFace API stability issues affecting LM Studio: HuggingFace API stability issues are causing the Model Explorer in LM Studio to be non-functional at this time.
Request for alternative model repository mirrors: A user suggested adding the ability to switch to third-party repository mirrors within LM Studio due to ongoing HuggingFace API issues.
- An example given was hf-mirror.com, with further details on using mirror scripts like hfd.sh for downloads.

Link mentioned: Hugging Face – The AI community building the future.: no description found

LM Studio ▷ #📝-prompts-discussion-chat (2 messages):

Renaming Presets

Finding Models

Renaming Presets: A member asked if they could rename presets they had created for different use cases.
- They noted that they figured it out soon after asking.
Finding Models: The same member mentioned that they had finally found a nice model suited for their needs.
- They planned to create 4 prompts focusing on their specific use cases.

LM Studio ▷ #🎛-hardware-discussion (40 messages🔥):

NVidia Tesla P40 usage

AMD GPU compatibility

Home NAS recommendations

Choosing GPUs for AI/ML workloads

Finetuning Large Language Models

Tesla P40 runs with mixed GPUs: Some users successfully run NVidia Tesla P40 alongside other GPUs like the RTX 3070 on Windows 10 x64 by installing specific drivers, such as the default P40 datacenter driver and then the studio RTX driver.
AMD GPUs of different generations can be used: A user asked about running LM Studio on an RX 6800 and potentially an RX 7900XTX, to which another user replied that if both support ROCM, it should work.
Setting up a home NAS for iPhones/iPads: A user discussed setting up a NAS for home use to store content from iPhones and iPads, rather than buying devices with larger memory.
Upgrading GPU setups for AI workloads: Debates ensued about the suitability of various GPU combinations for AI/ML workloads, with some users considering dual RTX 3090s for enhanced VRAM and performance.
- One user discussed the challenge of high power consumption with certain GPUs like the RTX 3090 compared to 4060 but was advised that 3090s would significantly improve their ability to handle larger models.
Challenges in finetuning quantized models: Members discussed the difficulties of finetuning GGUF quantized models, with guidance suggesting it's more effective to use base safetensors models for finetuning.
- One user shared a helpful Microsoft article on getting started with large language model finetuning.

Link mentioned: Getting started with LLM fine-tuning: Large Language Model (LLM) Fine-tuning is the process of adapting the pre-trained model to specific tasks. This process is done by updating its parameters on a new dataset. Specifically, the LLM is pa...

LM Studio ▷ #🧪-beta-releases-chat (18 messages🔥):

Nemo issues

Search functionality problems

Huggingface API

Users report issues with Nemo from Mistral: madan.pandit inquired about problems they were facing with the Nemo module from Mistral.
Search function fails in LM Studio 0.2.27: Multiple users reported that LM Studio 0.2.27 search function was broken, yielding 0 results on Linux and Mac M2.
- Reports indicate search works on Huggingface directly but not within the app.
Huggingface API down affects LM Studio: Issues were confirmed by heyitsyorkie who noted that the Huggingface API was down, impacting the Model Explorer in LM Studio.
- a_dev_called_dj_65326 acknowledged the API issue and indicated that the team had been notified.

LM Studio ▷ #amd-rocm-tech-preview (1 messages):

captainpumpkinhead: "natively"

LM Studio ▷ #model-announcements (1 messages):

Athese by Nexusflow

Model benchmarks

Multilingual performance

Nexusflow introduces Athese model: Athese by Nexusflow shows impressive improvements across a wide array of categories and may be the current SOTA for its size for general use.
- Nexusflow used an internally developed benchmark to create a high quality dataset of RLHF, resulting in benchmarks that speak for themselves. Check out the model here.
Athese excels in multilingual performance: Athese by Nexusflow boasts massively improved multilingual performance, making it an excellent choice for use beyond English.

LM Studio ▷ #🛠-dev-chat (2 messages):

LM Studio Discord bot

Private responses with LM Studio

GitHub tutorial link

Create Discord bot with LM Studio: A developer shared a blog post titled 'I made a Discord bot with LM Studio.js', detailing how to create a Discord bot that responds using LM Studio.
- The blog post includes a tutorial and source code available on GitHub, providing insights into modifications necessary for the bot to respond privately.
Thanks for sharing: Another member thanked the developer for sharing the blog post about creating a Discord bot using LM Studio.
- They expressed appreciation for the tutorial and the provided GitHub link.

Link mentioned: GitHub - mrdjohnson/lmstudio-discord-bot: A tutorial for creating a Discord bot that responds using LM Studio! This code is based on a blogpost found here: https://dev.to/mrdjohnson/i-made-a-discord-bot-with-lmstudiojs-4fd6: A tutorial for creating a Discord bot that responds using LM Studio! This code is based on a blogpost found here: https://dev.to/mrdjohnson/i-made-a-discord-bot-with-lmstudiojs-4fd6 - mrdjohnson/lm...

Perplexity AI ▷ #general (338 messages🔥🔥):

Image generation with Pro subscription

GPTs Agents

Issues with Pro search

Profile and collection prompts

Perplexity's context window

Image generation issue resolved by browser restart: A user reported being unable to generate more than one image with a Pro subscription, but restarting the Chrome browser resolved the issue.
GPTs Agents cannot learn after initial training: A member shared a concern about GPTs agents not learning from additional information provided after their initial training.
Pro search limitations and preferences: Several users discussed that Pro search can sometimes disrupt focus and not adhere to user prompts, making it less preferable for certain tasks.
Profile and collection prompts not always effective: Users discussed that profile and collection prompts often do not influence the AI's search process during Pro search, limiting their utility.
Perplexity's context window and token usage: Perplexity's context window is limited to 32k tokens per turn, but cumulative multi-turn conversations can handle up to 128k tokens.

Links mentioned:

Perplexity AI ▷ #sharing (23 messages🔥):

YouTube tests AI conversational capabilities

OpenAI drops GPT-4.0 mini

Unraveling Chaos project

CrowdStrike global IT outage

Possibilities of developing software

YouTube AI Conversations Take a Test Drive: Perplexity AI explored YouTube testing new AI conversational features to understand their functionality and user interactions.
OpenAI Launches GPT-4.0 Mini: OpenAI recently released the GPT-4.0 Mini, a compact version aimed at improving accessibility while maintaining advanced AI capabilities.
CrowdStrike Faces Major Global IT Outage: CrowdStrike experienced a significant global IT outage, impacting numerous clients and highlighting vulnerabilities in its service infrastructure.
- Community discussions emphasize the need for better resilience and contingency planning for cloud-based security providers.
Mistral NeMo's AI Leap Dissected: Perplexity AI's latest YouTube video covers Mistral NeMo's significant advancements in AI technology and CrowdStrike's global outage alongside a surprising sulfur discovery on Mars.
- 'This shows exciting times ahead for AI developments and space exploration,' says an enthusiast.
Stegosaurus Fossil Auctioned for $44.6M: A Stegosaurus fossil was sold for an astonishing $44.6 million, capturing the interest of both paleontology enthusiasts and art collectors.
- The record-breaking sale underscores the high value placed on rare historical artifacts.

Link mentioned: YouTube: no description found

Perplexity AI ▷ #pplx-api (9 messages🔥):

Feature Roadmap

Perplexity API Token Charges

Online Models Usage

Feature Roadmap Page Disappears: Feature Roadmap page disappeared from the documentation site.
Perplexity API Charges In and Out Tokens: Perplexity API charges for both in and out tokens as clarified in a discussion.
Using Online Models to Explain PDFs: A member used Perplexity's online models to explain PDF documents.

OpenAI ▷ #ai-discussions (170 messages🔥🔥):

Sonnet 3.5 vs 4O Mini

Finetuning multimodal models

Cost-effective TTS solutions

Voice assistant apps

GPT-4o mini vs GPT-5

4O Mini outshines Sonnet 3.5: 4O Mini can solve complex questions that even the entire Claude family struggles with, highlighting its superior capabilities.
Finetuning multimodal models for coral reef research: A user inquired about finetuning multimodal models to study coral reefs, but another suggested using Azure AI's Custom Vision Service for better accuracy.
Choosing cost-effective TTS solutions: A member recommended using OpenAI's text-to-speech service for voice handling in chats.
- Other options like Coqui.ai and running models locally were also discussed for cost efficiency and better performance.
Voice assistant apps with GPT integration: Members discussed voice assistant apps, mentioning VoiceGPT as an open-source option that overlaid ChatGPT listening animations on Android screens.
- The assistant couldn't open apps directly, so alternatives like macrodroid/tasker or integrating with Home Assistant were suggested.
Excitement builds around GPT-4o mini and GPT-5: The community showed excitement for GPT-4o mini, discussing its potential impact and speculating about future advancements like GPT-5.

Link mentioned: System Prompt in Markdown Code Block | Arc Search: Arc Search read websites across the internet to make you this perfect tab.

OpenAI ▷ #gpt-4-discussions (32 messages🔥):

GPT-4o mini replacing GPT-3.5

Differences between GPT-4o and GPT-4o mini

New features for GPT-4o

API vs. ChatGPT features

GPT-4o mini for longform content

GPT-4o Mini to Replace GPT-3.5: A member asked if GPT-4o mini will become the new free model and phase out GPT-3.5, receiving a simple confirmation.
- lumirix: 'Yes.'
API Lacks Real Time Voice Integration: A discussion highlighted that OpenAI’s new voice features are available in ChatGPT but not in the API.
- One member mentioned that real-time chat voice features are unlikely to come to the API soon due to functional differences between the platforms.
ChatGPT Voice vs API Text-to-Speech: Concerns were raised about the latency and quality differences between ChatGPT's new voice feature and the API’s Text-to-Speech endpoint.
- the_big_block_pc: 'I know of the TTS endpoint but that can add a lot of latency and doesn't sound as good as the new ChatGPT voice feature.'
Features Rollout: API vs ChatGPT: One member questioned why new features like voice are added to ChatGPT before the API.
- It was suggested that user demand for ChatGPT features might be higher and that real-time chat is more of an end-user experience rather than a developer tool.
GPT-4o Mini for Longform Content: Discussion emerged on whether GPT-4o mini can generate longform content like YouTube scripts or stories.
- A member expressed interest in using the API for such tasks, to incorporate Text-to-Speech and maybe a video model.

OpenAI ▷ #prompt-engineering (9 messages🔥):

Solving Mathematical or Logical Problems Accurately

Custom Instructions in ChatGPT

Using Custom Instructions Effectively

Guidance AI for Prompt Engineering

Mathematical problem accuracy issue: A user discussed a method for solving mathematical or logical problems step-by-step but was concerned about losing accuracy if only the final answer was given.
Custom Instructions issue when modifying previous responses: A user shared a problem with ChatGPT rewriting entire texts instead of modifying specific parts and asked for a solution.
- Another member suggested using the 'Customize ChatGPT' section in the settings, but the issue persisted for the user.
Detailed help offered for Custom Instructions issue: A suggestion was made to share the custom instructions, a shared link of the chat, and specifics on desired changes for further help.
Guidance AI for prompt engineering: A user inquired about the use of Guidance AI or other tools for prompt engineering.

OpenAI ▷ #api-discussions (9 messages🔥):

Improving ChatGPT Accuracy

Modification of ChatGPT Responses

Custom Instructions for ChatGPT

Prompt Engineering Tools

Tackling Math Problems with Exact Answers: A user raised a concern about generating isolated final answers for math problems, but noted the method caused a drop in accuracy.
Improving ChatGPT Response Modifications: A user, Sparrow.hwk, asked for help with instructing ChatGPT to modify only specific parts of text without repeating the entire response.
Sharing Custom Instructions for Better Help: Sparrow.hwk tried custom instructions but faced repeated issues.
Prompt Engineering Tools Used: A user asked if others use tools like Guidance AI for prompt engineering or if they use alternative methods.

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Rankings page update

Infrastructure migration

Rankings Page Slows Due to Migration: The rankings page will be slow to update over the weekend and often present stale data while the migration to new infrastructure occurs.
- Users should expect delays and inaccuracies in rankings during this timeframe.
Infrastructure Migration Causes Delays: The notice pointed out that the infrastructure migration will specifically impact the rankings page over the weekend.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

OpenRouter provider for GPTScript

gptscript on command line

gptscript demo video

OpenRouter provider for GPTScript now available: A new OpenRouter provider for GPTScript has been announced, with an image and detailed description on GitHub.
- This tool contributes significantly to the development of GPTScript applications.
GPTScript impresses on command line: Discussion highlights the gptscript GitHub repo, noted for its capability to build AI assistants that interact directly with systems.
- One member mentioned an impressive demo video available on the repository page.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #general (202 messages🔥🔥):

Hermes 2.5

GPTs Agents

OpenRouter Feature Requests

Model Merging

Dolphin Llama 70B

Dolphin Llama 70B's Performance Issues: Using Dolphin Llama 70B at 0.8-1 temperature led to erratic behavior in a 7k token context chat, producing incoherent content split between code and unrelated output.
- Another member noted similar issues with Euryale 70B's fp8 quantized models, suggesting potential problems stemming from the quantization process.
DeepSeek's Low Cost and Efficiency: DeepSeek v2, a 236B parameter MoE model (21B activated per token), is praised for its excellent performance and cost-efficiency at $0.14 per input token.
- “DeepSeek’s pricing is very competitive, and it seems to be hugely profitable,” explaining their strategy of using high batch sizes and compression techniques.
GPT-4o mini's Multilingual Capabilities and Code Performance: Members discussed that GPT-4o mini performs worse in coding tasks compared to DeepSeek, but better in multilingual capabilities than gemma2-27b.
- One member noted, “4o mini seems better at multilingual capabilities compared to gemma2-27b, but worse at reasoning.”
Leaked Information on Llama 3.1 405B: Llama 3.1 405B Base apparently leaked early due to a misstep on HuggingFace, sparking discussions about its extended context capabilities via RoPE scaling.
- Members are excited, anticipating software updates for efficient utilization and eager for the official instruct model release.
Issues with Free vs Paid Model Limits: A user discovered that free model variants like google/gemma-2-9b-it:free have stricter token limits (4096 tokens) compared to their paid counterparts (8192 tokens).
- The disparity led to confusion and error messages, prompting discussions on potential misconceptions or misconfigurations in how token limits are enforced.

Links mentioned:

Cohere ▷ #general (156 messages🔥🔥):

Open source contributions

LLM use in RPG games

Nextjs app router update

Developer office hours format

Integrating Cohere models

Open Source Contributions Welcomed for Toolkit: Community members highlighted the openness of the toolkit, encouraging others to build, change, and contribute to the project.
- "I'd happily contribute if I could," one member expressed, showing enthusiasm for collaborating on the toolkit.
Using LLMs in RPG Games: Lyrcaxis discussed using LLMs for classifications, JSON, and dialogue generation for their AI-powered RPG project.
- After offloading classification tasks to local models, CMD-R is their top choice for dialogue generation due to its strict instruction following.
Nextjs App Router Update: Hamedmp appreciated the timely update of the toolkit to Nextjs app router, expressing excitement about incorporating components into projects.
- "I'd still love to see more support for Cohere in the AI SDK," they mentioned, noting the need for extended tool call support.
Tweaking Developer Office Hours Format: The switch from Discord to Twitter Spaces for developer office hours was debated, with concerns about accessibility and platform dependencies.
- "Maybe a mirror could be possible," suggested one member, advocating for combined Discord and Twitter formats for broader reach.
Integrate Cohere Models into Writing Work: Petersmall discussed using Gemini and ChatGPT for narrative creation and considered adding a Cohere-based product for enhanced outputs.
- After testing, they found that responses from Cohere products like CMD-R provided impressive results and complemented other AI tools.

Links mentioned:

Cohere ▷ #project-sharing (4 messages):

chat GUI with local LLMs

multiplayer text games Discord app

Exciting Chat GUI with Local LLMs Project: A member shared an ongoing project of a chat GUI powered by local LLMs with features like Web Search, Python Interpreter, and Image Recognition. They've provided the GitHub repository for more details.
Create and Play Multiplayer Text Games on Discord: A member introduced a Discord app, Command R+, allowing users to create and play multiplayer text games.

Link mentioned: Create 'n' Play - Create AI multiplayer text games for your Discord server! | Product Hunt: AI-powered Discord bot turns your server into a text game paradise! Craft ANY game with our /search-games command. Endless possibilities—it's the ultimate community engagement tool. Let AI fuel y...

Cohere ▷ #announcements (1 messages):

Developer Office Hours

Structured Generations in the API

Cohere Toolkit features

Cohere For AI research papers

Community events

Cohere Developer Office Hours 2.0 Announced: Cohere hosted another Developer Office Hours session discussing recent product updates with <@1199000650982379524>.
- They covered new features in the API, toolkit updates, and recent Cohere For AI research papers, inviting the community to join.
Structured Generations in Cohere API: Cohere announced that their models, Command R and Command R+, can now generate structured outputs in JSON format.
- This feature (documented here) allows better integration and data analysis for downstream applications.
New Cohere Toolkit Features in July 2024: Cohere and Fujitsu unveiled their strategic partnership offering new enterprise AI services for the Japanese market, announced via the blog.
- This collaboration aims to enhance AI service accessibility and performance in various applications.
Upcoming Community Events: Cohere mentioned upcoming community events, encouraging participation and engagement from members.
- They urged everyone to join the session with questions and to connect over a cup of coffee.

Links mentioned:

Eleuther ▷ #general (13 messages🔥):

Softmax invariance and z loss

Understanding large language models (LLM)

GPT model example code

Scaling diarization pipelines

Vector quantization models

Softmax is shift invariant, needs z loss: A member pointed out that softmax is shift invariant, necessitating the use of z loss.
- Official reasons include keeping logits from drifting too far from zero and encouraging normalized log-probabilities.
Simple GPT model explained: A member shared a minimal GPT model code to help understand its workings, incorporating components like LayerNorm, Linear layers, and scaled dot product attention.
- Example code resembles Alibaba’s easydist/benchmark/GPT model.
Scaling diarization and vector quantization models: A member sought insights on scaling diarization pipelines and vector quantization models, considering running a slow pipeline with Whisper and Pyannote to pretrain an LSTM.
- They also explored training a model to produce unified codebooks from composite audio, keyframes, and text, leveraging Perceiver IO as an encoder.

Link mentioned: easydist/benchmark/torch/model/gpt.py at 3dbb146812fddf6259590a0b4611a251f3e7cbe5 · alibaba/easydist: Automated Parallelization System and Infrastructure for Multiple Ecosystems - alibaba/easydist

Eleuther ▷ #research (39 messages🔥):

Self-Modeling in AI

Hybrid Post-Quantum Encryption

Feature Contamination in Neural Networks

Human-like Response Time in CNNs

Efficient Dictionary Learning with Switch SAE

Self-models Simplify Neural Networks: The paper on self-models shows that networks predicting their internal states as an auxiliary task become simpler and more regularized, making them more parameter-efficient.
- However, the results using datasets like MNIST, CIFAR, and IMDB show expectations rather than surprising insights, with some members questioning the novelty of these findings.
Implementing Hybrid Post-Quantum Encryption: The liboqs-python GitHub repository describes using Kyber/Dilithium python bindings and Podman for rootless container management to implement hybrid post-quantum encryption.
- This approach aims to minimize attack surfaces during this vulnerable time, showcasing advancements in secure encryption techniques.
Feature Contamination Limits OOD Generalization: A paper on out-of-distribution generalization finds that neural networks suffer from feature contamination, where irrelevant features can hinder their performance.
- The discussion suggests that inductive biases and SGD dynamics play a crucial role, hinting at a potential unified theory to explain model failures.
RTNet Mimics Human Response Times: The RTNet model reproduces human-like response times and confidence in decision-making by utilizing stochastic decisions.
- Although the model's practical applications are debatable, it provides an accurate prediction of human behavior on image classification tasks by using sampled Gaussian weights.
Switch SAE for Efficient Dictionary Learning: A new architecture, Switch SAE, is proposed for scaling sparse autoencoders efficiently, aiming to recover features from superintelligent language models.
- Leveraging conditional computation, Switch SAE offers a practical solution to scale SAEs to billions of features, overcoming current computational limitations.

Links mentioned:

Eleuther ▷ #scaling-laws (3 messages):

Scaling laws and hypernetworks

Brains and backprop

Scaling Exponents Across Parameterizations and Optimizers

Hypernetworks constrained by scaling laws: A user discussed the constraints hypernetworks face due to scaling laws, arguing that for a hypernetwork to achieve a target error, its size must be less than O(scaling_law(output_model_compute)(target_error)).
- The user mentioned that for hypernetworks to be effective, the task of predicting a neural network has to be simpler or the output model's scaling law must be 'really nice'.
Brains only approximate backprop: A statement was made that brains only approximate backprop, suggesting a possibility of alternative learning mechanisms beyond traditional backpropagation.
Scaling Exponents Across Parameterizations and Optimizers: A tweet discussed a paper on scaling exponents across different parameterizations and optimizers, involving 10,000+ models with varying optimizers, model sizes, and parameterizations.
- Key findings include O(1/n) LR schedule outperforming mUP, successful hparam transfer across all tested configurations, and a proposed Adam-atan2 optimizer avoiding gradient underflow issues seen in Adam.

Link mentioned: Tweet from main (@main_horse): Scaling Exponents Across Parameterizations and Optimizers [GDM] [nocode/weights] https://arxiv.org/abs/2407.05872 trains 10,000+ (!) models, varying * optim (SGD/Adam/Adafactor) * model size (1.1B ~...

Eleuther ▷ #interpretability-general (9 messages🔥):

MATS 7.0 Applications

**nnsight** Paper Release

Apollo's Mech Interp Projects List

Tokengrams Project

Suffix Arrays for Pile

MATS 7.0 Applications Open Now: Neel Nanda and Arthur Conmy have opened applications for their Winter MATS 7.0 streams, with the deadline on August 30th.
- Nanda highlighted the program's unique value in fostering mechanistic interpretability research and provided an announcement link and an admissions doc.
New Mech Interp Paper: nnsight: A new paper titled nnsight will be available on arXiv soon.
- It's expected to cover significant advancements in mechanistic interpretability.
Apollo Shares 45 New Mech Interp Project Ideas: Apollo Research shared a list of 45 mechanistic interpretability project ideas in a recent Alignment Forum post.
- The post prompts a discussion on making small language models more useful for interpretability research.
Tokengrams Project Inquiry: Users inquired about obtaining the document.bin file for datasets like the Pile and Dolma for the Tokengrams project.
- A team member confirmed the upcoming release of suffix arrays for shards of the Pile dataset, with other datasets being considered in the future.

Links mentioned:

Eleuther ▷ #lm-thunderdome (56 messages🔥🔥):

Zeno Upload Feature

Commit Branch Queries

Logging Issues

Multinode Inference Support

PSA: ICML Conference

Zeno Upload Feature Confusion: A member faced issues with the Zeno upload feature in visualize_zeno.py, particularly with the get_latest_filename function.
- It was recommended by another member to use the main branch instead of big-refactor, and to use pip install -e . for correct installations.
Commit Branch Version Clarification: There was confusion regarding the version number in pyproject.toml showing 0.4.3 while the README announces 0.4.4.
- Members agreed that this discrepancy needed a check, with one suggesting the need to update the README and add a new FAQ document.
Logger Information Not Printing: A member reported that eval_logger.info statements in lm-eval-harness were not printing while print statements worked fine.
- It was confirmed that the install was editable, and suggestions were made to check logger configurations.
Multinode Inference for Large Models: Questions were raised about the ability of the eval harness to support inference from sharding across 2 nodes for large models.
- It was discussed that a PR from the Open LLM Leaderboard team could enable this, and using vllm with its inter-node PP might be an effective solution.
PSA: ICML Conference Attendance: A public service announcement was made that some team members would be attending ICML, potentially delaying PR reviews.
- Members were encouraged to reach out in the channel or meet at ICML for faster responses and discussions.

Link mentioned: Refactor API models by baberabb · Pull Request #2008 · EleutherAI/lm-evaluation-harness: This PR introduces a new superclass for API request models, providing: Modularity for downstream classes Overloadable methods for request transformation, API requests and response parsing Tokeniza...

Eleuther ▷ #multimodal-general (4 messages):

Latent downscaling

Image classification performance

Generated latents

Latent Downscaling Challenges: A user discussed the challenge of applying operations directly to latents and suggested downscaling images before encoding them.
Generated Latents with ImageNet: A user mentioned creating a generated version of ImageNet with a native resolution of 128x128x4 for the generator and 64x64x4 for the classifier, achieving a 20% performance boost through image resizing versus naive latent resizing.
- They are exploring cost-effective methods for latents to achieve similar classification performance benefits.

Eleuther ▷ #gpt-neox-dev (19 messages🔥):

Nemotron-340B specifics

Nathan's bounty for Nemotron-340B conversion

vLLM multinode inference

Multi-node performance for Nemotron-340B

Evaluation harness discussion

Nathan offers bounty for Nemotron-340B HF conversion: Nathan offers a bounty starting at $75 for converting Nemotron-340B-Instruct to HuggingFace with FP8 quantization and multi-node HF implementation.
- The bounty has now grown to over $2,000 with added donors from the synthetic data community.
Debating Nemotron-340B's architecture uniqueness: Members discuss that Nemotron-340B is essentially standard with few unique components like sqrelu and custom rope pct.
- Hailey notes, if we had a setup we could easily load the model on, I could add it to vllm without too much difficulty.
vLLM multinode inference feasibility: Hailey discusses the feasibility of vLLM multinode inference but isn't sure about the performance stating, I actually don’t know how good if at all vllm multinode is. I think bad?.
- Stella notes that a good and easy to run multinode inference setup doesn't exist currently and isn't particularly reasonable for most available hardware.
Multi-node performance and testing challenges: The group acknowledges that supporting Nemotron's architecture and running it efficiently multi-node are conflated issues, indicating a lack of multiple nodes for testing.
- Tastybucketofrice believes the setup is doable, and comments, I'll do it tn after seeing the bounty increase.
Nerdsniping and evaluation harness: Catboy_slim_ shows interest in scutwork and discusses integrating a uncheatable eval into the evaluation harness.
- Baber_ humorously notes, It ceases to be uncheatable once you add it to the harness.

Link mentioned: Tweet from Nathan Lambert (@natolambert): I'm offering a paid bounty to successfully convert nvidia/Nemotron-4-340B-Instruct to HuggingFace / related libraries. Starting reward $75 We really need this to unlock synthetic permissive data...

Interconnects (Nathan Lambert) ▷ #news (61 messages🔥🔥):

Nemotron-4-340B conversion to HuggingFace

Llama-3 and 3.1 leaks

Meta AI's potential premium offerings

Distillation techniques for large models

SOC2 compliance for HuggingFace

Nemotron-4-340B conversion to HuggingFace: Nathan Lambert is offering a paid bounty to convert nvidia/Nemotron-4-340B-Instruct to HuggingFace, with initial donations totalling $75.
- The goal is to unlock synthetic permissive data and enable distillation projects, requiring both FP8 quantization and multi-node implementation.
Llama-3 and 3.1 leaks spark excitement: Rumors and leaks about Llama-3 405b and Llama 3.1 models, including benchmarks and potential features, were widely discussed, with links pointing to specific Azure's GitHub benchmarks and community Reddit threads.
- Leaked benchmarks show Llama 3.1 outperforming GPT-4 in several areas, excluding HumanEval, sparking conversation on its potential superiority.
Meta AI's potential premium offerings: There is speculation that Llama 405B might be part of a Premium offering from Meta AI, as suggested by snippets of code and a tweet by Testing Catalog.
- A possible Meta AI API platform, AI Studio, was also hinted at, creating buzz around the upcoming July 23 announcements.
SOC2 compliance concerns for HuggingFace: Discussion highlighted that HuggingFace's SOC2 compliance might be causing some issues, though no specific details were provided.
- Nathan Lambert expressed surprise that HuggingFace has SOC2 compliance, suggesting it could be contributing to delays or complexities.
Discussion on distillation for Llama 3.1: Nathan Lambert considered writing an article on distillation techniques, inspired by the potential impact on Llama 3.1.
- There was speculation that a significant portion of Llama 3.1's performance gains could be attributed to distillation methods, similar to Gemma 2.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #ml-questions (4 messages):

WizardMath paper

Instruction Reward Model (IRM)

PRM by Uesato et al 2022 paper

Step-by-step reward labeling

UltraChat vs Zephyr paper

Instruction Reward Model in WizardMath sparks curiosity: WizardMath paper introduces the Instruction Reward Model (IRM) which rates the quality of instructions and uses this rating to influence PPO rewards.
- A user questioned whether conditioning on instruction quality is valuable and wondered if similar ideas are used elsewhere.
Binary vs Categorical vs Scalar Rewards for Reasoning Steps: A user compared the PRM system from Uesato et al 2022, which uses binary rewards, with the Let's Verify Step by Step paper that uses categorical labeling (positive, negative, neutral).
- They questioned why researchers might choose different reward systems for reasoning steps in model training.
Debate over UltraChat's Data Quality: A user noted that the Zephyr paper significantly filtered UltraChat data from 1.5M to 200k and asked for opinions on UltraChat's generation process.
- They compared the top-down approach of UltraChat with the diversely-sourced seed documents approach (e.g., WRAP/Rephrasing the Web), questioning which method is more effective.
Surprising Effectiveness of UltraChat: A user expressed surprise at the effectiveness of UltraChat despite significant filtering and generation process scrutiny.
- Another user acknowledged the comment but didn't elaborate further.

Interconnects (Nathan Lambert) ▷ #ml-drama (3 messages):

ICML 2024 Spotlight

Lit Review Concerns

Harvey Legal AI Criticism

ICML 2024 celebrates Faithfulness Measurable Models: Andreas Madsen announced their ICML 2024 spotlight on a new approach for interpretability: Faithfulness Measurable Models, boasting 2x-5x better explanations and accurate faithfulness metrics at no cost.
- A user pointed out that this closely resembles their 2021 NeurIPS paper, emphasizing the need for improved literature reviews in submissions and reviews.
Legal AI company Harvey slammed as 'smoke and mirrors': A comment on Harvey, the legal AI company predicted its failure, dismissing it as 'smoke and mirrors.'
- Emilyinvc bluntly predicted that Harvey would end up as 'roadkill on the side of the highway.'

Links mentioned:

Interconnects (Nathan Lambert) ▷ #random (52 messages🔥):

Interview Invitations

MosaicML Sword Tradition

Claude AI Text Restrictions

Reward Model Innovations

Stripe Settings Issue

Lambert reflects on interview strategies: Lambert considered reaching out to various high-profile individuals like Karpathy, Sebastian Raschka, and Gary Marcus for interviews, noting a reluctance to contact Andreessen due to political concerns.
- He also mentioned upcoming plans involving Ross Tayler and Andrew Trask, and expressed excitement about potential collaborations with HuggingFace.
MosaicML's sword gifting phased out: Discussion revealed that the tradition of gifting swords at MosaicML was phased out with the advent of an HR department, shifting to more professional norms.
- A humorous note mentioned members of Databricks' legal team allegedly receiving swords, and Lambert toyed with the idea of introducing Interconnects swords.
Claude AI faces text restrictions: Members discussed Claude AI refusing certain texts, particularly sacred texts like the 'I Have a Dream' speech.
- A workaround involved pre-filling responses to overcome using restricted texts in the API.
Innovations in reward models discussed: Members shared links and discussed the Absolute-Rating Multi-Objective Reward Model which boasts of Mixture-of-Experts aggregation.
- The discourse included a mention of RewardBench challenges and the relative youth of Reward Models (RMs).
Stripe settings phone number issue: Lambert was dealing with an issue where his phone number appeared on receipts due to Stripe settings.
- He joked about switching to a Google Voice number and discussed alternative options like virtual mailboxes.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #nlp (2 messages):

Blog post on distillation

Lilian Wang

Surprise at lack of resources

Seeking blog post on distillation: A member inquired if anyone has a favored blog post on distillation.
Surprise at lack of Lilian Wang's work: Another member expressed surprise at the absence of a comprehensive Lilian Wang blog post on the topic, suggesting there isn't a 20k word post available.

Interconnects (Nathan Lambert) ▷ #reads (2 messages):

Yitay blog post on model architectures

Encoder vs. Encoder-Decoder models

Evolution of LLMs

@srush_nlp tweet

Yitay introduces series on model architectures: Yitay's blog post discusses the shift from encoder models like BERT to new trends in LLMs.
- He aims to update those interested in language and NLP on the evolution of model architectures, referencing a deleted tweet by @srush_nlp.
Where have all the encoder models gone?: Yitay tackles the question of why scaling BERT and similar encoder models fell out of favor despite their success.
- He plans to explore this topic in-depth in a series of blog posts, starting with the linked primer.

Link mentioned: What happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives — Yi Tay: A Blogpost series about Model Architectures Part 1: What happened to BERT and T5? Thoughts on Transformer Encoders, PrefixLM and Denoising objectives

Latent Space ▷ #ai-general-chat (74 messages🔥🔥):

Langfuse vs Langsmith

GPT-4o mini and AI-generated content

Rumors about Harvey AI

Elon Musk's Memphis Supercluster

LLaMA 3.1 leaks and evaluations

Langfuse Outshines Langsmith: Anecdotal feedback from users suggests that Langfuse is performing better than Langsmith, with positive experiences shared about its ease of self-hosting and integration.
- Clemo_._, the founder, encouraged more community interaction, emphasizing their commitment to maintaining a great OSS solution.
GPT-4o Mini Enables AI-generated Content: OpenAI's new GPT-4o mini model costs $0.15 per 1M input tokens, making it possible to create dynamic AI-generated content supported entirely by ads.
- Discussion includes the potential impact on web content, hypothesizing a shift towards more AI-generated outputs.
Harvey AI Rumors and Predictions: Rumors and skepticism about Harvey AI's viability emerged, with some calling it a 'smoke and mirrors company'.
- Debates ensued about the challenges facing vertical AI startups, including dependency on big AI labs and the industry's current cycle.
Elon Musk's Memphis Supercluster: Elon Musk announced the launch of the Memphis Supercluster, claiming it to be the world's most powerful AI training cluster with 100k liquid-cooled H100s on a single RDMA fabric.
- However, fact-checks reveal discrepancies in power usage and GPU availability, suggesting that the facility is not yet fully operational.
LLaMA 3.1 Leaks Spark Excitement: Leaked evaluations for LLaMA 3.1 suggest that its 8B, 70B, and 405B models might outperform current state-of-the-art models, even before instruct tuning.
- These leaks have led to widespread anticipation and speculation about the future capabilities of open-source AI models.

Links mentioned:

Latent Space ▷ #ai-announcements (1 messages):

swyxio: big monthly recap is up: https://x.com/latentspacepod/status/1815411709085143197

Latent Space ▷ #ai-in-action-club (29 messages🔥):

Audio Issues

Layout Detection

Texify vs Mathpix

Presentation Feedback

Model Training Dataset

Audio Issues Plague Meeting: Several members reported audio issues, with nuvic_ being heard by none while vikas.p was clear to other members.
- The problem was not isolated as slono observed similar issues occurring for different people on Discord.
Layout Detection Demystified: Discussion focused on the mechanics of layout detection, speculating whether it's based on classical object detection with extensive training data.
- Members appreciated the explanation of task decomposition in the context of layout detection and reading order models.
Texify Versus Mathpix: A question was raised about how Texify compares to Mathpix in terms of performance and usage.
- The query did not receive a direct comparison but generated interest in the distinctive methodologies used by both tools.
Presentation Receives High Praise: This whole presentation was 🤯 👏, said a member, indicating strong approval of the session.
- The session concluded with overwhelming positive feedback and thanks from various attendees.
Query on Training Datasets: Members were curious about the creation of training datasets, asking whether the reading order model's labels were manual or heuristic.
- The explanation provided was well-received, with a positive acknowledgment from the members.

Link mentioned: VikParuchuri - Overview: VikParuchuri has 90 repositories available. Follow their code on GitHub.

OpenAccess AI Collective (axolotl) ▷ #general (64 messages🔥🔥):

Training Qwen2-7b

Triplex for Knowledge Graphs

Mistral 12b Issues

LLaMA 3 Inconsistencies

LLaMA 3.1 Benchmarks

Qwen2-7b Training Setup Issues: A member inquired about the configuration for training Qwen2-7b, seeking guidance on the appropriate settings to use.
- Specifically, the user asked if anyone had the config for running dolphin-2.9.2-qwen2-7b.
Triplex Cuts Knowledge Graph Costs by 98%: SciPhi's new Triplex model reduces knowledge graph creation costs by 98%, outperforming GPT-4 at a fraction of the cost.
- This model, which extracts triplets from unstructured data, can operate locally, making knowledge graphs accessible and less expensive.
Mistral 12b Faces Tokenizer Problems: Multiple members reported significant issues with Mistral 12b, particularly with its tokenizer outputting text without spaces.
- Despite promising loss metrics, the outputs were deemed 'garbage', indicating unresolved issues potentially tied to special tokens.
LLaMA 3.1 EOS Token Issue: A discrepancy in the EOS token configuration for LLaMA 3 repositories on Hugging Face was identified, causing significant problems.
- The EOS token was set incorrectly, and a member provided updated token configurations to resolve this issue.
LLaMA 3.1 Benchmarks Impress Community: Members were highly impressed by the benchmarks for LLaMA 3.1, noting particularly strong performances by the 8B and 70B models.
- The 405B model also performed well, but the 70B model was remarked as being very close behind the leading models.

Links mentioned:

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (22 messages🔥):

Knowledge Distillation Support

DPO Improvements

DeepSpeed Zero-3 Compatibility

Members discuss adding Knowledge Distillation support: Members brainstormed adding support for knowledge distillation with pretokenized datasets, specifically focusing on providing logits for each input id instead of labels.
- They considered the feasibility within the hf trainer constraints and noted TRL might not provide a compatible trainer out of the box.
Conversations around merging DPO improvements: Members initiated merging DPO improvements and discussed the necessity of achieving parity with updated chat_template functionalities.
- The DPO enhancements may require a deep dive into TRL's DPO Trainer for finer controls like masking out parts of the query during loss computation.
DeepSpeed Zero-3 compatibility issues resolved by members: A user encountered a ValueError with DeepSpeed Zero-3 stating incompatibility with low_cpu_mem_usage=True and custom device_map.
- The issue was resolved by deleting the accelerate config, which allowed the setup to proceed without errors.

Links mentioned:

OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (5 messages):

Axolotl training error

GPU memory issues

Batch size adjustment

Mixed precision training

NCCL timeouts

Axolotl training hits GPU memory roadblocks: Member asked about the cause of an error during Axolotl training, and Phorm responded that it is likely due to running out of GPU memory, which is common with large models or batch sizes.
- Phorm suggested troubleshooting steps such as reducing batch size, adjusting gradient accumulation steps, and using mixed precision training to mitigate the issue.
Tackle NCCL timeouts for smoother training: For NCCL timeouts during distributed training, Phorm directed users to refer to the NCCL guide for detailed troubleshooting tips.

Links mentioned:

LangChain AI ▷ #general (57 messages🔥🔥):

LangChain.js Token issues

Video and blog post on LLM lessons

Beginners guide on LangChain

Vector store filtering

Discussion on deploying RAG app to production

LangChain.js Token issues persist: A user inquired if the LangChain.js Token for 4omini is still broken.
- No specific response or resolution was mentioned in the channel.
New video and blog post on LLM lessons: A member shared a video and blog post highlighting lessons from a year of building with LLMs, recommending viewers to watch the video for better understanding.
- The post summarized a three-part series by six practitioners focusing on tactical, operational, and strategic insights.
Beginner's guide on LangChain: A user shared a Medium article offering a beginner-friendly introduction to LangChain and its components.
- The article aims to explain why these components are used and their importance, using relatable examples like JARVIS from Iron Man.
Filter application in vector stores: Multiple users discussed applying filters when using MongoDB Atlas VectorStore within LangChain, with detailed code snippets provided.
- Methods for custom retrievers and integrating these with the EnsembleRetriever were also explained.
Deploying RAG app to production: A member shared a tutorial on using MongoDB Atlas with LangChain to build a RAG implementation.
- The tutorial covers setting up the environment, storing data, creating search indices, and running vector search queries.

Links mentioned:

LangChain AI ▷ #share-your-work (6 messages):

Triplex model

Vector visualization for embeddings

Semantic search with LangChain

AI function builder for TypeScript

LangChain tutorial

Triplex slashes knowledge graph creation costs: The newly open-sourced Triplex by SciPhi.AI reduces knowledge graph creation costs by 98%, outperforming GPT-4 at 1/60th the cost.
- Triplex, a finetuned version of Phi3-3.8B, extracts triplets from unstructured data, enhancing RAG methods like Microsoft's Graph RAG.
Visualize vectors in NLP using graphs: A new GitHub project was created to help visualize vectors on a graph, making it easier to understand embeddings in NLP tasks.
- Graphs are easier to understand than equations and can aid in text classification, clustering, and recommendation systems.
Semantic Search tutorial with LangChain: A new blog post on Substack dives into implementing semantic search using LangChain, Cohere LLM, and ApertureDB.
- The author described implementing a chat module using Cohere's Command R+ and encouraged feedback from readers.
AI-powered function builder for TypeScript: A new project called AI Fun was developed at a hackathon to build LLM-powered functions for TypeScript.
- The project leverages AI to automate and simplify TypeScript function building processes.
Create a Scheduler Agent with Composio and LangChain: A detailed guide was shared for creating a Scheduler Agent that uses Composio, LangChain, and ChatGPT to schedule events based on emails.
- The guide aims to empower users to leverage these tools for more efficient task management.

Links mentioned:

LangChain AI ▷ #tutorials (3 messages):

LangChain article on Medium by Harshit Ambalia

Guide for deploying a RAG app

Scheduler Agent guide using Composio, LangChain, and ChatGPT

LangChain beginner-friendly article published: A user shared a Medium article about LangChain and its components, aimed at beginners interested in understanding its applications.
- Imagine having a virtual assistant that can handle complex tasks through simple natural language commands, the article delves into why these components are important.
Guide for Scheduler Agent using Composio: A user posted a GitHub guide for creating a Scheduler Agent leveraging Composio, LangChain, and ChatGPT to schedule events based on received emails.
- The guide provides detailed steps and the user encouraged others to try it out and star the repository.

Links mentioned:

LAION ▷ #general (34 messages🔥):

sdxl vae latents

HF hosting advantages

new BUD-E demo

local LLM on Linux terminal

Kolors diffusion model

Saving storage with sdxl vae latents: Ramimmo discussed the potential benefits of using sdxl vae latents to reduce storage costs for large image datasets.
- Puffy310 raised concerns about copyright implications, but Nodja clarified that latent data compression does not exempt from copyright laws.
HF pays storage bills, use it: Thejonasbrothers recommended uploading datasets to Hugging Face since they cover S3 storage costs, making it a cost-effective solution.
- Unknown pointed out that despite this, HF remains profitable, hinting at efficient financial management.
Watch new BUD-E demo on YouTube: Spirit_from_germany shared a new demo of the BUD-E voice assistant on YouTube.
- The demo invites the community to join their Discord and help build the assistant together.
Kolors model runs on 3090: Spirit_from_germany inquired about running the Kolors diffusion model on an NVIDIA 3090.
- Segmentationfault8268 confirmed its compatibility, recommending the ComfyUI flow and using Int8 precision for optimal performance.
Seeking local LLM for Linux terminal: Alexiosthesixth looked for a local LLM that runs in a Linux terminal via CPU, due to an AMD card limitation.

Links mentioned:

LAION ▷ #announcements (1 messages):

Bud-E voice assistant

Daily Online-Hackathons

BUD-E Discord Server

Bud-E presents new demo with open-source goals: A demo of the Bud-E voice assistant was shared, showcasing the vision of a future where everyone has access to highly capable, open-source systems for the cost of electricity.
- The code base currently optimized for Ubuntu will be restructured for clean separation between client, server, and interchangeable ASR, TTS, LLM components.
Join the BUD-E discord server for collaboration: Volunteers are invited to join the new BUD-E discord server to help develop the voice assistant further and contribute new skills akin to Minecraft Mods.
- Daily Online-Hackathon meetings will occur every day at 9 PM CEST to onboard new volunteers and coordinate project work.
Daily Online-Hackathons kick off for BUD-E development: Beginning today (Monday 22. July), daily Online-Hackathon-Meetings will be held at 9 PM CEST to provide an overview, onboard volunteers, and coordinate project efforts.
- These meetings will take place in a dedicated voice room on Discord: https://discord.gg/nMexRzbJ3W.

LAION ▷ #research (14 messages🔥):

Plotting Loss Curves

Mem0 AI Memory

Datadog Time Series Modeling

Research Recruitment

Switch Back to Epochs for Plotting Loss Curves: A member initially plotted their loss curves with wall-clock time but found it more meaningful to measure model learning efficiency before regretting and switching back to epochs.
- The member mentioned the ease of using WandB for this purpose but admitted the change was incorrect and a 'foolish' decision.
Mem0 Introduces Smart Memory Layer for LLMs: Mem0 has released a memory layer for Large Language Models, enabling personalized AI experiences with features like user, session, and AI agent memory, and adaptive personalization.
- For more information on integration and features, view the GitHub page for Mem0.
Datadog Publishes SOTA Results in Time Series Modeling: A member shared that Datadog has published state-of-the-art results on time series modeling and is actively recruiting for research roles.
- Datadog's foundation models aim to handle time series data effectively by identifying trends, parsing high-frequency data, and managing high-cardinality data.

Links mentioned:

LlamaIndex ▷ #blog (8 messages🔥):

PostgresML for Reranking

LLMs as Judges

Merlinn: Open-source On-call Copilot

Multimodal RAG with Ollama and Qdrant

Deasie RAG Workshop

Enhance Results with PostgresML Reranking: Using PostgresML for reranking can significantly boost the relevance of your search results with an extra parameter or two.
- A guest post on the blog details how this managed index approach works.
LLMs as Judges in Production: A recording featuring Yixin Hu and Thomas Hulard showcases how to use LLMs as judges to bring an application to production.
- This session covered key concepts and practices behind RAG evaluation for development.
Merlinn: AI-powered On-call Copilot: Merlinn is an open-source, LLM-powered Slack assistant that listens to and resolves production incidents.
- It integrates with observability and incident management tools like Datadog.
Multimodal RAG with Ollama and Qdrant: An introductory article by Pavan Mantha explains setting up a multimodal RAG application with Ollama and Qdrant.
- It details how to ingest audio/video sources through text transcription and index the multimodal data.
Improving RAG with Advanced Parsing and Metadata: A workshop with Deasie cofounders discusses improving RAG with advanced parsing and metadata, recording available on YouTube.
- Key takeaways include that both parsing and metadata significantly enhance RAG performance.

Links mentioned:

LlamaIndex ▷ #general (33 messages🔥):

llama-parse API issues

ReActAgent max iterations

VectorStoreIndex embedding model

LlamaIndex webinar

Extracting pictures from PDFs

llama-parse API displays garbled output: Users reported issues with the llama-parse API producing garbled outputs, consisting of symbols like :->|>11_NaN<|<-:.
- One member suggested re-running the job and sharing the JobID for log examination.
ReActAgent hits max iterations: A member faced the ValueError: Reached max iterations while using ReActAgent with the retriever.
- It was suggested to increase the max_iterations value, but concerns about the agent being stuck were raised.
Specifying custom embedding model for VectorStoreIndex: A user wanted to use a custom embedding model with VectorStoreIndex instead of defaulting to OpenAI calls.
- The solution involved setting Settings.embed_model to the custom model globally or directly passing the model during initialization.
Recent LlamaIndex webinar available on YouTube: A member asked where to find the latest LlamaIndex webinar recording.
- The webinar is available on the LlamaIndex YouTube channel titled 'Improving RAG with Advanced Parsing + Metadata Extraction'.
Extract pictures from PDFs with llama-parse: A user inquired about extracting pictures and corresponding captions from PDFs using LlamaIndex.
- It was advised to use the JSON mode in llama-parse to return images and leverage multi-modal LLMs for further processing.

Links mentioned:

LlamaIndex ▷ #ai-discussion (4 messages):

ETL for Video and Music Data

SycoPhancy in LLMs

Korvus for RAG Pipeline

Exploring New ETL Methods for Unstructured Data: A user mentioned Jerry Liu's discussion about a new type of ETL aimed at making video and music data digestible by LLMs.
- They believe text and PDFs can be handled well but are curious if the community has achieved similar results for other types of unstructured data.
Analyzing SycoPhancy in LLMs: A user shared a LinkedIn article detaling their analysis on the concept of SycoPhancy in LLMs, hoping it will provide insight to the community.
Korvus for Simplified RAG Pipeline: A member is curious whether Korvus truly simplifies the RAG pipeline without sacrificing quality.
- They provided a GitHub link to Korvus, a search SDK built on PostgreSQL that unifies the entire RAG pipeline in a single database query.

Links mentioned:

DSPy ▷ #general (44 messages🔥):

Issues with GPT4o-mini Model

DSPy Tracing Release

TypedPredictors Compatibility

DSPy Paper Release

Optimizers Reliability in DSPy

Issues with GPT4o-mini Model: A member reported that GPT4o-mini was verbose and repeated much of the structure of examples compared to GPT3.5-turbo, which impacted data extraction pipelines.
DSPy Tracing Release Enhances Workflow: New DSPy tracing feature now available, tracks all modules, predictions, LMs, and retrievers efficiently (documentation link).
Challenges with TypedPredictors and Complex Pydantic Classes: A member noted that only GPT-4o and Sonnet-3.5 handle complex pydantic class generation successfully, while other models fail.
DSPy Paper Validates Joint Optimization Approach: New paper release shows alternating between prompt optimization and finetuning delivers up to 26% gains over single methods.
Reliability of DSPy Optimizers Discussed: Members discussed the reliability of DSPy optimizers, noting that BootstrapFewShotWithRandomSearch is a simple, reliable start.

Links mentioned:

tinygrad (George Hotz) ▷ #general (8 messages🔥):

OpenPilot Model Run Analysis

Bitcast Functionality in Tinygrad

Promising Pull Requests

Weekly Tinygrad Meeting

Analyze OpenPilot model run performance: George Hotz shared a trace of a 14.64 ms OpenPilot model run and outlined steps to document kernel changes and their potential slowdowns.
- He emphasized that this task is approachable for anyone with a technical background, but noted that beginners often ask questions without thorough initial thought.
Debating bitcast shape consistency: Tyoc213 raised a question on whether the bitcast function in Tinygrad should align with TensorFlow's bitcast, especially considering shape differences.
- George Hotz and another member agreed that matching TensorFlow/Torch/Numpy makes sense, and Tyoc213 promised to follow up with the needed changes.
Most promising PR gets recognition: George Hotz praised a PR by tyoc213, calling it the most promising he's seen and noting that it included the expected tests.
- Tyoc213 thanked him and mentioned plans to check other frameworks for further alignment.
Tinygrad weekly meeting key points: Chenyuy shared an agenda for the Monday meeting, including updates on tinybox, hcopt speed recovery, and MCTS search improvements.
- Other highlights included better search features, conv backward fusing, fast Llama improvements, and various bounties for kernel and driver enhancements.

Links mentioned:

tinygrad (George Hotz) ▷ #learn-tinygrad (11 messages🔥):

Composing LazyBuffers

Shapetrackers Tutorial

Viability of Tinygrad versus PyTorch

Composing LazyBuffers debated: Discussion about composing lazybuffers and how srcs and a base form a tree, sparking the idea of sequences.
- Members compared it to PyTorch's layout/view system, but noted that Tinygrad's system seems more powerful and shapetracker-dependent.
Shapetrackers Tutorial Implementation Livestream: A member shared a YouTube video on implementing the Shapetrackers tutorial with a focus on view merging.
- The video provided a detailed code walkthrough on a tinygrad fork, guiding viewers through the optimization techniques.
Viability of Tinygrad for PyTorch users questioned: Members debated tinygrad's viability as a PyTorch alternative, with one member considering switching.
- Questions were raised about whether to wait for version 1.0 or proceed with 0.9, reflecting productivity concerns.

Link mentioned: Tinygrad tutorial - shapetrackers and view merging - machine learning optimization: tinygrad fork with code tutorial: https://github.com/Zaffer/tinygrad/tree/tuturial-notebooktinygrad docs: https://docs.tinygrad.org/tinygrad notes: https://m...

OpenInterpreter ▷ #general (4 messages):

Crowdstrike update

Python subinterpreters

Meta Llama 3.1

First day update at Crowdstrike: Vinceflibustier shared a lighthearted update about their first day at Crowdstrike, mentioning they pushed a little update and took the afternoon off. They ended the message with a peace sign emoji ✌️.
Discover Python subinterpreters: A member shared a tutorial on Python subinterpreters with the upcoming release of Python 3.12 and a preview of changes in Python 3.13, emphasizing enhancements in GIL control and parallelism.
- The tutorial provides insights on Python subinterpreters, changes to CPython's global state, and potential enrichments in the succeeding version, suggesting familiarity with Python basics and GIL.
Meta Llama 3.1 repo leak: AlpinDale confirmed that Meta Llama 3.1 includes a 405B model distilled into 8B and 70B, with 128k context and a curious inability to draw unicorns.
- AlpinDale's posts note that the 405B's instruct tuning might be safety aligned and that the repo was accidentally made public before time, retaining the same architecture as Llama 3.

Links mentioned:

OpenInterpreter ▷ #O1 (8 messages🔥):

Deepseek chat v2 6.28

4o mini performance

Apple Watch support

Device shipping updates

Coqui model on MacOS

Deepseek chat v2 6.28 outperforms Deepseek coder: A member mentioned that Deepseek chat v2 6.28 update is incredible, even outperforming Deepseek coder and being cheaper than 4o mini.
4o mini excels at logic, struggles with code: Although 4o mini is great at logic and reasoning, it performs poorly at coding.
Inquiries about Apple Watch support for iOS app: A member inquired whether the iOS app supports the Apple Watch, stating they have a research use case where the 01 could shine if supported.
Updates on device shipping timelines requested: Users are asking for updates on when the devices will ship.
Contributing to the project as a developer: A member asked if there is an opportunity for a capable developer to contribute to the project. The response indicated that help is welcome with many open issues on GitHub.

OpenInterpreter ▷ #ai-content (1 messages):

Augmentoolkit on GitHub

Pinokio Project Launch

Launch of Pinokio's Augmentoolkit on GitHub: Pinokio's new Augmentoolkit has been released on GitHub for public access, featuring tools for enhancing AI applications.
- The project launch was announced on multiple platforms including Discord, GitHub, and Twitter.
Pinokio Project Gathers Momentum: The Pinokio project is gaining attention on social media and developer forums.
- Click here for more details on Twitter and join the discussion on Discord.

Link mentioned: Pinokio: AI Browser

LLM Finetuning (Hamel + Dan) ▷ #general (4 messages):

Finetuning with GPT models

Issues with OpenAI credits

Why finetuning with GPT models isn't common: Costs and vendor lock-in are main reasons why GPT models are rarely fine-tuned. It involves using expensive API calls and dependency on specific company's infrastructure.
Problems receiving OpenAI credits: Members have reported issues with not receiving their promised OpenAI credits. One member shared organization ID org-EX3LDPMB5MSmidg3TrlPfirU and stated they had filled out forms multiple times.

LLM Finetuning (Hamel + Dan) ▷ #jarvis-labs (1 messages):

vishnu9158: Nope

LLM Finetuning (Hamel + Dan) ▷ #east-coast-usa (1 messages):

karmapa: Yes how about late august meetup in NY?

LLM Finetuning (Hamel + Dan) ▷ #openpipe (1 messages):

Openpipe with other providers

Integration of Replicate models

Modal model compatibility

Openpipe with Replicate or Modal models: A member inquired about using Openpipe with providers other than OpenAI or Anthropic, such as models hosted on Replicate or Modal with OpenAI compatible APIs.
- Anyone have insights?
Integration of Replicate models into Openpipe: Discussion centered around integrating models hosted on Replicate into Openpipe, ensuring API compatibility.
- The main concern was the ease of adding these models while maintaining compatibility with the existing system.

LLM Finetuning (Hamel + Dan) ▷ #openai (1 messages):

Issues with credit allocation

Course forms

Credit allocation issue for course participants: A member reported not receiving their credits despite filling out the necessary forms for their organization, org-EX3LDPMB5MSmidg3TrlPfirU.
- They mentioned having filled the forms at the start of the course and again on the reporting day.
Repeated form submissions for course credits: The same member reiterated filling in the forms multiple times to ensure the receipt of credits.
- There seems to be an issue with credit allocation despite compliance with the form submission process.

LLM Perf Enthusiasts AI ▷ #general (3 messages):

OpenAI Scale Tier

TPS Calculation

GPT-4-o throughput

Confusion about new OpenAI Scale Tier: A user expressed confusion about the new OpenAI Scale Tier, asking if anyone understood it.
- They seemed particularly puzzled by the specific calculations involved in the throughput per second (TPS) for different models.
Unclear calculation for TPS in Pay-As-You-Go Tier: A user questioned OpenAI's calculation of 19 tokens per second (TPS) on the pay-as-you-go tier, comparing it to GPT-4-o's throughput of around 80 tokens per second.

LLM Perf Enthusiasts AI ▷ #jobs (1 messages):

Websim platform

Founding AI Engineer role

AI-assisted software creation

Non-deterministic programs

Human-AI system

Websim makes software creation malleable: Websim aims to build the world's most malleable software creation platform where everyone can solve their own problems and realize their dreams.
Join Websim as a Founding AI Engineer: Websim is looking for a Founding AI Engineer to establish foundations for quickly iterating on non-deterministic programs aimed at automated product development.

Link mentioned: websim.ai: no description found

Alignment Lab AI ▷ #general-chat (2 messages):

Building with LLMs

BUD-E Voice Assistant

Lessons from a Year of Building with LLMs: A user shared a video and blog post summarizing a three-part series on lessons learned by practitioners building with LLMs for a year.
- The summary highlights tactical, operational, and strategic insights, with a recommendation to consume the content via video for better understanding.
BUD-E Voice Assistant Demo and Collaboration Invite: A user shared a YouTube video showcasing a demo of the open-source BUD-E voice assistant, inviting others to join their new Discord server for collaboration.
- Daily online hackathons will begin at 9 PM CEST to onboard new volunteers and coordinate project work.

Links mentioned:

AI Stack Devs (Yoko Li) ▷ #team-up (1 messages):

ari991963: Hi all, I am Aria a 2D/3D artist, if you are interested to collaborate dm

MLOps @Chipro ▷ #general-ml (1 messages):

Target Audience Clarification

Communication Strategy

Clarifying the Target Audience: A member asked about the target audience and the primary intent behind the communication strategy.
- The discussion highlighted different approaches for engineers, aspiring engineers, devrels, and solution architects when discussing products.
Communication Strategies for Different Roles: Different strategies were discussed for communicating with engineers, devrels, solution architects, and aspiring engineers.
- Each role may require tailored messages to effectively convey product features and benefits.

DiscoResearch ▷ #general (1 messages):

1 year of building with LLMs

TLDR series on LLMs

Lessons from LLM practitioners

TLDR Series on Lessons from Building with LLMs: Lessons from 1 Year of Building with LLMs summarize a three-part series detailing tactical, operational, and strategic lessons learned.
- Six practitioners published the series, which is recommended for those serious about LLMs.
Visual TLDR Video on LLM Learnings: A visual TLDR video accompanies the blog post to make the lessons more accessible.
- The author suggests watching the video for a better understanding of the visuals discussed.

Link mentioned: TLDR: 1 year of building with LLMs – D-Squared: no description found

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}