AI News for 10/25/2024-10/28/2024. We checked 7 subreddits, 433 Twitters and 32 Discords (230 channels, and 5833 messages) for you. Estimated reading time saved (at 200wpm): 601 minutes. You can now tag @smol_ai for AINews discussions!

Congrats to Moondream (a 1.6b vision language model) on their seed funding. With Moonshine (27-61m ASR model) also getting some buzz, there seems to be a little pattern with moon-themed tiny models.

https://youtu.be/T7sxvrJLJ14

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Research and Development

Advanced Language Models and Techniques: @AmandaAskell posed a sincere question on how pattern recognition in LLMs differs from intelligence. @cwolferesearch discussed using RL to optimize prompts for LLMs, highlighting challenges with discrete token optimization. @ophilschmid introduced NotebookLlama, an open-source version of NotebookLM, utilizing various LLaMA models for tasks like text-to-speech.
Model Optimization and Efficiency: @Philschmid shared an example related to NotebookLlama. @StasBekman highlighted async-TP implementation in PyTorch for Tensor Parallelism, improving computation efficiency. @francoisfleuret discussed model hyperparameters, specifically d_model and n_heads in LLMs.
Multi-Modal Machine Learning: @mervenoyann showcased Mini-Omni 2, a model that understands image, audio, and text inputs for voice conversations. @Reach_vb detailed the technical overview of Mini-Omni 2, emphasizing modal alignment and multimodal fine-tuning.

AI Applications and Tools

AI Productivity Tools: @dzhng promoted an AI email writer designed for efficiency without AI email slop, enabling well-researched email sequences. @AravSrinivas introduced a knowledge assistant video series utilizing LlamaCloud for building AI research assistants.
AI-Enhanced Software Development: @sama emphasized the importance of practical practice over complex prerequisite plans for skill development. @Lateinteraction discussed using DSPy optimizers to teach Llama3-8B for privacy-conscious AI tool usage.
Generative AI Tools: @DeepLearningAI highlighted the impact of #AIPythonforBeginners in automating tasks and integrating LLMs. @LangChainAI shared resources for GenAI Agents development, focusing on agent architectures in LangGraph.

AI Business and Startups

Startup Execution and AI Integration: @AndrewYNg discussed the importance of speedy execution in AI-powered product development, outlining feedback loop strategies to enhance market fit. @Bindureddy predicted the evolution of new job roles in the post-AI era, such as AI agent supervisors and fallback humans.
AI in Software Industry: @Jerryjliu0 emphasized the challenges in enterprise-grade text-to-SQL and the necessity for advanced retrieval methods. @LangChainAI provided tutorials on optimizing RAG applications using LangChain and MongoDB.

Software Engineering and ML Engineering

Software Development Practices: @scottastevenson critiqued the evolution of software engineering, highlighting issues like the lack of distinction between designing and building software, and the detailed orientation required in software design compared to traditional engineering disciplines.
Machine Learning Engineering: @LangChainAI discussed the use of LangGraph.js in building applications with small, local LMs, promoting the benefits of open-source models.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Small LLMs with RAG: Surprising Capabilities of 1B-3B Models

The glm-4-voice-9b is now runnable on 12GB GPUs (Score: 109, Comments: 24): The glm-4-voice-9b model is now capable of running on 12GB GPUs, enabling more efficient inference. This development allows for broader accessibility and use of the model, potentially expanding its applications in voice-related AI tasks on more modest hardware configurations.
- Users tested the glm-4-voice-9b model on RTX 3060 12GB GPUs, reporting it's functional but not smooth for real-time conversations. Some experienced 30-60 second delays and noise generation issues on Runpod.
- Discussion on AI voice assistants' future development, with predictions ranging from 3 years to as short as 6-12 months for achieving capabilities comparable to current ChatGPT voice. Moshi was mentioned as a potential leader in this space.
- The prompt "cry about your lost cat" sparked amusement, highlighting the model's diverse and sometimes unexpected use cases.
I tested what small LLMs (1B/3B) can actually do with local RAG - Here's what I learned (Score: 542, Comments: 67): Llama3.2 3B was tested for local RAG on a MacBook Pro M1 Pro, using a setup including Nomic's embedding model, Langchain RAG workflow, and Chroma DB. The system performed well for basic Q&A on Nvidia's Q2 2025 financial report, with PDF loading under 2 seconds and simple info retrieval slightly faster than Claude 3.5 Sonnet. The author experimented with LoRA for specialized tasks like generating charts, using an Octopus_v2 action model as a task router, demonstrating potential for a small base model with task-specific "plugins" for local RAG systems.
- Users discussed potential applications for local RAG systems, including a board game rules finder and an educational tool for children. The latter uses Gemma2 27B for explaining concepts and generating questions about school subjects.
- The concept of a small base model with swappable LoRAs was compared to Apple's AI approach for on-device intelligence. A blog post was shared discussing Apple's implementation.
- Discussion around embedding models clarified that 137M parameters is not small for this type of model. The Hugging Face MTEB leaderboard was referenced, showing top models using >1GB of memory.

Theme 2. Multimodal Models: Llama 3.2 Vision and Pixtral Advancements

Has anyone realized that ollama has launched llama3.2-vision beta? (Score: 76, Comments: 35): Ollama has released a beta version of Llama 3.2 Vision, a multimodal model capable of processing both text and images. This new model requires Ollama 0.4.0, which is currently available as a pre-release, and can be accessed at ollama.com/x/llama3.2-vision.
- Users expressed interest in other models like Qwen-VL and Minicpm 2.6, with some noting Qwen-VL's superior performance and Minicpm's earlier release and compatibility with Ollama.
- Concerns were raised about Ollama potentially hijacking efforts from the llama.cpp ecosystem by working directly with Meta. However, it was clarified that the implementation was done by Ollama contributors and the code remains open-source.
- The Llama 3.2 Vision model in Ollama can process images but cannot generate them. Users discussed its performance, with some finding it satisfactory while others await models like Pixtral.
Pixtral is amazing. (Score: 167, Comments: 40): Pixtral demonstrates impressive performance in both image analysis and text-to-text tasks, outperforming MiniCPM-V-2.6 and Llama3.2 11B Vision in the poster's tests. The model combines strong visual understanding comparable to MiniCPM with excellent text generation capabilities, while maintaining minimal censorship, making it a versatile choice for multimodal AI tasks.
- Qwen2-VL and Molmo-7B are recommended alternatives to Pixtral, with Molmo offering a 7B MoE variant with 1B active params. Users report minimal censorship issues and strong performance, though Pixtral excels in brainstorming/storytelling.
- Pixtral can be run locally using vllm with specific settings for 12GB VRAM and 64GB RAM. The model performs well for personal work, delivering quick descriptions of images at 1.1 tokens/s on limited hardware.
- Llama 3.2 70b vision is reported as SOTA for OCR, with MiniCPM 2.6v as a close second. Pixtral functions similarly to ChatGPT's image analysis capabilities, allowing users to ask questions about given images.

Theme 3. Battle of Inference Engines: Llama.cpp vs MLC LLM vs vLLM

Battle of the Inference Engines. Llama.cpp vs MLC LLM vs vLLM. Tests for both Single RTX 3090 and 4 RTX 3090's. (Score: 86, Comments: 41): The post compares the performance of Llama.cpp, MLC LLM, and vLLM inference engines on both single and multi-GPU setups using RTX 3090 graphics cards. Tests were conducted using various configurations, including a single RTX 3090 and a setup with four RTX 3090s, to evaluate the efficiency and speed of these inference engines for large language models.
- MLC LLM performance impressed users, with its speed and quantization options including q0f16, q3f16_0, q4f16_0, and others. Some noted it excels in short context scenarios (150t/s on 3090 with Qwen 7b q4f16) but slows down with long contexts (>8k).
- Users discussed batch inference capabilities, with one reporting processing 81M input tokens and 5.5M output tokens per hour on a single RTX 3090 Ti using vLLM with W8A8 INT8 models and flashInfer engine in eager mode.
- Suggestions for future benchmarks included testing exllama v2, comparing PCIe bandwidth requirements, and evaluating performance with NVLINK between GPUs. The MMLU Pro test was recommended for batch inference comparisons.

Theme 4. Meta's Open-Source NotebookLM: Enhancing Document Interaction

Meta releases an open version of Google's NotebookLM (Score: 76, Comments: 7): Meta has released Llama-Coder, an open-source version of Google's NotebookLM, designed to assist with code generation and analysis. The tool, built on Meta's Llama 2 language model, offers features similar to NotebookLM including contextual understanding of code and the ability to generate, explain, and debug code across multiple programming languages.

Theme 5. Top Coding Models: Qwen 2.5 32B and Alternatives Under 70B

Is there anything that beats Mistral-Nemo 12b in coding that's still smaller than a Llama 3.1 70b quant? (Score: 30, Comments: 26): The post titled "Is there anything that beats Mistral-Nemo 12b in coding that's still smaller than a Llama 3.1 70b quant?" discusses the performance of various language models in coding tasks. Qwen 2.5 32B is mentioned as outperforming larger models in coding benchmarks, potentially positioning it as a strong contender in the space between Mistral-Nemo 12B and Llama 3.1 70B.
- Qwen 2.5 32B outperforms Llama 3 70B and approaches Llama 3.1 70B in coding benchmarks, scoring 54.1% on the Aider benchmark. It's recommended for memory-limited setups, allowing Q8 quantization instead of Q3/Q4 for 70B models.
- Several smaller models are suggested as alternatives to Mistral-Nemo 12B, including Qwen Coder 2.5 7B, Yi Coder 9B, and Codestral. An upcoming 32B version of Qwen Coder is mentioned as potentially the largest code-specific model in this size range.
- Users recommend testing models like Qwen2.5 14B, Mistral Small, and Gemma 2 27B for specific coding use cases, as they may perform differently than benchmark results suggest. DeepSeek-Coder-V2 236B is noted as a much larger code-specific model.

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Research and Techniques

Google DeepMind advances multimodal learning: A paper from Google DeepMind demonstrates how data curation via joint example selection can accelerate multimodal learning. (/r/MachineLearning)
Microsoft's MInference speeds up long-context inference: Microsoft's MInference technique enables inference of up to millions of tokens for long-context tasks while maintaining accuracy. (/r/MachineLearning)
Scaling synthetic data creation: A paper on scaling synthetic data creation leverages 1 billion web-curated personas to generate diverse training data. (/r/MachineLearning)

AI Model Releases and Improvements

Salesforce releases xLAM-1b model: Salesforce's 1 billion parameter xLAM-1b model achieves 70% accuracy in function calling, surpassing GPT 3.5. (/r/LocalLLaMA)
Updated Phi-3 Mini with function calling: Rubra AI released an updated Phi-3 Mini model with function calling capabilities, competitive with Mistral-7b v3. (/r/LocalLLaMA)
IC-Light V2 demo released: A demo for IC-Light V2, based on Flux models, was released on Hugging Face. The weights are not yet released and the model will be non-commercial. (/r/StableDiffusion)

AI Training and Fine-tuning Techniques

Detailed SDXL fine-tuning process shared: A developer shared extensive details on fine-tuning SDXL for 40M samples, including dataset preparation, quality modeling, captioning, and training specifics. (/r/StableDiffusion)

AI Ethics and Societal Impact

Debate over AI training data ethics: Discussions arose around the ethics of using copyrighted material to train AI models, with some arguing for fair use and others concerned about potential impacts on content creators. (/r/singularity)
James Cameron expresses concerns about AGI: Filmmaker James Cameron voiced concerns about AGI leading to superintelligence and potential conflicts, sparking debate about AI safety and anthropomorphization of AI. (/r/singularity)

AI Applications and Demonstrations

Neuralink competitor's eye implant restores vision: A Neuralink competitor reported that its experimental 2mm eye implant placed under the retina restored vision in blind people during a clinical trial. (/r/singularity)
AI demonstrates Minecraft building capabilities: Sonnet 3.6, an AI model, demonstrated the ability to build complex structures in Minecraft without specific training, showcasing emergent capabilities. (/r/singularity)
AI-generated music for videos: A new AI system called MuVi can generate music that matches video visuals by analyzing important features and using rhythmic synchronization. (/r/singularity)

AI Development and Policy

US National Security Advisor urges AI acceleration: Jake Sullivan, the US National Security Advisor, called for accelerated AI development and deployment to maintain the US lead, citing concerns about other countries' AI development. (/r/singularity)
Google's Project Jarvis leak: A leak about Google's Project Jarvis highlighted potential advancements in Gemini 2.0, suggesting significant improvements in AI capabilities. (/r/singularity)

AI Discord Recap

A summary of Summaries of Summaries by O1-mini

Theme 1: Model Breakthroughs and Woes

Llama and Phi Push Performance Boundaries: Llama-3.1-405B and Phi-3.5 models showcase impressive advancements in tasks like automated penetration testing and image generation. While Llama benefits from reinforcement learning, Phi-3.5 grapples with overzealous censorship, limiting its practical applications.
Stable Diffusion's Rollercoaster Ride: Stable Diffusion 3.5 sparks mixed reactions, debating its speed versus quality compared to version 1.5. Galleries featuring 120 artists and 140 styles highlight its expanded artistic capabilities.
Dualformer and Grok 2 Take on Multimodality: Dualformer integrates both fast (System 1) and slow (System 2) reasoning, enhancing transformer efficiency. Meanwhile, Grok 2 introduces multimodal understanding, enabling it to process images alongside text, broadening its application scope.

Theme 2: Tool Tango - Building, Troubleshooting, and Integrations

LM Studio Juggles Multiple GPUs Like a Pro: LM Studio now effectively recognizes and utilizes multiple GPUs, optimizing performance for high-demand projects. Engineers highlight the need for specific configuration tweaks to maximize computational efficiency.
OpenRouter Connectivity Chaos Continues: Persistent Cloudflare errors plague OpenRouter, causing disruptions despite status indicators showing normal operations. Users explore workarounds like switching browsers or locations to regain stable connectivity.
Tinygrad's Quest for Complex Numbers: Tinygrad faces challenges in complex number integration, crucial for tasks like Discrete Fourier Transforms. Community-driven emulation strategies and contributions aim to enhance this functionality.

Theme 3: Collaborative Constellations - Meetups, Study Groups, and Shared Projects

Toronto's Tech Titans Set to Meetup: A planned Toronto meetup on NVIDIA GPUs and CUDA programming is generating buzz, with the first session scheduled for November 15. Organizers invite speakers and collaborators to join the AI knowledge exchange.
Study Groups Spark Sync in LLM Agents MOOC: Enthusiastic learners propose forming virtual study groups to delve into LLM Agent lectures, fostering peer collaboration and collective learning to tackle complex course materials effectively.
AdaletGPT - The Turkish Legal AI Assistant: Introduction of AdaletGPT, a Turkish legal chatbot, showcases the community's drive towards domain-specific AI tools. Built using RAG frameworks, it invites collaborative inputs and open-source contributions for legal assistance.

Theme 4: Privacy and Policy - Navigating AI with Ethics in Mind

Phi-3.5 Gets Canceled for Censorship: The Phi-3.5 model faces backlash for its overly censored responses, hindering its effectiveness in technical and coding tasks. Debates emerge on the appropriate level of AI censorship and its impact on usability in professional settings.
Meta Forge Its Own AI Search Engine: In an effort to reduce dependency on Google and Bing, Meta is developing a proprietary search engine, reflecting a broader industry trend towards AI-driven information retrieval and data sovereignty.
Apple's Million-Dollar AI Security Bounty: Apple announces a $1M bounty for successfully hacking their AI servers, underscoring the critical importance of AI security and proactive vulnerability identification in AI deployment.

Theme 5: Deployment Dilemmas - Configurations, GPU Setups, and Performance Tuning

CUDA Configuration Fiasco Solved: After multiple reinstallations and upgrades, engineers resolve their CUDA compatibility issues with GPT-NeoX, stabilizing their training environments for robust model deployment.
Gorilla's Function Call Leaderboard Clarified: Clarification on 'multiple' functionality in Gorilla's leaderboard highlights its role in evaluating multi-step reasoning and function selection in LLMs, enhancing transparency in performance assessments.
Torchtune Tweaks Tune Performance: Recent LoRA bug fixes and config flag proposals in Torchtune streamline model fine-tuning, improving single and multi-device training setups and ensuring more consistent performance across various LLM configurations.

PART 1: High level Discord summaries

HuggingFace Discord

Hugging Face Spaces Explored: Users discussed various models available on Hugging Face Spaces, focusing on those for image generation and model quantization, while sharing useful links to specific models.
- Recommendations included using lighter models like Llama and Flux for local projects, emphasizing their unique functionalities.
New Benchmark for Automated Penetration Testing: A recent paper introduced a benchmark for LLM-based automated penetration testing, showcasing models like GPT-4o and Llama 3.1-405B with the PentestGPT tool.
- While Llama 3.1 holds an edge, both models struggle in penetration testing, prompting discussions on improvements through reinforcement learning.
Stable Diffusion 3.5 Galleries...: A user showcased galleries demonstrating how Stable Diffusion 3.5 interprets artistic styles, featuring over 120 artists and 140 styles.
- Both galleries are accessible via Artists Gallery and Styles Gallery, detailing used prompts.
AI Development Tools Discussed: A member inquired about offline AI development tools due to company internet restrictions, receiving suggestions for using portable virtual machines or Docker.
- Community members warned that exporting and importing environments could be cumbersome.
Bionic Reading Hub Repository Launched: A GitHub project titled Bionic Reading Hub was shared, allowing PDFs to be transformed into a Bionic Reading format for enhanced readability.
- This tool could aid in processing complex materials, especially beneficial for users in cybersecurity fields.

Notebook LM Discord Discord

NotebookLM Daily Limits Cause Frustration: Users express frustration over newly imposed daily limits on Audio Overview generations, speculating on potential subscription models in the future.
- Frustrations arose as many felt blindsided by the lack of communication regarding these limits, emphasizing the need for transparency from Google.
Join UXR Team's Insightful Study: The UXR team is hosting remote 1:1 interviews from October 31 to November 6 to gather participant feedback on upcoming developments, offering a $75 thank you reward.
- Only 6 slots are available for this research, pressing interested participants to complete the eligibility questionnaire.
Introducing PodCraft: Personalized Podcasts: A user proposed an app called PodCraft that delivers personalized podcast content, eliminating the need to sift through numerous episodes.
- The app aims to provide instant access to content in the voice of favorite creators, catering to frustrated listeners struggling to find relevant insights.
Successful Integration of HeyGen Avatars: One user shared a project that enhances HeyGen avatars to behave more realistically in a Halloween special video.
- Excitement was expressed regarding the capabilities of AI-generated content and the notable improvements achieved.
Opinions on Open Source AI Models: Users favored various open-source AI image generation tools, particularly those with less restrictive usage guidelines.
- Discontent was noted towards Google's Imagen, with many expressing that better models exist without limitations on usage.

Unsloth AI (Daniel Han) Discord

Unsloth Performance sees improvements amidst bugs: Recent upgrades to Unsloth have led to enhanced speed, achieving better processing despite reports of crashes due to indexable dataset assumptions in the unsloth-zoo, as noted in community feedback.
- Users have engaged actively on GitHub trying to solve model quantization issues, reflecting a community-driven troubleshooting approach.
Multimodal models integration discussed: Discussion highlighted complexities in merging vision and language models where adapters play a key role, leading to users considering potential solutions to enhance compatibility.
- GLM-4 emerged as a robust example supporting both audio and textual inputs, stirring interest in audio adapters for better multimodal interactions.
Gradient Accumulation impacts training workflows: Members shared experiences surrounding gradient accumulation improvements post-fix, emphasizing training efficiency but noting challenges related to batch sizes and memory management.
- Feedback indicated a learning curve for users adapting to the latest gradient accumulation capabilities within Unsloth.
AI Video Generation with 3D Models: A member proposed the development of AI video generators using 3D models, incorporating features like camera controls and consistent environments, potentially leveraging Unreal Engine physics.
- This sparked inquiries about existing projects merging AI with video generation, hinting at a collaborative community interest.
Introducing Dualformer for reasoning efficiency: The Dualformer model proposes a new approach integrating both fast (System 1) and slow (System 2) reasoning for improved transformer efficiency, outpacing predecessors like Searchformer.
- Connecting cognitive systems theory with AI models, it unveils performance advancements in complex reasoning tasks such as maze navigation and mathematics.

LM Studio Discord

LM Studio Runs with Multiple GPUs: Users confirmed that LM Studio effectively recognizes and utilizes multiple GPUs, with one member highlighting their successful setup of two RTX 3060 Ti cards.
- However, specific configurations are necessary to optimize performance across both GPUs.
NPU Faces Functionality Shortcomings: A user expressed disappointment in their NPU due to its lack of software support for AI tasks compared to standard PC setups.
- Discussion included speculations on how Intel might improve NPU capabilities in partnership with Microsoft for better AI performance.
Apple M3 and Future M4 Performance Skepticism: The conversation about Apple's M4 emphasized concerns over memory limitations in new Mac models, leading to doubts about their ability to handle large AI models efficiently.
- Participants criticized the high costs involved in upgrading RAM, seeing it as a major deterrent for serious workloads.
Cost-Effective AI System Building Tips: Members highlighted the affordability of building custom systems with higher RAM capacities compared to purchasing Apple hardware that lacks sufficient resources.
- The consensus is that constructing a powerful AI-capable machine remains a more budget-friendly option.
Mixed GPU Setup Performance Questions: Concerns arose regarding mixed setups of RTX 3090 and 4090 GPUs, with users debating whether to sell off the more powerful card for compatibility reasons.
- The emphasis was on optimizing rigs for handling large models, prioritizing compatibility over inference speed.

OpenRouter (Alex Atallah) Discord

Inflection Returns Online: Inflection has resolved its recent billing issue, restoring access for users eager to utilize latest features Inflection 3 Pi and Inflection 3 Productivity.
- Along with the billing fix, Inflection clarified its offerings aimed at improving user productivity enhancements.
OpenRouter Connectivity Chaos Continues: Users are facing ongoing connectivity issues with OpenRouter, reporting Cloudflare errors like 520 and 524 despite everything appearing operational on the status page.
- Some users suspect the issues are more severe for those in Europe and have suggested testing with various browsers as a workaround.
Sonnet Model's Troubling Response Quality: Many users pointed out a noticeable drop in the response quality of the Sonnet model, now generating more generic follow-up questions than before.
- This decline seems linked to adjustments made after restricting the free version, prompting users to express frustration over decreased model interactivity.
Grok 2 Brings Multimodal Understanding**: The community buzzed about the announcement of Grok 2, which now features the ability to process images and text together, expanding its potential applications.
- Users are excited to explore how these multimodal capabilities compares with existing models in the marketplace.
Demand for Integration Access Grows: A chorus of users is actively seeking access to integrations, indicating a strong community interest in this capability.
- Polite requests for integration permissions reveal an engaged user base eager for feature expansion, with consistent messaging thanking community members for potential help.

Latent Space Discord

Whisper vs Moonshine in ASR: Participants analyzed how Whisper stacks up against new ASR technologies like Moonshine, which boasts enhanced performance with lower computational costs on edge devices.
- While Moonshine outshines Whisper's smaller models, critics argue that Whisper's larger models maintain a performance upper hand.
Apple's Big Step in Homomorphic Encryption: Apple's announcement on homomorphic encryption marks a notable innovation, allowing private data use in AI without sacrificing confidentiality, akin to the HTTPS moment for AI.
- Experts discussed potential implementations, like data retrieval without exposing private info, though speed issues for inference remain a concern.
Moondream Secures $4.5M Funding: Moondream confirmed a successful funding round, raising $4.5 million to test the effectiveness of smaller AI models in competitive landscapes.
- This funding has ignited a debate regarding the capability limitations of smaller models in overcoming prevalent industry hurdles.
Cursor Pro Tips for Better Coding: Members shared Cursor Pro Tips, highlighting shortcuts like ctrl+k for localized edits, enhancing coding workflows significantly.
- There’s interest in follow-up sessions to dive deeper into these tips, as only a slice of potential practices were explored.
Audio Concerns in Discord: Users reported audio issues during meetings, which hindered their ability to stay engaged and track discussions effectively.
- Concerns emerged regarding Discord's server performance, suggesting it might be a factor in the ongoing audio problems.

Perplexity AI Discord

Perplexity Curators Program Launch: The Perplexity Team announced the Curators Program aimed at creating engaging content for the Discover Feed. Interested individuals can apply or tag friends here to be part of this initiative.
- The program invites users who enjoy creating Pinterest boards, editing Wikipedia pages, and diving into YouTube video essays to inspire a global audience.
Mixed Reviews on MacOS App Usability: Users reported issues with the Perplexity MacOS app, mentioning crashes and problematic pop-ups which affect performance. Some highlighted limitations on copy-pasting images compared to the web version.
- Frustrations grow over the lack of adequate feedback options, indicating an urgent need for improvements in usability.
New Features Spark Debate: The introduction of shopping features in Perplexity raised mixed feelings among users, calling for these features to be more compartmentalized. Speculation continues about the strategic implications of these developments.
- Users express eagerness to see how these features will affect their daily interactions with the platform.
Anticipation for Next-Gen AI Models: Chatter indicates the possible release of GPT-5 by December 2024, with competitive dynamics evolving among AI developers. Meta's move to create its own search engine adds to this competitive landscape.
- Users are curious about how advancements will shape functionality in the coming months.
Clarifications on Perplexity API Access: Members discussed how to get sources for API results, linking to a Discord message for help. Users are seeking to replicate results similar to those found in the standard Perplexity chat.
- Concerns were raised about the citations closed beta access, with one user stressing the need for better communication regarding the status of their request.

Nous Research AI Discord

Ultrasonic Device Sales Surge: A developer's ultrasonic device for repelling mice saw sales jump from 15% to 28% in its target demographic, driven by logistic growth curve analysis.
- Confusion regarding A and B values in the logistic model was clarified by redefining time variables appropriately.
AI Distillation Techniques Spark Debate: The conversation around distillation highlighted Arcee's Llama-3.1 model, efficiently training smaller models using logits from larger frameworks.
- Concerns arose about the insufficient technical documentation from Meta, prompting deeper discussions on their training methodologies.
Hermes 3 Dataset Remains Closed Source: Members confirmed that the Hermes 3 SFT dataset is not open source, in contrast to its predecessors Hermes 1 and 2.
- Nonetheless, a link to the OpenHermes-2.5 dataset was provided for resources.
Thought Preference Optimization Boosts Performance: The paper 'Thinking LLMs' suggests that Thought Preference Optimization (TPO) can enhance instruction-following in LLMs, yielding a 4% performance increase.
- TPO's implementation on the Llama 3 8B Instruct model revealed that improper prompts might diminish performance.
Apple Rolls Out Ferret-UI for iOS Integration: Apple introduced Ferret-UI, a multimodal LLM designed for optimized usage on iPhone/iOS, enhancing user experience when integrated with Hugging Face transformers.
- Ferret-UI showcases impressive capabilities in mobile UI understanding, surpassing even GPT-4V in icon recognition and text location.

Eleuther Discord

Training LLMs on Limited Resources: Members discussed the challenges of training large language models like LLaMA-2 on limited GPU resources, noting extensive hardware requirements for effective reproduction.
- Deploying nanoGPT was suggested as a lightweight model for newcomers looking for easier training.
Contributions to Open Source AI Projects Gain Traction: A user expressed interest in participating in EleutherAI's projects despite mainly proprietary experience, highlighting open-source contributions as valuable learning experiences.
- Responses emphasized that smaller projects can offer significant insights and facilitate transition opportunities for software engineers.
Stick-Breaking Attention Mechanism Design Discussion: A novel stick-breaking attention mechanism offers improvements to Transformer models by addressing positional embedding and softmax limitations, as described in a recent arXiv paper.
- Community feedback underlined the need for clearer introductions to such mechanisms, with mentions of related projects like IBM's ModuleFormer.
Python 3.10 Compatibility Issues Unraveled: Setting up GPT-NeoX with Python 3.10 requires overriding the Torch version to 1.11.0 to resolve import failures, with users documenting the installation fixes in a specific Colab notebook.
- Warnings mentioned around compatibility issues with Torch versions, indicating that torch 2.4 causes failures while 2.3 might be viable.
Challenges with Distributed GPU Training Explored: Concerns about networking difficulties arise when sharing consumer GPUs for GPT-NeoX training, leading users to recommend checking out INTELLECT-1 for decentralized efforts.
- A shared link to ongoing work by PrimeIntellect highlighted an initiative for contributing compute resources.

Stability.ai (Stable Diffusion) Discord

Stable Diffusion 3.5 Draws Mixed Reactions: Users reported mixed experiences with Stable Diffusion 3.5, questioning its speed and quality compared to 1.5. It was suggested to run the same prompt across different models to effectively compare outcomes.
- Some members have shared this guide aimed at maximizing the performance of the new model.
Deploying Juggernaut on Runpod Gets Attention: A user explored deploying a custom model named Juggernaut on Runpod and noted the absence of Forge templates. Others highlighted that using Auto1111 could provide a more user-friendly approach.
- This discussion pointed towards the need for clearer resources for custom model deployment.
AMD GPUs Show Promise for Local Generation: The community discussed local generation capabilities using AMD GPUs, encouraging adherence to pinned guides for optimal performance. Users shared insights on VRAM limitations and model testings, specifically noting Gemma 2.
- Much emphasis was placed on experimenting with various models to find the best fit for AMD setups.
Sketch to Render Workflow in Architectural Design: Interest grew around utilizing Stable Diffusion for a 'sketch to render' process tailored for architectural design. Members recommended leveraging tools like ControlNet to enhance detail and accuracy.
- This approach aims at improving transformations from simple sketches to high-fidelity renders.
Discord Bot for Flux Inpainting Developments: Developers brainstormed creating a Discord bot to facilitate inpainting in Flux, noting the limited availability of models for this use case. One participant showed eagerness to implement functional inpainting features for community tools.
- This conversation reflects the growing interest in integrating advanced image manipulation directly into community platforms.

aider (Paul Gauthier) Discord

Aider and PearAI Feature Face-off: Members highlighted the overlap between PearAI and Aider, especially in their integration capabilities with open-source tools, raising ethical concerns over feature replication.
- They referenced the Open Source Pledge, emphasizing the need for tech firms to contribute more to open-source development.
Claude 1022 Drives Productivity: A user reported a productive experience using Claude 1022 alongside Aider for a Flutter application, costing $18 in credits for 4300 lines of code generated.
- They noted spending 15 hours on the project, showcasing significant productivity gains through effective prompting.
Troubleshooting Nvidia Nemotron Setup: Users faced challenges configuring Nvidia Nemotron with Aider, specifically around custom model metadata settings and exec commands.
- One member encouraged overlooking model warnings during connection and suggested reviewing the troubleshooting guide for guidance.
Benchmarking Sonnet 3.5: Request for Files: Users expressed the need for benchmark data files for Sonnet 3.5, especially concerning code edits and refactoring to assist in avoiding costly tests.
- One specific request was made for the .aider.chat.history.md and .aider.results.json files for empirical evaluation.
Privacy Issues with Local Models in Aider: Concerns arose about data privacy when using local models in Aider, particularly regarding the handling of sensitive information.
- Users were reassured that Aider maintains privacy by not storing user data when using local models.

Modular (Mojo 🔥) Discord

Mojo API Documentation Needs Examples: A discussion highlighted the lack of examples for Collections in the Mojo API documentation, leading to suggestions to contribute to the docs via GitHub.
- Members emphasized the importance of community engagement and preparing pull requests as a step towards improving documentation.
Mojo vs C++ for Learning: A user contemplating learning Mojo or C++ received advice that Mojo, being a modern systems language, might be better suited for their explorations, particularly in ML and data science.
- Community members shared insights on language choices suggesting a focus on Rust or building libraries in Mojo.
Mutable Tensors Set to Enhance Training Objects: Current nightly builds are introducing mutable tensors, enabling the representation of training objects such as trained weights and KVCaches.
- This feature is still under development from an API perspective but is expected to be included in the next release.

GPU MODE Discord

High Performance Mixed Precision Computing Ready: An upcoming talk on high performance mixed precision computing is generating excitement within the community, scheduled for shortly.
- Members are reacting positively, indicating strong interest in performance optimization strategies.
Challenges with H100 and CUDA Profiling: Users discussed 'Command Buffer Full' errors encountered on H100 during CUDA profiling, an issue not seen on A100.
- Members are seeking advice on dealing with CUDA limitations and whether to explore alternative channels for solutions.
FLUX and the LLM.int8 Refactor: Insights emerged regarding Sayak's Twitter findings pointing to improved performance in FLUX, raising intrigue around the LLM.int8 refactor.
- Collaboration discussions centered around refining models and unlocking better functionality.
Toronto Meetup Focused on NVIDIA GPUs: Plans for a Toronto meetup on NVIDIA GPUs and CUDA programming are in the works, with the first session slated for November 15.
- Organizers are calling for speakers to contribute to the event aimed at enhancing collaboration among AI professionals.
Resolving Compounding CUDA Issues: A user shared their tumultuous experience troubleshooting CUDA installation issues which stemmed from a recent Ubuntu update, confirming success after upgrading to CUDA 12.4.
- The chaos led to humorous reflections, emphasizing the typical hurdles developers face when setting up robust environments.

Cohere Discord

Connector Queries Stumble in Cohere: Users faced issues retrieving data via the Cohere connector, receiving messages like 'I was unable to find any information' when querying specific user IDs.
- It's recommended to reach out to [email protected] for assistance regarding these problems.
Lagging in the Playground: Discussion highlighted persistent lag issues within the Cohere playground, especially after multiple messages, which hindered user experience.
- Starting fresh chats or clearing cache were suggested as potential fixes, linked to device limitations and context overload.
Tidbits from Algorithmic Trading Discussions: Members exchanged insights on algorithmic trading, focusing on AI sentiment influence on market movements and the nuances of media bias.
- It's noted that significant trading insights are better sourced from platforms like EDGAR rather than human perspectives.
Accessing the Cohere Community Server: Inquiries about joining the Cohere For AI community server led to sharing the application page.
- Information about a research lab aimed at addressing complex machine learning challenges was also provided.
Configuring Cohere Connectors: Users sought guidance on utilizing connectors for the Cohere chat endpoint, prompting sharing of necessary documentation.
- It's crucial to use the v1 API for connector setups, as v2 is not supported yet.

OpenAI Discord

Exploring AI Research Grants Experience: A member inquired about experiences applying for grants in AI research, reflecting a growing interest in funding opportunities within the community.
- This exchange highlighted the diverse pathways for securing resources to advance AI projects.
Challenges in AI Customization: Concerns arose about how ChatGPT often ignores customization commands, leading to unpredictable outputs.
- Participants shared instances where guidance wasn’t followed, raising questions about AI's reasoning capabilities.
Understanding Limitations of LLMs: It was noted that LLMs excel in language generation but struggle with math, prompting suggestions to use Python for calculations.
- A member emphasized the importance of providing step-wise guidance to improve LLM functionality.
Utilizing Multiple LLMs for AI Solutions: Discussions highlighted the necessity of using multiple LLMs to handle different tasks effectively, as a single model may not suffice.
- Participants explored the benefits of ‘prompt chaining’ and agentic workflows for enhanced results.
AI Consistency is a Myth: Members pointed out that AI is not consistent, marking unpredictability as a fundamental challenge for users.
- Engagement with AI tasks is seen as both intricate and enjoyable, presenting a blend of excitement and complexity.

Interconnects (Nathan Lambert) Discord

OpenAI and Google in a December Showdown: OpenAI aims for a December launch of its next AI model while Google is also working on releasing its Gemini 2.0, intensifying competition in the AI space. While OpenAI's rollout is phased, Google seeks a wide release, although performance expectations might not be fully met.
- December is shaping up to be a month of dueling AI announcements, making it crucial for engineers to stay updated on these developments.
Meta Builds Its Own Search Engine: Meta is developing a new web search engine under engineering manager Xueyuan Su to minimize reliance on Google and Bing data feeds. This project aims to provide more independent AI solutions for Meta's platforms, avoiding another Apple-like situation.
- The shift reflects Meta's strategy to enhance control over its information ecosystem, potentially impacting data sourcing practices.
Generative AI Adoption is Slow: A recent paper claims that while 40% of US adults engage with generative AI, only 0.5% – 3.5% of work hours actually involve its assistance. The adoption rate is much slower than expected, revealing a disparity between usage and the anticipated impact on productivity.
- This raises questions about how AI integration in workflows can be improved to maximize efficiency.
Concerns Over Gemini's Releases: The release of Gemini models has faced criticism for declining performance compared to previous versions and issues in marketing to consumers. The launch has been deemed one of the most botched releases, with significant regressions affecting user experience.
- Shifting user experiences raise concerns about the legacy of product development in high-stakes AI environments.
Pricing for Human-Generated Examples Inquiry: A member inquired about where to find information on the prices for human-generated examples versus annotating them as good or bad. This question highlights the need for clarity in the value proposition of manual versus automated annotation processes.
- Establishing clear criteria for evaluating generated examples is essential as AI systems continue to proliferate.

tinygrad (George Hotz) Discord

Fast Math Mode sparks discussion: Members highlighted how fast math mode in Metal automatically performs algebraic transforms, requiring manual disabling for strict floating point compliance. The use of -fassociative-math was mentioned as an optimization for mathematical expressions.
- Reassociation was cited as a potential enhancement to explore within the math settings.
Tinygrad limps with complex number integration: Users reported issues with complex numbers in Tinygrad, particularly when creating a DFT, encountering an AssertionError due to insufficient support. George expressed a desire for easier complex number handling, suggesting a potential emulation with a 2D axis.
- The need for complex number support is critical for users aiming to implement advanced algorithms.
Tinygrad rolls out on Android with OpenCL: A user inquired about using Tinygrad on an Android device with OpenCL for model compilation, seeking guidance on setup. Resources like compile_efficientnet.py were shared as potential pathways to establish the necessary OpenCL kernels and buffers.
- Members emphasized the ability to run models without relying on Python as a significant advantage for mobile applications.
Strict PR Submission Guidelines to follow: George Hotz stressed the importance of reviewing existing PRs before submitting new ones, indicating that poorly understood changes may face rejection. He urged contributors to prioritize bug fixes rather than duplicating PRs with similar information.
- This approach ensures the integrity of Tinygrad's development process and meaningful contributions.
Tinygrad ecosystem development takes shape: George discussed the evolution of Tinygrad's ecosystem, hinting at a shift towards performance enhancements and a broader implementation. The community expressed interest in developing model conversion tools similar to HuggingFace's offerings to streamline model management.
- The conversation centered on the importance of these tools as Tinygrad matures, reinforcing the focus on usability and compatibility.

LlamaIndex Discord

Exploring Intelligent Knowledge Assistants at Ray Summit: The Ray Summit workshop showcased the vision for building intelligent knowledge assistants that process complex data in various ways, now available on YouTube.
- All components needed to go beyond simple tasks were discussed during the session, which can be found here.
NVDIA Case Study Cookbook Needed: Several members expressed interest in a cookbook for the NVDIA case study, particularly focusing on streaming use cases with Chainlit.
- One member highlighted struggles with nesting parent/child steps within Chainlit's framework while pursuing a custom agent workflow.
Mastering Text-to-SQL with 500 Tables: A reliable text-to-SQL tutorial demonstrates constructing a SQL agent capable of operating over 500 tables, available on YouTube.
- This resource stands out as one of the best for navigating complex data setups, further information is accessible here.
Deepfake Voice Generation Impresses: A user experienced impressive deepfake voice generation, where the system auto-predicted replies as if they were responding as the user on a Teams Tier plan.
- The AI not only asked questions in its own voice but also answered in the user's voice, demonstrating real-life auto-predict capabilities.
Retriever Issues Reported: A member reported issues with retrievers returning empty nodes despite successfully testing the index with a chat engine.
- Another member recommended sharing code to troubleshoot further since retriever configurations seemed incorrectly set.

DSPy Discord

Automatic Prompt Generation with MIPROv2: A member shared a thread on implementing automatic prompt generation techniques using the MIPROv2 optimizer with the gsm8k dataset, structured into three clean modules for demos, instructions, and outputs.
- This streamlined approach enhances the prompt crafting process, as discussed in a detailed tweet.
Swiss Citizens Collaborate on Laws: A member is developing a collaborative software application enabling Swiss citizens to participate directly in law-making using the popular initiative process, a topic of personal academic interest.
- The project showcases significant involvement in civic engagement and is linked to the broader discourse on participatory democracy.
DSPy 2.5 Mapping Clarification: Discussion emerged on the transition to DSPy 2.5, with members consulting the migration documentation to understand implementation changes.
- No major differences are anticipated, suggesting smooth continuity for existing users.
Development of Audio Input Features: Members explored ongoing developments related to audio input features in DSPy, referencing a potential GitHub pull request that discusses supporting architectures like Ultravox with LLaMa.
- This integration could advance multimodal capabilities, pivoting DSPy into broader applications.
Examples for NER and Relation Extraction: A member provided a code snippet for Named Entity Recognition (NER) in DSPy, highlighting the modern dspy.ChainOfThought implementation as preferred over deprecated methods.
- Attention was also directed towards relation extraction, with suggestions of leveraging relevant datasets from Hugging Face to enhance project insights.

OpenInterpreter Discord

Open Interpreter Performance with Spreadsheets: During discussions on improving Open Interpreter performance, users experimented with insights from the YouTube tutorial but found that local models like qwen2.5:32b-instruct struggled significantly with execution.
- A member suggested that enhancing performance hinges on using quality models and effective prompting techniques, even recommending the creation of a profile for task clarification.
Guidance on Open Interpreter Setup: Beginners faced challenges setting up Open Interpreter via the Windows terminal, prompting another member to share setup instructions complete with pip installation commands.
- This streamlined setup guidance aimed to facilitate easier introductory experiences for new users embarking on their journey with the tool.
Local Model Restrictions in Open Interpreter: Inquiries about local models' requirements for visual capabilities led to the understanding that no local model could match the performance of Sonnet, undermining local operations.
- A tech-savvy member emphasized the importance of correctly importing the computer API to enable local models to function effectively.
Markdown Love with Obsidian: Members celebrated their passion for Markdown, with one hinting at imminent exciting demos related to Obsidian tools set to impress.
- This reflects a growing enthusiasm for implementing Markdown practices within AI coding environments, pushing for creative utilization.
OpenAI Introduces Advanced Voice Features: OpenAI's announcement revealed that Advanced Voice is now accessible for free users in the EU, including Switzerland and Norway, enhancing their mobile app functionalities.
- This accessibility milestone signifies an important step towards democratizing advanced AI features for broader user demographics.

LAION Discord

Discord LLM Helper Demand: Members expressed a strong desire for a 'Discord LLM helper' to summarize chats and field questions on demand, while noting the current limitations of Discord's beta feature.
- It’s a missed opportunity, especially since providing ephemeral responses could streamline interactions by keeping them user-specific.
Custom Bots with Ephemeral Responses: Interest was shown in developing a custom Discord bot that could efficiently handle question-answering and summaries, utilizing ephemeral responses.
- This approach could significantly improve the clarity of chat interactions by making responses only visible to the user executing the command.
Mindcraft LLM Projects: Engaging discussions around integrating Minecraft with LLMs have sparked enthusiasm within the community for creative projects.
- Participants remarked that these combination projects are not only enjoyable but also present unique challenges in implementation.
Clarification on Llama3-8B-1.58 Model: Discussions clarified the Llama3-8B-1.58 model's lineage, stating it derives from Llama-3-8B-Instruct, not BitNet as previously assumed.
- Members referred to a blog on extreme quantization for further details and guidance.
Confusion About Model Specifications: Clarifications emerged surrounding the model specifications of Llama3-8B-1.58, particularly a misconception about it being a 100B model.
- Members acknowledged the misunderstanding and found commonality in the need for better communication on 8B parameters in model descriptions.

OpenAccess AI Collective (axolotl) Discord

Mixtral AI model is outdated: A member humorously suggested upgrading from the Mixtral AI model to a newer version of MistralAI, hinting at obsolescence.
- At least upgrade to a newer MistralAI model. �*
Inquiry on SymNoise implementation code: Members discussed the need for code implementation related to the SymNoise fine-tuning technique to enhance language models using symmetric noise.
- Tried implementing it myself, but it seems to double the batch size of the embeddings through concatenation, and I don't know how to deal with that.
Incomplete SAT reading test scrape revealed: A member reported an incomplete scrape of the SAT reading tests and several AP tests, igniting formatting discussions.
- Thank you for bringing it to my attention! expressed appreciation for feedback regarding the scrape.
Concerns on multimodal question inclusion: Members raised issues about whether images should accompany questions after observing formatted_prompt and rewritten_answer fields in the SAT dataset.
- The original scraper confirmed that while the full set does include images for some questions, the dataset was intended to remain unimodal.
Clarifications on Qwen model configuration needed: In a detailed discussion, members highlighted the necessity of specifying exact model types for Qwen/Qwen2.5-32B rather than generic placeholders like AutoModel.
- Concerns were also raised regarding potential security issues tied to the trust_remote_code setting.

LangChain AI Discord

Creating ReAct Agent using HuggingFace Local Model: A member is currently initializing a ReAct Agent with a local model and faced a parserException during invocation.
- They are seeking help as they couldn't find a solution online for this specific error.
Exploration of Advanced RAG Methods: Questions arose about the most advanced techniques for Retrieval-Augmented Generation (RAG) and the relevance of traditional methods.
- Common practices mentioned include data cleaning and storage in Pinecone/vector databases, while recent references were sought.
Using create_sql_agent to Return Pandas DataFrame: A query was raised on utilizing create_sql_agent to generate a Pandas DataFrame instead of just a text string.
- The member specifically inquired about the necessity of SQLDatabaseToolkit in this scenario.
Introducing AdaletGPT - A Turkish Legal Chatbot: AdaletGPT is a Turkish legal chatbot based on RAG, built with LangChain, Pinecone, and OpenAI for legal assistance.
- This platform allows users to engage in AI-driven interactions for legal inquiries.
bootstrap-rag v0.0.11 Launches with Exciting Updates: The new release of bootstrap-rag v0.0.11 incorporates an LLM as Judge template with enhancements from Arize AI Phoenix.
- This update includes key bug fixes and improved documentation for a smoother user experience.

LLM Agents (Berkeley MOOC) Discord

Lecture 8 Kicks Off Today!: The 8th lecture begins today at 3:00pm PST, available via livestream here, promising to cover integral aspects of LLM Agents.
- Attendees look forward to insights from guest speaker Yuandong Tian, who will explore the fusion of neural and symbolic decision-making frameworks.
Study Group Buzz Gains Momentum: Participants are keen to form a study group for collaborative discussions, with a Google Form shared for scheduling preferences.
- The response has been positive, suggesting increased engagement among late joiners eager to dissect lecture content together.
Hackathon Timeline Released: Details about the upcoming hackathon, including various tracks like Applications, Benchmarks, and Safety, are now available on the hackathon website.
- Hosted by Berkeley RDI, this event aims to bring together diverse talents to enhance the field of LLM agent technology.
Datasets Discussion Lingers: Members seek guidance on suitable datasets for the benchmarking track of the hackathon, prompting an open-ended conversation on resources.
- Despite interest, no specific dataset resources have been shared yet, indicating a need for further exploration and collaboration.

Torchtune Discord

Embedding Config Flags Proposal: A member proposed exposing two boolean flags (embedding_trainable=False and norms_trainable=False) in the configs to mitigate future configuration issues, as TransformerDecoder may necessitate more significant changes.
- This approach seeks to simplify transitions from boolean flags to lists, preventing numerous configuration adaptations.
LoRA Bug Fix Submitted: A fix for the LoRA bug was submitted via pull request #1909, addressing NaN loss during single device fine-tuning when use_dora=True.
- However, there are uncertainties about the fix's compatibility across all recipes, particularly in distributed setups.
Hyperparameter Optimization Recipe Discussion: A GitHub issue proposes a recipe for hyperparameter optimization allowing users to input configurations along with datasets and parameters for sweeping common defaults.
- Interestingly, no one has requested this feature explicitly, indicating potential gaps in user needs.
Skepticism Surrounds muP Utility: Members questioned the practicality of muP for fine-tuning, noting its primary mention relates to pretraining, with calls for improved generation and early stopping taking priority.
- Concerns persisted over whether implementing muP is worth the investment over addressing existing issues.
Prioritizing Development Issues: A member highlighted the excessive backlog of 200 open issues, emphasizing the urgent need to tackle faster reinforcement learning generation and improved LLM classification.
- Furthermore, support for distributed shampoo was flagged as another high-priority item.

Mozilla AI Discord

Human Native AI Marketplace launches: The new Human Native AI Marketplace allows creators to license their content for AI training and receive compensation.
- Co-founder James Smith will discuss progress at the upcoming Mozilla Data Futures Lab Speaker Series.
Exciting November Member Programming lined up: November hosts a range of member-organized events including sessions on Sqlite-Vec and Refact.ai, along with remote conferences and a San Francisco meetup.
- Members should RSVP to join the discussions that matter.
Showcase of Open Source Projects: Highlighted projects at Mozilla AI include Open Interpreter, Homebrew, and Sentry.io's open source auto fix.
- There’s anticipation for featuring even more projects from the 3300 member community on Public AI.
OSS4AI Meetup brings local members together: The upcoming OSS4AI San Francisco IRL Meetup invites members to connect and collaborate.
- It's a golden chance for local enthusiasts to engage in meaningful project discussions.
Sqlite-Vec Metadata Filtering techniques discussed: An event on Metadata Filtering in Sqlite-Vec will tackle crucial strategies for efficient data management.
- This initiative emphasizes preserving data integrity while supporting AI training.

Gorilla LLM (Berkeley Function Calling) Discord

Clarification on Leaderboard's 'Multiple' Functionality: Users queried the meaning of 'multiple' in the leaderboard context, suspecting it indicates the ability to choose the appropriate function from several available options.
- It was suggested that while this aspect is clear, the evaluation of true multi-step functionality is still unclear.
GitHub Reference for Function Call Leaderboard: A GitHub link was shared as an example related to the Gorilla project, aimed at training and evaluating LLMs for function calls.
- The referenced page provides vital context for understanding the leaderboard's operational mechanics.

The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

HuggingFace ▷ #general (1071 messages🔥🔥🔥):

Hugging Face Spaces

Fine-Tuning Models

Image Generation Models

NSFW Content Discussion

Model Quantization

Exploration of Hugging Face Spaces and Models: Users discussed various models available on Hugging Face Spaces, including those for image generation and quantization, sharing links to specific models.
- Recommendations for using lighter models for local projects included Llama and Flux, with insights into specific functionalities.
Controversy Over NSFW Content: A debate arose regarding the appropriateness of discussing NSFW material on the server, with community members emphasizing the server's SFW standards.
- Cakiki reminded everyone to adhere to the code of conduct and suggested continuing the discussion in a designated thread.
Technical Discussions on AI Training: Users explored training models on local GPUs, with suggestions about optimizing performance and reducing overfitting by adjusting sample sizes.
- There was an emphasis on the importance of understanding model parameters and quantization techniques.
Sharing Educational Resources: Community members shared beginner-friendly resources, including a video on building Hugging Face Spaces and courses on model quantization.
- The conversation included recommendations for practical applications of AI and machine learning, encouraging new users to explore available tools.
General AI Community Interaction: Participants engaged in light-hearted banter about AI-generated content and expressed interest in various AI projects.
- The chat highlighted the collaborative spirit of the community and the willingness to assist newcomers with their queries.

Links mentioned:

Here, Let Me Google That For You: Passive-aggressively teach your friends how to Google. For all those people who find it more convenient to ask you rather than search it themselves. Not associated with Google.
Rayleigh quotient - Wikipedia: no description found
Code of Conduct – Hugging Face: no description found
Here, Let Me Google That For You: Passive-aggressively teach your friends how to Google. For all those people who find it more convenient to ask you rather than search it themselves. Not associated with Google.
Intel's Hala Point, the world's largest neuromorphic computer, has 1.15 billion neurons: The Hala Point machine brings compute efficiency that rivals GPUs and CPUs on some tasks, Intel said.
Streamlit: no description found
@Tonic on Hugging Face: "boomers still pick zenodo.org instead of huggingface ??? absolutely clownish…": no description found
Api for access AI recommendation - a Hugging Face Space by KaliumPotas: no description found
Tonic/open-gpt-Flirty-Friend · Hhh: no description found
FlyWire: no description found
Vegeta Dragonballsuper GIF - Vegeta Dragonballsuper Dragonball - Discover & Share GIFs: Click to view the GIF
city96/stable-diffusion-3.5-large-gguf · Hugging Face: no description found
Models - Hugging Face: no description found
Quantization: no description found
Compressed Tensors: no description found
I Saw W Gus Fring GIF - I Saw W Gus Fring Gus - Discover & Share GIFs: Click to view the GIF
Leonardo Dicaprio Rick Dalton GIF - Leonardo Dicaprio Rick Dalton Point Finger - Discover & Share GIFs: Click to view the GIF
Alone Glitch GIF - Alone Glitch Film - Discover & Share GIFs: Click to view the GIF
Embed No GIF - Embed No No Embed - Discover & Share GIFs: Click to view the GIF
calcuis/sd3.5-large-gguf · Hugging Face: no description found
Reddit - Dive into anything: no description found
Models - Hugging Face: no description found
Let's build GPT: from scratch, in code, spelled out.: We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections t...
meta-llama/Llama-3.1-8B · Hugging Face: no description found
Allegro: New Open Source SOTA Text to Video Model - 27 Amazing Examples With Prompts, Apache License: Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input. Currently ...
How to Create a Hugging Face Space: A Beginner's Guide: If you're new to Hugging Face and want to set up a space for your machine learning model or app, you're in the right place. Follow these simple steps to get ...
But what is a neural network? | Chapter 1, Deep learning: What are the neurons, why are there layers, and what is the math underlying it?Help fund future projects: https://www.patreon.com/3blue1brownWritten/interact...
Interview with a Postdoc, Junior Python Developer: Python programming languageInterview with a Postdoc, Junior Python developer with Ph.D. Carl Kron - aired on © The Python. Programmer humor Python humorProgr...
But what is a neural network? | Chapter 1, Deep learning: What are the neurons, why are there layers, and what is the math underlying it?Help fund future projects: https://www.patreon.com/3blue1brownWritten/interact...
特斯拉机器人出租车与人形机器人、OpenAI MLE-bench、DeepLearning.AI多模态课程、NVIDIA专家混合体、AMD GPU性能超越Nvidia: _
GitHub - huggingface/hub-docs: Docs of the Hugging Face Hub: Docs of the Hugging Face Hub. Contribute to huggingface/hub-docs development by creating an account on GitHub.
GitHub - RayFernando1337/LLM-Calc: Instantly calculate the maximum size of quantized language models that can fit in your available RAM, helping you optimize your models for inference.: Instantly calculate the maximum size of quantized language models that can fit in your available RAM, helping you optimize your models for inference. - RayFernando1337/LLM-Calc
nroggendorff/profession · Datasets at Hugging Face: no description found
argilla (Argilla): no description found
19,500+ Beautiful Sad Girl Hurt Silhouette Stock Photos, Pictures & Royalty-Free Images - iStock: no description found
[Bug] The specified tag is not a valid quantization scheme. · Issue #1476 · huggingface/hub-docs: Bug description. When trying to pull a specific quantization tag for a model through Ollama I was getting the following error: The specified tag is not a valid quantization scheme. At first I thoug...

HuggingFace ▷ #today-im-learning (2 messages):

Byte Pair Encoding tokenizer

Shakespeare dataset

GitHub project DeepLLMs

Custom BPE Tokenizer Created: A member finished creating a custom implementation of a Byte Pair Encoding (BPE) tokenizer trained on 100k chars of the tiny Shakespeare dataset with a vocab size of 3k.
- The implementation can be explored further on their GitHub repository for those interested in learning about LLMs and transformers.
Exploration of DeepLLMs: The GitHub repository titled DeepLLMs is aimed at learning the basics of LLMs and transformers, as well as exploring other interesting topics along the way.
- This project serves as a valuable resource for anyone looking to deepen their understanding of large language models.

Link mentioned: GitHub - its-nmt05/DeepLLMs: Meant for learning the basics of LLMs and transformers and exploring other interesting stuff along the way: Meant for learning the basics of LLMs and transformers and exploring other interesting stuff along the way - its-nmt05/DeepLLMs

HuggingFace ▷ #cool-finds (56 messages🔥🔥):

AI Offline Development Environments

Bee Agent Framework

Medical AI Research

Quantum Computing and Machine Learning

Reading and Understanding Complex Papers

Seeking Offline AI Development Tools: A member inquiring about offline development environments for AI due to company internet restrictions received feedback suggesting to use a portable virtual machine or Docker built offline.
- The community discussed various options but warned that exporting and importing virtual environments can be cumbersome.
IBM's New Bee Agent Framework: A new Bee Agent Framework was released by IBM developers, featuring agents refined for Llama 3.1, sandboxed code execution, and enhanced memory management.
- The framework allows integration using an OpenAI-compatible API and is designed for serving agents with improved workflow control.
Last Week in Medical AI Updates: Members highlighted the latest Medical AI podcast, discussing significant research breakthroughs and new LLM models from the week of October 19-26, 2024.
- Highlighted studies included a paper on safety principles for medical summarization, along with various applications of medical AI in diagnostic tasks.
Quantum Computing in Black Hole Research: Discussion emerged around a recent paper on ML and Quantum Computing aimed at studying the insides of black holes, noted for its advanced complexity.
- Members noted the difficulty of the paper, suggesting prior knowledge of quantum mechanics as necessary for comprehension.
Improving Paper Understanding through AIs: Members shared strategies for understanding complex academic papers by backtracking to more basic resources and using AIs for explanations and quizzes.
- A suggestion was made to create a HuggingFace space where users can interact with an LLM embedded within academic papers for better understanding.

Links mentioned:

HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly: There have been many benchmarks for evaluating long-context language models (LCLMs), but developers often rely on synthetic tasks like needle-in-a-haystack (NIAH) or arbitrary subsets of tasks. It rem...
Tweet from elvis (@omarsar0): IBM devs release Bee Agent Framework, an open-source framework to build, deeply, and serve agentic workflows at scale. Features include: - Bee agents refined for Llama 3.1 - sandboxed code execution...
[Distributed w/ TorchTitan] Introducing Async Tensor Parallelism in PyTorch: with Horace He, Less Wright, Luca Wehrstedt, Tianyu Liu, Wanchao Liang TL;DR We implemented experimental async tensor parallelism support in PyTorch. We integrated it in TorchTitan and observed: Up...
🤖 Latest Medical AI Breakthroughs | Weekly Research Review Oct 19-26 (Part 1/2): This week in medical AI research, we explore groundbreaking work in deepfake detection, explainable AI (XAI), LLM applications, and multimodal foundation mod...
@aaditya on Hugging Face: "Last Week in Medical AI: Top Research Papers/Models 🔥 🏅 (October 19-26…": no description found

HuggingFace ▷ #i-made-this (11 messages🔥):

Stable Diffusion 3.5

Fast Apply Qwen2.5 Model

Bionic Reading Hub

Google Shopping Dataset

LoRA Models

Stable Diffusion 3.5 Galleries Showcase: A user created galleries demonstrating how Stable Diffusion 3.5 Large interprets artistic styles with over 120 artists and 140 styles showcased in two separate galleries.
- Each entry presents unique artistic interpretations, accompanied by the prompts used, as seen in the Artists Gallery and Styles Gallery.
Introducing Fast Apply Qwen2.5 Coder Model: A new open-source Fast Apply model has been released, optimizing the code update process for large files using the Qwen2.5 Coder Model.
- The model facilitates fast and accurate code updates, handling natural snippets without needing full file edits, significantly improving efficiency.
Bionic Reading Hub Launch: A GitHub project titled Bionic Reading Hub was introduced, aimed at transforming PDFs into a Bionic Reading format for enhanced readability.
- The repository details tools and methods to read like a superhero, enhancing the reading experience for users.
Launch of Marqo Google Shopping Dataset: The Marqo Google Shopping 10 million dataset has been released on Hugging Face, featuring comprehensive multimodal product retrieval data, including 10M rows.
- This dataset includes significant details for research and development, providing useful benchmarks and relevance scores for each query-document pair.
Exploration of LoRA Model Opportunity: A user highlighted the need for more LoRA models that emulate the aesthetics of old pixel-style games.
- This comment reflects an interest in retro gaming styles within generative AI art, prompting opportunities for creative exploration.

Links mentioned:

SD3.5L Artist Gallery: no description found
SD3.5L Style Test Gallery: no description found
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found
goin to war with the guitar and gary4ive - ableton speedrun on fastforward - captains chair s2 ep9: trying something different with this one. if i actually get a subscriber from this i'll be surprised.fk the algorithmi use a plugin called gary4live in here....
GitHub - SanshruthR/Bionic_Reading_Hub: Read like a superhero: Turn PDFs into Bionic Reading format: Read like a superhero: Turn PDFs into Bionic Reading format - SanshruthR/Bionic_Reading_Hub
ChromePunk - SDXL 1.0 - V1 | Stable Diffusion XL LoRA | Civitai: Do whatever you want, but do something cool... 👉🏻 Hugging Face LINK 👉🏻 Dataset LINK
rbourgeat/ChromePunk-SDXL-LoRA · Hugging Face: no description found
Marqo/marqo-GS-10M · Datasets at Hugging Face: no description found

HuggingFace ▷ #reading-group (12 messages🔥):

Automated Penetration Testing

LLMs in Cybersecurity

Bionic Reading Hub Repo

Introducing a New Benchmark for Automated Penetration Testing: A recent paper discussed introduces a novel benchmark for LLM-based automated penetration testing, addressing the lack of comprehensive evaluation tools in cybersecurity. It highlights the performance of LLMs like GPT-4o and Llama 3.1-405B using the PentestGPT tool.
- Llama 3.1 shows an edge over GPT-4o, yet both models currently struggle with effective penetration testing.
Excitement for the Future of Automated Testing: Members expressed enthusiasm for advancements in automated penetration testing, with discussions about training LLMs using reinforcement learning to improve their performance. Challenges remain, as the effectiveness of LLMs in pentesting is still in question, given their current limitations.
- Chad_in_the_house noted that while the journey is lengthy, the potential benefits in this field are significant.
Supportive Community Encourages Progress: Community members, including platinumfawn95, emphasized that taking small steps can lead to eventual success in the complexities of automated pentesting. Support for the ongoing effort reassures researchers as they work through identified failure points.
- Lunarflu expressed eagerness to participate in events related to this research, highlighting collective enthusiasm.
Useful GitHub Resource Shared: A member shared the Bionic Reading Hub GitHub repository, which transforms PDFs into Bionic Reading format, potentially aiding users in processing information. The repository, titled Bionic Reading Hub, was presented with a brief description of its capabilities.
- This tool could enhance reading efficiency for those engaging with complex materials in cybersecurity.

Links mentioned:

Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements: Hacking poses a significant threat to cybersecurity, inflicting billions of dollars in damages annually. To mitigate these risks, ethical hacking, or penetration testing, is employed to identify vulne...
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements: no description found
GitHub - SanshruthR/Bionic_Reading_Hub: Read like a superhero: Turn PDFs into Bionic Reading format: Read like a superhero: Turn PDFs into Bionic Reading format - SanshruthR/Bionic_Reading_Hub

HuggingFace ▷ #NLP (23 messages🔥):

Langchain SQL Agent Memory Integration

Contributing to Open Source

Qwen Models Dominance

Healthcare Application Models

File Renaming Models

Integrate Memory in Langchain SQL Agent: A user is seeking help to add memory to their Langchain SQL agent for natural language queries on a database, expressing urgency in resolving the issue.
- Another user mentioned that there are buffer and summary memory integrations available, allowing for cached queries and responses to improve follow-up answers.
Getting Started with Open Source Contributions: A user expressed interest in contributing but was unsure of the initial steps, concerned about the necessary communications and agreements.
- Guidance was offered on needing a GitHub account and familiarity with git operations like cloning, branching, committing, and pushing.
Qwen Models Lead the Pack: A member pointed out the surprising dominance of the Qwen2.5 models in the open-source community, as newer models often omit these in comparisons.
- The community is baffled by how Qwen has established a strong lead, with many voicing curiosity about the reasons behind this success.
Seeking Medical Models for Healthcare Apps: A user is in search of both open and closed source medical models for a healthcare application, currently using GPT-4o but looking for something more specialized.
- They expressed a need for a small model suitable for use as an LLM in a judicial context.
Need for Lightweight File Renaming Models: A user asked if anyone has models capable of renaming files based on their contents, specifying a preference for a really small model.
- This request highlights the ongoing need for lightweight solutions in file management and automation tasks.

HuggingFace ▷ #diffusion-discussions (7 messages):

Contributing to Diffusers

AutoencoderKL model parameters

Custom workflows with LLMs

Training models from scratch

Noise addition in models

Get started with Diffusers: New contributors are encouraged to read the contributing readme and search for the good first issue label.
- This helps streamline the onboarding process for those looking to contribute.
Inquiry About AutoencoderKL Parameters: Schrodingersneko asked about the number of parameters in the autoencoderKL model (original Sd2.x pretrained ae), but no direct answers were provided in the conversation.
- The community may have insights, but further discussion on this topic seems needed.
Exploring LLM Web Browsing Workflows: Ritzfy inquired about custom workflows with LLMs beyond standard tasks like summarization and creative writing.
- No specific responses were provided, indicating a potential area of interest for exploration.
Challenging Noise Addition in Model Training: Cocos9762 expressed thoughts on training models from scratch, questioning the assumptions about noise addition affecting different channels differently based on their characteristics.
- Ozzy_gt responded with insights on how noise is currently added uniformly across channels, suggesting this might suffice but advocating for experimental validation.
Consensus on Noise Dynamics in Channels: Following Ozzy_gt's input, Cocos9762 acknowledged the practice in Flux and newer SD models and plans to proceed with this understanding.
- They discussed the appropriateness of learned weights to handle the noise addition dynamics.

Link mentioned: <a href="https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A"good+first+issue")">Issues%22%3EIssues) · huggingface/diffusers: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX. - Issues · huggingface/diffusers

Notebook LM Discord ▷ #announcements (1 messages):

UXR Team Study

Remote Interviews

Participant Incentives

Join UXR Team's Insightful Study: The UXR team is hosting remote 1:1 interviews via Google Meet from October 31 to November 6 to gather participant feedback on upcoming developments.
- Participants can qualify for a $75 thank you reward upon successful completion of the interview.
Limited Slots for Research Participation: Only 6 slots are available for this research opportunity, emphasizing the need for quick responses.
- Interested participants are encouraged to complete the eligibility questionnaire to check if they qualify.

Link mentioned: Participate in an upcoming Google concept testing study!: Hello, I’m contacting you with a short questionnaire to verify your eligibility for an upcoming research study with Google. This study is an opportunity to provide feedback on something that's cu...

Notebook LM Discord ▷ #use-cases (86 messages🔥🔥):

Personalized Podcast App

Deep Dive Podcasts

Audio Overview Improvements

Using NotebookLM for Bible Study Lessons

AI Avatar Synchronization

Introducing PodCraft: Personalized Podcasts: A user proposed an app called PodCraft that delivers personalized podcast content based on specific requests, eliminating the need to sift through numerous episodes.
- The app promises instant access to tailored content in the voice and style of favorite creators, appealing to frustrated listeners who find it hard to locate relevant insights.
Positive Feedback on Deep Dive Podcasts: Many users expressed enjoyment for deep dive podcasts, with one noting an increased interest in using NotebookLM for weekly Bible study lessons.
- Creating a podcast based on prepared lessons helps attendees understand the topics better before class.
Concerns Over AI Podcast Generation Changes: Some users reported recent changes in podcast generation that resulted in undesired artifacts, such as repeated prompts and reference to commercial breaks.
- It is speculated that these changes may be the result of adjustments in training data or model updates, affecting the quality of output.
Challenges with AI Audio Overview Lengths: Users noted difficulties in generating short audio podcasts, as recent outputs tend to standardize around 20 minutes, impacting variety.
- Experiments showed longer generations often yielded repetitive content, underscoring potential issues in the output consistency.
Successful Integration of HeyGen Avatars: One user shared a project focused on making HeyGen avatars behave more realistically when not speaking, particularly for a Halloween special video.
- The user expressed excitement about the capabilities and potential of AI-generated content, showing particular satisfaction with the improvements made.

Links mentioned:

no title found: no description found
no title found,): no description found
Home | Podcraft: no description found
How to create Audio Reactive Circle Wave Spectrums in Adobe After Effects | CC Tutorial: My Effects Shop: https://justinodisho.com/shopAdobe Software Download: https://prf.hn/l/dlXjya5Support the Channel: https://www.youtube.com/channel/UCy7DyWXJ...
LTT Reads: Abbott and Costello: Who's On First: Ever thought comedy could teach you about the human condition? In this episode, we’re dissecting Abbott and Costello’s legendary 'Who’s on First?' routine—no...
Vocaroo | Online voice recorder: no description found
OK. This is Serious… China Is Taking Over Electric Vehicles with BYD: Dive into the unstoppable rise of BYD, China’s electric vehicle powerhouse, as it prepares to challenge global markets and take on giants like Tesla. This vi...
JungleTV study (scary): Do not visit: https://www.jungletv.live Podcast generated by NotebookLM based on a text written by me. The same text was then used to generate a study with C...
Weekly Deep Dive 28Oct24: Election, High Yield Spreads, China Buy backs, Palladium
Descript Multi-track Sequences and Compositions: Understand how files and compositions and sequences are organized in Descript. In this video I show you a detailed explanation of what it means to import fil...
Best Tesla for the Money? Stock Market Secrets Behind Tesla’s AI & Robotaxi Revolution!: In this video, we dive deep into the future of Tesla—exploring the latest surge in investor optimism and the ambitious steps Elon Musk and his team are takin...
UNREAL MYSTERIES 4: The Halloween Special: Unreal Mysteries with David and Hannah - the HALLOWEEN SPECIAL!- - -This is 100% AI generated. Every person, every picture, every word, every sound, every no...
PocketAI Speaker Separation Tool: Generate separate MP3 files for each speaker with synchronized silence and a downloadable SRT file. Perfect for enhancing AI avatar videos with realistic conversations and lip-sync animations.
Create an AI-Powered Podcast & Avatars with Your Own Voice! | Control the Conversation: In this tutorial, learn how to create an AI-powered podcast using your voice and ChatGPT Advanced Voice. Unlike NotebookLM, where AI directs the conversation...
DeepDive | Instagram, Facebook | Linktree: 🎤 Daily voice vibes, front and center! Join the journey! 12k Active DEEPDIVERS

Notebook LM Discord ▷ #general (516 messages🔥🔥🔥):

NotebookLM daily limits

Audio Overview generation

Open source AI models

Image generation tools

NotebookLM features and updates

NotebookLM Daily Limits Cause Frustration: Users express frustration over newly imposed daily limits on Audio Overview generations, speculating on potential subscription models in the future.
- Many feel blindsided by the lack of communication regarding these limits, emphasizing the need for transparency from Google.
Varying Length of Audio Outputs: Members discuss the variability in the lengths of generated podcasts, with some outputs reaching up to 39 minutes based on source length and prompts used.
- It was noted that the source material heavily influences the duration of the generated content.
Opinions on Open Source AI Models: Users share their preferences for various open-source AI image generation tools, significantly favoring those with less restrictive usage guidelines.
- Discontent is voiced toward Google's Imagen, with many stating that better models exist without limitations on usage.
Anonymous User Queries about Notebook Functions: Concerns were raised regarding the quantity of notebooks allowed, with indications that the limit is around 100 notebooks per user.
- Participants inquire about effective updates and re-parsing of sources within notebooks, especially connected to Google Drive folders.
Mix of Personal Experiences and Preferences: Contributors share personal anecdotes about driving, AI technology habits, and college experiences, highlighting community interests.
- Conversations include sentiments about responsible driving and the quality of various AI tools, sparking debates among users.

Links mentioned:

Discord - Group Chat That’s All Fun & Games: Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.
no title found: no description found
NotebookLM: no description found
Ban Hammer GIF - Ban Hammer Futurama - Discover & Share GIFs: Click to view the GIF
Absolutely Brilliant Simon Cowell GIF - Absolutely Brilliant Simon Cowell Britains Got Talent - Discover & Share GIFs: Click to view the GIF
no title found: no description found
Chris Evans Laugh GIF - Chris Evans Laugh - Discover & Share GIFs: Click to view the GIF
no title found: no description found
Tweet from Vicente Silveira (@vicentes): @joshtwoodward of Notebooklm fame sharing at the Google/Deepmind event in SF some cool things about it:
Tweet from Anthropic (@AnthropicAI): Even while recording these demos, we encountered some amusing moments. In one, Claude accidentally stopped a long-running screen recording, causing all footage to be lost. Later, Claude took a break ...
no title found: no description found
Best Tesla for the Money? Stock Market Secrets Behind Tesla’s AI & Robotaxi Revolution!: In this video, we dive deep into the future of Tesla—exploring the latest surge in investor optimism and the ambitious steps Elon Musk and his team are takin...
Revolutionize Collaboration with Real Time Messaging Tools: This summary is talking about the Book "Begining React and Firebase - Nabendu Biswas".It serves as an introduction to Firebase, a set of tools provided by G...
GitHub - souzatharsis/podcastfy: An Open Source alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI: An Open Source alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI - souzatharsis/podcastfy
TikTok - Make Your Day: no description found
TikTok - Make Your Day: no description found
[TikTok - Make Your Day](https://www.tiktok.com/@therealsoniczed/photo/7430616173437390086?is_from_webapp=1&sender_device=pc&web_id=7430710704338454021): no description found

Unsloth AI (Daniel Han) ▷ #general (484 messages🔥🔥🔥):

Unsloth Performance

Multimodal Models

Gradient Accumulation Issues

CuDNN Optimization

Training Custom Models

Unsloth's New Features and Issues: Users discussed the latest update to Unsloth, highlighting improved speed but also reporting crashes, specifically due to an assumption of indexable datasets in the unsloth-zoo. A pull request addressing this issue was mentioned, along with community concerns about effectively utilizing the new features.
- While exploring Unsloth's capabilities, some users encountered problems quantizing models and sought solutions on GitHub, reflecting an engaged community troubleshooting specific issues.
Discussion on Multimodal AI Models: The dialogue shifted towards the complexities of merging vision and language models, with insights shared about how adapters can facilitate compatibility between different modalities. Users noted that while sound integration is possible, it remains a challenging area in current AI development.
- GLM-4 was highlighted as a robust example of a model that supports audio and textual inputs, leading to discussions on the potential of audio adapters for enhanced multimodal interactions.
Gradient Accumulation Enhancements: There was significant discourse around the impact of gradient accumulation on training efficiency, as users shared personal experiences and optimizations following a recent fix in Unsloth. Community members noted the importance of understanding the balance of batch sizes and memory when configuring their training environments.
- Feedback indicated a learning curve with the new gradient accumulation features, leading to trials and adjustments in user training workflows.
Future Developments in AI Training Frameworks: The pursuit of developing a Gradio UI for automated training processes was presented, aiming to simplify model training with adjustable variables. Users discussed the integration of this framework with Hugging Face for uploading adapted and merged models.
- Interest was expressed in enhancing user experience through streamlined tools for AI model training, particularly for those with less technical proficiency.
Community Engagement and Learning: The return of a community member after an extended absence sparked discussions regarding contributions and shared experiences with various AI projects. Participants emphasized the value of code contributions and collaborative problem-solving as essential components of the open-source community.
- Members encouraged discussions about new ideas while clarifying that effective collaboration typically involves sharing tangible contributions like code implementations.

Links mentioned:

Arcee-SuperNova: Training Pipeline and Model Composition: We trained Arcee SuperNova-70B and Arcee SuperNova-8B to be a generally intelligent Llama-3.1-405B derivatives using intelligent distillation, novel post-training, and model merging techniques.
Oh No GIF - Oh No Computer Saysno - Discover & Share GIFs: Click to view the GIF
Sad Sigh GIF - Sad Sigh Monkey - Discover & Share GIFs: Click to view the GIF
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \\ /| - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Confused Confused Look GIF - Confused Confused look Confused face - Discover & Share GIFs: Click to view the GIF
llama-recipes/recipes/quickstart/NotebookLlama at main · meta-llama/llama-recipes: Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&...
unsloth (Unsloth AI): no description found
Bug Fixes in LLM Training - Gradient Accumulation: Unsloth's Gradient Accumulation fix solves critical errors in LLM Training.
physics_of_llms/model_edit.py at main · symato/physics_of_llms: Các thí nghiệm liên quan tới LLMs cho tiếng Việt (insprised by Physics of LLMs Series) - symato/physics_of_llms
GitHub - THUDM/GLM-4-Voice: GLM-4-Voice | 端到端中英语音对话模型: GLM-4-Voice | 端到端中英语音对话模型. Contribute to THUDM/GLM-4-Voice development by creating an account on GitHub.
Issues · unslothai/unsloth: Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - Issues · unslothai/unsloth
GitHub - unslothai/unsloth: Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
GitHub - ggerganov/llama.cpp: LLM inference in C/C++: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
unsloth/Llama-3.1-Nemotron-70B-Instruct at main: no description found
Home: Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
Weights & Biases: The AI Developer Platform: Weights & Biases is the leading AI developer platform to train and fine-tune models, manage models from experimentation to production, and track and evaluate GenAI applications powered by LLMs.

Unsloth AI (Daniel Han) ▷ #off-topic (15 messages🔥):

Existential crises in school

Unsloth Salute Emote

Making friends in school

Nostalgia for school

Coping with school pressures

Existential crises from school pressure: A member expressed existential crisis from school, prompting others to suggest ways to cope with it.
- One advised, 'endure.. it doesn't last forever,' and another humorously noted, 'Thank God it doesn't,' reflecting common feelings.
Celebrating the Unsloth Salute: A member acknowledged the existence of the unsloth salute emote, expressing their love for it with a cheerful remark.
- The use of emotes like <:slothsalute:1257540362868752396> contributes to the group's playful atmosphere.
Finding Like-Minded Friends in School: One member suggested that making like-minded friends could enhance school experiences, reducing feelings of isolation.
- Another agreed, stating, 'This is true,' emphasizing the importance of social connections.
Nostalgia About School Life: Members discussed the notion that older individuals often wish to return to school for its simplicity and fun.
- This sentiment was shared when one remarked, 'the fun part is .. when you are older you want to go back to school.'

Unsloth AI (Daniel Han) ▷ #help (91 messages🔥🔥):

Fine-tuning with SFTTrainer

Errors in Unsloth functions

Using multiple datasets for training

Quantization errors in model conversion

Using system prompts in training

Fine-tuning model with class weights: A user inquired about the effectiveness of using class weights during fine-tuning and how to implement this in SFTTrainer.
- This indicates an interest in improving model performance by handling class imbalances.
Runtime error during eval dataset usage: A user reported encountering a 'cuDNN Frontend error' when evaluating a dataset with Unsloth, which didn't occur without it.
- They later discovered they needed to upgrade their libraries, which hinted that inconsistencies might arise from version mismatches.
Challenges in model quantization: Users experienced errors while attempting to convert a model to Q8_0 GGUF format, with suggestions to compile llama.cpp for resolution.
- The conversion error highlighted issues related to directory paths and the need for specific build configurations.
Using system prompts in training: A user was confused about how to properly incorporate a system prompt while using the train_on_responses_only method.
- The discussion clarified that system prompts may need different handling to ensure they are effectively integrated into dataset inputs.
Running Ollama with Docker: A user faced difficulties running Ollama on Docker for use with a Python client and sought assistance from others.
- This highlighted common integration challenges when using Docker containers for specific applications.

Links mentioned:

Google Colab,): no description found
Home: Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
GitHub - ggerganov/llama.cpp: LLM inference in C/C++: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
unsloth/unsloth/tokenizer_utils.py at d76eda4f66828d66aa6a1b01a0d03323e43810dd · unslothai/unsloth: Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
llama3.1-finetune-quick-tuto/main.py at main · donglinkang2021/llama3.1-finetune-quick-tuto: Use unsloth to fine-tune llama3.1 on your own dataset. - donglinkang2021/llama3.1-finetune-quick-tuto
iamtarun/python_code_instructions_18k_alpaca · Datasets at Hugging Face: no description found
GitHub - unslothai/unsloth: Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers

Unsloth AI (Daniel Han) ▷ #showcase (3 messages):

Llama 405B Performance

Llama 3.1-405B-Instruct Breakthrough

Llama 405B hits 142 tok/s on Nvidia H200 SXM: The performance of Llama 405B has reached 142 tokens per second on the Nvidia H200 SXM, showcasing significant processing capabilities.
- This achievement highlights the efficiency of the model in handling complex tasks.
Llama 3.1-405B-Instruct surpasses 100 t/s barrier: Llama 3.1-405B-Instruct has broken the 100 tokens per second barrier without any dirty tricks or performance loss, maintaining a full 128k context length.
- The user emphasized that this milestone was the result of months of work, marking a significant advancement in model performance.

Link mentioned: Reddit - Dive into anything: no description found

Unsloth AI (Daniel Han) ▷ #community-collaboration (2 messages):

AI video generators

3D models integration

Camera controls

Consistent environments

Unreal Engine physics

Exploring AI Video Generators with 3D Models: A member is interested in developing AI video generators using 3D models as the fundamental framework, proposing features like camera controls and consistent environments.
- They are particularly interested in integrating Unreal Engine or similar basic physics to enhance the overall experience.
Seeking Insights on AI Video Generator Projects: The same member inquired if anyone has seen or is currently working on similar projects involving AI and video generation.
- This indicates a potential interest in community collaboration on these advanced technologies.

Unsloth AI (Daniel Han) ▷ #research (2 messages):

Dualformer Model

Attention Calculation on CPU

System 1 and System 2 Thinking

Reasoning in Transformers

Attention on CPU? That's New!: A member expressed surprise regarding the computation of attention on CPU, questioning the effectiveness of such an approach.
- The discussion hints at underlying implications for computational efficiency and performance in AI models.
Introducing Dualformer for Enhanced Reasoning: The Dualformer model integrates both fast (System 1) and slow (System 2) reasoning in Transformers, addressing efficiency issues in reasoning tasks.
- It surpasses existing models like Searchformer and Solution-Only, significantly improving performance in maze navigation and math problems through tailored training strategies.
Cognitive Systems in AI: The underlying theory presented in the paper connects human cognition's System 1 and System 2 with Transformer performance, enhancing reasoning capabilities.
- This connection reveals critical insights into the design of AI models for better cognitive mimicry during problem-solving tasks.

Link mentioned: Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces: In human cognition theory, human thinking is governed by two systems: the fast and intuitive System 1 and the slower but more deliberative System 2. Recent studies have shown that incorporating System...

LM Studio ▷ #general (203 messages🔥🔥):

LM Studio Plugins

Headless Mode and Model Loading

Performance Comparisons of GPU vs CPU

ROCm vs CUDA

ChatGPT Generated Webpage for LM Studio

LM Studio Supports Multiple GPUs: Users discussed that LM Studio can utilize multiple GPUs, with one member confirming that their two RTX 3060 Ti GPUs are both recognized by the system.
- However, the application may require specific configurations to work optimally with both GPUs.
Concerns Over Security for Publicly Exposed Servers: A user raised concerns about exposing their LM Studio server to the public internet and received recommendations on using VPNs and proper firewalls.
- It's noted that LM Studio is not ready to be exposed directly to the internet without additional precautions.
Performance Metrics on Different Models: Members shared performance metrics, noting that the 9B model achieves approximately 62 tokens per second on GPU, while the CPU handles around 7.5 tokens per second.
- This highlights the significant difference in performance between GPU and CPU for large language models.
Transition from Version 0.2 to 0.3: Discussions revealed that switching from LM Studio version 0.2.x to 0.3.x does not automatically update, which can cause confusion about available features.
- Users encouraged others to manually update to the newer version for the best experience.
Webpage to View LM Studio Conversations: A user shared a link to a simple webpage they developed for viewing LM Studio conversation JSON files, allowing customization of user display names.
- This tool aims to help users better manage their conversation logs in a more user-friendly interface.

Links mentioned:

Gotta Catch Em All GIF - Gotta Catch Em All - Discover & Share GIFs: Click to view the GIF
RYZEN AI MAX + Is Coming! 40CU iGPU APU, The End Of Mid Range GPU's?: Ryzen AI Max 3 New Models with Powerful AI and GraphicsGet ready for a performance boost! The Ryzen AI Max lineup may contain three exciting models. From the...
Chat Log Viewer for LM Studio 0.3.x - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
GitHub - p4stoboy/AI-Junction: AI Junction exposes a Discord interface for generating text and images using LMStudio (or any OpenAI API interface) and ComfyUI back-ends. The bot allows users to manage their own configurations for both LLM prompting and image generation, as well as generate content in-server using custom prompts.: AI Junction exposes a Discord interface for generating text and images using LMStudio (or any OpenAI API interface) and ComfyUI back-ends. The bot allows users to manage their own configurations fo...
GitHub - ml-explore/mlx: MLX: An array framework for Apple silicon: MLX: An array framework for Apple silicon. Contribute to ml-explore/mlx development by creating an account on GitHub.
lmstudio.js/packages/lms-client/src/embedding/EmbeddingDynamicHandle.ts at 582eeed1536ce937d0d51bf3e5fbbee1698a51e0 · lmstudio-ai/lmstudio.js: LM Studio TypeScript SDK (pre-release public alpha) - lmstudio-ai/lmstudio.js
Run LM Studio as a service (headless) - Advanced | LM Studio Docs: GUI-less operation of LM Studio: run in the background, start on machine login, and load models on demand
lmstudio.js Code Examples - SDK (TypeScript) | LM Studio Docs: Examples of using lmstudio.js in TypeScript applications.

LM Studio ▷ #hardware-discussion (232 messages🔥🔥):

Mixing GPUs performance

NPU utility concerns

Motherboard slot spacing

Apple M3 and future M4 performance

Cost-effective AI and Computer builds

Mixing GPUs performance concerns: Users expressed concerns about the performance of mixed GPU setups, particularly between RTX 3090 and 4090 models, questioning whether to sell the higher-end card for better compatibility.
- The need for a rig capable of running large models without focusing on inference speed led to discussions about optimal configurations.
NPU utility concerns: A user shared disappointment with their NPU, concluding that it lacks software support and functionality, especially in AI tasks compared to regular PC setups.
- There was speculation about how Intel could better leverage NPUs in collaboration with Microsoft to improve their viability in AI workloads.
Motherboard slot spacing: Discussion about motherboard slot spacing highlighted how E-ATX and specialized boards could provide better airflow for multi-GPU setups.
- Suggestions included measuring current distances and researching boards that could accommodate better spacing.
Apple M3 and future M4 performance: The announcement of Apple's M4 sparked debate about its suitability for larger models, with skepticism about its capabilities dominating the conversation.
- Users expressed frustration over minimal memory allotments in new Macs and criticized the high upgrade costs for increased RAM.
Cost-effective AI and Computer builds: Comparisons were made between building custom systems with higher RAM capacities and the costs associated with purchasing Apple's systems.
- Users noted that building a powerful AI-capable machine would be more cost-effective than Apple's offerings, which often lack sufficient hardware for serious workloads.

Links mentioned:

NVIDIA Jetson AGX Orin: Next-level AI performance for next-gen robotics.
GitHub - XiongjieDai/GPU-Benchmarks-on-LLM-Inference: Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?: Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference? - XiongjieDai/GPU-Benchmarks-on-LLM-Inference

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Inflection

Inflection Billing Issue Resolved: Inflection has returned online after resolving the recent billing issue from last week, ensuring uninterrupted access for users.
- For more details, visit the Inflection 3 PI page and Inflection 3 Productivity page.
Inflection Product Offering Clarification: Alongside the billing resolution, Inflection clarified its product offerings, showcasing features aimed at productivity enhancement.
- Details can be explored through their dedicated product links shared previously.

Links mentioned:

Inflection 3 Pi - API, Providers, Stats: Inflection 3 Pi powers Inflection's Pi chatbot, including backstory, emotional intelligence, productivity, and safety. Run Inflection 3 Pi with API
Inflection 3 Productivity - API, Providers, Stats: Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. Run Inflection 3 Productivity with API

OpenRouter (Alex Atallah) ▷ #general (364 messages🔥🔥):

OpenRouter Connectivity Issues

Sonnet Model Performance Changes

Grok 2 Multimodal Release

Prompt Engineering Techniques

Model Parameters and Providers

OpenRouter Connectivity Issues Persist: Users have reported intermittent connectivity issues with OpenRouter, experiencing Cloudflare errors such as 520 and 524, despite the status page showing green.
- Some users suggested testing connectivity through different browsers, and there were indications that issues might be more pronounced from Europe.
Sonnet Model Performance Changes Observed: Several users noted a decline in response quality and an increase in follow-up questions from the Sonnet model, which they found to be more generic than before.
- There are suspicions that the model's behavior may have changed following the restriction of the free version, with users feeling that it is less responsive compared to previous interactions.
Grok 2 Multimodal Capabilities Released: Discussion centered around the recent announcement of Grok 2's multimodal features, which allows it to understand images alongside text.
- Users expressed curiosity about the implications and capabilities of this new model, especially in comparison to existing models.
Effective Prompt Engineering Techniques: Tips for crafting effective prompts were shared, focusing on how to elicit longer responses from models by providing detailed instructions.
- Users shared examples of structured prompts designed to maximize output length and quality, highlighting the importance of specificity.
Understanding Model Parameters and Providers: The conversation included discussions about listing model capabilities and the potential differences across providers concerning output length and response quality.
- Users queried the existence of models with extended token limits and shared information about different providers' capabilities in handling them.

Links mentioned:

Internet Speed Test - Measure Network Performance | Cloudflare: Test your Internet connection. Check your network performance with our Internet speed test. Powered by Cloudflare's global edge network.
Activity | OpenRouter: See how you've been using models on OpenRouter.
Open VLM Leaderboard - a Hugging Face Space by opencompass: no description found
Pixar Draw GIF - Pixar Draw Single - Discover & Share GIFs: Click to view the GIF
Tweet from Niels Rogge (@NielsRogge): A new video LLM by @Meta dropped on the hub, and it's the new SOTA for open-source video understanding > builds on top of SigLIP/DINOv2 and Qwen2/Llama 3.2 > includes a 3B parameter model f...
OpenRouter Status: OpenRouter Incident History
Automated clearing house - Wikipedia: no description found
The killer app of Gemini Pro 1.5 is video: Last week Google introduced Gemini Pro 1.5, an enormous upgrade to their Gemini series of AI models. Gemini Pro 1.5 has a 1,000,000 token context size. This is huge—previously that …
Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...
Tweet from Elon Musk (@elonmusk): Grok now understands images, even explaining the meaning of a joke. This is an early version. It will rapidly improve. https://x.com/i/grok/share/roBaNzwhuOYzfHUuQLo4OWXjO
GitHub - sashabaranov/go-openai: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go - sashabaranov/go-openai

OpenRouter (Alex Atallah) ▷ #beta-feedback (7 messages):

Access to Integrations

Many Users Seek Access to Integrations: Numerous users have expressed their interest in obtaining access to integrations, indicating a strong demand for this feature.
- The requests were polite and consistent, with phrases like 'I would get access to integrations. Thanks' signaling a unified approach.
Polite Appeals for Integration Permissions: Users have been courteously requesting to gain access to integrations, pointing to an engaged community eager for new features.
- Messages repeatedly highlighted the desire for access, with multiple instances of users thanking others for potential assistance.

Latent Space ▷ #ai-general-chat (60 messages🔥🔥):

OpenAI Whisper

Gemini AI Model

Homomorphic Encryption

Moonshine ASR

Moondream Funding

Whisper vs New ASR Models: Whisper is being compared to new ASR technologies like Moonshine, which claims to perform better with less computational demand on edge devices, achieving substantial speed improvements.
- Critics note that while Moonshine may exceed Whisper's smaller models, Whisper's larger models still hold a performance advantage.
Apple's Homomorphic Encryption: Apple's announcement around homomorphic encryption suggests a significant advancement in how private data can be utilized for AI without compromising confidentiality, likened to the 'HTTPS moment for AI'.
- Discussions highlight potential applications such as retrieval and embedding without exposing private information, although the feasibility for inference remains uncertain due to speed concerns.
Moondream Secures Funding: Moondream announced a successful funding round, raising $4.5 million to explore the effectiveness of smaller AI models in competitive AI applications.
- The move has sparked discussions about the limitations of smaller models, with some questioning how far they can progress without encountering known industry challenges.
Chat on Inference Hardware Options: A continuous conversation has been occurring regarding the selection of inference hardware, reflecting concerns about dependence on cloud APIs and exploring alternative solutions like Groq.
- The dialogue seeks insights into how development teams make hardware choices and whether they await broader adoption from major cloud providers.
AI Evaluation and Hallucinations: Amid discussions on AI transcription tools like Whisper, concerns over hallucinations highlight the importance of proper evaluation methods to mitigate inaccuracies in sensitive environments like healthcare.
- Experts emphasize the role of user participation and continuous evaluation to enhance the reliability of AI outputs, especially in critical applications.

Links mentioned:

no title found: no description found
Tweet from Brad (@brad_agi): Fast Apply dropped - an open source cursor like model designed to apply edits to a file
Tweet from Varun (@varun_mathur): This is a game-changer announcement by Apple around cryptography. It is the “HTTPS moment for AI” in some ways.. Here is what this means: your private confidential data can be pooled with other data...
Tweet from VentureBeat (@VentureBeat): Moondream raises $4.5M to prove that smaller AI models can still pack a punch https://venturebeat.com/ai/moondream-raises-4-5m-to-prove-that-smaller-ai-models-can-still-pack-a-punch/
Google plans to announce its next Gemini model soon: December is shaping up to be a month of dueling AI announcements from OpenAI and Google.
Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said: Whisper is a popular transcription tool powered by artificial intelligence, but it has a major flaw. It makes things up that were never said.
The State of AI Infrastructure at Scale 2024: How are fortune 1000 companies handling the growing demands of AI on their infrastructure? Can they move fast enough to deploy Gen AI but at the same time keep that AI on a tight leash to deliver fant...
Zight Recording 2024-10-25 at 08.23.03 PM: no description found
Tweet from Tom Dörr (@tom_doerr): Claims to beat Whisper + live transcription
The future of AI needs more flexible GPU capacity: Why Modal is obsessed with serverless AI infrastructure
A Comparison of Open Source LLM Frameworks for Pipelining: Optimize your LLM projects with the best open source LLM frameworks, Python libraries, and orchestration tools.
Tweet from Paul Calcraft (@paul_cal): LLMs play pictionary! Quoting Simon Willison (@simonw) Playing with a new LLM benchmark: how good are they at drawing an SVG of a pelican on a bicycle? Here's the difference between Claude 3.5...
James Cameron: Special Video Message at the SCSP AI+Robotics Summit: Three-time Academy Award-winning director James Cameron gives special remarks at the AI+ Robotics Summit.The Special Competitive Studies Project’s AI+Summit ...
How NotebookLM Was Made?): Raiza Martin and Usama Bin Shafqat are the lead PM and AI engineer behind the NotebookLM feature flag that gave us the first viral AI voice experience, the “...
Tweet from vik (@vikhyatk): i started a company... Quoting VentureBeat (@VentureBeat) Moondream raises $4.5M to prove that smaller AI models can still pack a punch https://venturebeat.com/ai/moondream-raises-4-5m-to-prove-tha...
GitHub - protectai/vulnhuntr: Zero shot vulnerability discovery using LLMs: Zero shot vulnerability discovery using LLMs. Contribute to protectai/vulnhuntr development by creating an account on GitHub.
GitHub - Marker-Inc-Korea/AutoRAG: AutoML tool for RAG: AutoML tool for RAG. Contribute to Marker-Inc-Korea/AutoRAG development by creating an account on GitHub.
GitHub - usefulsensors/moonshine: Fast and accurate automatic speech recognition (ASR) for edge devices: Fast and accurate automatic speech recognition (ASR) for edge devices - usefulsensors/moonshine
no title found: no description found
Moonshine: Speech Recognition for Live Transcription and Voice Commands: This paper introduces Moonshine, a family of speech recognition models optimized for live transcription and voice command processing. Moonshine is based on an encoder-decoder transformer architecture ...
Introducing Moonshine, the new state of the art for speech to text: Can you imagine using a keyboard where it took a key press two seconds to show up on screen? That’s the typical latency for most voice interfaces, so it’s no wonder they’ve failed…
GitHub - n8n-io/n8n: Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.: Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services. - n8n-io/n8n
n8n.io - a powerful workflow automation tool: n8n is a free and source-available workflow automation tool
Best apps & software integrations | n8n: Optimize your workflows with these top software integrations. Seamlessly move and transform data between different apps with n8n.
LLM platform for enterprise AI teams | deepset: Check out deepset Cloud, our platform to build faster with LLMs. Enterprise NLP accelerated.
AI Infrastructure Alliance: The AI Infrastructure Alliance brings together the best AI/ML infrastructure companies in the world. Together we're creating the Canonical Stack for MLOps.

Latent Space ▷ #ai-in-action-club (260 messages🔥🔥):

Cursor Pro Tips

Audio Issues in Discord

LLM Integration Concerns

Markdown File Generation Challenges

Upcoming Event Suggestions

Cursor Pro Tips for Enhanced Coding Workflow: Participants discussed various Cursor Pro Tips, emphasizing workflows like using ctrl+k for localized edits and multifile high-level edits for coding efficiency.
- Suggestions were made for a follow-up session to explore even more tips, as only a fraction of useful content was covered.
Audio Issues Affecting Meeting Quality: Multiple attendees reported experiencing audio issues and drops during the session, affecting their ability to follow along.
- Concerns about Discord's performance were raised, suggesting that server-related problems may have contributed to the disruptions.
Challenges with LLM Usage in Workplaces: Discussion emerged around employers' security concerns that restrict the use of LLMs like Cursor and Claude due to code privacy issues.
- Participants shared potential workarounds, including using AWS Bedrock to proxy interactions while keeping code secure.
Markdown Generation Issues in Cursor: A participant mentioned difficulties with generating markdown files directly in Cursor, opting instead for a quick setup using Claude.
- Markdown formatting needs improvement, and engagement with Cursor developers for potential enhancements was suggested.
Suggestions for Future Events and Collaborations: Suggestions were made for planning additional NYC tech events and discussions around Cursor protips in a series format.
- There was a collective interest in exploring LLM tool integrations and how they improve coding outputs, indicating a rich area for future discussion.

Links mentioned:

no title found: no description found
Hp Harry Potter GIF - Hp Harry Potter Snape - Discover & Share GIFs: Click to view the GIF
YOLO11 🚀 NEW: Discover YOLO11, the latest advancement in state-of-the-art object detection, offering unmatched accuracy and efficiency for diverse computer vision tasks.
Trolling Is An Honored Profession Leland Townsend GIF - Trolling Is An Honored Profession Leland Townsend Evil - Discover & Share GIFs: Click to view the GIF
yikes’s Substack | Substack: My personal Substack. Click to read yikes’s Substack, a Substack publication. Launched a year ago.
RDoc Documentation: no description found
RDoc Documentation: no description found
GitHub - twilwa/crawler.nvim: uses firecrawl, jina, and/or jsondr to render webpages in neovim buffers: uses firecrawl, jina, and/or jsondr to render webpages in neovim buffers - twilwa/crawler.nvim
GitHub - sigoden/aichat: All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI tools & agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.: All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI tools & agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more. - sigoden/aichat

Perplexity AI ▷ #announcements (1 messages):

Curators Program

Discover Feed

Join the Curators Revolution!: The Perplexity Team announced the launch of the Curators Program to develop engaging content for the Discover feed.
- Interested individuals can apply or tag friends at this link to join the initiative.
Calling all Creative Enthusiasts!: The program invites those who enjoy activities like making Pinterest boards, editing Wikipedia pages, and diving into YouTube video essays to participate.
- Curators will play a key role in inspiring, surprising, and informing a global audience of millions directly through the product.

Perplexity AI ▷ #general (212 messages🔥🔥):

MacOS app experiences

Perplexity AI new features

Upcoming AI models

User concerns about LLMs

Language options in Perplexity

Mixed reviews on MacOS app functionality: Users reported issues with the Perplexity MacOS app crashing and difficulties with pop-ups, raising concerns about usability and performance.
- Some users expressed frustration over limitations on copy-pasting images compared to the web app and a need for better feedback options.
New features stir excitement and skepticism: The introduction of shopping features was unexpected and has led to mixed feelings among users, with calls for these features to be compartmentalized for better user experience.
- There is ongoing speculation about the strategic reasons behind new developments, with users eager to see their impacts on everyday use.
Anticipation for next-gen AI models: Industry chatter hints at the potential release of GPT-5 by December 2024, reflecting the competitive landscape among AI developers.
- Meta's move to create its own search engine also highlights the evolving dynamics, with users interested in how these advancements will influence functionality.
User concerns about LLM performance: Some users noted discrepancies in model responses, particularly with Claude Sonnet and GPT-4, leading to frustrations about reaching answer limits.
- Concerns were raised about the reliability of model outputs, particularly when handling translations or complex queries.
Language support issues observed: Language switching issues caused confusion as multiple users reported their Perplexity app defaulting to unintended languages despite settings.
- Feedback indicated a desire for clearer solutions to language options within the app to avoid hindrances in user experience.

Links mentioned:

no title found: no description found
Google plans to announce its next Gemini model soon: December is shaping up to be a month of dueling AI announcements from OpenAI and Google.
Search: no description found
Yuki Kaguya Shinomiya GIF - Yuki Kaguya Shinomiya - Discover & Share GIFs: Click to view the GIF
Tweet from Aravind Srinivas (@AravSrinivas): @laqd99 Soon
Chost Machine GIF - Chost Machine Ai - Discover & Share GIFs: Click to view the GIF
Tweet from Aravind Srinivas (@AravSrinivas): We’ll build and debug together the future of how people ask questions and get their answers and solve their problems
Tweet from Tibor Blaho (@btibor91): Meta is reportedly building a web search engine led by engineering manager Xueyuan Su for at least 8 months to break free from using Google Search and Bing data feeds, with some websites like NY Times...
no title found: no description found

Perplexity AI ▷ #sharing (21 messages🔥):

800-Year-Old Well Man

Haunted Houses

Carbon Capture Technology

Web3

Culinary Health

Exploring the Tale of the 800-Year-Old Well Man: A member shared a link related to the 800-Year-Old Well Man, shedding light on historical nuances.
- No further details were provided, but the topic seems intriguing to those interested in history.
Haunted Houses of NE Revealed: Several members discussed haunted houses in NE, sharing a link that lists various locations notorious for their eerie reputations.
- The link could serve as a guide for thrill-seekers or those intrigued by the paranormal.
Carbon Capture: A New Frontier: A discussion emerged around a technology report on how powder can capture carbon, potentially influencing environmental efforts.
- Members expressed curiosity about the implications of this technology on climate change strategies.
Understanding Web3: Members seemed eager to understand Web3, sharing resources that provide insights into its implications on the digital landscape.
- This topic could appeal to those wanting to explore emerging web technologies.
Healthiest Cooking Oils Explored: One member posted a query about the healthiest cooking oils, indicating an interest in nutrition and cooking practices.
- Such discussions highlight a growing awareness of health-conscious culinary choices.

Link mentioned: YouTube: no description found

Perplexity AI ▷ #pplx-api (11 messages🔥):

Getting sources from the API

Requesting Perplexity API results

Access to citations closed beta

Communication on request status

Accessing sources from the Perplexity API: Members discussed how to get sources for results from the Perplexity API, with one linking to a Discord message that may contain helpful information.
- A user confirmed they seek to replicate results similar to those in the Perplexity chat, questioning which model achieves that goal.
Understanding Perplexity API model differences: In response to questions about which model returns similar results to the Perplexity chat, a member directed attention to a FAQ link explaining discrepancies.
- This discussion stemmed from a user's inquiry about obtaining matching results from the API.
Questions on citations closed beta access: A user expressed urgency in obtaining access to the citations closed beta for their project, indicating it was critically important.
- After checking the pinned message for information, the user noted a lack of communication regarding their request status.
Concerns about request timelines: In follow-up messages, the same user highlighted frustrations over not knowing where they stand in the request process, whether it's a waitlist, or the timeline involved.
- Despite requesting access, they felt uncertain about communication from the team regarding their query.

Links mentioned:

no title found: no description found
no title found): no description found

Nous Research AI ▷ #general (206 messages🔥🔥):

Ultrasonic Sound Device Sales

Logistic Growth Curve Analysis

AI Distillation Techniques

AI Technical Newsletters

DisTrO GitHub Contributions

Sales Growth of Ultrasonic Sound Device: A developer's ultrasonic device for keeping mice away saw sales rise from 15% to 28% in the target group, assuming a logistic growth curve.
- There was confusion regarding A and b values in the logistic function calculation, which was resolved by redefining time variables in context.
Understanding AI Distillation Processes: Discussion around distilled models highlighted Arcee's Llama-3.1 model, which uses logits from a larger model to train a smaller version efficiently, capitalizing on 'grokked' features.
- Concerns were raised about the lack of detailed technical documentation from Meta regarding their distillation processes, sparking further discussion on their training pipelines.
Interest in AI Technical Newsletters: A user sought recommendations for AI newsletters targeted at technical audiences rather than consumer-focused hype, expressing dissatisfaction with the current offerings.
- Users resonated with the sentiment around hype in AI news, noting that sometimes the 'hype' covers developments that were previously known.
Potential Contributions to DisTrO Project: A member inquired about opportunities to contribute to the DisTrO GitHub, expressing interest in helping out with the project.
- The channel engaged positively with the inquiry, indicating a collaborative spirit within the community.
Limitations of AI in Coding: Concerns were shared regarding the performance of certain AI models in coding tasks, specifically noting that they may require additional training on code-related data.
- Discussion acknowledged the prevalent issue of AI tools falling short in technical capabilities, aligning with broader community concerns.

Links mentioned:

conversation_1719549125_scenario_backrooms-sonnet-opus.txt • infinite backrooms: a conversation between two ais:
Claude Artifact: Try out Artifacts created by Claude users
Open LLM Leaderboard 2 - a Hugging Face Space by open-llm-leaderboard: no description found
Arcee-SuperNova: Training Pipeline and Model Composition: We trained Arcee SuperNova-70B and Arcee SuperNova-8B to be a generally intelligent Llama-3.1-405B derivatives using intelligent distillation, novel post-training, and model merging techniques.
Tweet from TDM (e/λ) (@cto_junior): Puts to rest all theories about Sonnet being smarter due to feature steering
Bi-Weekly vLLM Office Hours - Neural Magic: Learn about vLLM, deep dive into exciting topics, ask questions, and engage with the vLLM community.
Tweet from Gabriel Elbling (@gabeElbling): Amazing graphical representation of a neural net, never seen anything like it.
Not A Tunnell Its Dimming GIF - Not A Tunnell Its Dimming Dark Tunnel - Discover & Share GIFs: Click to view the GIF
no title found: no description found
From Black Holes Entropy to Consciousness: The Dimensions of the Brain Connectome: The provided text is an excerpt from a scientific article exploring the connection between consciousness, the brain's connectome, and the concepts of spaceti...
[Distributed w/ TorchTitan] Introducing Async Tensor Parallelism in PyTorch: with Horace He, Less Wright, Luca Wehrstedt, Tianyu Liu, Wanchao Liang TL;DR We implemented experimental async tensor parallelism support in PyTorch. We integrated it in TorchTitan and observed: Up...
Tweet from Jiaxin Pei (@jiaxin_pei): It's common to add personas in system prompts, assuming this can help LLMs. However, through analyzing 162 roles x 4 LLMs x 2410 questions, we show that adding a persona mostly has *no* statistica...
meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 · Hugging Face: no description found
Current team tasks • QubesOS: Current team tasks

Nous Research AI ▷ #ask-about-llms (13 messages🔥):

Hermes 3 SFT Dataset

Training and Inference for Nous Models

Runpod and Modal Performance

DRY Sampler Implementation

Hermes 3 SFT dataset not open source: Members confirmed that the Hermes 3 SFT dataset is not open source, unlike previous versions, including Hermes 1 and 2.
- However, a link to the OpenHermes-2.5 dataset on Hugging Face was shared as a related resource.
Best platforms for fine-tuned Nous models: A member suggested Runpod for training and inference, mentioning a need for better real-time inference due to high cold start times.
- They also expressed interest in testing performance on Fireworks and noted that Runpod devex has limitations in function calling.
Modal's cold start times impact Nous models: Concerns were raised about Modal's cold start times, which some members felt hindered the performance of Nous models.
- A member expressed a wish for Groq to support custom fine-tune deployments as a potential solution.
DRY sampler integration needed for bots: A member pointed out a recent GitHub pull request for integrating a DRY sampler, indicating it could benefit bot performance.
- They noted that implementation awaits inclusion in vllm or similar tools used for inference with Hermes 3.

Links mentioned:

teknium/OpenHermes-2.5 · Datasets at Hugging Face: no description found
added implementation of DRY sampler (post-refactor) by wwoodsTM · Pull Request #9702 · ggerganov/llama.cpp: I have read the contributing guidelines Self-reported review complexity: Low Medium High This is a continuation of the previous PR #6839 to implement DRY sampling (and I think a pretty faith...

Nous Research AI ▷ #research-papers (4 messages):

Thinking LLMs

Medical AI Developments

Dualformer Integration

Thought Preference Optimization

Medical AI Podcast

Thinking LLMs propose effective instruction-following: The paper Thinking LLMs: General Instruction Following With Thought Generation introduces Thought Preference Optimization (TPO), targeting Chain-of-Thought-style prompting to enhance instruction-finetuned LLMs.
- Results indicate that TPO significantly boosts performance while also revealing that employing a thought prompt without appropriate finetuning can worsen outcomes.
Medical AI Podcast delivers latest breakthroughs: The highlighted podcast episode discusses recent advancements in medical AI, focusing on topics like deepfake detection and explainable AI.
- View the full episode titled Latest Medical AI Breakthroughs to explore detailed insights on new research and technologies.
Introducing Dualformer for enhanced reasoning capabilities: The Dualformer model integrates fast and slow reasoning modes by utilizing randomized reasoning traces during training, improving efficiency and performance.
- In tests, Dualformer achieves up to 97.6% success on maze navigation tasks in slow mode, outperforming existing baselines while reducing computation.
Medical LLM achieving comprehensive advancements: Recent developments include various medical models like BioMistral-NLU for vocab understanding and ONCOPILOT for CT tumor models.
- These innovations demonstrate the expanding role of AI in healthcare and promise new capabilities in clinical analysis.
Discussion on Thought Preference Optimization: In the conversation about TPO, a community member reflected on its efficacy compared to OpenAI's model, emphasizing the role of thinking in instruction-following.
- The discussion highlights the importance of understanding model prompts to maintain response accuracy in LLM applications.

Links mentioned:

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces: In human cognition theory, human thinking is governed by two systems: the fast and intuitive System 1 and the slower but more deliberative System 2. Recent studies have shown that incorporating System...
Tweet from Sebastian Raschka (@rasbt): I just read the "Thinking LLMs: General Instruction Following With Thought Generation" paper (I), which offers a simple yet effective way to improve the response quality of instruction-finetun...
🤖 Latest Medical AI Breakthroughs | Weekly Research Review Oct 19-26 (Part 1/2): This week in medical AI research, we explore groundbreaking work in deepfake detection, explainable AI (XAI), LLM applications, and multimodal foundation mod...

Nous Research AI ▷ #interesting-links (4 messages):

Ferret-UI release

Homomorphic Encryption announcement

Apple's AI Cloud security initiative

Apple launches Ferret-UI for iOS: Two weeks ago, @Apple released weights for Ferret-UI, a new multimodal LLM tailored for iPhone/iOS screens, simplifying integration with @HuggingFace transformers for users.
- Ferret-UI excels at mobile UI screen understanding with capabilities like icon recognition and text locating, outpacing even GPT-4V.
Apple's game-changer in cryptography: According to @varun_mathur, Apple has announced a revolutionary approach to homomorphic encryption which protects private data while allowing it to be used for improving user experience anonymously.
- This technology allows private data to remain encrypted end-to-end, facilitating a non-zero-sum game where individual user data can contribute to a broader UX.
Big bounty for AI Cloud security: Ahead of the launch of Apple's Private AI Cloud, the company announced it will pay researchers up to $1 million to identify vulnerabilities in its security.
- The announcement includes bounties for reporting exploits capable of executing malicious code, with rewards reaching $250,000 for reporting sensitive data vulnerabilities.

Links mentioned:

Tweet from Jade Choghari (@jadechoghari): 2 weeks ago @Apple released weights for Ferret-UI -> a new Multimodal LLM made specifically for iPhone/IOS screens !! I worked on the HF integration - now available in @HuggingFace transformers - ...
Apple will pay security researchers up to $1 million to hack its private AI cloud | TechCrunch: Ahead of the debut of Apple's private AI cloud next week, dubbed Private Cloud Compute, the technology giant says it will pay security researchers up to
Tweet from Varun (@varun_mathur): This is a game-changer announcement by Apple around cryptography. It is the “HTTPS moment for AI” in some ways.. Here is what this means: your private confidential data can be pooled with other data...

Nous Research AI ▷ #research-papers (4 messages):

Thought Preference Optimization

Medical AI Podcast

Dualformer Model

Medical LLM Applications

AI in Healthcare Ethics

Thought Preference Optimization enhances LLM responses: The paper on 'Thinking LLMs: General Instruction Following With Thought Generation' proposes Thought Preference Optimization (TPO), incorporating Chain-of-Thought prompting in LLM training to improve instruction response quality.
- Results show a performance boost of 4% with TPO when applied to the Llama 3 8B Instruct model, with prior finetuning yielding significant gains.
Weekly Medical AI insights available: The weekly Medical AI podcast discusses groundbreaking advancements, including deepfake detection and explainable AI, helping listeners stay updated efficiently.
- Featured models include BioMistral-NLU for medical vocab understanding and applications such as ONCOPILOT for CT tumor analysis.
Introducing Dualformer for flexible reasoning: The Dualformer model integrates fast and slow reasoning modes, significantly enhancing LLM reasoning capabilities while reducing computational costs.
- In tests, Dualformer achieved 97.6% success on maze navigation tasks in slow mode, outperforming the previous baseline while using 45.5% fewer reasoning steps.
Diverse Medical LLM developments: Recent advancements include models like the Bilingual Multimodal LLM designed for biomedical tasks and the Metabolic-Enhanced LLMs aimed at clinical analysis.
- The developments emphasize applications across various areas of medicine, including personalized health initiatives and evaluation methodologies.
Exploring ethical considerations in AI healthcare: Discussions around AI in healthcare ethics highlight the importance of bias analysis and reflection-aware clinical agents to ensure fair and accurate AI usage.
- Projects like Healthcare XAI Through Storytelling aim to improve understanding and trust in AI applications among medical professionals.

Links mentioned:

Tweet from Sebastian Raschka (@rasbt): I just read the "Thinking LLMs: General Instruction Following With Thought Generation" paper (I), which offers a simple yet effective way to improve the response quality of instruction-finetun...
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces: In human cognition theory, human thinking is governed by two systems: the fast and intuitive System 1 and the slower but more deliberative System 2. Recent studies have shown that incorporating System...
🤖 Latest Medical AI Breakthroughs | Weekly Research Review Oct 19-26 (Part 1/2): This week in medical AI research, we explore groundbreaking work in deepfake detection, explainable AI (XAI), LLM applications, and multimodal foundation mod...

Eleuther ▷ #general (99 messages🔥🔥):

Training LLMs on Limited Resources

Contributing to Open Source AI Projects

Optimizers in Machine Learning

Running Models on Multiple GPUs

Community Support for AI Frameworks

Training LLMs on Limited Resources: Members discussed the challenges of training large language models (LLMs) like LLaMA-2 or Pythia on limited GPU resources, highlighting that many papers require extensive hardware for effective reproduction.
- One member shared their experience deploying nanoGPT as a lightweight model for easier training, suggesting it for newcomers despite its simplicity.
Contributing to Open Source AI Projects: A user expressed interest in getting involved with EleutherAI's work despite having mostly proprietary tech experience, asking for advice on contributing to projects.
- Responses indicated that open-source contributions, especially for smaller projects, can provide invaluable learning experiences and help in transitioning from a software engineering background.
Optimizers in Machine Learning: A discussion on the shampoo optimizer highlighted its lack of momentum compared to other optimizers like Adam, while questioning its implementation in various libraries.
- Members pointed to papers providing updated insights into the shampoo optimizer, leading to a deeper exploration of its functionalities.
Running Models on Multiple GPUs: Concerns were raised about the complexities of using frameworks like GPT-NeoX for training on multiple GPUs, especially regarding the efficiency of CUDA communications and project dependencies.
- Suggestions included using simpler frameworks like nanoGPT for small-scale experiments before moving on to more complicated codebases.
Community Support for AI Frameworks: Participants noted the importance of community and documentation when working with different AI codebases, especially when navigating system requirements and dependency management.
- One user mentioned the frustration with abandoned projects in the PyTorch Lightning space and advocated for clearer, simplified code structures to facilitate learning.

Links mentioned:

SymNoise: Advancing Language Model Fine-tuning with Symmetric Noise: In this paper, we introduce a novel fine-tuning technique for language models, which involves incorporating symmetric noise into the embedding process. This method aims to enhance the model's func...
EleutherAI/gpt-neox - Sourcegraph: no description found
Some testing from me · Issue #407 · pytorch/torchtitan: I tried torchtitan. Here's a grab bag of issues. My setup is CoreWeave-provided PyTorch nightly image, on a CW-hosted HGX in slurm. PyTorch and torchtitan builds are nightlies from a few days ago....
GitHub - facebookresearch/lingua: Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.: Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs. - facebookresearch/lingua
GitHub - tysam-code/hlb-gpt: Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to larger models with one parameter change (feature currently in alpha).: Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 secon...
GitHub - tomogwen/LitGPT: Minimal GPT Implementation in PyTorch Lightning: Minimal GPT Implementation in PyTorch Lightning. Contribute to tomogwen/LitGPT development by creating an account on GitHub.
Megatron-LM/megatron/core/distributed/distributed_data_parallel.py at main · NVIDIA/Megatron-LM: Ongoing research training transformer models at scale - NVIDIA/Megatron-LM
pythia/models/14M/pythia-14m.yml at main · EleutherAI/pythia: The hub for EleutherAI's work on interpretability and learning dynamics - EleutherAI/pythia
GitHub - tomogwen/LitGPT: Minimal GPT Implementation in PyTorch Lightning: Minimal GPT Implementation in PyTorch Lightning. Contribute to tomogwen/LitGPT development by creating an account on GitHub.
GitHub - google-research/google-research at 2a1b8acddba7ed1bcf308427586564322a66225a: Google Research. Contribute to google-research/google-research development by creating an account on GitHub.
google-research/scalable_shampoo/optax/distributed_shampoo.py at 2a1b8acddba7ed1bcf308427586564322a66225a · google-research/google-research: Google Research. Contribute to google-research/google-research development by creating an account on GitHub.

Eleuther ▷ #research (89 messages🔥🔥):

Stick-Breaking Attention Mechanism

Differential Transformers

Counting in Transformers

Latent Collaborative Recommendations

Universal Transformers Scaling

Innovative Stick-Breaking Attention Proposed: A new attention mechanism based on the stick-breaking process aims to improve the self-attention of Transformers, addressing issues related to positional embeddings and softmax limitations, as detailed in the arXiv paper.
- Community feedback suggested a need for clearer introductions for such mechanisms, while links to related projects like IBM's ModuleFormer were shared.
Discussion on Differential Transformers: Questions arose about the use of GroupNorm vs. RMSNorm in the Differential Transformer paper, pointing to potential confusion in methodology.
- Community members clarified the terminology and the implications of the normalization approach in relation to each attention head.
Exploring Counting Capabilities in Transformers: A recent paper on Transformers highlights architectural limitations that affect reasoning depth, specifically in counting tasks, raising questions about general-purpose LLM capabilities as noted in the arXiv discussion.
- Community interest in previously established work and similarities in tokenization approaches for reasoning tasks was expressed, indicating ongoing exploration in the field.
Optimization Strategies for LC-Rec: A user sought community input on optimizing the Latent Collaborative Recommendations (LC-Rec) approach, focusing on enhancing performance and methodological tweaks for their master's project.
- Suggestions included referencing baseline papers and exploring optimizations reflective of community standards.
Updates on Universal Transformers Scaling: A member reported on their efforts to scale Universal Transformers, expressing challenges faced while improving their performance amidst compute limitations.
- The conversation shaped around the potential of achieving favorable results if scaling issues could be resolved in future research endeavors.

Links mentioned:

Value Residual Learning For Alleviating Attention Concentration In Transformers: Transformers can capture long-range dependencies using self-attention, allowing tokens to attend to all others directly. However, stacking multiple attention layers leads to attention concentration. O...
Democratic Representations: Minimization of the $\ell_{\infty}$ (or maximum) norm subject to a constraint that imposes consistency to an underdetermined system of linear equations finds use in a large number of practical applica...
when trees fall...: no description found
Counting Ability of Large Language Models and Impact of Tokenization: Transformers, the backbone of modern large language models (LLMs), face inherent architectural limitations that impede their reasoning capabilities. Unlike recurrent networks, Transformers lack recurr...
Stick-breaking Attention: The self-attention mechanism traditionally relies on the softmax operator, necessitating positional embeddings like RoPE, or position biases to account for token order. But current methods using still...
Dirichlet process - Wikipedia: no description found
GitHub - IBM/ModuleFormer: ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.: ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Languag...
Sparse Universal Transformer: The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers. Empirical evidence shows that UTs have better compositional generalization than Vanilla Transfo...

Eleuther ▷ #interpretability-general (4 messages):

Mech Interp Research Critique

Definition of Feature in SAEs

Outdated Task List

Apollo Research Project Ideas

Future Projects in MI

Mech Interp Research Critique: A member shared a project proposal criticizing mech interpretation research for lacking scientific integrity among small groups and large AI labs, proposing improvements.
- We should do better was a key takeaway from their critique.
Feature Definition in SAEs debated: There was a question raised about the accuracy of defining a 'feature' as 'a column of numbers in the matrix that consistently has a large dot product when certain text is inputted/outputted'.
- This sparked discussion regarding the nuances and interpretations of feature definitions in the context of SAEs.
Outdated Task List Concerns: A newcomer noted that the unclaimed tasks section in the Notion doc may be outdated and asked for confirmation on recent updates.
- They expressed concern that Neel's list of 200 problems also seems a bit dated.
Apollo Research's New Project Ideas: A member shared a list of 45 mech interp project ideas generated by the Apollo Research team to identify new potential projects.
- They noted the previous lists are outdated and highlighted the team's computational constraints.
Seeking Involvement in Future MI Projects: Another member expressed their desire to pivot into mechanistic interpretability research and sought information on ongoing or upcoming projects they can join.
- They requested others to react to their message for direct communication regarding potential project opportunities.

Links mentioned:

Tweet from Alice Rigg (@woog09): I wrote this project proposal on Sunday. TL;DR I shit on mech interp research saying both small groups and big AI labs lack scientific integrity, we should do better, and I suggest some ways of doing ...
A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team — AI Alignment Forum: Why we made this list: • * The interpretability team at Apollo Research wrapped up a few projects recently[1]. In order to decide what we’d work on…

Eleuther ▷ #lm-thunderdome (1 messages):

Eval Tasks Limit

Accuracy Measurement

Determining Effective Eval Tasks Limit: A member inquired about an appropriate --limit value that effectively measures accuracy without executing all evaluation tasks, considering that full evaluations are time-consuming.
- They suggested --limit 100 as a potential value, seeking insights on whether this would suffice for faithful and accurate measurement.
Concerns on Evaluating with Limited Tasks: Another discussion highlighted concerns regarding the use of limited evaluation tasks to gauge model performance as it may not yield comprehensive results.
- Members debated the impact of using lower limits on the accuracy of the evaluations in the context of model reliability.

Eleuther ▷ #gpt-neox-dev (10 messages🔥):

GPT-NeoX Colab Notebooks

Distributed GPU Training for LLMs

Python 3.10 Compatibility

Llama3.x YAML Configs

GPT-NeoX Colab Notebooks in Development: A user aims to create simple examples of GPT-NeoX in Colab notebooks, focusing on small GPT models and Python data for code completion. They shared specifications and are seeking feedback from the community, including links to the specifications on GitHub.
- Two links to notebooks detailing tasks were provided: notebook1 and notebook2.
Concern About Distributed GPU Training: A user asked if there's a setup for sharing consumer GPUs for training a large language model with GPT-NeoX. A response indicated difficulties due to networking issues and suggested checking out INTELLECT-1 for similar efforts.
- The link to INTELLECT-1 was shared, highlighting some ongoing work in this area: PrimeIntellect.
Python 3.10 Compatibility Fixes: Setting up GPT-NeoX with Python 3.10 requires overriding the Torch version to 1.11.0 to avoid import failures. One user shared a Colab notebook documenting the installation issue and how it was resolved after applying the fix.
- They noticed possible compatibility issues with later versions of Torch, suggesting that torch 2.3 might be acceptable while 2.4 causes failures.
Patching DeepSpeed for Python Compatibility: A discussion revealed issues related to patches in deepspeed/elasticity/elastic_agent.py that affect Python 3.10 support on GPT-NeoX. Users shared links to related GitHub pull requests addressing these compatibility concerns.
- A specific PR related to this issue can be found here: PR #62, showing ongoing work to improve compatibility.
YAML Configs for Llama3.x Models: A user inquired about adding YAML configs for smaller instruct models like Llama3.1/3.2 to the GPT-NeoX repository. They also asked if it's possible to automatically generate these configs from HuggingFace files, mentioning contributions from another user.
- The inquiry included a reference to a commit that established existing config files for Llama2: commit link.

Links mentioned:

INTELLECT-1 | Prime Intellect | Decentralized Training of a 10B Model: The first decentralized training run of a 10-billion-parameter model, inviting anyone to contribute compute and participate.
GPT-NeoX-Colab/notebooks/neox_failing_python3.10.ipynb at main · markNZed/GPT-NeoX-Colab: Example Colab notebooks for GPT-NeoX. Contribute to markNZed/GPT-NeoX-Colab development by creating an account on GitHub.
GPT-NeoX-Colab/notebook1task.md at main · markNZed/GPT-NeoX-Colab: Example Colab notebooks for GPT-NeoX. Contribute to markNZed/GPT-NeoX-Colab development by creating an account on GitHub.
GPT-NeoX-Colab/notebook2task.md at main · markNZed/GPT-NeoX-Colab: Example Colab notebooks for GPT-NeoX. Contribute to markNZed/GPT-NeoX-Colab development by creating an account on GitHub.
Python 3.10 support by markNZed · Pull Request #1313 · EleutherAI/gpt-neox: In this issue Python 3.10 support was added #1122
[BE] minor logging cleanup in distributed (#122921) · pytorch/pytorch@b6201a6): Summary: Minor logging cleanup in distributed library 1. Don't use "f" formatted strings - address linter issues. 2. Nits: Make use of unused e (error) in a...
Merge upstream torch-compatibility changes for elastic_agent by jacobthebanana · Pull Request #62 · EleutherAI/DeeperSpeed): Pull in the following changes from DeepSpeed upstream. microsoft@a4cd550 microsoft@6a4b96a Related discussion: message in EleutherAI channel Tested for torch==2.5.0.

Stability.ai (Stable Diffusion) ▷ #general-chat (197 messages🔥🔥):

Stable Diffusion 3.5 usage

Custom model deployment on Runpod

Local generation with AMD GPU

Sketch to render in architectural design

Discord bot for Flux inpainting

Stable Diffusion 3.5 performance concerns: Users shared mixed experiences with Stable Diffusion 3.5, questioning its speed and quality compared to earlier versions like 1.5.
- It was suggested to run the same prompt across different models to compare results effectively.
Deploying custom models on Runpod: A user inquired about deploying a custom Stable Diffusion model (Juggernaut) on Runpod and discussed the lack of Forge templates.
- Others pointed out that using simpler interfaces like Auto1111 might be more straightforward.
Local generation with AMD GPUs: Discussion included the viability of local generation using an AMD GPU, with advice to follow pinned guides for success.
- Participants shared their experiences with VRAM capacities and recommended testing various models like Gemma 2.
Sketch to render for architectural design: A user expressed interest in utilizing Stable Diffusion for architectural design, focusing on a 'sketch to render' workflow.
- Suggestions included using tools like ControlNet for maintaining detail and accuracy in transformations.
Creating a Discord bot for Flux inpainting: Developers discussed the creation of a Discord bot capable of inpainting in Flux but mentioned limited availability of such models online.
- One participant showed intent to implement functionality that supports inpainting within the community tools.

Links mentioned:

Stable Diffusion 3.5 (Large): What You Need to Know: Stable Diffusion 3.5 (Large) and (Turbo) released. Compare with Flux—master fast AI generation with low VRAM tips & workflows. Tutorial.
Reddit - Dive into anything: no description found
Tweet from AK (@_akhaliq): x.infer So, a new computer vision model just dropped last night. It's called GPT-54o-mini-vision-pro-max-xxxl. It's a super cool model, open-source, open-weights, open-data, all the good stuf...
AnythingLLM | The all-in-one AI application for everyone: AnythingLLM is the AI application you've been seeking. Use any LLM to chat with your documents, enhance your productivity, and run the latest state-of-the-art LLMs completely privately with no technic...
GitHub - open-webui/open-webui: User-friendly AI Interface (Supports Ollama, OpenAI API, ...): User-friendly AI Interface (Supports Ollama, OpenAI API, ...) - open-webui/open-webui
GitHub - anapnoe/stable-diffusion-webui-ux-forge: Stable Diffusion web UI UX Forge: Stable Diffusion web UI UX Forge. Contribute to anapnoe/stable-diffusion-webui-ux-forge development by creating an account on GitHub.
ComfyUI Workflows - Developer Community | OpenArt: Discovery, share and run thousands of ComfyUI Workflows on OpenArt.

aider (Paul Gauthier) ▷ #general (127 messages🔥🔥):

Aider vs PearAI Discussion

Claude 1022 Experience

Homebrew vs Pipx Installation

Benchmarking Sonnet 3.5

Privacy with Local Models

Aider and PearAI Feature Comparisons: Members discussed the similarities between PearAI and Aider, noting that PearAI seems to mimic features of both Continue.dev and Aider, particularly in its integration with open-source tools.
- Concerns were raised about the ethical implications of such overlap, with references to the Open Source Pledge asking businesses to contribute to the development of open source tools.
Using Claude 1022 with Aider: A user shared a positive experience utilizing Claude 1022 alongside Aider for developing a complex Flutter application, claiming a significant productivity boost.
- They spent 15 hours and $18 in credits while achieving 4300 lines of code generated solely through prompting.
Best Installation Method for Aider: Discussion on the drawbacks of using Homebrew for Aider installation emerged, noting potential conflicts with dependency requirements.
- Many advocated for using Pipx instead, which keeps Aider's dependencies isolated for better compatibility.
Requests for Benchmarking Files: Users expressed a need for benchmark data files specifically for Sonnet 3.5's code edit and refactoring benchmarks to avoid running expensive tests themselves.
- One user specifically requested the .aider.chat.history.md and .aider.results.json files.
Privacy Concerns with Aider: A user inquired about data privacy regarding the use of local models in Aider, fearing risks associated with sensitive information.
- Another member confirmed that Aider does not store user data when local models are used, ensuring privacy.

Links mentioned:

Introducing PearAI Creator (Beta) — Powered By aider: PearAI Creator can build apps, fix your bugs, and implement new features for you — all automatically. Learn how to use this powerful new feature powered by aider.
Install with pipx: aider is AI pair programming in your terminal
Home: aider is AI pair programming in your terminal
SF ads call out tech firms for not paying for open source: Puts Chief Tightwad Officers on notice
GitHub - caseymcc/aider-code: vscode extension for aider: vscode extension for aider. Contribute to caseymcc/aider-code development by creating an account on GitHub.
GitHub - caseymcc/aider at commandio: aider is AI pair programming in your terminal. Contribute to caseymcc/aider development by creating an account on GitHub.

aider (Paul Gauthier) ▷ #questions-and-tips (60 messages🔥🔥):

Nvidia Nemotron setup

Unit Testing with Aider

VSCode Extensions for Aider

Gemini Model Performance

Fine-tuning Models and Context Issues

Setting Up Nvidia Nemotron with Aider: A user is struggling to configure Nvidia Nemotron on Aider, seeking guidance on settings and exec commands, particularly regarding custom model metadata.
- Another member mentioned that if it's connecting, model warnings can be ignored and encouraged reviewing the docs to handle custom metadata.
Enhancing Unit Testing Coverage: A user reported issues with Aider generating unit tests that fail to compile, asking for suggestions on improving coverage and reliability.
- Responses included discussions on specific prompts and experiences with unit testing using Aider, with some suggesting ways to improve isolation of tests.
VSCode Extensions for Aider Efficiency: A user proposed creating a VSCode extension that allows easy transfer of text editor content to Aider, potentially with buttons for different command formats.
- The idea was positively received, highlighting the need for efficient workflows when integrating with Aider.
Performance of Gemini Models for Editing: Members discussed whether Gemini models are effective for editing tasks, with some noting performance advantages of Gemini Flash over Claude 3.5.
- Concerns were raised about inconsistencies when using various models, emphasizing the need for better document creation commands in Aider.
Fine-tuning Models and Context Limitations: A user inquired about using Aider with fine-tuned models, particularly on AngelScript, expressing frustration with error handling and hallucinated functions.
- Another user shared detailed output issues encountered with models, including errors related to API rate limits and resource exhaustion while using Aider.

Links mentioned:

File editing problems: aider is AI pair programming in your terminal
Models: 'google)' | OpenRouter): Browse models on OpenRouter
Release v1.44.7 · BerriAI/litellm: What's Changed [Docs] use litellm sdk with litellm proxy server by @ishaan-jaff in #5367 [Feat] Add support for fine tuned vertexai models by @ishaan-jaff in #5371 [Refactor] Refactor cohere prov...
GitHub - CEDARScript/cedarscript-grammar: A SQL-like language for efficient code analysis and transformations): A SQL-like language for efficient code analysis and transformations - CEDARScript/cedarscript-grammar
GitHub - CEDARScript/cedarscript-grammar: A SQL-like language for efficient code analysis and transformations: A SQL-like language for efficient code analysis and transformations - CEDARScript/cedarscript-grammar

Modular (Mojo 🔥) ▷ #general (20 messages🔥):

Mojo documentation contributions

Learning programming languages

Mojo vs Python for ML

C++ learning resources

Community engagement and humor

Mojo API Documentation Needs Examples: A discussion highlighted the lack of examples for Collections in the Mojo API documentation, leading to suggestions to contribute to the docs via GitHub.
- Members emphasized the importance of community engagement and preparing pull requests as a step towards improving documentation.
Choosing Between Mojo and C++ for Learning: A user contemplating learning Mojo or C++ received advice that Mojo, being a modern systems language, might be better suited for their explorations, especially in fields like ML and data science.
- Community members shared insights on language choices with recommendations on focusing on Rust or building libraries in Mojo.
Rust's Role in Safety Critical Applications: Discussions underscored that Rust is gaining traction in domains needing safety-critical certifications, appealing to those transitioning from C++.
- Users debated the utility of Rust in various programming contexts, especially for safety and correctness in systems-level programming.
Community Humor with Code Revelations: The channel enjoyed a light-hearted moment with humorous code comments from a user, referencing issues in a function implementation with a playful tone.
- Comments like these foster a sense of community engagement and entertain while discussing programming challenges.
Exploring Mojo's Potential in Medical Research: A user expressed a keen interest in learning Mojo for its applicability in fields like data science and ML, intertwined with medical research aspirations.
- The dialogue indicated a positive outlook on Mojo as a viable option for users in specialized fields, despite its early-stage development.

Link mentioned: GitHub - modularml/mojo: The Mojo Programming Language: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #mojo (151 messages🔥🔥):

Mojo language features

InlineArray slicing

Protobuf plugin experience

Zig language comparisons

Argument handling in functions

Slicing InlineArray with Span: A user discussed slicing an InlineArray in Mojo, expressing difficulties in deducing parameters, particularly with the integration of Span which could not deduce field parameters correctly.
- The solution proposed included utilizing as_span to simplify the process and handle slices effectively.
Inline Assembly and Protobuf: There was a discussion around the implementation of inline assembly in Mojo and whether Protobuf's complexity would justify writing a custom plugin for it.
- One user highlighted the challenge of zero-copy deserialization with Protobuf and suggested considering ASN.1 as an alternative for better debugging.
Zig Language Features: Users expressed interest in Zig's type system, specifically in how it handles types, comptime values, and unions for interface semantics.
- The conversation touched on how Zig's eager code generation could lead to large amounts of code and the potential drawbacks of this approach in application libraries.
Comptime Argument Handling: Users discussed the challenges of handling arguments that can be comptime or runtime, particularly in functions requiring optimization for specific cases.
- Solutions included the use of switch statements combined with comptime checks to manage various argument scenarios effectively.
Unions and Interfaces: Concerns were raised regarding the complexity of using unions as interfaces in programming, especially regarding the need to instantiate unions for passing values.
- The discussion concluded with a desire for simpler syntax and functionality in working with unions to improve usability.

Links mentioned:

Physical Address Extension - Wikipedia: no description found
mojo/stdlib/src/sys/_assembly.mojo at nightly · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
GitHub - mzaks/flatbuffers-mojo: Flatbuffers lib in Mojo: Flatbuffers lib in Mojo. Contribute to mzaks/flatbuffers-mojo development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #max (2 messages):

Mutable Tensors

Nightly Builds

Mutable Tensors Set to Enhance Training Objects: Current nightly builds are introducing mutable tensors, enabling the representation of training objects such as trained weights and KVCaches.
- This feature is still under development from an API perspective but is coming together nicely and expected to be included in the next release.
Community Excitement for New Features: A community member expressed enthusiasm for the introduction of mutable tensors, indicating positive reception of the new feature.
- Fantastic was the noted sentiment, reflecting an eagerness for the upcoming advancements.

GPU MODE ▷ #general (13 messages🔥):

High Performance Mixed Precision Computing

CUDA Performance Issues on H100

Llama 3.2 Inference Discrepancies

Unsloth Kernels Guide

Parallel Function Calls in CUDA

High Performance Mixed Precision Computing Talk: A talk on high performance mixed precision computing is set to start in 5 minutes, as mentioned in the channel.
- Community members showed excitement with reactions like ⚡🤌👍.
Profiling CUDA Inference Code: A user reported experiencing 'Command Buffer Full' issues on H100 during profiling with nsight systems, which was not seen on A100.
- They are seeking guidance on potential solutions or if they should ask in another channel.
Discrepancies in Llama 3.2 Inference: One user questioned whether slight differences in model outputs when running L3.2 inference on MLX and Torch could indicate an error in implementation.
- Another member suggested that evaluating in FP16 or BF16 could contribute to these discrepancies compared to FP32.
Seeking Guidance on Unsloth Kernels: A user inquired about obtaining a guide for unsloth kernels and shared a link to the GitHub project.
- The project claims to fine-tune Llama 3.2, Mistral, Phi, and Gemma LLMs 2-5x faster with 80% less memory.
Parallelizing Function Calls using CUDA: A beginner posed a challenge in creating a parallel function for arbitrary methods using CUDA or Numba/CuPy.
- They expressed a need for advice on automatic parallelization of function calls on multiple inputs.

Link mentioned: GitHub - unslothai/unsloth: Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth

GPU MODE ▷ #triton (15 messages🔥):

FA3 Performance Insights

Triton MXFP4 Support

GPU Performance Discrepancies

Triton Debugging Strategies

Triton Hardware Support

FA3 shows impressive kernel tricks: FA3 reportedly employs numerous tricks to achieve SOTA performance despite some register spill, which makes it quite hacky.
- One member humorously noted, 'must say FA3 has so many tricks' with real model workloads contributing to its performance.
Triton set to support MXFP4 x FP16 matmul: Official support for (emulated) MXFP4 x FP16 matmul in Triton is expected soon, aiming to enhance performance.
- A member advised using lower-level languages than Triton for higher performance ceilings.
GPU performance varies notably: Some members pointed out that tl.load with repeated indices is slower on GPUs like A100/H100 but runs fine on others like 2080 Ti and 3090.
- One user inquired about debugging this issue and suspected it relates specifically to tl.load performance.
Debugging strategies for Triton calls: Discussion around the right debugging approach for slower tl.load indicated that TMA does not support interleaved indices and hence is not viable for A100/H100.
- It was noted that using contiguous blocks can further degrade performance if not handled correctly.
Exploring Triton hardware support: There was a query regarding which hardware Triton supports beyond GPUs and Intel NPUs, mentioning AWS AI chips as a potential but uncertain candidate.
- The performance of Triton's compiler on these chips was also brought into question, highlighting the need for comparative analysis.

Link mentioned: Poor performance on Ampere vs. Ada with bitpacked weights · Issue #4906 · triton-lang/triton: I am writing a library to perform different low-bit matmul kernels in Triton/CUDA. The Triton kernels work great on Ada gpus like the 4090 RTX and the A6000 Ada - on par with Marlin on large matric...

GPU MODE ▷ #torch (10 messages🔥):

Torch Compile Performance

CUDA Graphs and Triton Layers

GEMM Optimization on Different GPUs

Custom Operations Impact on Performance

Max Autotune vs Reduce Overhead

Torch Compile slows down models with Triton layers: A user noted that their simple MLP model with Triton layers runs much slower with torch.compile, specifically when using reduce-overhead mode due to the extra copy required for CUDA graphs.
- They found using max-autotune-no-cudagraphs significantly improved performance.
Custom Operations hinder performance: Another user observed that wrapping Triton kernels in custom_op negatively affects performance with reduce-overhead mode, while other operations like tinygemm perform well even when wrapped.
- They noted that without custom_op, they faced limitations using torch.compile in version 2.4.1.
GEMM optimization varies by GPU architecture: A user shared that torch.compile optimizes GEMM differently on GPUs based on their SM count, with performance improvements noted on SM > 80 devices compared to those < 80.
- They referenced a related discussion about how this architectural decision impacts performance, particularly for the NVIDIA A10G with 72 SMs.
Manual CUDA Graphs boost performance: One member revealed they resolved their Triton layer performance issues by switching to max-autotune-no-cudagraphs and incorporating manual CUDA Graphs for faster execution.
- For instance, they achieved 186 tokens/sec end-to-end decoding speed using Llama2-7B on the RTX 4090, surpassing the 173 tokens/sec with reduce-overhead.
GEMM restrictions based on version: Discussion arose regarding versioning in PyTorch, with users pointing out that previous versions limited GEMM optimization based on SM counts.
- It's suggested that the behavior may have changed as early as version 2.3 or before.

Link mentioned: torch_compile.webm: no description found

GPU MODE ▷ #cool-links (5 messages):

Tune Llama3 on AMD MI300x

Advancements in Contrastive Loss Techniques

Epilogue Visitor Tree in GEMM

FP8 Training Framework by NVIDIA

Tune Llama3 Journey on AMD MI300x: A detailed article shares insights on tuning the Llama3 405B model using AMD MI300x hardware.
- The article outlines practical steps and challenges faced during the tuning process.
Innovations in Contrastive Loss Training: A new approach to contrastive loss proposes a tile-based computation strategy that avoids full materialization of the similarity matrix, optimizing GPU resources.
- This method scales batch sizes effectively, showcasing benefits when implementing concepts like online softmax and cross-GPU tiling.
Exploring Epilogue Visitor Tree in GEMM: A supplemental article dives into the Epilogue Visitor Tree (EVT) for efficient post-processing in GEMM-based computations for NVIDIA GPUs.
- It explains how to integrate EVT into a CUTLASS GEMM kernel, enhancing performance and facilitating complex processes such as elementwise activations.
NVIDIA's FP8 Training Recipe Unveiled: NVIDIA's recent paper introduces a new FP8 training framework called COAT, aimed at optimizing memory usage during large model training.
- Key features include Dynamic Range Expansion and Mixed-Granularity Activation Quantization, which significantly enhance training efficiency.

Links mentioned:

Tune Llama3 405B on AMD MI300x (our journey) - Felafax Blog - Obsidian Publish: Tune Llama3 405B on AMD MI300x (our journey) - Felafax Blog - Powered by Obsidian Publish.
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss: Contrastive loss is a powerful approach for representation learning, where larger batch sizes enhance performance by providing more negative samples to better distinguish between similar and dissimila...
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training: FP8 training has emerged as a promising method for improving training efficiency. Existing frameworks accelerate training by applying FP8 computation to linear layers while leaving optimizer states an...
Epilogue Fusion in CUTLASS with Epilogue Visitor Trees: Welcome to a supplemental article for our tutorial series on GEMM (GEneral Matrix Multiplication). Posts in the main series (1, 2) have discussed performant implementations of GEMM on NVIDIA GPUs b…

GPU MODE ▷ #jobs (1 messages):

ML / Research Engineer Job Search

AI Engineering Skills

Public Learning via Blogging

ML / Research Engineer seeks full-time position: A member announced their search for a full-time position in ML, highlighting expertise in AI engineering, data, evals, finetuning, and deployment.
- They expressed flexibility for roles beyond their primary focus and requested feedback or job referrals.
Expertise in NLP and STT: The member detailed their primary experience in NLP (including LLMs, RAG, and NER) and STT (including ASR and Speech Translation).
- They aim to apply this expertise in their job search, contributing significant knowledge in these specialized areas.
Learning in public through blogging: The member promotes their personal blog on Substack, sharing insights on their learning journey in AI engineering.
- They launched the blog 10 months ago to engage with a broader community and document their experiences.

Link mentioned: Amgad’s Substack | Amgad Hasan | Substack: My personal Substack. Click to read Amgad’s Substack, by Amgad Hasan, a Substack publication. Launched 10 months ago.

GPU MODE ▷ #beginner (26 messages🔥):

QAT Framework for VITs

GTX 780 with PyTorch

Learning Resources for Beginners

Getting Started with Triton

CUDA Learning Recommendations

Seeking Configurable QAT Framework for VITs: A user inquired about easily configurable QAT frameworks for VITs, seeking repositories that allow direct changes to quantizers.
- No specific recommendations were provided, but assistance was encouraged.
GTX 780 is a Classic Challenge for PyTorch: A user experienced compatibility issues using a GTX 780 with their PyTorch installation, citing CUDA compatibility concerns.
- An experienced user commented that while the toolkit supports it, the torch kernels might not be optimized for such an old architecture due to its 3GB VRAM limitation.
Need for Beginner-Friendly Learning Resources: One user expressed a need for easier introductory resources instead of complex topics like profiling in available YouTube lectures.
- Recommendations included a book titled PMPP and an interest in beginner tutorials available in the channel.
Triton Tutorials for Beginners: Another user received guidance to explore official Triton tutorials to aid their learning process in building CNNs.
- They were directed towards interactive resources such as Triton-Puzzles for practical learning.
CUDA Learning Options: A user questioned whether to invest in a mobile NVIDIA GPU laptop or a desktop for learning CUDA, considering a budget of $3.5k.
- Responses suggested leveraging platforms like Colab and Kaggle that provide free access to powerful GPUs, making hardware purchases less immediate.

Links mentioned:

Course Detail | NVIDIA: no description found
Start Locally: Start Locally
Tutorials — Triton documentation: no description found
GitHub - srush/Triton-Puzzles: Puzzles for learning Triton: Puzzles for learning Triton. Contribute to srush/Triton-Puzzles development by creating an account on GitHub.

GPU MODE ▷ #pmpp-book (4 messages):

Race Conditions in GPU Programming

Independent Thread Scheduling Changes

Memory Optimization Concerns

Race Conditions Can Still Occur: Misusing __sync threads in GPU programming can lead to race conditions, raising concerns from members about potential issues.
- One member emphasized that without proper synchronization, the compiler might incorrectly optimize memory operations, leading to wrong results.
Volta/Turing Thread Coordination: Discussions revealed that with the introduction of Volta/Turing architectures (sm_70/75), threads in a warp are no longer guaranteed to run in lockstep.
- This change allows for cheaper warp synchronization instructions to be used instead of traditional barriers, providing greater flexibility for programmers.
Memory Optimization Avoids Conflicts: A member noted that omitting __sync instructions can lead to the compiler making incorrect assumptions about caching, potentially affecting memory read/store operations.
- Such optimizations could inadvertently lead to incorrect outcomes if synchronization is not properly handled.

GPU MODE ▷ #youtube-recordings (1 messages):

mr.osophy: L33: BitBLAS

GPU MODE ▷ #torchao (8 messages🔥):

Quantization Techniques

Model Training Challenges

Dequantization Strategies

LLM.int8 Refactor

LoRA Usage

Option 3: Just Don't Merge: A member suggested adding 'Option 3: don't merge' after successfully training HQQ+ models with dequantize(W) + sAB without issues.
- They proposed creating a fused kernel to avoid separate dequantization and LoRA runs, which could benefit GemLite.
Gibberish Outputs in Experiments: Concerns were raised about gibberish outputs from experiments, specifically with the dequantization step impacting outliers.
- Suggestions included adjusting parameters and using LoRAs separately in vLLM instead of merging to avoid these issues.
Sayak's Findings on FLUX: A member recalled Sayak's Twitter posts suggesting improved results for FLUX, although the specifics were uncertain.
- There's ongoing discussion to resolve this through collaborations, particularly focused on the LLM.int8 refactor.
Improving Quantization Processes: Various options were discussed for improving quantization processes, including methods to handle weights without compromising performance.
- Among these options, utilizing cuda streams was mentioned as a practical solution without the need for a fused kernel.
Experiences with LoRA Usage: Members discussed using LoRAs separately within models like vLLM, but recognized this might resemble merging into original weights.
- Maintaining the integrity of weight quantization during this process is crucial to prevent degradation of model performance.

GPU MODE ▷ #off-topic (2 messages):

Gear toxicity

Interesting YouTube videos

Your Gear is Poisoning You! video: A member shared a link to the YouTube video titled Your Gear is Poisoning You! (Not Clickbait), which discusses the potential dangers of everyday gear.
- They described the video as really interesting, highlighting their investment of time and money into its production.
Discussion on Video Content: The video sparked a conversation about the impact of gear on health, with members expressing curiosity about the claims made.
- One viewer remarked, “It really makes you think about what you use daily!”

Link mentioned: Your Gear is Poisoning You! (Not Clickbait): Thank you for watching this video. I spent a bunch of time and money making this video possible. The Sponsor I originally had helping me fund this project un...

GPU MODE ▷ #irl-meetup (2 messages):

Toronto Meetup Series

NVIDIA GPUs

Collaboration in AI

Exciting Toronto Meetup Series on Compute: A member announced plans to organize a meetup series in Toronto focused on compute, specifically discussing NVIDIA GPUs and CUDA programming.
- The first meetup is tentatively scheduled for November 15th, with hopes to explore other HPC companies like Tenstorrent and Cerebras in future sessions.
Call for Speakers in Toronto Meetup: The organizer is seeking speakers for the Toronto meetup series and encourages anyone interested to reach out.
- They aim to foster collaboration and unify talent among the various AI meetups already in the city.
Community Support for the Meetup Initiative: Another member expressed support for the upcoming meetup series, stating, *

GPU MODE ▷ #llmdotc (9 messages🔥):

CUDA Installation Issues

Cache Modifiers

CUDA Version Compatibility

Troubleshooting Techniques

Struggles with CUDA Compilation: A user faced errors regarding __ldcs and __stcs function overloads after reinstalling Ubuntu and CUDA, mentioning issues with floatX type.
- They ran a test with train_gpt2.cu but were unclear about the origins of the errors.
Hints on Problem Diagnosis: Another user suggested that the error may stem from unexpected types being passed to the functions and recommended looking into Cache Modifiers.
- They expressed surprise that the code didn't work unchanged for the user.
CUDA Version Suggestion: A member pointed out that the problem might be related to the CUDA version and suggested upgrading to a higher version, such as 12.4.
- The original user confirmed their CUDA version was 12.2 and resolved to try the newer version.
Resolution and Installation Saga: After upgrading CUDA, the user reported success and shared a humorous account of their chaotic installation process, which included breaking and fixing CUDA multiple times.
- They detailed mishaps involving partitions and applications, ultimately leading to a working status but needing to import bookmarks and other minor adjustments.

GPU MODE ▷ #rocm (9 messages🔥):

AMD consumer GPUs and ROCm support

Driver stability and kernel differences

Performance comparison with PyTorch

CK Flash Attention Backend PR

AMD consumer GPUs supported by ROCm: Currently, only gfx11 is supported, and issues like driver instabilities and segfaults seem to persist.
- One member noted that after transitioning to Ubuntu 24.04 and removing one GPU, their system has been stable for several months.
Kernel differences impact performance drastically: A user highlighted that using torch SDPA yields around 50 iter/s on a 4090 but only 15 iter/s on the XTX for forward pass, indicating a major performance disparity.
- It was suggested that backward pass performance could be even worse based on user experience.
Fix for MES overruns: An issue causing servers to freeze due to MES overruns has been recently addressed, although it took about 2-3 years post-launch.
- A workaround was shared, recommending the use of an upstream version of the kernel on Arch Linux instead of Ubuntu's older kernel.
CK Flash Attention Backend PR discussion: A member pointed to a recent GitHub pull request that focuses on updating the CK gemm backend for ROCm.
- This PR replaces an earlier one, suggesting ongoing improvements in ROCm integration within PyTorch, with multiple contributors tagged.

Link mentioned: [ROCm] CK Flash Attention Backend by alugorey · Pull Request #138947 · pytorch/pytorch: Replaces ROCm#1592 Updated implementatioon of CK gemm backend. Can close previous PR cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero7...

GPU MODE ▷ #webgpu (1 messages):

marksaroufim: https://docs.pygfx.org/stable/index.html

GPU MODE ▷ #liger-kernel (6 messages):

Nanogpt model training

Torch Compile Usage

Batch Normalization Implementation

Optimizing Nanogpt Training with Triton: Members discussed that Nanogpt model training can accelerate with optimized triton ops if only eager PyTorch is used.
- However, support is currently limited to HF compatible models like Llama and Qwen, requiring modifications for custom Nanogpt models.
Enhancements with Torch Compile: It was noted that using torch.compile could fuse RMS well, but it struggles with chunking cross-entropy loss, which requires a different implementation.
- Members were advised to compile the model itself and use custom implementations for the loss functions.
Discussion on Functions for Speedup: A member inquired about specific functions like RMS Norm or cross-entropy loss that could speed up training when using torch compile.
- Responses indicated familiarity with the functions in the liger-kernel, suggesting testing for performance boosts.
Batch Norm Added to Liger-Kernel: A recent Pull Request (#321) was highlighted, which introduced batch normalization to the Liger Kernel.
- This update was tested against Keras's batch norm using a 4090 setup, ensuring accuracy and performance compliance.

Link mentioned: added batch norm by vulkomilev · Pull Request #321 · linkedin/Liger-Kernel: Summary Аdded batchNorm Testing Done I have compared it against Keras's batch norm.I have used 4090 Hardware Type: [ X] run make test to ensure correctness [X ] run make checkstyle to ensure...

GPU MODE ▷ #self-promotion (1 messages):

geri8904: is there a meeting invite?

GPU MODE ▷ #diffusion (2 messages):

Stable Diffusion

Cross Attention Maps

Attention Map Tools

Exploring Cross Attention Maps in Stable Diffusion: A member is conducting a research project using Stable Diffusion models and seeks guidance on capturing Cross Attention maps in the Unet to understand word-image interactions.
- They specifically asked for code implementations to assist with their work on this challenging task.
Resource Found for Attention Maps: After some time, the member shared that they found a helpful resource for Cross Attention maps on GitHub.
- This repository provides tools for utilizing attention maps within huggingface/diffusers, which should aid similar inquiries.

Link mentioned: GitHub - wooyeolBaek/attention-map: 🚀 Cross attention map tools for huggingface/diffusers: 🚀 Cross attention map tools for huggingface/diffusers - wooyeolBaek/attention-map

GPU MODE ▷ #🍿 (8 messages🔥):

Discord Cluster Manager

CPU Execution with AVX and NEON

Development Sprint Timeline

Discord Cluster Manager Document Drafted: A member shared a document detailing how the Discord cluster manager will function, planning to actively work on it from November 3 to November 10.
- Others expressed willingness to contribute and join the development efforts once their schedules allowed.
Exploring CPU Execution with AVX and NEON: A member indicated plans to mirror progress on CPU execution using AVX and NEON instructions, acknowledging it won't merge with the main project immediately.
- They aim to have this alternative available should the team decide to expand in that direction later.
Development Team Availability Shift: One member, currently busy with midterms, plans to join the development sprint after November 10.
- Others mentioned they would have more free time soon and would start looking into contributing around November 1.

Link mentioned: Discord Cluster Manager: Our code will be here https://github.com/gpu-mode/discord-cluster-manager User experience Start on Nov 4 and be feature complete by at most Nov 10. For this work we only need a single node. Claud a...

Cohere ▷ #discussions (53 messages🔥):

Connector Usage in Cohere

Playground Performance Issues

Algorithmic Trading Insights

Struggle with Connector Queries in Cohere: A user reported difficulties retrieving data through the Cohere connector, indicating consistent issues with messages like 'I was unable to find any information' while querying a specific user ID.
- Another member recommended emailing support at [email protected] for further assistance with account issues.
Playground's Lagging Performance: Discussion arose over the lag issues in the playground, especially after several messages, with some members agreeing that it affects interaction experience.
- One suggested starting a fresh chat or clearing cache to improve performance, noting that lag could be linked to device limitations and context window overload.
Insights on Algorithmic Trading: Members shared experiences in algorithmic trading, with one discussing their weekend work on analyzing AI sentiment relative to trading, stressing the nuances of media bias.
- Another pointed out that the impact of human perspectives on market movements is minimal, suggesting that major trading insights are better sourced from comprehensive platforms like EDGAR.

Cohere ▷ #questions (33 messages🔥):

Cohere community server

Using connectors in Cohere

Reranker models for multimodal data

News generation applications

Cohere API models and limits

Cohere Community Server Access: A user inquired about accessing the Cohere For AI community server, and another member shared a link to the application page for joining.
- This was followed by information about Cohere's research lab aimed at addressing complex machine learning problems.
Configuring Connectors with Cohere: A user sought help on how to use a connector for querying the Cohere chat endpoint and received a link to documentation for setup.
- It was noted that connectors must be set up with the v1 API, as they are unsupported in v2.
Interest in Multimodal Reranker Models: A member asked whether there are reranker models that support multimodal data, specifically both text and images, leading to a response noting none exist yet.
- The conversation indicated active interest in model capabilities and user feedback on this topic.
Choosing Models for News Generation: When questioned about which model to choose for generating longer articles, it was highlighted that Command R is faster and cheaper, while Command R+ offers better performance.
- Users were advised to experiment with the Free API before upgrading to avoid hitting rate limits.
Understanding API Token Limits: Discussion revealed that the Command R model can generate responses up to 128k tokens, including input and output, with some users reporting responses averaging 3,000 to 4,000 characters.
- Users were informed that the max_tokens parameter is regulated by context window limits, and effective prompts can extend the response length.

Links mentioned:

Retrieval Augmented Generation (RAG) — Cohere: Generate text with external data and inline citations using Retrieval Augmented Generation and Cohere's Chat API.
Tokens and Tokenizers — Cohere: This document describes how to use the tokenize and detokenize API endpoints.
Creating and Deploying a Connector (v1 API) — Cohere: Learn how to implement a connector, from setup to deployment, to enable grounded generations with Cohere's Chat API.
Cohere For AI (C4AI): Cohere For AI (C4AI) is Cohere's non-profit research lab that seeks to solve complex machine learning problems.
Command R+ — Cohere: Command R+ is Cohere's model for conversational interaction and long-context tasks, best suited for complex RAG workflows and multi-step tool use.
The Command R Model — Cohere: Command R is a conversational model that excels in language tasks and supports multiple languages.
Login | Cohere: Login for access to advanced Large Language Models and NLP tools through one easy-to-use API.
Login | Cohere: Login for access to advanced Large Language Models and NLP tools through one easy-to-use API.
Pricing: Access our models directly through our API to create scalable production workloads.

Cohere ▷ #api-discussions (17 messages🔥):

Weekend Timeout Issues

API Timeout Reporting

Intermittent Timeout Errors

Returning Weekend Timeout Problems: Members noted that the weekend timeout issues have re-emerged, with logs indicating a persistent read operation timed out error on October 26th and 27th.
- This seems to be an intermittent issue that predominantly occurs on weekends, as reported.
Inquiries on Timeout Frequency: Members inquired about the frequency and start time of the reported timeouts, establishing that these issues seem to start early on weekends.
- One member expressed gratitude for users flagging this problem, ensuring proper escalation.
Engagement on Timeout Solutions: A member offered to investigate the account details to provide further insight on timeout occurrences.
- Another member conveyed appreciation for addressing these concerns, highlighting the community's collaborative efforts.
Detailed Logs of Timeout Events: Recent logs shared by users reveal consecutive timeout errors that took place at specific UTC timestamps on October 28th.
- Although the user's local error was resolved, the timeouts have continued, as per the logs.
Community Support in Reporting Errors: Members exhibited support for each other while reporting these timeout issues to internal teams for resolution.
- Recognition of prompt reporting by users was expressed, reinforcing the importance of teamwork in troubleshooting.

OpenAI ▷ #ai-discussions (80 messages🔥🔥):

AI research grants

Customization and personalization in AI

Limitations of LLMs

Using multiple LLMs in AI applications

Agent interactions in AI

Exploring AI research grants experience: A member inquired about experiences related to applying for grants for AI research, seeking insight from others who may have received funding.
- This reflects an increasing interest in funding opportunities within the AI research community.
Challenges in AI customization: Concerns were raised regarding how ChatGPT sometimes ignores or counters customization commands, leading to unexpected outputs.
- Participants discussed experiences where their guidance wasn't followed as intended, prompting questions about AI reasoning and intent.
Understanding LLMs and their limits: It was acknowledged that LLMs excel in generating language but struggle with math, prompting suggestions for using Python tools for calculations.
- Participants emphasized the need for step-wise guidance when interacting with LLMs to improve their functionality.
Utilizing multiple LLMs for AI solutions: A user shared insights from their experience building an AI app, noting that a single LLM isn't sufficient for all tasks.
- The conversation highlighted the practicality of prompt chaining and agentic workflows for enhanced results.
Anthropomorphism of AI behavior: Discussions included the tendency to anthropomorphize AI as it mimics human conversation and decision-making.
- Members cautioned against viewing LLMs as possessing intentions or reasoning like humans, emphasizing their fundamental nature as machines.

OpenAI ▷ #gpt-4-discussions (10 messages🔥):

Evolving Prompt Engineering

Custom AI Languages

AI Interoperability in Language Definition

Managing AI Models

Using APIs for Specific AI Calls

Prompt Engineering as a Programming Language: A member pondered whether prompt engineering could evolve into a form of 'programming language' for AI, enabling modular and reusable interactions across various models.
- Another added, 'anything the model can do, we can get the model to do' when communicating clearly.
Creating New Languages for AI Interaction: Members discussed the potential to 'create' new languages for AI interaction, similar to the various programming languages used today.
- One member mentioned that just as with programming, we can employ strategies to effectively communicate our intents to AI models.
Interoperability and Defining New Languages: A user claimed to have successfully taught AI to define a never-translated language, indicating that interoperability can extend language definition beyond traditional boundaries.
- They hinted at undefined dimensions where AI might be successful in defining its required languages.
Challenges with Model Customization: A member expressed frustration that their models ignore customization prompts and name themselves independently.
- Another user inquired about limiting API calls to specific models to enhance efficiency, reflecting on similar frustrations.
Image Segmentation with GPT Vision: One user sought assistance with cutting images into panels using GPT Vision, struggling with inaccurate coordinates and restrictions on GPU usage.
- They specifically aimed to avoid cutting characters or panels in half, looking for effective solutions.

OpenAI ▷ #prompt-engineering (2 messages):

AI Consistency

Challenging AI Learning

AI Consistency is a Myth: A member highlighted that AI is not consistent at all, emphasizing the inherent unpredictability in its behavior.
- First thing you have to learn in the AI field is acknowledging this inconsistency.
Fun in AI Challenges: Another member expressed that working with AI can be a fun challenge indeed, pointing out the engaging nature of AI-related tasks.
- This perspective showcases the balance between enjoyment and complexity when dealing with AI technologies.

OpenAI ▷ #api-discussions (2 messages):

AI consistency

Challenge in AI understanding

AI's Inconsistent Nature Revealed: A member pointed out that the first thing to learn about AI is its inherent lack of consistency.
- This raises an interesting challenge for anyone trying to deeply understand or work with AI models.
A Fun Challenge in AI!: Another member remarked that engaging with AI presents a fun challenge indeed.
- This highlights the playful yet insightful aspects of exploring AI capabilities and behaviors.

Interconnects (Nathan Lambert) ▷ #news (82 messages🔥🔥):

AI Race between OpenAI and Google

Meta's New Search Engine Development

User Adoption Rates of Generative AI

Issues with Gemini Model Releases

Challenges Facing Major AI Companies

OpenAI and Google in a December Showdown: OpenAI aims for a December launch of its next AI model while Google is also working on releasing its Gemini 2.0, intensifying competition in the AI space.
- While OpenAI's rollout is phased, Google seeks a wide release, although performance expectations might not be fully met.
Meta Builds Its Own Search Engine: Meta is developing a new web search engine under engineering manager Xueyuan Su to minimize reliance on Google and Bing data feeds, with some news sites blocking their crawler.
- This project aims to provide more independent AI solutions for Meta's platforms, avoiding another Apple-like situation.
Generative AI Adoption is Slow: A recent paper claims that while 40% of US adults engage with generative AI, only 0.5% – 3.5% of work hours actually involve its assistance.
- The adoption rate is much slower than expected, revealing a disparity between usage and the anticipated impact on productivity.
Concerns Over Gemini's Releases: The release of Gemini models has faced criticism for declining performance compared to previous versions and issues in marketing to consumers.
- The launch has been deemed one of the most botched releases, with significant regressions affecting user experience.
Dysfunction in Major AI Companies: Members discussed how larger companies like Microsoft and Google are perceived as not performing up to their potential despite vast resources.
- Concerns were raised about inefficiencies, product release delays, and the pressure to remain competitive in a fast-evolving market.

Links mentioned:

Tweet from Tibor Blaho (@btibor91): Meta is reportedly building a web search engine led by engineering manager Xueyuan Su for at least 8 months to break free from using Google Search and Bing data feeds, with some websites like NY Times...
Google plans to announce its next Gemini model soon: December is shaping up to be a month of dueling AI announcements from OpenAI and Google.
Tweet from Arvind Narayanan (@random_walker): Here's an AI hype case study. The paper "The Rapid Adoption of Generative AI" has been making the rounds based on the claim that 40% of US adults are using generative AI. But that includes...
Tweet from The Information (@theinformation): Exclusive: Google is preparing ‘Project Jarvis’—an AI program that takes over computers to help with everyday web tasks. https://www.theinformation.com/articles/google-preps-ai-that-takes-over-comput...
OpenAI CFO Says AI Isn't Experimental Anymore: OpenAI CFO Sarah Friar says artificial intelligence isn't experimental anymore. She says banks, financial institutions and fintech's are using it everyday in...

Interconnects (Nathan Lambert) ▷ #ml-questions (1 messages):

Pricing for human-generated examples

Annotation quality comparison

Pricing for human-generated examples inquiry: A member inquired about where to find information on the prices for human-generated examples versus annotating them as good or bad.
- This question highlights the need for clarity in the value proposition of manual versus automated annotation processes.
Request for Annotation Guidelines: A follow-up question was posed on the annotation guidelines that would clearly define the quality of good or bad examples.
- This suggests a growing interest in establishing standard criteria for evaluating generated examples.

Interconnects (Nathan Lambert) ▷ #random (1 messages):

rjvs: Everyone is getting in on the 🍓 https://underworld.lnk.to/strawberryhotel

Interconnects (Nathan Lambert) ▷ #memes (7 messages):

Lex Friedman

Prompt Optimization

Hiking Stories

Lex Friedman Unblocks and Follows: Member expressed excitement that Lex Friedman unblocked and followed them on Twitter, saying it feels like they are early in the game.
- They humorously noted, 'I have some popcorn' to celebrate this unexpected interaction.
Inquiry on Prompt Optimization Resources: A member asked about the best resource for prompt optimization, highlighting interest in optimizing prompts for better AI interactions.
- Another member suggested DSPy as a suitable option, reflecting the community's shared knowledge.
Mom's Hiking Experience Humor: A user shared a humorous tweet from Andrew Schmidt, where their mom said, 'That hike almost killed me!' after a tough hike.
- This generated lighthearted discussions reflecting on personal hiking experiences and challenges.

Link mentioned: Tweet from Schmidt (@AndrewSchmidtFC): My mom: That hike almost killed me! Apple’s AI summary:

tinygrad (George Hotz) ▷ #general (24 messages🔥):

Fast Math Mode in Metal

Tinygrad PR Submission Guidelines

Backend Testing for LLVM and ONNX

Tinybox Updates

Active Bounties

Fast Math Mode complexities in Metal: A discussion arose about fast math mode in Metal, with members noting it performs algebraic transforms by default and requires manual disabling for strict floating point rules.
- Reassociation with the -fassociative-math flag was also mentioned as a potential optimization for mathematical expressions.
Strict PR Submission Guidelines: George Hotz emphasized the importance of reviewing merged PRs before submitting new ones, stating that changes without clear understanding or tests will not be merged.
- He criticized redundant PR submissions that encode the same information differently, urging contributors to focus on bug fixes over new features.
Testing Output for LLVM and ONNX Backends: A member proposed creating tests for the output of LLVM and ONNX backends, highlighting the lack of such checks in tinygrad.
- George confirmed there are tests comparing tinygrad's output to Torch, indicating the existing testing infrastructure can be expanded.
Tinybox Meeting Scheduled: A meeting was scheduled for 8 PM Hong Kong time to discuss several topics, including a tinybox update and various technical comments.
- Agenda items include updates on uop canonical order, MLPerf, and ongoing active bounties.

Links mentioned:

Clang Compiler User’s Manual — Clang 20.0.0git documentation: no description found
MSE in tensors.py and tests implemented by littlemountainman · Pull Request #7107 · tinygrad/tinygrad: MSE with testing implemented

tinygrad (George Hotz) ▷ #learn-tinygrad (27 messages🔥):

Tinygrad complex number support

Tinygrad on Android with OpenCL

Tinygrad tensor contiguity

Model Conversion Tools

Tinygrad ecosystem development

Tinygrad struggles with complex number support: Users discussed the challenge of using complex numbers in Tinygrad for tasks like creating a DFT, noting an AssertionError indicating lack of direct support.
- George Hotz expressed interest in easier support for complex numbers, suggesting they could be emulated with a 2D axis.
Tinygrad on Android with OpenCL: A user inquired about using Tinygrad to compile models for an Android device with an OpenCL accelerator, seeking guidance on the setup process.
- Members pointed to various resources, including compile_efficientnet.py, which helps generate required OpenCL kernels and buffers to run without Python.
Tinygrad tensor contiguity issues: A conversation emerged about why tensors created with Tensor.ones in Tinygrad may appear non-contiguous, referencing broadcasting behavior.
- Davidgonmar_ clarified that Tinygrad employs different memory management techniques compared to PyTorch, which fills a full buffer.
Interest in Model Conversion Tools: A user expressed a desire for an equivalent to HuggingFace's 'transformers' in Tinygrad to help standardize models, noting the organizational challenge with extra and examples folders.
- George Hotz welcomed the initiative, emphasizing the importance of such libraries as Tinygrad matures.
Future ecosystem development in Tinygrad: There are discussions about the evolution of Tinygrad's ecosystem, with George Hotz indicating a shift towards speed enhancements and broader implementation.
- He acknowledged the current importance of converters but noted their typical inefficacy, suggesting a potential focus on newer tools.

Links mentioned:

tinygrad/examples/openpilot/compile2.py at master · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ - tinygrad/tinygrad
my_notebooks/mnist-from-scratch.ipynb at main · NinoRisteski/my_notebooks: implementing ai papers. Contribute to NinoRisteski/my_notebooks development by creating an account on GitHub.
tinygrad/examples/compile_efficientnet.py at master · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ - tinygrad/tinygrad
tinygrad/extra/export_model.py at master · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ - tinygrad/tinygrad

LlamaIndex ▷ #blog (5 messages):

Intelligent Knowledge Assistants

Advanced RAG Techniques

Text-to-SQL Tutorials

Agentic Workflows

Exploring Intelligent Knowledge Assistants at Ray Summit: The Ray Summit workshop showcased the vision for building intelligent knowledge assistants that process complex data in various ways, now available on YouTube.
- All components needed to go beyond simple tasks were discussed during the session, which can be found here.
New Video Series on Advanced Knowledge Assistants: A new video series is launching to cover agentic workflows that can reason over data for output generation and decision making, with links available on YouTube.
- Core topics discussed include advanced retrieval-augmented generation techniques and can be found here.
Mastering Text-to-SQL with 500 Tables: A reliable text-to-SQL tutorial by @kiennt_ demonstrates constructing a SQL agent capable of operating over 500 tables, available on YouTube.
- This resource stands out as one of the best for navigating complex data setups, further information is accessible here.
Implementing Agentic RAG Techniques: Two new full-length tutorial videos are releasing that instruct on implementing agentic RAG techniques, including adding LLM layers to process inputs, found on YouTube.
- The tutorials will cover usage of LLMs to reason over vector databases and apply metadata filters, more details can be found here.
Dynamic Retrieval Strategies Explained: The advanced RAG series includes an upcoming tutorial demonstrating dynamic retrieval, adjusting context retrieval based on questions, with early access on YouTube.
- The tutorial will also address dynamic querying of SQL databases, further insights can be explored here.

LlamaIndex ▷ #general (32 messages🔥):

NVDIA case study cookbook

LlamaIndex workflows and streaming

Retriever issues in LlamaIndex

VectorStoreIndex and embeddings

Internship opportunities in RAG solutions

Cookbook Needed for NVDIA Case Study: Several members expressed interest in a cookbook for the NVDIA case study, particularly focusing on streaming use cases with Chainlit.
- One member highlighted struggles with nesting parent/child steps within Chainlit's framework while pursuing a custom agent workflow.
LlamaIndex Workflow Guidance: A newcomer to LlamaIndex asked for guidance on making chained LLM calls with loops and handling multiple requests simultaneously.
- An experienced member suggested using workflows for structured outputs and provided documentation links for further assistance.
Retriever Issues Reported: A member reported issues with retrievers returning empty nodes despite successfully testing the index with a chat engine.
- Another member recommended sharing code to troubleshoot further since retriever configurations seemed incorrectly set.
Exploring VectorStoreIndex and Embeddings: One user sought help with viewing generated embeddings in VectorStoreIndex but found that embeddings returned None.
- An expert pointed out that the underlying vector store clients typically don't return embeddings to conserve memory during operations.
Internship Opportunities in RAG Solutions: A member detailed their experience building RAG solutions for an enterprise client and expressed interest in internship opportunities for the upcoming summer.
- They invited members to reach out if anyone had leads on intern or entry-level positions in software engineering or related fields.

Links mentioned:

Workflows - LlamaIndex: no description found
Create cookbook for LlamaIndex Workflow abstraction by tituslhy · Pull Request #138 · Chainlit/cookbook: This cookbook aims to provide an example of how to use LlamaIndex's latest Workflow abstraction with Chainlit.
OpenAI - LlamaIndex: no description found
create-llama/templates/components/services/python/suggestion.py at main · run-llama/create-llama: The easiest way to get started with LlamaIndex. Contribute to run-llama/create-llama development by creating an account on GitHub.

LlamaIndex ▷ #ai-discussion (2 messages):

OpenAI training practices

Deepfake voice capabilities

OpenAI's Training Raises Eyebrows: Concern was raised about OpenAI allegedly training on individuals who did not opt-in, highlighting issues of privacy and consent.
- It was pointed out that this practice could lead to unexpected outcomes, such as generating personalized responses without explicit permission.
Deepfake Voice Generation Impresses: A user experienced impressive deepfake voice generation, where the system auto-predicted replies as if they were responding as the user on a Teams Tier plan.
- The AI not only asked questions in its own voice but also answered in the user's voice, demonstrating real-life auto-predict capabilities.

DSPy ▷ #show-and-tell (12 messages🔥):

Automatic Prompt Generation using MIPROv2

Collaborative Law Crafting Application

DSPy Plugin for Etherpad

Unique Research on Ancient Manuscripts

Translation and Interpretation of Historical Texts

Automatic Prompt Generation Unveiled: A member shared a thread on implementing automatic prompt generation techniques from the MIPROv2 optimizer using the gsm8k dataset with a simplified program structured into three modules.
- The modules include generating demos, crafting instructions, and combining outputs for the final prompt, providing a clean approach.
Swiss Citizens Crafting Laws Directly: A member is developing a collaborative software application that allows Swiss citizens to discuss and create laws through a process called popular initiative.
- This application is part of their master's thesis, showcasing the power of direct citizen participation in law-making.
DSPy Plugin Enhances Law Discussions: A member created a DSPy plugin for Etherpad to facilitate discussions on laws and generate legal text from community input.
- This innovative tool aims to transform discussions into actionable law drafts, allowing direct public participation in legislation.
RAG Research Queries for Ancient Texts: A member is studying biblical languages and aims to utilize DSPy for translating ancient manuscripts like the Dead Sea Scrolls and Egyptian Papyri for modern audiences.
- They are seeking collaboration on setting up a specialized RAG system that integrates historical text analysis and multi-language document processing.
Praise for Community Efforts: One member expressed appreciation for the impressive projects being produced by others in the community, acknowledging their high output quality.
- This highlights the commitment to innovation and collaboration within the community, fostering an inspiring atmosphere.

Links mentioned:

Tweet from Karthik Kalyanaraman (@karthikkalyan90): 🧵A simplified implementation of "automatic prompt generation" using the techniques used in MIPROv2 optimizer. This program uses the gsm8k dataset consisting of math problems and is made up of...
CustomGPTs: In this video, I delve into the evolution of my GPTs, starting from my first language model 8 years ago to the latest version, DSLmodels. I discuss the significance of sparse priming representations (...
Introducing DSL Model Framework! 🚀: https://github.com/seanchatmangpt/dslmodel Hey DSLModel community, it's Sean Chatman here. I'm excited to introduce you to DSL Model, a framework that revolutionizes structured text and langu...
Tweet from Karthik Kalyanaraman (@karthikkalyan90): A short thread on solving classification/name-entity recognition class of problems using DSPy and Outlines from @dottxtai . This approach is not only ergonomic and clean but also guarantees schema adh...

DSPy ▷ #general (27 messages🔥):

DSPy 2.5 mapping

Audio input development

Liquid40B implementation

Named entity recognition examples

Relation extraction use cases

Clarification on DSPy 2.5 Mapping: Members discussed the transition to DSPy 2.5 and if the implementation would significantly differ from earlier notebooks, with suggestions to follow the migration doc.
- No major differences are expected, emphasizing that existing implementations should still be valid.
Audio Input Development Inquiry: There was a discussion about ongoing work related to audio input features in DSPy, referencing a potential GitHub PR.
- Participants shared links and discussed architectures like Ultravox, which support audio functionality integrated with LLaMa models.
Implementing Liquid40B in DSPy: Inquiry was made about the implementation of Liquid40B into DSPy, questioning whether it works with LiteLLM.
- The reply affirmed that support exists for any models compatible with LiteLLM.
Examples for Named Entity Recognition: A member shared a code snippet for a Named Entity Recognition (NER) implementation in DSPy, highlighting a way to extract entities from a text input.
- The newer method via dspy.ChainOfThought was recommended for reliability over deprecated TypedPredictor methods, showcasing modernized typing support.
Relation Extraction Use Cases: Members explored the availability of examples on relation extraction alongside NER, suggesting relevant datasets that could yield valuable insights during projects.
- A dataset from Hugging Face was mentioned as a possible resource for this task.

Links mentioned:

Getting Started I - DSPy: None
Hugging Face – The AI community building the future.: no description found
dspy/examples/vlm/mmmu.ipynb at mm-adapter · stanfordnlp/dspy: DSPy: The framework for programming—not prompting—foundation models - stanfordnlp/dspy
dspy/examples/migration.ipynb at main · stanfordnlp/dspy: DSPy: The framework for programming—not prompting—foundation models - stanfordnlp/dspy
GitHub - fixie-ai/ultravox: A fast multimodal LLM for real-time voice: A fast multimodal LLM for real-time voice. Contribute to fixie-ai/ultravox development by creating an account on GitHub.
GitHub - stanfordnlp/dspy at mm-adapter: DSPy: The framework for programming—not prompting—foundation models - GitHub - stanfordnlp/dspy at mm-adapter
Google Colab: no description found
GitHub - stanford-oval/storm: An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.: An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations. - stanford-oval/storm
Google Colab: no description found
dspy/examples/coding/hackercup.py at main · stanfordnlp/dspy: DSPy: The framework for programming—not prompting—foundation models - stanfordnlp/dspy
GitHub - jjovalle99/DSPy-Text2SQL: DSPY on action with OpenSource LLMs.: DSPY on action with OpenSource LLMs. Contribute to jjovalle99/DSPy-Text2SQL development by creating an account on GitHub.

OpenInterpreter ▷ #general (35 messages🔥):

Open Interpreter Performance

Setting Up Open Interpreter

Local Model Limitations

Beta Testing Opportunity

OS Mode Flexibility

Open Interpreter Performance with Spreadsheets: Members discussed running the test from the YouTube video titled 'Improve Open Interpreter Performance with Spreadsheets,' but local models like qwen2.5:32b-instruct struggled to follow the steps.
- One member mentioned that performance depends on model quality and prompting, suggesting making a profile to clarify tasks.
Help for Open Interpreter Setup: A beginner asked for help setting up Open Interpreter in the Windows terminal, mentioning issues following the GitHub demo video.
- Another member shared detailed setup instructions including installation commands via pip.
Local Model Limitations with Open Interpreter: A user inquired if local models require visual capabilities for Open Interpreter, but was informed that there's no local model matching Sonnet's performance due to its limitations.
- A suggestion was made to ensure the computer API is properly imported, as this is crucial for local models to perform actions effectively.
Interest in Beta Testing for Web3 Platform: A member offered a Web3 platform opportunity seeking developers, moderators, beta testers, and more, inviting those interested to DM for details.
- Another member expressed their interest specifically in being a beta tester.
Flexibility of OS Mode with Different APIs: Inquiries about using different APIs with OS mode led to clarification that it should be possible with code modification, mainly set for Anthropic, Vertex, and Bedrock.
- A member provided guidance on accessing and modifying the loop.py file under the computer use folder to implement these changes.

Links mentioned:

Setup - Open Interpreter: no description found
All Settings - Open Interpreter: no description found
Improve Open Interpreter Performance with Spreadsheets: Does your AI agent lose focus during a multi-step task? Here's an approach that helps Open Interpreter stay on track PLUS it speeds up prompt engineering ite...
open-interpreter/interpreter/core/computer/vision/vision.py at 36ec07125efec86594c91e990f68e0ab214e7edf · OpenInterpreter/open-interpreter: A natural language interface for computers. Contribute to OpenInterpreter/open-interpreter development by creating an account on GitHub.

OpenInterpreter ▷ #ai-content (4 messages):

Markdown usage

Obsidian demos

Advanced Voice features

Apple AI server hacking incentive

Markdown Lovers Unite!: A member expressed their love for Markdown, stating that Obsidian is great and hinted at a cool demo coming out this week.
- This enthusiasm reflects a growing interest in using Markdown tools in AI coding contexts.
OpenAI Expands Advanced Voice Access: OpenAI announced that Advanced Voice is now available to free users in the EU, Switzerland, Iceland, Norway, and Liechtenstein within their mobile apps.
- This marks a significant enhancement in accessibility for mobile users in these regions.
Apple's $1M Hacking Challenge: A tweet shared that Apple will pay up to $1M to anyone who can successfully hack into their AI servers.
- This news raises intriguing questions about security measures surrounding AI technologies.
Exciting YouTube Content on AI Coding: A YouTube video titled 'Cline & Aider + Obsidian: AI CODING with MARKDOWN is AMAZING!' was shared, emphasizing the combination of Cline, Aider, and Markdown in AI coding.
- The video aims to showcase rules and long prompts, highlighting innovative ways to enhance coding practices.

Links mentioned:

Tweet from OpenAI (@OpenAI): And finally, Advanced Voice is now available to Free users in the EU, Switzerland, Iceland, Norway, and Liechtenstein in our mobile apps.
Tweet from Culture Crave 🎃 (@CultureCrave): Apple will pay up to $1M to anyone that can hack into their AI servers
Cline & Aider + Obsidian : AI CODING with MARKDOWN is AMAZING! (Rules & Long Prompts!): Join this channel to get access to perks:https://www.youtube.com/@AICodeKing/joinIn this video, I'll be telling you that how you can use Cline & Aider combin...

LAION ▷ #general (12 messages🔥):

Discord LLM Helper

In-Channel Summaries

Voicebot Project

Ephemeral Bot Responses

Desire for a Discord LLM Helper: A user expressed interest in having a feature similar to a 'ask Discord' LLM helper that could summarize conversations and answer questions on demand.
- Another member pointed out that while Discord has a beta feature for summaries, asking questions isn't currently supported.
Ephemeral Responses in Custom Bots: A member suggested that creating a custom Discord bot to handle questions and summaries might be straightforward.
- They noted that ephemeral responses could make interactions cleaner by ensuring responses are only visible to the user who executed the command.
Inquiry about Active Voicebot Project: A user inquired whether the project for a voicebot at LAION is still active and if they could connect with someone involved.
- This reflects ongoing interest in LAION's advancements in voice technology, highlighting community engagement.

LAION ▷ #research (11 messages🔥):

Mindcraft and LLMs

Llama3-8B-1.58 Model

Misunderstandings about model specifications

Mindcraft fun with LLMs: People are engaging in interesting projects combining Minecraft with LLMs.
- It’s fun stuff, as noted in the discussion.
Clarifications on Llama3-8B-1.58 Model: The Llama3-8B-1.58 models are derived from the base model Llama-3-8B-Instruct and are not built on BitNet architecture, contrary to some claims.
- For detailed information, they referred to a blog post on extreme quantization.
Confusion over model parameters: There was some misunderstanding regarding the 100B mentioned in the model link, with others clarifying it relates to the model being 8B parameters.
- Members expressed that this confusion was common and appreciated the corrections exchanged.

Links mentioned:

HF1BitLLM/Llama3-8B-1.58-100B-tokens · Hugging Face: no description found
Tweet from FxTwitter / FixupX: Sorry, that user doesn't exist :(
GitHub - kolbytn/mindcraft: Contribute to kolbytn/mindcraft development by creating an account on GitHub.

OpenAccess AI Collective (axolotl) ▷ #general (2 messages):

Mixtral AI Upgrades

SymNoise Implementation

Mixtral AI is considered outdated: A member humorously suggested upgrading from the Mixtral AI model to a newer version of MistralAI, implying its obsolescence.
- At least upgrade to a newer MistralAI model. �*
Seeking code for SymNoise technique: A member inquired about code implementation for a paper introducing the SymNoise fine-tuning technique, which enhances language models using symmetric noise.
- Tried implementing it myself, but it seems to double the batch size of the embeddings through concatenation, and I don't know how to deal with that.

Link mentioned: SymNoise: Advancing Language Model Fine-tuning with Symmetric Noise: In this paper, we introduce a novel fine-tuning technique for language models, which involves incorporating symmetric noise into the embedding process. This method aims to enhance the model's func...

OpenAccess AI Collective (axolotl) ▷ #datasets (11 messages🔥):

SAT reading test scraping

Sonnet formatting for questions

Presence of original answers

Issues with multimodal questions

Dataset sharing

Incomplete scrape of SAT reading tests observed: A member completed an incomplete scrape of the SAT reading tests and several AP tests, leading to a discussion on formatting results.
- Thank you for bringing it to my attention! expressed appreciation for feedback regarding the scrape.
Questions raised about image inclusion: Concerns were raised regarding whether images should be included with questions after noticing the presence of formatted_prompt and rewritten_answer fields.
- The original scraper mentioned, the full set does include images for some questions but intended the dataset to remain unimodal.
Clarification on original answers and potential errors: A member inquired whether the original answers were retained or if Sonnet was used to generate them, raising potential error concerns.
- The scrapper confirmed they had both the original answer and rationale as scraped from other sources for reference.
Sonnet used for improved formatting: The main motivation for using Sonnet was to enhance the visual formatting of the answers and rationale, avoiding starting with the answer itself.
- This was aimed at creating a clearer presentation of the details involved in each question and its reasoning.
Dataset sharing link provided: A member shared a link to the dataset at Hugging Face, specifically for Dans-Logicmaxx-SAT-AP.
- The link can be accessed here: Dans-Logicmaxx-SAT-AP.

OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (8 messages🔥):

Qwen Model Configuration

Fine-tuning Parameters

DeepSpeed Integration

Training Token Definitions

Discussion on Qwen Model Configuration: Members discussed the provided configuration for fine-tuning Qwen/Qwen2.5-32B, highlighting the potential need to specify exact model types instead of using placeholders like AutoModel and AutoTokenizer.
- Concerns were raised about the base_model_config not being necessary if using defaults, and the security implications of setting trust_remote_code to true.
DeepSpeed Configuration Checks: The focus was on verifying the deepspeed configuration path to ensure it points to an existing and correctly set file for optimized training.
- Members noted that misconfigurations could hinder performance and stressed confirming the compatibility of the optimizer settings in relation to DeepSpeed's requirements.
Clarification on Special Tokens Use: The importance of correctly integrating defined special tokens and ensuring they function within the model and tokenizer was emphasized.
- Failures to integrate these tokens might lead to issues during both training and inference phases.
Lighthearted Exchange on Knowledge Gaps: A member jokingly admitted to not knowing everything about the configuration process and welcomed further clarification on potential issues.
- This led to a light-hearted moment as members acknowledged the complexities inherent in fine-tuning configurations.

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search): Understand code, faster.

LangChain AI ▷ #general (7 messages):

ReAct Agent with HuggingFace

Advanced RAG methods

Using create_sql_agent with Pandas

Public benchmarks for RAG systems

Image handling in Langchain

Creating ReAct Agent using HuggingFace Local Model: A member shared their current approach to initializing a ReAct Agent with a local model and encountered a parserException during invocation.
- They requested assistance as they were unable to find a solution online for this specific error.
Exploration of Advanced RAG Methods: Question raised about the current most advanced methods for Retrieval-Augmented Generation (RAG) and whether the traditional methods are still applicable.
- Both data cleaning and storage in Pinecone/vector databases were mentioned as common practices alongside a search for recent references.
Using create_sql_agent to Return Pandas DataFrame: A query was made concerning how to utilize create_sql_agent to output a Pandas DataFrame rather than just a text string.
- The member inquired specifically about the need for SQLDatabaseToolkit in this context.
Public Benchmarks for RAG Systems: Inquiry into whether there are any public benchmarks specifically available for RAG systems.
- A suggestion was made to consider RAGAS as a potential resource to evaluate LLM applications.
Convenient Image Handling in Langchain: A discussion took place regarding the need for a convenient method to handle multiple images in Langchain messages.
- This highlights the community's interest in improved image processing capabilities within the language model framework.

Link mentioned: GitHub - explodinggradients/ragas: Supercharge Your LLM Application Evaluations 🚀: Supercharge Your LLM Application Evaluations 🚀. Contribute to explodinggradients/ragas development by creating an account on GitHub.

LangChain AI ▷ #share-your-work (6 messages):

AdaletGPT

bootstrap-rag v0.0.11

Appine

Financial Agentic System

Wordle Clone Tutorial

Introducing AdaletGPT - A Turkish Legal Chatbot: AdaletGPT is a Turkish legal chatbot based on RAG, utilizing LangChain, Pinecone, and OpenAI to assist users.
- This platform aims to facilitate legal inquiries and services through AI-driven interaction.
bootstrap-rag v0.0.11 Launches with Exciting Updates: The release of bootstrap-rag v0.0.11 includes a LLM as Judge template now integrated with Arize AI Phoenix for enhanced observability and analysis.
- The update also brings crucial bug fixes and improvements in documentation, streamlining user experience.
Appine: No-Code AI App Creation Made Easy: Appine provides a no-code platform for building and sharing AI-powered apps, allowing users to create applications visually through a drag-and-drop interface.
- As a minimum viable product, it is currently free of charge, and further enhancements will be made based on user feedback.
Innovative Financial Agentic System Built: A new financial reporting system combines Langgraph, GROQ, and various APIs for real-time analysis, gathering data like stock prices and income statements.
- It features a flexible execution sequence, robust error handling, and supports multiple LLM providers including OpenAI.
Quick Tutorial on Building a Wordle Clone: A tutorial was published detailing the creation of a Wordle clone in just 30 minutes using V0 and Next.js, geared towards all coding levels.
- This resource is designed to inspire and guide users in creating their own simple game applications.

Links mentioned:

no title found: no description found
bootstrap-rag: None
Release v0.0.11 · pavanjava/bootstrap-rag: What's Changed Modify maindoc by @pavanjava in #62 Modify maindoc by @pavanjava in #63 Observability migration by @pavanjava in #64 Observability migration by @pavanjava in #65 Observability migr...
Appine: no description found

LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

Lecture 8

Yuandong Tian's Presentation

Neural vs Symbolic Decision Making

Lecture 8 Starts Soon!: The 8th lecture is happening today at 3:00pm PST. You can watch the livestream here.
- Be sure not to miss the insights shared during this special lecture!
Yuandong Tian Takes the Stage: Today's guest speaker is Yuandong Tian, a Research Scientist Director at Meta AI Research. He will discuss integrating neural and symbolic decision-making frameworks to tackle complex tasks.
- His talk addresses how Large Language Models can enhance reasoning compared to traditional symbolic solvers.
Understanding Neural and Symbolic Integration: The talk will explore how neural models excel with flexibility while symbolic solvers provide guaranteed solutions. The goal is to find a unified framework that can leverage the strengths of both approaches.
- This integration is essential for improving the performance of LLMs in complex reasoning tasks.
Questions? Reach Out!: For any questions regarding the lecture, course staff can be contacted in the designated channel. It's a great opportunity to engage with the course material and ask for clarification.
- Don’t hesitate to seek help to deepen your understanding of the topics discussed!

Link mentioned: CS 194/294-196 (LLM Agents) - Lecture 8: no description found

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (8 messages🔥):

Confirmation emails for MOOC signup

Peer study group initiative

Hackathon timeline and tracks

Datasets for benchmarking track

No Confirmation Emails for MOOC Signup: Members confirmed that there are no confirmation emails sent after signing up for the LLM Agents MOOC, only a copy of the Google Form response titled 'LM Agents MOOC Signup Form'.
- Participants can expect weekly session reminder emails after signing up, but no additional confirmation will be provided.
Peer Study Group Suggestion: A member proposed creating a peer study group on Zoom for those who joined the course later to facilitate discussions and brainstorming sessions.
- Although there is no formal resource from the staff, others are encouraged to form their own groups or join existing ones.
Hackathon Timeline and Details Unveiled: The hackathon's timeline is available on the official hackathon website, detailing various tracks including Applications, Benchmarks, Fundamentals, and Safety.
- This event, hosted by Berkeley RDI, seeks to unite students, researchers, and industry practitioners to advance LLM agent technology.
Need for Datasets in Benchmarking Track: A member inquired about appropriate datasets for the benchmarking track of the hackathon, indicating interest in resources for their projects.
- No responses or specific resources were provided for datasets, leaving the inquiry open for further discussion.

Link mentioned: LLM Agents Hackathon: Hackathon on LLM Agents hosted by RDI at UC Berkeley.

LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (2 messages):

Study Group Formation

Interest in Collaborating

Proposal to Form a Study Group: A member proposed to gauge interest in starting a study group for discussing lectures and reading materials, suggesting various time slots for virtual meetings.
- They provided a link to a Google Form for others to express their availability and interest.
Support for Study Group Idea: Another member expressed enthusiasm for the study group proposal, stating it sounds cool.
- This shows a positive response towards fostering collaboration among late joiners.

Link mentioned: LLM Agents Peer Study Group (Virtual): Use this to express your interest in joining a peer study group virtually. We might use Discord events or Zoom. We will go through the lectures starting from the first and also discuss the additional...

Torchtune ▷ #general (4 messages):

Embedding Config Flags

LoRA Bug Fix

TransformerDecoder Changes

Embedding and Norm Layer Configuration Proposal: A member suggested exposing two boolean flags (embedding_trainable=False and norms_trainable=False) in the configs instead of a list of names, to prevent future configuration issues if layer names change.
- The idea follows the logic that transitioning from boolean flags to a list is less disruptive than adapting countless configs, especially as TransformerDecoder may require more expressive changes for various model types.
LoRA Bug Fix Submitted: A member submitted a fix for the LoRA bug through pull request #1909, addressing issues with single device fine-tuning checkpoint saving and NaN loss when use_dora=True.
- They expressed uncertainty about whether it was functional across all recipes, noting bugs for distributed setups and NaN results for single device use.
Discussion on Configuration Flexibility: Another member weighed in on the configuration proposal, acknowledging the pros and cons of both the boolean flag and name options, suggesting a preference for flexibility.
- This indicates a consensus on the importance of adaptability in configurations to meet future needs.

Link mentioned: Fix lora single device fine tune checkpoint saving & nan loss when use_dora=True by mirceamironenco · Pull Request #1909 · pytorch/torchtune: Context What is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here) Fixes #1903 . Changelog What are the changes made in thi...

Torchtune ▷ #dev (5 messages):

Hyperparameter Optimization Recipe

Discussion on muP Utility

Priority Issues in Development

External Tools for Tuning

Exploring Hyperparameter Optimization in Torchtune: A GitHub issue discusses creating a recipe for hyperparameter optimization where users can provide a config along with datasets and parameters to sweep.
- The suggestion emphasizes the need for common defaults in a grid format, although it is noted that no one has explicitly requested it yet.
Debating muP Usefulness for Fine-tuning: Members expressed skepticism about the utility of muP for fine-tuning, with one mentioning it is largely mentioned in the context of pretraining.
- Concerns were raised regarding prioritizing other issues like faster generation and early stopping over the implementation of muP.
Higher Priority Issues Discovered: A member pointed out that there are many open issues that should take precedence, including faster reinforcement learning generation and better classification with LLM.
- They specifically highlighted the challenge of managing 200 open issues and suggested support for distributed shampoo as another priority.
Relying on Existing Tuning Tools: Discussion shifted towards leveraging existing external solutions like Wandb sweeps and Ray Tune for hyperparameter tuning instead of incorporating a new script into Torchtune.
- One participant expressed a preference to leave these functionalities out of the library, as the existing tools already provide suitable solutions.

Link mentioned: recipe for hyperparameter sweep · Issue #1752 · pytorch/torchtune: Torchtune could provide a recipe to do HPO, where the user provides a config, the recipe, eval dataset, params to sweep and budget. I just played with optimizer. Our default in lr 3e-4. I tried 3e-...

Mozilla AI ▷ #announcements (2 messages):

Human Native AI Marketplace

November Member Programming

Public AI Events

OSS4AI San Francisco Meetup

Sqlite-Vec Metadata Filtering

Human Native AI Marketplace launched: A new AI data marketplace by Human Native AI allows creators to license their content for AI training and receive compensation.
- Co-founder James Smith will discuss their progress at the upcoming event, part of Mozilla's Data Futures Lab Speaker Series.
Exciting November Member Programming: November brings a lineup of member-organized events including topics from Sqlite-Vec and Refact.ai, alongside remote conferences and an in-person meetup in San Francisco.
- Members are encouraged to RSVP based on event descriptions to join these significant gatherings.
Showcasing member projects: Highlighted projects presented on the Mozilla AI stage include Open Interpreter, Homebrew, and Sentry.io's open source auto fix among others.
- The community looks forward to featuring more members' open-source endeavors within Public AI, emphasizing the contributions of the 3300 members.
Upcoming OSS4AI San Francisco IRL Meetup: The OSS4AI San Francisco IRL Meetup is set to take place, inviting interested members to connect and share insights.
- This serves as an opportunity for local members to engage in collaborative projects and discussions.
Metadata Filtering in Sqlite-Vec discussed: An event on Metadata Filtering in Sqlite-Vec will address important techniques for handling data efficiently.
- This initiative highlights the importance of maintaining data integrity while contributing to AI training.

Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (2 messages):

Leaderboard multiple functionality

GitHub example clarification

Clarification on 'Multiple' Functionality in Leaderboard: A user inquired about what 'multiple' on the leaderboard refers to, questioning if it involves multiple queries, functions, or steps.
- In response, it was noted that 'multiple' seems to indicate the capability to select the correct function from several options, though the evaluation of multi-step functionality remains unclear.
Reference to GitHub Example: The bot provided a GitHub link as a random example.
- The GitHub page details the project 'Gorilla: Training and Evaluating LLMs for Function Calls', providing context for understanding the leaderboard functionality.

Link mentioned: gorilla/berkeley-function-call-leaderboard/data/BFCL_v3_exec_multiple.json at 2101b11f6d03d9f323715d7d2012a955d7f4114e · ShishirPatil/gorilla): Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls) - ShishirPatil/gorilla

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}