> AI Discords for 1/30/2024. We checked **21** guilds, **311** channels, and **7688** messages for you. Estimated reading time saved (at 200wpm): **577 minutes**.

There’s been a lot of speculation about the surprisingly good open access (not open source, because no license) model Miqu - scoring 74 on MMLU (vs 75 for mistral-medium) and 84.5 on EQ-bench, a subjectively better version of MMLU. There’ve been a lot of debates both for and against this fact - but the CEO of Mistral has now come out and confirmed it.

image.png

So technically we can’t use this model but it’s an interesting leak for sure.


Table of Contents

[TOC]

PART 1: High level Discord summaries

TheBloke Discord Summary

  • Miqu Takes the Lead in Model Discussions: Engineers are buzzing about Miqu, comparing it favorably against Llama-2-70B-chat and Mixtral, with Miqu notably excelling in instruction-following and critiquing tasks. The debate extends to sampling methods where dynatemp and min-p are being dissected for their utility in enhancing model confidence levels and outcomes, sparking a dialogue on what qualifies as “superior” results in AI performance.

  • Developers Dive Into Niche Browser Uses and UI Preferences: In less model-centric conversations, developers exchange views on optimal browsers for both development and personal use, ranging from Internet Explorer to more niche choices like Vivaldi and Docker on Ubuntu systems on Arch. This also spirals into a discussion on Discord’s UI aesthetics, particularly the debate over dark vs. light themes, and the exclusive color choices bound to Discord Nitro.

  • Creative Use of Chat Models Unleashed in Role Play Channels: A creative flight was found in utilizing various chat models for role-playing, with models like BagelMistery Tour v2 and Psyfighter v2 standing out for their role-playing finesse. Technical discussions sprung from the functional fp16 dequat of Miqu, where a notable fp16 conversion of Miqu-1-70b was shared, highlighting both advancements and challenges in model quantization.

  • Training and Fine-Tuning Discussions Illuminate a Path for Novices and Experts Alike: Tips on budget-friendly model training with Unsloth and optimizing Mistral 7B’s training process were the highlight, along with the call for comprehensive tutorials for fine-tuning Hugging Face models hinting at a collective need for accessible, advanced guidance in model optimization and customization.

  • Coding Conversations Reflect on Development Culture and Challenges: The complexity of reading over writing code resonated with developers, alongside critiques of contemporary web development practices including the over-reliance on external libraries and the Not Invented Here syndrome. There’s a growing concern over the risk of fostering developers as mere “framework technicians” rather than problem solvers, pointing to a deeper need for a foundation in computer science principles in the programming community.


Nous Research AI Discord Summary

  • Activation Beacon Paves Way for Unlimited Context: The Activation Beacon method has been discussed as a groundbreaking approach to overcome LLM context length limitations, enabling models trained on 4K contexts to generalize to 400K contexts with linear inference time growth. This method, incorporating “global state” tokens, could radically change how LLMs manage memory consumption. Read the paper and check out the implementation.

  • SQLCoder-70B Excels in Text-to-SQL Generation: Introduced as the new leader in Postgres text-to-SQL conversion, SQLCoder-70B, fine-tuned on AIatMeta’s CodeLlama-70B, has set a new standard for LLMs in SQL generation. This model is now accessible on Hugging Face, offering significant advancements for related tasks. Explore SQLCoder-70B.

  • Miqu Model Shatters Expectations on EQ-Bench: The Miqu model has beaten previous benchmarks, scoring an 83.5 on EQ-Bench, surpassing Mistral Medium and sparking community discussions about its potential origins and capabilities as perhaps the best publicly available model yet. Details on Miqu and its performance can be found on Hugging Face.

  • MoE’s Scaling Laws and Impact on Model Efficiency: Two key papers were highlighted in the discussion on MoE scaling laws, shedding light on the efficiency and performance benefits of Mixture-of-Experts models. The exploration of these models represents a significant interest in enhancing computing resource utility and model performance. Reference: Scaling Laws for Autoregressive Generative Modeling and ST-MoE: Designing Stable and Transferable Sparse Expert Models.

  • Advancements and Needs in Vector Language Models (VLMs): The community has expressed a pressing need for more efficient and accessible inference libraries for VLMs, highlighting ongoing efforts and innovations in batch LORA inference and potential extensions to support VLMs more effectively. This ongoing development aims to improve accessibility and computational efficiency, addressing the current lack of resources in this domain.


LM Studio Discord Summary

  • LM Studio API Connection Woes Resolved: @.ben.com clarified to @.mchinaga that issues in chat_gpt_bot.py were due to failing to reach the API server, not a bug, with @dagbs advising on correct API base setting and response key configurations. This common pitfall underlines the necessity of accurate endpoint configuration for successful API interactions.

  • Innovative Text to Video Puppeteering Explored: LM Studio users, led by @toddmiller, delved into the feasibility of converting text scripts into video puppets, discussing current limitations and the potential need for models beyond LM Studio’s current capabilities for such sophisticated tasks.

  • GPU Acceleration Optimizations for LM Studio: Detailed advice was shared on GPU acceleration settings for an RTX 4070 by @fabguy, highlighting the importance of n_gpu_layers adjustment for enhancing performance without straining the CPU. This insight underpins the critical balance between GPU utilization and overall system efficiency in AI applications.

  • Compatibility and Execution Challenges with LM Studio Across Various Hardware: The discussion covered strategies to overcome challenges when running LM Studio on diverse platforms, particularly focusing on Linux library issues, GPU selection, and ARM CPU compatibility concerns. Notably, mobile platforms like Android and iOS pose significant compatibility hurdles, reinforcing the importance of platform-aware development in AI tools.

  • Emerging Trends and Performance in AI Model Discussion: The community reported mixed performances and unique behaviors in models like Tiny Models, CodeLlama 70B, and MIQU, with an amusing incident of Tiny Models producing absurd jokes post-reboot. The discourse extended to practical discussions on LangChain’s potential integration with LM Studio for dataset generation, emphasizing the ongoing innovation and troubleshooting within AI model utilization and development.

These summaries highlight key discussions on technical challenges, model performance, and innovative applications within the LM Studio community, reflecting a vibrant dialogue on AI technology’s frontiers among practitioners.


OpenAI Discord Summary

  • VRAM Choice Conundrum for AI Beginners: aqua_dawn_67525 is deliberating between 16GB VRAM and 24GB VRAM for starting AI projects, with a backup thought of migrating to cloud computing for more robust requirements. Meanwhile, toror vouched that 16GB VRAM proves to be quite capable for contemporary optimized models, shedding some light on the practical sufficiency of lower VRAM for starters.

  • Limits and Confusions in GPT Plus Unveiled: Users like @iyons and @myroslava_35196_08143 encountered unexpected message limit warnings on GPT Plus, igniting confusion given that they hadn’t reached the advertised 40-message threshold. This issue pointed to broader concerns regarding communication and support from OpenAI for its users.

  • GPT Mentions Usher in New AI Collaboration Possibilities: Introduction of GPT mentions, allowing GPTs to share context and actions, has stimulated community excitement due to its potential for enhanced AI composability. However, despite the innovative leap, users like @darthgustav. and @blckreaper are still mapping out the practical capabilities and limitations of inter-GPT communication.

  • GPT’s Gaming and Creative Challenge: In the realm of word games and creative projects, challenges like effectively managing GPT in grid-based word games were highlighted, with community members suggesting advanced strategies involving 2D arrays and Python for better game management. Additionally, the integration of DALL-E 3 and GPT, facilitating projects that span across text and visual generation, represents a pioneering step in AI-assisted creative endeavors, although currently requiring manual orchestration.

  • Community Tackles API Complexities and Feature Limitations: The community has been actively engaging in troubleshooting and brainstorming around the use of OpenAI’s APIs for complex project workflows, such as chaining different models for multi-step processes. Despite the excitement around features allowing for inter-model communication, discussions revealed that such integrations necessitate manual navigation due to current technological constraints.


Eleuther Discord Summary

  • Ratchet Revolutionizes ML in Browsers: Ratchet, a novel ML framework for browsers, promises optimized speed and developer experience using Rust and WebGPU, as showcased in whisper-turbo.com.

  • EleutherAI’s Research Triumph: EleutherAI celebrates the acceptance of 6 out of 10 papers at ICLR, spotlighting advancements like “LLeMA: An Open Language Model for Mathematics” and marking a significant achievement for authors and contributors.

  • Sparse Fine-Tuning Outshines LoRA: A new method for sparse fine-tuning large language models presents a more parameter- and memory-efficient alternative over (q)LoRA, potentially revolutionizing instruction tuning as evidenced in this study and its implementation.

  • CUDA and CUDNN on PPC64LE Present Challenges: The NVIDIA CUDA container’s only support for ppc64le platform, combined with CUDNN installation issues and the struggle in building wheels, underscores the difficulties faced in optimizing AI development environments on specific architectures.

  • Tokenizing Strokes for Vector Graphic Synthesis: StrokeNUWA introduces a method for tokenizing strokes to facilitate vector graphic synthesis, indicating a novel approach in multimodal AI research.


Mistral Discord Summary

  • Mistral 7B VRAM Requirements and Workarounds: The community discussed the feasibility of running Mistral 7B models on 6GB VRAM GPUs like the 1660 Ti, with mixed results. While out-of-memory issues were common, a quantized model on Hugging Face was identified as capable of running within the 5.5GB VRAM limitation, providing a solution for users with lower-end hardware.

  • Optimizing Mistral Performance Across Hardware: Users reported varied performance of Mistral 7B on both low-end and high-end systems, including the RTX4090. The discussions emphasized the importance of optimizing GPU utilization and considering resources like Colab notebooks for efficiently loading models without fully consuming VRAM.

  • Fine-Tuning for Specific Output Preferences: In the finetuning channel, there was a request for advice on getting more concise responses from the LLM, aiming for direct answers like “4” to straightforward questions. Enhanced finetuning strategies were suggested, including increasing the number of steps beyond 60 and possibly lowering the learning rate to 2e-5 for improved model performance.

  • Enterprise-Level Web UI for Mistral Debuts: The showcase channel introduced uMdali, an open-source project providing a Web UI for the Mistral API. This tool supports connections to Ollama, OpenAI, and Gemini, aiming to serve as an “Enterprise Chat Front End”.

  • Collaboration and Internships with Mistral: The community was encouraged to contribute to Mistral’s public documentation on GitHub for collaborative improvement. Queries about internship opportunities at Mistral highlighted the competitive nature and high qualifications required to join the Mistral team, suggesting a vibrant and engaged developer community.


HuggingFace Discord Summary

  • Code Llama 70B and Sentence Transformers v2.3.0 Unleashed: The Hugging Face community announced the release of Code Llama 70B, a new AI chat model, and the updated Sentence Transformers v2.3.0, boasting bug fixes and performance enhancements. Check out Code Llama 70B here and the Sentence Transformers release notes here.

  • Multimodal LLM Dataset for the Malaysian Context: The Multimodal Malaysian LLM dataset, aim to advance LLM training with multimodal inputs including translated LLaVA instructions, is now available on HuggingFace as part of the mesolitica collection.

  • Innovative AI Tools Created by Community Members: Community developers have introduced new tools: a Gradio app for transforming Excel/CSV files into database queries accessible on HuggingFace Spaces, and a novel application of AI to Magic: The Gathering for multi-label classification of card colors which you can explore here.

  • CUDA Troubleshooting and Unknown Compiler Option Challenges: A user, leveraging an RTX 2080ti, reported issues while configuring GPU acceleration, encountering an nvcc fatal: Unknown option 'fPIC' error, indicating compatibility complications with the nvcc compiler – detailed in this GitHub issue.

  • Discussions on LLMs and Diffusion Models Deepen: The community explored various topics, from the quest for improving the robustness of QA datasets beyond DPO, the capability of lokr and loha for inference with “loading with peft” method, to expressing dissatisfaction with a 70B coder chat model’s limited knowledge of the 🤗 Diffusers library. Additionally, members discussed the challenge of replicating specific art styles using Stable Diffusion, with an attempt to capture an anime style outlined here.


OpenAccess AI Collective (axolotl) Discord Summary

  • Axolotl’s Continuous Advancements Draw Acclaim: The axolotl framework is praised by @nafnlaus00 for its enhancements leading to lower VRAM usage, faster training, and improved outcomes, sparking discussions about sharing these successes on platforms like Twitter. Additionally, @dreamgen shared their experiences with hardware and VM enhancements that contributed to speedup in AI projects, emphasizing the crucial role of hardware in development.

  • MIQU-1-70b Makes a Leap: The dequantization of MIQU-1-70b from q5 to f16 and its PyTorch integration caught the community’s attention, provided by @dreamgen with a link to the model on Hugging Face. This breakthrough is particularly noted for its potential applications and ease of use.

  • Technical Troubles and Triumphs in Development and Deployment: From VRAM usage concerns in axolotl’s new implementations to Docker dilemmas resolved for better deployment practices, the community is heavily engaged in troubleshooting and sharing solutions, such as the pip install -e . command to fix module errors. In the realm of developing with Axolotl, @stefangliga’s exploration of LoftQ and its approximation using SVD is a highlight, showcasing the community’s inventive spirit.

  • Llamacpp Lights Up Community Showcase: @mistobaan shared an innovative project involving function calls using llamacpp, leveraging various community tools and models. This experiment, detailed through a shared Gist and highlighted resources from GitHub and Hugging Face, represents a prime example of collaborative innovation within the community.

  • Deployment Concerns Call for Community Insight: In deployment-help, an oblique mention by yamashi hints at challenges with parallel requests, indicating ongoing discussions or troubleshooting within deployment contexts. While minimal in detail, it spotlights the areas within AI engineering that require attention.


Perplexity AI Discord Summary

  • Mistral Medium’s API Access Sparks Interest: Community members, led by @arcinarci, are eagerly awaiting API access for Mistral Medium, signaling a growing demand for broader API capabilities.

  • Perplexity’s Free Trial Potentially Paused: Discussions initiated by @aarav7024 suggest that the 7-day free trial offer might have been discontinued, leading to confusion among new users.

  • Perplexity Enhances App Development Creativity: Users like @gammagames found Perplexity effective for generating creative content, such as names and addresses for app development, spotlighting Perplexity Labs as a resource for exploring the AI’s functionalities.

  • Exploring Efficient Integration of Perplexity into Web Applications: A detailed guide for seamlessly integrating the Perplexity API into web apps, including handling text inputs for chat interactions and referencing documentation for API token creation, was shared, although it was noted that file uploads are not currently supported.

  • Pioneering Local Model Training and Execution with Ollama: The feasibility of local training and execution of large models without requiring high-end hardware was discussed, pointing out tools like Ollama for local model utilities, alongside community-driven support for addressing API access issues, such as the 401 authentication error troubleshooting.


LLM Perf Enthusiasts AI Discord Summary

  • Triton and CUDA discussions converge on performance tuning: Triton programming language discussions emphasized its limited low-level CUDA feature flexibility and data control at the GPU block level, with Triton’s synchronization capabilities needing enhancement. Participants recommended delving into Triton’s implementation through its original paper for a deeper understanding.

  • Vectorized Memory Access in CUDA discussed: The significance of vectorized memory access for optimizing performance in CUDA programs was underscored, with an NVIDIA blog post cited as a key resource. Additionally, a unique approach to simplifying CUDA programming using Numba was shared through a Twitter post by @HaseoX94.

  • Learning Resources and Offers for GPU Access Provided: Offers to run code on A100/H100 GPUs and a dual-GPU machine were made to the community, aiming to facilitate testing and performance measurements. CUDA beginners were guided to comprehensive resources, including a book recommendation, free GPU access at lightning.ai, and a YouTube channel CUDA MODE for CUDA learning.

  • CUDA Programming Basics and Setup Guidance: Queries about CUDA programming basics and setup, particularly for an RTX 3070 laptop, sparked discussions. Advice ranged from book recommendations and environment setup to using Visual Studio for CUDA integration on Windows, denoting a preference for Torch over TensorFlow when working with Conda.

  • CUDA Timing and Memory Management: Technical exchanges focused on CUDA memory indexing and the use of CUDA events for accurate timing. It was clarified that synchronization is crucial for timing measurements and understanding the behavior of operations like cudaMemcpy, enforcing the idea that CUDA API calls are fundamentally asynchronous and necessitate explicit synchronization for performance metrics critique.


LangChain AI Discord Summary

  • LangChain Forks Make a Comeback: After a mysterious issue where forks of the LangChain repository weren’t recognized as forks on GitHub, leading to PRs vanishing, the situation has been resolved. Contributors should manually reopen any still-affected PRs as detailed in the GitHub discussion.

  • Expertise Sought for Custom Tool Parameters: Georg.ort seeks expert consultation on defining essential and optional parameters for a custom tool, offering payment for valuable insights, with communication links provided within the general channel.

  • Innovative AI Tool Announcements Stir Excitement: Pre-launch and launch announcements including Oranscribe on Product Hunt, SkillForge V1’s demonstration of agent skill creation on YouTube, and JACoB’s introduction as a production-ready AI coding assistant at jacb.ai sparked interest and anticipation.

  • LangServe’s Access and Resource Management Challenges: Discussions around LangServe touched on efforts for quicker access for educational purposes and the necessity of an additional layer for hardware resource management to prevent server crashes during high LLM usage rates.

  • Building and Prompting AI with Precision: Detailed explorations and resources shared, such as a guide on using Qdrant’s Multitenancy for a RAG chatbot, a video on empowering AI developers, and insights on prompt engineering for open-source LLMs, reveal an active pursuit of precision and innovation in AI development and application.


LlamaIndex Discord Summary

  • LlamaIndex and Replit Join Forces for RAG Bounties: LlamaIndex has announced a $2,000 bounty in collaboration with Replit for the development of open source templates leveraging advanced Retrieval-Augmented Generation (RAG). Dive into this opportunity here.

  • Insightful Exploration on RAG at LlamaIndex: In a guest post by @CobusGreylingZA, LlamaIndex’s recent foray into handling complex queries through RAG is featured, detailing the integration of multi-agent coordination and chain-of-thought reasoning with re-ranking from Cohere. Gain valuable insights here.

  • LlamaIndex Discourse: From Fine-Tuning to Query Enhancements: Discussions in the LlamaIndex community explore various technical aspects from fine-tuning embeddings using text and metadata, transforming CSVs to JSON for better data handling, to integrating pre-trained models from Hugging Face for embedding fine-tuning within LlamaIndex’s infrastructure.

  • Integration Challenges and Platform Connectivity: Community members delve into practical issues like embedding an improved model into LlamaIndex’s SubQuestionQueryEngine, utilizing AWS Sagemaker for deploying AI applications, and the intricacies of llama packs with existing databases and file formats in conversation building scenarios.

  • Tracking AI’s Imagination with the Hallucination Leaderboard: andysingal shared a resource for those particularly interested in measuring and combating hallucinations in AI outputs, pointing to the hallucination leaderboard available here.


LAION Discord Summary

  • AI Crafts Warhammer 40k’s Grim Dark Future: A fan-created Imperium Of Man - Warhammer 40k trailer, utilizing AI generative tools, has been commended for its remarkable visual effects, notably the fire and explosions segment at 0:54, viewable here. The discussion around AI video generative tools suggests that despite some uncanny results, these tools offer great temporal consistency and potential for creative industries.

  • AI’s Creative Limitations Unveiled: A Terminus model’s intriguing output shared in the discussion illustrates the exceptional yet occasionally flawed AI-generated content, underscoring the limitations inherent in current training datasets. The visual example can be seen here.

  • Querying the Capabilities of AI in Image Generation: An inquiry surfaced regarding the comparative efficiency and advancements between DALL-E 2 - PyTorch and Stable Diffusion in the realm of AI image generation, pinpointing the community’s growing interest in understanding the nuances of these powerful tools.

  • MoE-LLaVA Framework Steps Up LVLMs: The introduction of MoE-tuning and MoE-LLaVA framework, detailed in a paper, offers a novel approach to enhancing Large Vision-Language Models’ efficiency by invoking only top-k experts during deployment, promising high parameter models with maintained computational cost. The framework is further explored in Hugging Face’s implementation, accessible here.

  • Advances in Multilingual AI and Ethical AI Codes: CodeLlama 70b Instruct illuminates the delicate balance between ethical predispositions and efficiency in code generation, available on Hugging Face, while MAGBIG, a new multilingual text-to-image benchmark shared here, aims at broadening the linguistic applicability of AI models, spotlighting the AI community’s strides towards inclusivity and responsible AI development.


DiscoResearch Discord Summary

  • Unlocking German Language Capabilities with DiscoLM German 7b and RAG Datasets: The release of DiscoLM German 7b and GermanRAG datasets marks a significant step in enhancing German language model performance, introducing comprehensive datasets for RAG finetuning and broadening applications in native language processing.

  • Prometheus Meets Mistral for Enhanced English Models: Inquiry on progress with the Prometheus Mistral model specifically for English applications, hints at the ongoing efforts in developing cutting-edge language models.

  • Code Llama 70B and Llama Factory Insights: Meta’s launch of Code Llama 70B garners attention, alongside discussions revolving around Llama Factory’s recommended practices for parameter tuning, underscoring the continuous evolution in code generation AI technology.

  • Boosting Retrieval Performance with Multilingual BGE-M3 and ColBERT: Innovations like BGE_M3 and ColBERT demonstrate advances in embedding techniques, offering multilingual support and improved search through nuanced retrieval. Practical advice was shared for BGE-large users to include prompts in queries for enhanced retrieval outcomes.

  • Diverse Strategies for German-language AI Development Emerge: Discussions encompass a spectrum from data augmentation techniques like Web Rephrase Augmented Pre-training (WRAP) and explorations of German Orca DPO datasets, to novel dataset initiatives that leverage GPT-4 for enriched training material, signaling a vibrant ecosystem of German language AI research and development.


Latent Space Discord Summary

  • VFX Studios Eye AI Integration: A tweet by @venturetwins reveals major VFX studios, including one owned by Netflix, are now seeking professionals skilled in stable diffusion technologies. This new direction in hiring underscores the increasing importance of generative imaging and machine learning in revolutionizing storytelling, as evidenced by a job listing from Eyeline Studios.

  • New Paradigms in AI Job Requirements Emerge: The rapid evolution of AI technologies such as Stable Diffusion and Midjourney is humorously noted to potentially become standard demands in future job postings, reflecting a shift in employment standards within the tech landscape.

  • Efficiency Breakthroughs in LLM Training: Insights from a new paper by Quentin Anthony propose a significant shift towards hardware-utilization optimization during transformer model training. This approach, focusing on viewing models through GPU kernel call sequences, aims to address prevalent inefficiencies in the training process.

  • Codeium’s Leap to Series B Funding: Celebrating Codeium’s progress to Series B, a complimentary tweet remarks on the team’s achievement. This milestone highlights the growing optimism and projections around the company’s future.

  • Hardware-Aware Design Boosts LLM Speed: A new discovery highlighted by a tweet from @BlancheMinerva and detailed further in their paper on arXiv:2401.14489, outlines a hardware-aware design tweak yielding a 20% throughput improvement for 2.7B parameter LLMs, previously overlooked by many due to adherence to GPT-3’s architecture.

  • Treasure Trove of AI and NLP Knowledge Unveiled: For those keen on deepening their understanding of AI models and their historic and conceptual underpinnings, a curated list shared by @ivanleomk brings together landmark resources, offering a comprehensive starting point for exploration in AI and NLP.


Alignment Lab AI Discord Summary

  • Lilac Garden Launches, Revolutionizes Dataset Transforms: Lilac Garden, a new cloud service for accelerated dataset transforms, has been announced by @nikhil_thorat, featuring LLM-powered clustering as its first service. The announcement along with details can be found on Twitter.
  • Explore Precomputed OpenOrca Clusters: The OpenOrca dataset, along with embeddings and clusters precomputed, is now available on Lilac Garden, providing an advanced toolkit for dataset analysis. Users can explore the dataset via this direct link.
  • Founding Engineer Wanted at WashU Startup: DoubleMint is seeking a founding engineer for a new venture in collaboration with Washington University in St. Louis, emphasizing proficiency in Next.js, TailwindCSS, and Supabase. The project has secured a $50,000 Letter of Intent and is poised for rapid scaling.

LLM Perf Enthusiasts AI Discord Summary

  • Gratitude Expressed in LLM Perf Enthusiasts: User @an1lam simply expressed their gratitude with a “Thanks!” in the discussion.
  • Seeking Insights on Gemini Pro: @res6969 asked for insights or results from anyone who has experimented with Gemini Pro in a production setting, aiming to understand its performance and applicability.

AI Engineer Foundation Discord Summary

  • Join the Open Source Movement at AI Engineer Foundation: @hackgoofer has made a call to @everyone to submit and recommend open source projects for the AI Engineer Foundation, emphasizing the importance of community involvement. Here’s the Guide to Submit Projects for anyone interested in contributing.

PART 2: Detailed by-Channel summaries and links

TheBloke ▷ #general (1209 messages🔥🔥🔥):

  • Miqu Model Discussion Intensifies: Users have been comparing Miqu with other models like Llama-2-70B-chat and Mixtral, finding Miqu exceptionally good at taking criticism and following instructions. Miqu outperforms even frankenmerge 120b models according to some.
  • Explorations in AI Sampling and Samplers: There’s ongoing discussion about different sampling methods, with a specific focus on the practicality of dynatemp and min-p in improving results. The dialogue circles around the challenge of defining what constitutes “better” results and the potential overconfidence of models in their estimations.
  • Navigating Browser Choices for Development: Beyond AI, there’s chatter about browser choices for development and personal use. Internet Explorer, Brave, Vivaldi, and Docker for Ubuntu on Arch systems were mentioned, alongside mentions of arcane tech like NCSA Mosaic and VRML browser plugins.
  • Discord UI and Themes: Conversation touched on Discord’s UI, specifically the preferences for dark vs. light theme, and the limited color choices available without Discord Nitro. The discussion briefly highlighted the impact of color choices on visual perception and neuro-linguistics.
  • General AI Enthusiasm and Critique: Users expressed concerns about AI models refusing to engage in certain topics or responding with disclaimers. There was also a lighter conversation on AI models’ ability to replicate catgirl behavior, showcasing the wide range of interests within the AI community, from technical to playful.

Links mentioned:


TheBloke ▷ #characters-roleplay-stories (285 messages🔥🔥):

  • Exploration and Utilization of Chat Models: The discussions have revolved around finding and applying various chat models for roleplaying and general tasks. @funtimedaddyohyea touched on exploring BagelMIsteryTour-v2-8x7B- GGUF among others, while @frammie and @_dampf prefer BagelMistery Tour v2 and Psyfighter v2, respectively, for roleplaying purposes.

  • Conversations on Quantization and Model Performance: The debate on the functional fp16 dequat of Miqu sparked technical discussions. @doctorshotgun shared an fp16 conversion of Miqu-1-70b, highlighting a significantly lower perplexity compared to previous conversions, yet facing challenges with exl2 quanting.

  • Miqu vs. Other Models: Users like @mrdragonfox and @goldkoron discussed Miqu’s performance in roleplay (RP) and its superiority in character comprehension over Mixtral and Yi-34b, citing anecdotal evidence of its effectiveness.

  • Exploration of AI for Creative Content: @c.gato mentioned an experimental approach of generating RP answers from GPT-4 using Yahoo Answers data, aiming to diversify the chatbot responses. This highlights the continuous search within the community for more dynamic and human-like interactions.

  • Technical Challenges and Community Experimentation: Various users, including @doctorshotgun and @dreamgen, discussed the technical aspects and challenges of working with models like Miqu, from quantization issues to the exploration of model efficiencies and potential improvements through fine-tuning and testing against benchmarks.

Links mentioned:


TheBloke ▷ #training-and-fine-tuning (17 messages🔥):

  • Training on A Budget with Unsloth: @superking__ mentioned that it’s possible to train models with just 16GB of memory using Unsloth, and pointed out that Colab offers free resources to run examples. This could be a valuable tip for developers with limited resources.
  • Optimization Tips for Mistral 7B Training: In a discussion on training the Mistral 7B, @bishwa3819 shared struggles with train loss not decreasing despite specific LoRA configurations. @dirtytigerx responded, suggesting the need to try to overfit the model first as a troubleshooting step and mentioned that the provided graph showing training for only 80 steps, might be insufficient data.
  • High Hardware Demands for Yarn-Scaled Models: The conversation between @blackl1ght and @sao10k highlighted the high cost as the primary reason behind the lack of instruct tunes for large-scale models, like yarn-scaled 128k models. This points out the scalability challenges in machine learning projects.
  • Seeking Tutorial for Fine-tuning Hugging Face Models: @chovii requested recommendations for comprehensive guides or tutorials for fine-tuning Hugging Face models, particularly expressing difficulty in using Trainer for the TheBloke/TinyLlama-1.1B-Chat-v1.0-AWQ model. This underscores the need for accessible information for newcomers to model fine-tuning.
  • Direct Guidance Requests: Both @givan_002 in search of high-quality datasets for role-play and @222gate seeking advice on quantizing multimodal models underscore the demand for specialized guidance and resources in the ML community. These inquiries show the varied and specific nature of challenges faced by practitioners.

TheBloke ▷ #model-merging (4 messages):

  • Community Shout-Out: @kquant expressed gratitude towards @284810978552578050 for providing feedback that helped improve their work. No specific details about the work were provided.
  • Hope for Engagement: @kquant shared a hopeful message about the interest level of their content, without specifying the topic.
  • Sharing a Research Document: @kquant posted a Google Docs link regarding k-NN in Mixture of Experts, though details about the content are not described within the message.
  • Code Snippet Cut Off: @kquant mentioned an issue with a portion of code being accidentally cut off in their documentation, indicating that readers might face difficulties without this part if they attempt to implement the discussed methods.

Links mentioned:

k-NN In Mixture of Experts: no description found


TheBloke ▷ #coding (19 messages🔥):

  • Reading vs. Writing Code: @zachmayer briefly mentioned a common sentiment among developers: Reading code is harder than writing code, sparking a nod among coders everywhere.
  • Comparative Analysis of AI Coders: @technomancer73 tested an unnamed AI code generator and found it more comprehensive in its answers than Bard, engaging in manual comparisons of generated code based on past prompts.
  • The Web Development Paradox: @wbsch and @dirtytigerx discussed the contradiction in web development culture, citing both Not Invented Here (NIH) syndrome and overreliance on external libraries as prevalent issues. They touched on the historical lack of a standard library in JavaScript, highlighting the incident with left-pad as a symptom of deeper experience and knowledge gaps among developers.
  • Programmers as Framework Technicians: @dirtytigerx lamented the emergence of a generation of programmers trained primarily to use frameworks and APIs, without the foundational skills needed to tackle novel, complex challenges. This comment sparked a conversation about the importance of understanding basic computer science principles and how the lack of specialization could be both a job security factor and a concern for those in leadership roles.
  • Struggles of Spec Writing and Project Management: @wbsch and @dirtytigerx shared their frustrations with team/project management, particularly the challenge of communicating basic computer science concepts to team members and the art of translating common sense into technical specifications. This conversation touched on both the practical and existential dilemmas faced by those responsible for leading development teams.

Nous Research AI ▷ #ctx-length-research (1 messages):

dreamgen: “global state” tokens sound similar to attention sinks, did not read the paper though


Nous Research AI ▷ #off-topic (26 messages🔥):

  • Exploring Large Language Models: @tempus_fugit05 sought resources on understanding Large Language Models (LLMs), and @teknium recommended watching a YouTube tutorial by Karpathy titled “[1hr Talk] Intro to Large Language Models”. This is aimed at providing a general-audience introduction to LLMs like ChatGPT. Watch here.
  • Upcoming Paper on Chatbot Roleplay Personality Evaluation: @lorenzoroxyolo announced they are in the process of publishing a paper focused on chatbot roleplay personality evaluation and is seeking means of advertising it. They mentioned benchmarking several models, though not exhaustive, and encouraged following @lrzneedresearch on Twitter for updates.
  • Sharing Fun with GIFs: @Error.PDF shared a couple of humorous GIFs, including one of a cat (Cat Nyash GIF) and another featuring a turtle and a dog (Turtle Dog GIF), adding a light-hearted touch to the conversation.
  • Teasing and Jokes Among Members: After @Error.PDF posted a sleeping emoji, @teknium humorously suggested that no one cared, triggering a playful exchange that included laughter and affirmative emojis from @Error.PDF.

Links mentioned:


Nous Research AI ▷ #benchmarks-log (14 messages🔥):

  • Discussing the Impact of Assembly Knowledge: @euclaise mentioned that having knowledge in assembly might not be greatly beneficial for typical code benchmarks and isn’t aware of benchmarks that assess assembly skills specifically.
  • Debate on Chain of Thought (CoT) Methodology: @euclaise suggested using the Chain of Thought (CoT) approach for fairness in evaluation, while @teknium responded it would not be a fair comparison as he has not used CoT before and would need to reevaluate all models.
  • BBH Benchmark and CoT: @euclaise highlighted that the BBH benchmark is specifically designed to test the CoT methodology, advising to always utilize CoT for assessments.
  • Future Model Evaluation Strategies: @euclaise recommended adopting CoT for all future model assessments to ensure consistent and fair evaluations.
  • Stablelm Zephyr’s Performance on BBH with CoT: @euclaise noted with surprise that Stablelm Zephyr scored 0.9% on the BBH benchmark using CoT, indicating an unusually low performance.

  • Unlimited Context Length Breakthrough with Activation Beacon: @nonameusr highlighted a significant advancement with the introduction of the Activation Beacon method for extending LLMs context, potentially solving the context length limitation by generalizing a model trained on 4K context length to 400K. The method proposes the addition of “global state” tokens to maintain fixed memory consumption and ensure linear inference time growth. Read more about the research paper and explore the implementation code.

  • SQLCoder-70B Revealed: @if_a shared news about SQLCoder-70B, a new model outperforming all publicly accessible LLMs for Postgres text-to-SQL generation, made available on Hugging Face. This model, fine-tuned on AIatMeta’s CodeLlama-70B, showcases the potential for significant advancements in SQL generation tasks. Access SQLCoder-70B on Hugging Face.

  • Temporary Halt on Memphis-CoT Due to Bug: @euclaise warned users about a discovered bug in the Memphis-CoT training code and advised against making quants, merges, or any alterations until a retraining process is completed. The model, initially aimed at improving reasoning-focused outcomes, is based on human data and undergoing an iterative corrective finetuning procedure.

  • Nous Research Unveils New Open Source Model Evaluation System: @manojbh shared Nous Research’s announcement of a novel system for evaluating open-source models through a subnet on Bittensor, combating the limitations of traditional benchmarking reliant on public datasets. This system aims to offer a dynamic, fair, and continuously evolving evaluation platform. Explore the Nous Subnet Leaderboard.

  • Call for Caution in LLMs’ Use for Cryptography: @deki04 relayed a cautionary note from @moyix on the limitations of LLMs in handling cryptography tasks, suggesting that despite the advancements in AI and machine learning, certain domains like cryptography remain challenging for these models.

Links mentioned:


Nous Research AI ▷ #general (502 messages🔥🔥🔥):

  • Debating AI and Human Language Efficiency: @nonameusr sparked a discussion about the inefficiency of AIs using human language for both comprehension and cross-model communication, suggesting that adapting AI to a more efficient form of communication might be better.
  • Miqu Shines on the Benchmarks: @n8programs highlights that Miqu stands out by achieving an 83.5 on EQ-Bench, which he claims surpasses Mistral Medium, pointing out that Miqu could be the best openly accessible model available, despite skepticism from the community.
  • AI Community Buzzes About Miqu’s Performance and Origin: The AI Discord community is abuzz with talks about Miqu’s performance on benchmarks like MMLU and EQ-Bench, with some questioning whether it’s a Mistral Medium leak and discussing its potential as a top-performing open-source model.
  • Quantization and Compression Strategies Explored: Discussions about quantization strategies such as 2-bit Qlora and the challenges of maintaining performance metrics like GSM8K score during finetuning, alongside the technicalities of dequantization to improve AI models’ efficiency and accessibility, took center stage.
  • Subnet Discussions and GPU Leasing Queries: There were inquiries about subnet 6 functionality and whether it’s possible to lease GPUs through the Akash network for model serving in conjunction with running inference via subnet, indicating a keen community interest in optimizing resource use for AI development.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (58 messages🔥🔥):

  • Exploring the Scaling Laws of MoE: @vikas.p responded to @joey00072’s query about papers on MoE scaling laws by sharing two significant papers (Scaling Laws for Autoregressive Generative Modeling and another paper’s PDF) that explore the efficiency and performance of Mixture-of-Experts (MoE) models.

  • Discussing Nous Models’ Large Context Windows: @rememberlenny inquired about the trade-offs of Nous models, such as the NousResearch_YarnMistral128k, with 200k context windows. The conversation highlighted concerns regarding the capacity to scale position embeddings and the potential for position truncation due to Λ-shaped context windows.

  • Scaling Position Embeddings in Large Context Models: Both @teknium and @bloc97 contributed to the discussion on scaling position embeddings, with bloc97 explaining the advantages and disadvantages of not truncating position embeddings in models like YaRN, which allows for attention across the entire context window.

  • Desire for Better Inference Libraries for VLMs: @gabriel_syme and @carsonpoole discussed the need for more accessible and efficient inference libraries for Vector Language Models (VLMs), highlighting the lack of such resources currently and the efforts being made by CarsonPoole and Max_paperclips towards this goal.

  • Technical Insights into Batch Inference and Framework Capabilities: @carsonpoole detailed some of the underlying work on inference libraries, particularly focusing on features like batch LORA inference, converting dense models to LORAs, and the potential for incorporating machine learning model (MLM) extensions, indicating ongoing development to support VLMs more effectively.

Links mentioned:


LM Studio ▷ #💬-general (184 messages🔥🔥):

  • Troubleshooting LM Studio’s Code and Connection Issues: @.mchinaga sought assistance with an issue in chat_gpt_bot.py, suspecting a bug, but @.ben.com clarified the problem was due to not reaching the API server. Users discussed various errors, including invalid response keys and endpoint issues, with @dagbs advising on correct API base setting and response key adjustments.

  • Exploring Text Script to Video Puppet Solutions: @toddmiller inquired about a model or app equivalent for converting text scripts to video puppets, drawing parallels with other OpenAI and related technology equivalences. The conversation evolved into discussing the limitations of existing models for this purpose, with @dagbs suggesting LM Studio might not be suitable for such advanced video manipulation.

  • GPU Acceleration Tips for Optimal Performance: @rahulg1981 questioned the best GPU acceleration settings for a RTX 4070 to improve performance without overloading the CPU. @fabguy provided detailed recommendations for adjusting n_gpu_layers based on the model size, offering a practical approach to optimizing GPU usage.

  • LM Studio’s Model Compatibility and Execution Challenges: Various users shared their struggles and solutions around running LM Studio on different hardware setups and operating systems. Topics ranged from dealing with Linux library issues, selecting the correct GPU for acceleration, and challenges related to ARM CPU compatibility on Mac and non-supported platforms like Android and iOS.

  • Discussions on LM Studio Features and Future Improvements: Members discussed the functionality and future development directions of LM Studio, including running multiple models, sorting chat histories, and integrating local models. @yagilb provided insights into version updates, bug fixes, and workarounds for using local models within LM Studio’s framework.

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (61 messages🔥🔥):

  • Tiny Models Go Absurd After Reboot: @pudlo reported that highly quantized tiny models, after a reboot, started producing hilariously absurd jokes, making them unintentionally funny.

  • Suggestion for a “Top Models of the Month” Channel: @666siegfried666 proposed creating a channel to highlight top models of the month, sparking a lively discussion on how to make model recommendations more accessible and organized. Suggestions included voting systems and admin-only posting to ensure readability.

  • Challenges with CodeLlama 70B Model: Multiple users, including @unskilless and @dave000000, reported problems with the CodeLlama 70B model, noting it was “terribly broken” for specific tasks. However, @heyitsyorkie suggested using the “Codellama Instruct” preset for better results.

  • MIQU Model Generates Mixed Reviews: Conversations about the MIQU model highlighted varying experiences, with @n8programs praising it for domain-specific knowledge and stating it sits between Mistral medium and GPT-4 in terms of capability. However, @ptable found its performance on their setup not significantly better than Mixtral and highlighted speed issues.

  • Requests for Help with Functionary Model: @vbwyrde sought assistance for using the Functionary model from Hugging Face for CrewAI, emphasizing the model’s ability to execute functions intelligently but expressing uncertainty about the correct presets or prompt formats for optimal usage.

Links mentioned:


LM Studio ▷ #🧠-feedback (5 messages):

  • Model Compatibility Confusion Cleared: @rasydev faced an error when loading a model in LM Studio. @heyitsyorkie clarified that LMStudio is only compatible with GGUF models, pointing out that the saftensors file downloaded by rasydev is not compatible.
  • CodeLlama Models Misunderstanding: In a follow-up query, @rasydev asked if the codellama/CodeLlama-70b-hf model supported LM Studio. @heyitsyorkie responded that CodeLlama models, being RAW PyTorch models, do not work with LMStudio by default, and recommended searching for GGUF quants by TheBloke to ensure compatibility.

LM Studio ▷ #🎛-hardware-discussion (99 messages🔥🔥):

  • Retro Tech in Modern Train Systems: @hexacube shared that German railway automation operates on MSDOS and Windows 3.1, sparking discussions on the simplicity and efficiency of older systems for certain applications. This segued into a wider conversation about the appreciation of coding efficiency in the demoscene community and speculation on government IT investments.

  • Performance Insights on Minimal Tech: Discussions revealed scenarios where minimal hardware (e.g., 125MHz, 8MB RAM) efficiently runs specific applications like observatories, with @hexacube mentioning successful operation on an old i5 mini PC before upgrading for more intensive processing. This highlights ongoing relevance and effectiveness of seemingly outdated hardware in certain niches.

  • AI and Gaming Evolving Together: Several users, including @cihiris and @goldensun3ds, speculated on the future intersection of AI and gaming, with possibilities ranging from AI NPCs to whole games generated on-the-fly by AI. There was enthusiasm about AI’s potential to revolutionize game development and player interaction.

  • Hardware and AI Development Constraints: The conversation touched on various considerations for running large language models (LLMs) and AI-related tasks, including the importance of GPU over RAM for speed, the potential for multi-GPU setups for different tasks within gaming, and the intriguing concept of hardware specifically designed for AI acceleration.

  • Navigating the Best Hardware Setup for AI Applications: Users, including @pudlo and @heyitsyorkie, debated the merits of investing in high powered GPUs versus ample RAM for running AI models, with consensus leaning towards the significant performance gains provided by advanced GPUs. Links to resources like a LocalLLaMA LLM GPU Buying Guide were shared, offering insights on hardware selections tailored for AI development purposes.

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (18 messages🔥):

  • Troubleshooting Exit Codes: @fabguy suggests to those experiencing Exit Code 1 errors to check the FAQ in the pinned messages for solutions, often related to C++ Redist issues.

  • Anticipation for New Release: @oldandnew. shares the community’s anticipation and eagerness to see if a new beta release will be announced soon.

  • Queries on Download Resume Feature: @greg0403 inquires about the possibility of adding a download resume feature, with @dagbs and @senecalouck pointing him towards specific channels for further information.

  • Reporting and Diagnosing Model Errors: @epicureus reports a model error with exit codes and detailed system information, leading to a dialogue with @yagilb who appreciates the error report and commits to fixing the issue after attempting to load the problematic model themselves.

  • Quick Fix for Model Loading Issue: After discussing with @yagilb, @epicureus identifies that Openhermes 2.5 works but experiences issues with other models, hinting at potential ram detection problems. @yagilb further seeks information on which specific models are failing and states that a fix is incoming upon identifying the issue with the Dr Samantha 7b model shared by @epicureus.

Links mentioned:

TheBloke/Dr_Samantha-7B-GGUF · Hugging Face: no description found


LM Studio ▷ #autogen (1 messages):

  • Wizard Coder 15b struggles with terminating outputs: @strangematter shared their experience using Wizard Coder 15b for Python code generation, finding that although it produced coherent code, it had difficulty with terminating outputs despite satisfactory results. They inquired if anyone has had better success with another model for code generation tasks.

LM Studio ▷ #langchain (2 messages):

  • LangChain Integration with LM Studio: @circulustreme inquired about the possibility of integrating LangChain with a local LM Studio running Mixtral to generate a dataset of 100 responses. @yagilb affirmed it’s possible and referred to a previous message for instructions on the connection process.

OpenAI ▷ #ai-discussions (3 messages):

  • Choosing VRAM for AI tasks: aqua_dawn_67525 is contemplating whether to get 16GB VRAM or 24GB VRAM for AI projects as a beginner, wondering if 16GB will be sufficient for a few years.
  • Moving to Cloud for More Power: aqua_dawn_67525 considers the possibility of moving onto the cloud for more computing power if personal hosting proves to be expensive.
  • Personal Experience with 16GB VRAM: toror shares that having 16GB VRAM on a 4080 GPU is ample for many of the modern, optimized models, providing a data point for aqua_dawn_67525’s consideration.

OpenAI ▷ #gpt-4-discussions (170 messages🔥🔥):

  • GPT Plus Users Encounter Message Limit Confusions: Several users, including @iyons and @myroslava_35196_08143, expressed confusion and frustration over hitting a message limit warning on GPT Plus despite not reaching the supposed 40 messages per 3 hours limit. They reported receiving no response from support concerning this issue.

  • Introduction of GPT Mentions Sparks Excitement: The introduction of GPT mentions, as announced by @kumquatexpress, has generated excitement among users. This new feature allows for the sharing of context and custom actions between mentioned GPTs in a conversation, promising enhanced composability and flexibility in AI applications.

  • Effective Strategies for Long Conversations Shared: @darthgustav. shared tips for managing long conversations with GPT, advising periodically asking for summaries to maintain context and effectively using the conversation token budget.

  • Users Explore the Potentials and Limitations of GPT Mentions: Users like @darthgustav. and @blckreaper explored and debated the capabilities of GPT mentions, discussing the ability to switch contexts between different GPTs and the constraints related to each GPT’s knowledge base.

  • Calls for Expanded Functionality in GPT Plus: @peter07082 expressed difficulties accessing new features such as Explore GPTs in Australia, despite being an early GPT Plus subscriber. They, along with other users, faced challenges in obtaining support or clear answers from OpenAI’s help system.


OpenAI ▷ #prompt-engineering (32 messages🔥):

  • Exploring GPT for Word Games: .berns inquired if it’s feasible to use prompts effectively for grid-based word games, noting GPT’s prior struggles with such tasks. 7877 and eskcanta provided insights, suggesting the use of a 2D array and Python tools for tracking information to potentially overcome these issues, albeit recognizing the challenge’s complexity.

  • New Era with DALL-E 3 and GPT Integration: darthgustav. shared their positive experience with a new feature allowing GPTs to call on each other using the @ sign, particularly highlighting successful interactions between Custom GPT and DALL-E 3. They emphasized this integration’s potential, noting it preserved entire contexts including code and visual prompts, marking a significant step forward in chat realm engineering.

  • Clarification on Inter-GPT Communication Limits: novumclassicum sought advice on using the @ sign to chain multiple GPTs for a complex task involving creation, proofing, and translation. solbus and bambooshoots clarified that this feature currently only works in-conversation, not within the instructions field of a GPT, meaning tasks must be managed manually, step by step.

  • Powerful Potential of Full Context Preservation: darthgustav. discussed the groundbreaking potential of the feature that allows for full context preservation in requests between GPTs. This opens doors for seamless multi-disciplinary projects, such as writing and illustrating a children’s book in a single session without switching tools or tabs.

  • Insight on Manual Process for Now: Despite the enthusiasm for the new @ sign feature’s potential, novumclassicum discovered that for now, their project—a complete chapter lesson for language teachers including illustrations—would require manual calls to various GPTs. darthgustav. offered guidance on navigating this “labyrinth,” suggesting there’s a way to leverage the feature effectively even with current limitations.


OpenAI ▷ #api-discussions (32 messages🔥):

  • Exploring GPT’s Potential in Word Games: @.berns expressed concern about GPT’s effectiveness in word games like hangman, suggesting it often makes mistakes such as making up words. In response, @7877 and @eskcanta provided solutions involving the use of 2D arrays and Python tools for better tracking and letter placement within games, highlighting both the challenges and potential strategies for success.

  • DALL-E 3’s New Features Frustrate and Fascinate: @darthgustav. shared his testing experience with the new feature that allows calling different GPT models using the @ sign, illustrating how it preserves full context across different models and functionalities. He highlighted the power of seamlessly combining custom GPTs and DALL-E 3 for image generation, noting both the limitations and surprising capabilities of this feature.

  • Master GPT Routine and GPT Calls Clarified: @novumclassicum inquired about the capability to program master GPT routines that call subroutines in other GPT models for processes like creating, proofing, and translating texts. However, @solbus and @bambooshoots clarified that while the concept is intriguing, the functionality to automate this process through instruction fields currently does not exist, requiring manual intervention for each step.

  • Potential for Creative Project Integration: @darthgustav. and @novumclassicum discussed the exciting potential of using the new GPT and DALL-E features for comprehensive projects, like writing children’s books or creating full chapter lessons with illustrations. Despite current limitations and the necessity of manual operations, they remain optimistic about future developments that could streamline such creative endeavors.

  • Community Engagement and Problem-Solving: Several users, including @darthgustav. and @solbus, demonstrated community-driven troubleshooting and sharing of experiences with recent OpenAI features. Their dialogue underscores the community’s role in discovering, testing, and providing feedback on the evolving capabilities of OpenAI’s models.


Eleuther ▷ #general (74 messages🔥🔥):

  • ML in the Browser Takes a Leap with Ratchet: @frazermc asked about machine learning inference engines in the browser, leading to @carsonpoole introducing Ratchet, a cross-platform browser ML framework using Rust and WebGPU. It powers whisper-turbo.com and promises quantization support and optimization for speed and developer experience.

  • Flash Attention 2 Under the Microscope: @nshepperd sparked a technical discussion about a potential use-after-free issue in Flash Attention 2, yet the code “works” possibly due to the memory not being overwritten before kernel execution. This oddity opened up a detailed dialogue about Tensor memory management in PyTorch with @nlab_enthusiast.

  • EleutherAI Celebrates ICLR Acceptances: @stellaathena shared the exciting news that 6 out of 10 EleutherAI-affiliated papers were accepted to ICLR, listing accepted publications like “LLeMA: An Open Language Model for Mathematics” and expressing congratulations to first-time authors and contributors.

  • AI+Music Intersection Explored with New Survey: @loubb encouraged community members to help evaluate AI-driven music models through a survey found at http://survey.loubbrad.com:8501/. The announcement, supported by @stellaathena, aims to gather insights on the latest model developments.

  • Interest in GitHub’s Annual Growth Data and Anthropic’s Interpretation Efforts: Users expressed curiosity about GitHub’s year-over-year stats on commits and pull requests, and @digthatdata delved into Anthropic’s research on “OV” circuits, sharing links to transformer-circuits.pub as a resource for understanding emerging technology and interpretability efforts.

Links mentioned:


Eleuther ▷ #research (96 messages🔥🔥):

  • Cluster Performance vs. Accuracy Paradox: @llm_enjoyer sparked a discussion on the expectation that better clustered embeddings (measured by metrics like Davies–Bouldin and Calinski–Harabasz indices) should lead to higher classification accuracy. However, they observed the opposite in their experiments, finding models with better clustering metrics performing worse in accuracy, leaving them puzzled. Davies-Bouldin Index Wiki, Calinski-Harabasz Index Wiki

  • Exploring the Limits of muP Trained Models: A series of inquiries about the largest models trained with muP revealed a 3B model using Cerebras as the largest cited specific case, with also a speculative mention of GPT-4 potentially being trained with muP. Through this discussion @jstephencorey, @ad8e, and @thatspysaspy explored which large scale models might have benefited from muP technology.

  • Sparse Fine-Tuning (SFT) Beats LoRA in Latest Research: @random_string_of_character shared a breakthrough in sparse fine-tuning for large language models like Llama 2, presenting a method that’s both parameter- and memory-efficient while outperforming the (q)LoRA approach. The research suggests a significant advancement in instruction tuning performance with both the paper and code available for further exploration.

  • Potential for “Citations” Using RAG Q&A Explored: @carsonpoole proposed the idea of generating detailed “citations” by analyzing attention maps during RAG Q&A sessions, prompting @johnryan465 and @kharr.xyz to point out relevant tools and approaches used previously for similar objectives, such as Bertviz and fine-tuning strategies to produce line citations.

  • Mixture of Softmaxes Proposed for LLM Training: @alstroemeria313 shared an intriguing method of training models by blending logits from the last k layers of a transformer, an approach inspired by the concept of avoiding the “softmax bottleneck”. This method, which seems to show promise at a small scale, involves softmaxing each set of logits and then blending them according to softmaxed weights for output.

Links mentioned:


Eleuther ▷ #lm-thunderdome (3 messages):

  • Efficient Dataset Caching Technique Shared: @hailey_schoelkopf shared a helpful solution to cache datasets for offline use with HF datasets, referencing a GitHub issue and the documentation on using local datasets in the EleutherAI’s lm-evaluation-harness.

  • PyPI Project Ownership Transfer Request: @hailey_schoelkopf requested to transfer owner level access of the lm_eval PyPI project to haileyschoelkopf or to eleutherai usernames, aiming to manage maintainer permissions more effectively.

  • Ownership Transfer Confirmed: @bmk1476 promptly responded, confirming the completion of the ownership transfer request regarding the lm_eval PyPI project.

Links mentioned:


Eleuther ▷ #multimodal-general (2 messages):

  • Exploring Tokenizing Strokes for Vector Graphic Synthesis: @stellaathena mentioned a theoretical discussion about tokenizing strokes for vector graphic synthesis, linking to a relevant study, StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis. The concept appears to align with this study’s focus.

Eleuther ▷ #gpt-neox-dev (7 messages):

  • CUDA Container Woes on PPC64LE: @catboy_slim_ discovered that the only NVIDIA CUDA container that supports ppc64le is their UBI (Red Hat) one, and lamented over RHEL’s peculiar stance on Python versions, particularly the non-recognition of 3.10.
  • CUDNN Installation Issues: Further issues were noted by @catboy_slim_ with the CUDNN installation on the UBI NVIDIA image, claiming it appears to be broken.
  • Challenges with PPC64LE Wheel Building: @catboy_slim_ expressed frustration and a likelihood of giving up on building a wheel for ppc64le, hinting at the severity of the challenges faced.
  • Apex Cross-Compilation Dilemmas: The complexity and undesirability of setting up a cross-compilation build for Apex was discussed by @catboy_slim_, highlighting the significant difficulties and dependencies involved in NeoX and deepspeed without Apex.
  • Necessity of On-Device Apex Building: Concluding the discussion, @catboy_slim_ stated that building Apex on-device seems to be an unavoidable solution for specific cases, and emphasized the impracticality of prebuilding general-purpose binaries due to NVCC’s limitations concerning cross-compilation.

Mistral ▷ #general (147 messages🔥🔥):

  • Mistral on a Budget: Users debated the possibility of running Mistral 7B models on GPUs with 6GB VRAM like the 1660 Ti. @batot4968 reported failure due to out of memory issues, while @mrdragonfox clarified that it works but not in fp16 and insisted that most users need at least 24GB of VRAM to play with AI locally.

  • Finding the Right Model for Restricted VRAM: Amidst discussions, @batot4968 found a model on Hugging Face that is quantized to work with around 5.5GB VRAM, suitable for 1660 Ti. The conversation highlights the need to choose quantized models for lower VRAM requirements.

  • Performance Queries on High-End Systems: Concerns were raised about the performance of Mistral 7B models on high-end systems like RTX4090, with @batot4968 observing inconsistent speeds. @i_am_dom recommended focusing on GPU utilization to solve the issue, implying CPU usage should be minimized for optimal performance.

  • Clarifications on Running Models: As users like @mikifireblue sought clarity on running Mistral models without fully loading them into VRAM, @i_am_dom pointed towards utilizing resources like Colab notebooks for guidance on efficient model loading.

  • Tech Enthusiasts Navigate AI Landscape: Discussions from users new to the scene, like @krahs., who expressed interest in integrating Mistral AI into game design, underline the community’s exploration and experimentation with AI models. The dialogue showed a mix of enthusiasm and the need for detailed guidance to navigate the AI model landscape efficiently.

Links mentioned:


Mistral ▷ #finetuning (2 messages):

  • Seeking Brevity in LLM Responses: @brentnhunter requested advice on how to make the LLM’s responses shorter. Despite instructions for brevity and limiting token count, responses remained undesirably lengthy, specifically desiring straightforward answers like “4” for questions such as “what is 2 plus 2”.

  • Fine-Tuning Tips for Model Improvement: @friendly911 suggested running more steps, considering 60 too few for significant data sizes, and recommended decreasing the learning rate to maybe 2e-5 for better model performance.


Mistral ▷ #showcase (1 messages):

  • Open Source Web UI for Mistral Released: @darkstar1011 has launched an open source project named uMdali aimed at providing a Web UI for Mistral API. This project also supports connections to Ollama, OpenAI, and Gemini, positioning itself as an “Enterprise Chat Front End”.

Links mentioned:

GitHub - brett-baudin-consulting/uMdali: Enterprise Chat Front End: Enterprise Chat Front End. Contribute to brett-baudin-consulting/uMdali development by creating an account on GitHub.


Mistral ▷ #la-plateforme (4 messages):

  • Open Call for Contributions on Mistral’s GitHub: @sophiamyang encourages the community to submit Pull Requests (PRs) to Mistral’s public documentation on GitHub, inviting collaboration and contributions.
  • Direct Support for Notebook Inquiry: @sophiamyang addressed Patrick with an apology for the oversight and promised to review the submitted notebook, highlighting Mistral’s responsive engagement with its community.
  • Internship Inquiries at Mistral Spark Curiosity: User @bepis4552 inquired about the possibility of applying for an internship at Mistral, indicating interest in joining the team.
  • Tough Competition for Mistral Internships: In response to an internship inquiry, @sublimatorniq points out the high qualifications of Mistral’s developer relations team and suggests that landing an internship could require exceptional talent and a bit of luck.

Links mentioned:

GitHub - mistralai/platform-docs-public: Contribute to mistralai/platform-docs-public development by creating an account on GitHub.


HuggingFace ▷ #announcements (1 messages):

  • Code Llama 70B Launches: @lunarflu announces the release of Code Llama 70B, the community's latest AI chat model. Try it out here.
  • Sentence Transformers v2.3.0 Released: @tomaarsen introduces Sentence Transformers v2.3.0 featuring bug fixes, performance enhancements, and more efficient model loading. Release notes available here.
  • Introducing Serverless Object Detection: @whitphx shares a Gradio-Lite and transformers.js.py collaboration for a serverless object detection app. Check out the app and code here.
  • Autotrain Advances to Local-first: @abhi1thakur declares Autotrain is now "local-first", enabling local training with a UI through a simple pip install. Instructions available here.
  • Hugging Face and Google Cloud Partnership: A strategic partnership between Hugging Face and Google Cloud aims to democratize AI utilizing open models and technologies. More details on the partnership can be found here.

Links mentioned:


HuggingFace ▷ #general (97 messages🔥🔥):

  • Exploring Multimodality in LLMs: @thegenerativegeneration is catching up on LLM and multimodality understanding, particularly interested in how models handle multiple images and videos simultaneously for context understanding. They also inquired about practical experiences with LLMs that have 3D understanding, seeking any relevant surveys or resources on the topic.

  • Reducing LLM Costs: @jasonzhou1993 shared a YouTube video titled “The REAL cost of LLM (And How to reduce 78%+ of Cost)” discussing strategies to significantly reduce LLM operational costs.

  • Seeking Tips on Twitter Posts for Open Source/ML Projects: @vipitis sought advice on writing announcement-style Twitter posts for open source/ML topics, eliciting tips and feedback from the community. Recommendations included analyzing high-quality announcements and considering visual storytelling as an effective tool.

  • CUDA Memory Allocation Inquiry: A discussion between @felixsanz and @pixxelkick about Python’s PyTorch torch.cuda.max_memory_allocated function and how it correlates with memory usage reported by NVIDIA’s nvidia-smi tool. Confusion arose over the discrepancy between reported and actual GPU memory allocation.

  • Inquiry about TTS AI Performance on Low-Spec Hardware: @yengecbey queried the community about the feasibility of running text-to-speech AI on a laptop with an i5 processor and 8GB of RAM, indicating an interest in understanding the hardware requirements for TTS AI applications.

Links mentioned:


HuggingFace ▷ #cool-finds (1 messages):

  • Unveiling the Multimodal Malaysian LLM Dataset: User @andysingal shared a link to a multimodal Malaysian LLM dataset hosted on HuggingFace, offering a resource for developing LLM models with a focus on the Malaysian context. This dataset, part of the mesolitica collection, includes translated LLaVA instructions and aims to enhance language model training with multimodal inputs.

Links mentioned:

Multimodal Malaysian LLM dataset - a mesolitica Collection: no description found


HuggingFace ▷ #i-made-this (3 messages):

  • Excel/CSV files directly to database magic: @impl66 created a Gradio app that transforms Excel/CSV files into database tables and allows users to query them easily. Check out the app on HuggingFace Spaces.

  • Tackling AI existential fears: @mateomd_dev discusses the often sensationalized fear of AI gaining consciousness and turning against humanity in the latest issue of their newsletter, Recurrent Neural Notes. For a deep dive into the topic, visit RNN #8 - Will AI Become Evil?.

  • Magic: The Gathering meets AI: @joshuasundance introduces what might be the first Magic: The Gathering model on HuggingFace, capable of multi-label classification of card color identity based on card name and text. To explore this innovative use of AI for deck building, visit mtg-coloridentity-multilabel-classification.

Links mentioned:


HuggingFace ▷ #reading-group (3 messages):

  • New to the Reading Group Channel: @marc.casals.salvador inquired about how the channel operates and if there are meetings set up to discuss readings.
  • Discord Calls for Presentation: @chad_in_the_house responded to @marc.casals.salvador, noting that discord calls are being organized when presenters are available, with the next session planned for Friday around 1-2 pm EST.

HuggingFace ▷ #diffusion-discussions (5 messages):

  • Seeking robustness beyond DPO for QA datasets: @blackbox3993 is exploring how to enhance a question answering dataset with negative answers using DPO (Differential Privacy Optimization). They are curious about alternative methods to make their model more robust and accurate.

  • Inference support query for lokr and loha: @forsana is inquiring if lokr or loha support inference using the “loading with peft” method, indicating a specific interest in deployment techniques.

  • CUDA plugin registration errors in Google Colab: @straze007 encounters multiple CUDA-related errors while attempting to fine-tune a text-to-image model using LoRA in Google Colab, specifically with cuDNN, cuFFT, and cuBLAS plugins.

  • Disappointment with 70B coder’s chat capabilities: @pseudoterminalx shares a link (hf.co chat) expressing disappointment with the 70B coder chat model, noting it lacks knowledge of the 🤗 Diffusers library.

  • Struggle to replicate specific art style in Stable Diffusion: @troyfix aims to recreate an anime-style art form using vanilla Stable Diffusion and provides a detailed prompt to capture the sketchy and rough texture of the target style. They link to an example (akimasaweb) which embodies the aspirational art style.

Links mentioned:

works 徳永明正-航空イラストなど-: no description found


HuggingFace ▷ #NLP (7 messages):

  • GPU Acceleration Troubles Detected: User @sgp began troubleshooting an issue where they realized that GPU acceleration was not functioning at all despite their efforts in configuring the setup.
  • RTX 2080ti in the Mix: They disclosed using an RTX 2080ti for their experiments, hinting at their high-end hardware setup.
  • LLama GitHub Issue Consulted: Seeking solutions, @sgp found and shared a GitHub issue related to LLama cpp problem with GPU support, which suggested potential fixes.
  • Encountered nvcc Compilation Error: Following the proposed GitHub solutions, @sgp encountered a compilation error: nvcc fatal : Unknown option 'fPIC', signifying compatibility issues with the nvcc compiler options.
  • Possible Misconfiguration Leads to Larger Issues: The troubleshooting attempts led to a complication where @sgp indicated that their efforts might have inadvertently broken their existing setup, affecting the functioning of their gptq model.

Links mentioned:

LLama cpp problem ( gpu support) · Issue #509 · abetlen/llama-cpp-python: Hello, I am completly newbie, when it comes to the subject of llms I install some ggml model to oogabooga webui And I try to use it. It works fine, but only for RAM. For VRAM only uses 0.5gb, and I…


HuggingFace ▷ #diffusion-discussions (5 messages):

  • Query on Creating Negative Answers for Datasets: blackbox3993 is looking for ways to enhance a question answering dataset with negative answers, questioning if DPO is a viable method for this. They also ask for alternatives to DPO to make the model more robust and accurate.
  • LOKR or LOHA Support Inquiry for Inference: forsana inquires about the support for inference using LOKR or LOHA with the PEFT loading method, seeking clarity on the available options.
  • TensorFlow Errors in Google Colab: straze007 experiences multiple errors while trying to fine-tune a text-to-image model using LORA in Google Colab, pointing out compatibility issues with TensorFlow plugins.
  • Disappointment with 70B Coder Chat Experience: pseudoterminalx shares a link expressing dissatisfaction with the 70B coder chatbot’s lack of knowledge regarding the 🤗 Diffusers library.
  • Seeking the Perfect Anime Art Style with Stable Diffusion: troyfix struggles to recreate a specific anime art style using vanilla stable diffusion, providing a detailed prompt and an example photo but not achieving the desired sketchy and rough look.

Links mentioned:

works 徳永明正-航空イラストなど-: no description found


OpenAccess AI Collective (axolotl) ▷ #general (23 messages🔥):

  • Axolotl Updates Bring Joy: @nafnlaus00 shared their delight in the continuous improvement of axolotl and its dependencies that yield lower VRAM usage, faster training, and better outcomes without needing dataset or configuration adjustments. This feedback was appreciated, leading to a conversation about sharing the positive experience on Twitter and highlighting the importance of reproducibility.

  • Discussions on Hardware and VM Enhancements: @dreamgen explained a part of their speedup in AI projects was due to increased wattage (300 vs 700), while @dangfutures expressed a need for increased VM RAM on GPUs, emphasizing the importance of hardware in AI development.

  • The Excitement around MIQU-1-70b Dequantization: @dreamgen presented the dequantization of MIQU-1-70b from q5 to f16 and its adaptation to PyTorch, including a link to the model on Hugging Face. The conversation involved sharing the usage code snippet for implementation details and encouraging community members to explore this development.

  • Hardware Speculation and Rack Considerations: @dreamgen shared information about a 4xMI250 server being priced at 70K but also pointed out the logistical consideration of needing space for a server rack, highlighting budget and space as critical factors in infrastructural planning for AI projects.

  • Speculation and Inquiry into Mistral Medium’s Authenticity: Discussion emerged around the credibility and performance of Mistral Medium, with @le_mess and others questioning its authenticity and benchmark results, and @dreamgen offering access to the Mistral API for those wanting to run their own tests.

Links mentioned:

152334H/miqu-1-70b-sf · Hugging Face: no description found


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (7 messages):

  • VRAM Usage Concern on New Implementation: @caseus_ highlighted a known issue where a recent implementation consumes 2x VRAM, a concern under investigation by the paper’s author. @suikamelon expressed disappointment, having hoped to test it locally.
  • Potential DIY Solution to LoftQ by StefanGliga: @stefangliga suggested possibly re-implementing LoftQ due to understanding that it might be just an alternative initialization technique for LoRA.
  • Exploring First Order Approximation with LoftQ: In an attempt to approximate LoftQ, @stefangliga shared a code snippet using SVD on the difference between original and dequantized weights as what they believe is a first order approximation to LoftQ.
  • Debate on How to Address Implementation Concerns: @caseus_ argued that it might be more beneficial to correct the VRAM issue upstream and, failing that, to address it within Axolotl.

OpenAccess AI Collective (axolotl) ▷ #general-help (65 messages🔥🔥):

  • Docker Dilemmas Solved with pip install: @duke001. encountered an error “No module named 'axolotl.cli'” when using the Docker image for Axolotl. The issue was resolved by @nanobitz’s guidance to cd /workspace/axolotl and pip install -e ., highlighting the importance of proper volume mounting in Docker use.

  • Inference on Merged Models with vLLM: @diabolic6045 queried about inferring a merged qlora llama2 7B model using vLLM, to which @nanobitz advised ensuring sufficient vRAM for operations. Discussions also included observations about vLLM’s speed versus quality in text generation.

  • Training Troubles Tackled: @jorelosorio faced a ZeroDivisionError during model training with a small dataset. @nanobitz recommended adjustments to num_epochs and micro_batch_size, and subsequently suggested the use of gradient_accumulation_steps: 1 to prevent division by zero errors.

  • Adding Slang to Tokenizers Explained: @arcontex sought advice on training a conversational model with Axolotl using a corpus with country-specific slang. @nanobitz explained how to add new tokens to the tokenizer using the built-in tokens: in YAML, illustrating the process with chatml tokens and discussing when adding words to the tokenizer is most beneficial.

  • Conversational Model Training Clarified: @arcontex inquired about special considerations for declaring a dataset in YAML when training a conversational model with Axolotl. @nanobitz suggested remapping to sharegpt for ease, outlining how to define the dataset in YAML for conversational model training with data in OpenAI format.


OpenAccess AI Collective (axolotl) ▷ #community-showcase (1 messages):

Links mentioned:

Tweet from Fabrizio Milo (@fabmilo): Had fun experimenting on function calling outside #OpenAI API. Sharing my #colab [1] that leverages @ggerganov’s llama.cpp[2] / @abetlen llamacpp python wrapper [3] + @LangChainAI wrapper + @ca…


OpenAccess AI Collective (axolotl) ▷ #runpod-help (9 messages🔥):

  • Troubleshooting axolotl module not found: @gonejiggy encountered an error with the axolotl module not being found in the runpod docker image, which was temporarily fixed with pip3 install -e '.[deepspeed]' and pip uninstall flash_attn commands. However, they were confused as this issue was not present the previous day.
  • Possible Docker Volume Mounting Issue: @caseus_ and @propback discussed a potential problem where Docker might not mount axolotl in the workspace directory properly due to default mounting of network volumes to /workspace, which could overwrite the container’s /workspace.
  • Jupyter Server Extension Warning and Errors: @rss101 shared server logs showing warnings about deprecated Jupyter server extension functions and a critical error stating /workspace is outside root contents directory, resulting in the Http service at port 8888 never being ready.

OpenAccess AI Collective (axolotl) ▷ #deployment-help (1 messages):

yamashi: Parallel req


Perplexity AI ▷ #general (12 messages🔥):

  • Curiosity About Mistral Medium API Access: @arcinarci inquired about when API access for Mistral Medium might be available, indicating community interest in wider API access.
  • Free Trial Confusion Cleared: User @aarav7024 was puzzled about why they couldn’t access the 7-day free trial. This sparked a discussion, possibly suggesting that Perplexity might have halted the 7-day free trial offer.
  • Understanding the Mobile App’s Limitations: @gooddawg10 questioned if multiple uploads were possible on Android. @ok.alex confirmed that the file and image upload feature is not yet available on the app but mentioned it would be added in future releases.
  • Seeking Guidance on Image Generation: @stoop6981 sought advice on how to use the Image Generation Model effectively, after encountering issues. @ok.alex directed them to a helpful thread for more detailed guidance.

Perplexity AI ▷ #sharing (10 messages🔥):

  • Creative Use of Perplexity for App Development: @gammagames explored using Perplexity for generating names and addresses for a delivery-themed Pomodoro app, finding the tool highly effective for creative content creation.

  • Discovering Perplexity Labs: @rowalth highlighted the existence of Perplexity Labs, where users can experiment with various functionalities of the AI.

  • Bias and Decision Making in App Development: @zinovi.eu reflected on personal biases against Apple Inc. while considering iOS app development, ultimately deciding against it despite a constructive inquiry with Perplexity.

  • From Music Classification to Ice Cream Machines: @fungifriendly47 embarked on a journey with Perplexity from music classification to discovering Ice Cream machines on a B2B site, illustrating Perplexity’s diverse utility.

  • Finding the Right Laptop for Gaming in India: @aninokuma95 conducted a thorough search for laptops with good GPUs under $600, identifying options like Lenovo Yoga 7i (2023) and Acer Nitro 5, and highlighting the importance of component consideration for gaming performance.

Links mentioned:


Perplexity AI ▷ #pplx-api (61 messages🔥🔥):

  • Integrating Perplexity in Web Apps Made Easy: @dogemeat_ helped @andreafonsmortigmail.com_6_28629 by sharing a starting point for integrating the Perplexity API into a web application. Instructions and relevant documentation can be found here and API token creation here. However, it’s noted that the pplx-api does not support file uploads for chat interactions.

  • Perplexity API and File Handling Inquiry: @andreafonsmortigmail.com_6_28629 queried about the capability of pplx-api to handle file uploads and text summarization. @clay_ferguson clarified that while direct file handling might not be supported, users can extract text from files for inclusion in prompts, effectively allowing for summarization within the given text limits.

  • Discovering Cody, the AI Coding Assistant: @thereverendcognomen shared insights on using Cody, a free AI coding assistant that knows the user’s entire codebase, suggesting it as a model for future integrations and expressing interest in consolidating AI-related expenses under one platform. More information on Cody can be found here.

  • Potential for Local Model Training Questioned: @gritknox and @thereverendcognomen discussed the feasibility of training and testing large models without high-end local devices. The conversation highlighted the ability to use certain models locally with basic hardware and the mentioned utility of platforms like Ollama for local model execution and training.

  • API Troubleshooting and Community Support: @mafia_boii faced a 401 authentication error while trying to access the Perplexity API. Community support led by @clay_ferguson provided troubleshooting steps, including rewinding to more basic shell commands for isolating the issue, and confirmed the validity of the snippet found in the documentation.

Links mentioned:


LLM Perf Enthusiasts AI ▷ #triton (5 messages):

  • Flexibility Issues with Triton and CUDA: @mhmdsabry inquired about low-level CUDA features that Triton lacks flexibility with, requesting how these aspects could improve performance. He also asked for links and resources for a detailed answer.
  • Triton’s Handling of Low-Level Data: @gogators. highlighted that Triton provides limited control over data storage and partitioning at the GPU ‘block’ level, affecting the use of shared memory and registers crucial for algorithms like flash attention. He mentioned that despite this, Triton’s management of these resources could still reach optimal performance levels.
  • Recommended Reading on Triton’s Implementation: For those looking to dive into the specifics, @gogators. recommended reading the original Triton paper and GPT-4 details as excellent resources for understanding Triton’s implementation details.
  • Triton’s Synchronization Features Need Improvement: Addressing part of @mhmdsabry’s query, @andreaskoepf pointed out Triton’s weak synchronization features, specifically highlighting the lack of robust sync primitives beyond the documented debug_barrier, which is mainly used to synchronize all threads in a block.

Links mentioned:

triton.language.debug_barrier — Triton documentation: no description found


LLM Perf Enthusiasts AI ▷ #cuda (24 messages🔥):

  • Vectorized Memory Access Boosts Performance: @andreaskoepf shared insights on improving performance in memory-bound CUDA kernels, suggesting a shift towards reading consecutive memory and using vector loads. He referenced an NVIDIA blog post to emphasize the significance of optimizing bandwidth utilization through vectorization.
  • CUDA Made Easier with Numba: @hamelh highlighted a Twitter post by @HaseoX94 that discusses simplifying CUDA programming using Numba, including a tutorial by Jeremy Howard that demystifies CUDA for Python users.
  • Potential CUDA Presentation Teased: @marksaroufim humorously volunteered @555959391833292811 for a talk on February 24, sparking a conversation about presenting coding work, with @zippika expressing humility about their presentation skills while appreciating the opportunity.
  • RAM vs. VRAM Requirements Debated: @bazlan inquired why some recommend having twice as much RAM as VRAM, leading to a discussion with @marksaroufim and @zippika about the practical benefits of ample RAM for data preprocessing and model manipulation.
  • CUDA Refactoring Trials and Tribulations: @artste shared their experiences with refactoring CUDA code to enhance readability and performance, ultimately finding a “non-float” approach suggested by @555959391833292811 to be the fastest albeit with minor pixel discrepancies. The journey and comparison of various methods are compiled in a GitHub notebook.

Links mentioned:


LLM Perf Enthusiasts AI ▷ #torch (1 messages):

andreaskoepf: https://x.com/pytorch/status/1752406904809341165


LLM Perf Enthusiasts AI ▷ #algorithms (2 messages):

  • Offer to Run Code on A100/H100 GPUs: @vim410 has offered to run someone’s codebase on A100/H100 GPUs to generate data points. However, they cannot provide SSH access.
  • Dual-GPU Machine for Testing Available: @jeremyhoward is willing to provide a dual-GPU machine for testing for an unlimited time. Direct messaging him is the way to get this organized.

LLM Perf Enthusiasts AI ▷ #suggestions (1 messages):

vim410: Thanks for sharing, i am one of the person who wrote the article. 🙂


LLM Perf Enthusiasts AI ▷ #jobs (2 messages):

  • ML Performance Wizard Guide Unveiled: User @muhtasham shared a link to the ML Performance Wizard, a comprehensive guide or resource, but did not provide further details in their message.
  • NVIDIA Seeks CUDA and C++ Talent: @vim410 announced that NVIDIA is hiring CUDA and C++ experts. Interested candidates with intermediate or expert knowledge in CUDA were encouraged to contact them for connections to the right NVIDIA team members.

Links mentioned:

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It’s the all-in-one workspace for you and your team


LLM Perf Enthusiasts AI ▷ #beginner (8 messages🔥):

  • Diving Into CUDA Programming: @noobpeen expressed interest in learning CUDA programming and inquired about guides and prerequisites.
  • A Treasure Trove of CUDA Resources Unveiled by @apaz: For CUDA learning, @apaz recommended the “Programming Massively Parallel Processors” book, accessible on eBay or libgen.is. Apaz shared valuable resources, including a free GPU access link at lightning.ai, a twitter thread for setting up an environment by Jeremy Howard (@jeremyphoward), and a YouTube channel for CUDA lectures at CUDA MODE.
  • CUDA Setup Query for RTX 3070 Laptop: @noobpeen sought advice on setting up CUDA library for an RTX 3070 laptop, questioning if there were any special requirements.
  • Conda Preference: Torch Over Tensorflow: In response to @noobpeen’s query regarding the preference between Conda with Torch or TensorFlow, @apaz endorsed using Conda with Torch, humorously commenting on Tensorflow’s waning relevance.
  • Windows CUDA Setup Tip from @lancerts: For CUDA setup on Windows, @lancerts suggested using Visual Studio for its direct integration with CUDA, clarifying not to use Visual Studio Code for this purpose.

Links mentioned:


LLM Perf Enthusiasts AI ▷ #pmpp-book (6 messages):

  • CUDA Memory Indexing Explained: @andreaskoepf detailed how to calculate the memory index in CUDA, emphasizing the roles of blockDim.x, blockIdx.x, and threadIdx.x in determining an element’s index within a specific section of the memory array.
  • Understanding Through Collaboration: Following an explanation from @andreaskoepf, @ashpun expressed gratitude for the clarity provided on CUDA memory indexing, highlighting the value of community support in resolving technical inquiries.
  • Exploring CUDA Events for Timing: @shindeirou initiated a discussion on the necessity of using cudaEventSynchronize() when measuring the time of cudaMemcpy operations, despite the blocking nature of cudaMemcpy.
  • Clarifying CUDA Timing Mechanisms: @_tvi_ responded to @shindeirou with a clarification that synchronization is needed for both the completion of cudaMemcpy operations and the recording of the event itself, which might explain unexpected behavior like getting 0.0 in timing measurements.
  • Importance of Synchronization in CUDA: @vim410 emphasized that all CUDA API calls should be considered asynchronous by default, underscoring the importance of explicit synchronization when capturing performance metrics.

LangChain AI ▷ #announcements (2 messages):

  • LangChain Fork Fiasco: @.bagatur reported an incident where forks of the LangChain repository were not being recognized as forks, and corresponding PRs vanished overnight. A GitHub discussion has been opened for tracking the issue.
  • Quick Recovery and Action Steps: @.bagatur followed up announcing that the issue with LangChain forks appears resolved, and many of the closed PRs have been reopened. Contributors whose forks still have problems need to reopen the PRs manually, as the team cannot access those.

Links mentioned:

GitHub Incident: Forks not being recognized, PRs automatically closed · langchain-ai/langchain · Discussion #16796: As of Jan 30, 2024 9:30am PST we’re aware that most LangChain forks have stopped being recognized as forks, and the corresponding PRs have automatically been closed. We’re in contact with the …


LangChain AI ▷ #general (35 messages🔥):

  • Seeking Advice on Custom Tool Parameters: @georg.ort is looking for assistance on defining required and optional parameters for a custom tool. They are open to payment for valuable consultation and shared a link for communication.

  • LangChain Incorporates Handlebars: Handlebars has been experimentally incorporated into LangChain JS as a supported templating language, according to @afirstenberg and confirmed by @jacoblee93.

  • Investigating GitHub Forks Issue with LangChain: @.bagatur highlighted an issue where forks of LangChain on GitHub were not being recognized correctly, with PRs closing automatically. The problem seems to have been resolved, with efforts to reopen affected PRs.

  • Prompt Engineering for Open-Source LLMs: A link shared by @juanpablomesa highlights the nuances of prompt engineering for open-source LLMs like Mistral and Llama compared to closed-source models, derived from Dr. Sharon Zhou’s insights. Full details can be found on juanpml.com.

  • Troubleshooting GPT-4 Integration with Python Application: @lucas1809 shares challenges in integrating GPT-4 into a Python application for creating a chatbot, facing errors when attempting to use it outside the v1/completions endpoint. A series of messages detail the progression towards understanding the error and seeking a solution.

Links mentioned:


LangChain AI ▷ #langserve (3 messages):

  • Seeking Faster Access to Langserve: @rebelsandrobots_97106 is looking for a quicker way to get access to Langserve for hosting an LLM in a college literature course. They are currently on the waiting list and are exploring other options for quicker access.
  • LangServe and Hardware Resource Management: @veryboldbagel clarified that LangServe doesn’t manage hardware resources for LLMs, meaning users need to add an additional layer for this purpose. Without it, there’s a risk of server crashes during concurrent LLM usage.

LangChain AI ▷ #share-your-work (5 messages):

  • Oranscribe Pre-Launch Tease: @shving90 introduced Oranscribe, a tool designed to enhance writing, flow, and growth, now featured on Product Hunt. Excitement builds with anticipation for its official launch.

  • AI Symphony by ColBERT & Langchain: @andysingal shared a Medium article titled “ColBERT & Langchain’s Symphony with RAGatouille”, heralding a revolution in AI and humor interaction through a novel collaboration.

  • Launch of SkillForge V1: @robot3yes showcased their weekend project, a SkillForge agent prototype that has the capability to create skills for other agents. Here’s the intriguing YouTube video titled “SkillForge V1”.

  • JACoB: A New Dawn for AI Coding Bots: @momentnerd revealed significant progress on their AI coding bot project, now named JACoB (Just Another Coding Bot), which has transitioned from a concept to a production-ready coding assistant. Excitement surrounds the open-source announcement and the offering of a detailed walkthrough on JACoB’s capabilities, further details found at jacb.ai.

  • A Flashback to JACoB’s Origins: Following the big reveal of JACoB, @momentnerd references a post from June ‘23, providing context and continuity to the project’s journey. Unfortunately, the link is missing, leaving readers curious about the origins of JACoB.

Links mentioned:


LangChain AI ▷ #tutorials (3 messages):

  • Multi-user RAG Chatbot for Students: @rito3281 delved into the concept of Multitenancy to build a multi-user RAG chatbot tailored for students from different departments, ensuring data privacy and security using the Langchain framework and Qdrant Vector Database. A detailed exploration and guide are shared in their blog post, explaining how to set up multi-tenancy in Qdrant DB to make student inquiries department-specific.

  • Empowerment for AI Developers: @lhc1921 highlighted a YouTube video titled “AI Development - The Monthly Dev #37”, presenting a platform where world-class speakers empower the developers’ community with their insights, courtesy of daily.dev.

  • Prompt Engineering Insights for Open-Source LLMs: @juanpablomesa emphasized the differences in prompt engineering between open-source LLMs like Mistral-7B-Instruct-v0.1 and Llama-2-7b-chat-hf, and closed-source models like GPT-3.5 and GPT-4. Their blog summary underlines the unique approaches required for effective prompt engineering in open-source LLMs, as detailed by Dr. Sharon Zhou.

Links mentioned:


LlamaIndex ▷ #blog (2 messages):

  • LlamaIndex Announces Bounty with @replit: LlamaIndex has partnered with @replit to offer $2,000 in bounties for building open source templates focused on advanced RAG (Retrieval-Augmented Generation). Check out this collaborative opportunity here.

  • Exploring RAG with LlamaIndex - A Guest Post by @CobusGreylingZA: The latest guest post discusses the use of agents for handling complex queries through RAG, showcasing multi-agent coordination and chain-of-thought reasoning across numerous documents and featuring re-ranking from @cohere. Discover the insights here.


LlamaIndex ▷ #general (37 messages🔥):

  • Embedding Fine-Tuning Inquiry: @balanp inquired whether for fine-tuning embeddings, both metadata and text from a textnode should be paired with questions to form a datapoint. The discussion evolved into whether data in both text fields and metadata keys are converted into vector embeddings in LlamaIndex.

  • Converting CSV to JSON for Fine-Tuning: @balanp sought guidance on converting a CSV file, with questions and relevant context columns, into a JSON file suitable for fine-tuning embeddings in LlamaIndex. They were provided with a Python snippet for converting CSV to the required JSON format by kapa.ai.

  • Fine-Tuning with Hugging Face Models: The conversation covered the feasibility of fine-tuning an embedding model from Hugging Face within the LlamaIndex environment, including specifying model IDs and passing tokenizer parameters. @balanp was interested in using "intfloat/e5-mistral-7b-instruct" for fine-tuning and how to handle the max_length parameter.

  • Query Engine Integration challenges: @balanp also asked about incorporating a fine-tuned PyTorch and Hugging Face model into LlamaIndex’s SubQuestionQueryEngine, seeking a method to employ their fine-tuned embedding model within the engine.

  • AWS Sagemaker and Local Database Connections: @refik0727 inquired about tutorials or GitHub code examples for using Llama with AWS Sagemaker and building a chatbot with a CSV file or connecting directly to a local database. Other users like @shinji3046 asked general questions about llama packs and their compatibility with open-source models such as Mistral, while @a3lita reported issues with empty responses when attempting RAG on complex PDFs using a specific llamapack and hosting on Streamlit.


LlamaIndex ▷ #ai-discussion (1 messages):

andysingal: hallucination-leaderboard. https://github.com/vectara/hallucination-leaderboard


LAION ▷ #general (17 messages🔥):

  • Warhammer 40k AI-generated trailer impresses: @max_voltage shared a fan-made trailer of Imperium Of Man - Warhammer 40k, highlighting its impressive use of AI generative tools. The video, found here, utilizes various AI tools to create an impressive showcase, especially noting the fire and explosions at 0:54 as standout moments.
  • Insights on AI Video Generative Tools: Discussion notes from @pseudoterminalx and @astropulse highlight how AI-generated content, despite occasional uncanny elements, demonstrates good temporal consistency and potential uses in pitching movies and TV shows. A notable comment mentions the use of the same seed for every frame leading to a unique residual noise, akin to “looking at the world through some warped glass.”
  • AI models evoke mixed reactions: The discourse moves to specific AI models with @pseudoterminalx sharing an image produced by Terminus, prompting reflections on its capabilities and limitations. The post, offering a visual here, underscores how extraordinary results can sometimes highlight deficiencies in training datasets.
  • Inquiry about DALL-E 2 - PyTorch vs Stable Diffusion: @homie115 seeks insights into the comparison between DALL-E 2 - PyTorch and Stable Diffusion, questioning improvements and current standings among AI image generation tools.
  • Technical Requests and Model Discussions in AI Community: Users inquire about practical applications and technical setups - from extracting text with OCR models (@twoabove asking for a lost link) to optimizing WhisperSpeech for streaming audio (@normilkyway requesting setup help), and discussions on training conditions for CLIP models (@kal2296 questioning the feasibility of trained models with/out image transformations).

Links mentioned:

Imperium Of Man - Warhammer 40k: Imperium Of Man - Warhammer 40k is a fan-made (unofficial) trailer by JustMovies, produced using various AI generative tools. What started as a project a few…


LAION ▷ #research (17 messages🔥):

  • Introducing MoE-LLaVA for Efficient LVLMs: @nodja shared a paper on arXiv introducing MoE-tuning and the MoE-LLaVA framework, aimed at improving Large Vision-Language Models (LVLMs) efficiency by activating only the top-k experts during deployment. This strategy enables the construction of sparse models with a high number of parameters but constant computational cost.
  • MoE-LLaVA Demonstrated on Hugging Face: Follow-up, @nodja also highlighted the MoE-LLaVA model’s implementation on Hugging Face’s platform, inviting the community for direct exploration.
  • CodeLlama 70b Challenges with Ethical Precautions: @Ivannius introduced the CodeLlama 70b Instruct version, noting its impressive humaneval score but also its tendency to moralize unnecessarily. He suggested using specific instructions to bypass the model’s ethical guidelines for more straightforward code generation tasks, available on Hugging Face.
  • InternLM-XComposer Excels at Captioning: @mkaic praised the InternLM-XComposer for delivering the best caption among all open source Vision-Language Models (VLMs) tested, especially highlighting its ability to notice details like a vent on the ceiling, showcased on Hugging Face.
  • MAGBIG: A New Multilingual Text-to-Image Benchmark: @felfri_ shared MAGBIG, a newly proposed benchmark for evaluating multilingual text-to-image models, encouraging the community to use and share it. This dataset aims to advance the development and assessment of models on a broader linguistic scale.

Links mentioned:


DiscoResearch ▷ #disco_judge (1 messages):

huunguyen: <@213644857309134849> - any luck on the prometheus mistral model for en?


DiscoResearch ▷ #general (21 messages🔥):

  • German Orca DPO Dataset Discussions: Debate centers around the existence and preparation of a German Orca DPO dataset. @johannhartmann shared a Hugging Face dataset and mentioned azureml and hermeo tools used for translation. @_jp1_ hinted at work done on an original dataset with an intent to open-source it for improving German model training.

  • Approaches to Data Augmentation and Translation: Bjoernp discusses a novel data augmentation technique, Web Rephrase Augmented Pre-training (WRAP), proposed by Apple, highlighted in a research paper which demonstrates significant improvements in pre-training efficiency.

  • DiscoLM German 7b and GermanRAG Dataset Release: @rasdani announces the release of DiscoLM German 7b and shares a Hugging Face link to the GermanRAG dataset, used for finetuning the model’s retrieval augmented generation capabilities. They highlight this dataset’s usefulness for RAG finetuning with varied contexts and fully formulated answers.

  • New Public Dataset for RAG Fine-tuning by Philipmay: @philipmay introduces a new dataset generated with GPT-4 for RAG fine-tuning, featuring 124,961 German pairs of context, question, and answer. He mentions the ongoing addition of “rejected” answers to convert it into a DPO dataset, available on GitHub.

  • Code Llama 70B Release and Llama Factory Discussions: Meta’s release of Code Llama 70B and the tease of Llama 3 are briefly highlighted with a link to Twitter. There is also a conversation about following generic recommendations from llama_factory readme for hyperparameter settings, without specifics for phi-2.

Links mentioned:


DiscoResearch ▷ #embedding_dev (3 messages):

  • Simon Willison dives into ColBERTs mysteries: @jp1 shared an insightful article by Simon Willison on ColBERT, a model that challenges the standard embedding approach by allowing scalable BERT-based search. Unlike usual embedding models that store a single vector per document, ColBERT stores multiple, enabling a more nuanced retrieval.

  • BGE_M3: A Multilingual Marvel Unveiled: @sebastian.bodza introduced BGE_M3, a new multilingual model that combines dense retrieval models with sparse and multi-vector approaches like ColBERT. Its development is detailed on GitHub.

  • Pro Tip for BGE-large Users: @sebastian.bodza also offered a key update for BGE-large users, suggesting the inclusion of a prompt in short2long retrieval queries to significantly enhance performance.

Links mentioned:


DiscoResearch ▷ #discolm_german (1 messages):

  • DiscoLM German 7b v1 Drops: User @ustoll sought deployment advice for DiscoLM German 7b v1, a Mistral-based model focused on German language applications, succeeding the EM German model family. They inquired about low-friction services similar to together.ai or anyscale for deploying the model.

Links mentioned:

DiscoResearch/DiscoLM_German_7b_v1 · Hugging Face: no description found


Latent Space ▷ #ai-general-chat (21 messages🔥):

  • Stable Diffusion Spotlight in VFX Jobs: @swyxio highlights the integration of stable diffusion technologies in VFX job descriptions, linking to a tweet by @venturetwins about a major VFX studio owned by Netflix expanding into AI roles. The discussion continues with @coffeebean6887 sharing a job listing from Eyeline Studios detailing the demand for expertise in generative imaging and machine learning for revolutionizing storytelling.

  • Latency Challenges with LLM Responses: @austintackaberry expresses frustration over the extra latency experienced when not seeking complex LLM responses, especially when direct links are not promptly highlighted.

  • Amusement Over Future Job Requirements: @guardiang jokingly shares apprehension about future job postings demanding years of experience in Stable Diffusion and/or Midjourney, reflecting on the rapidly evolving AI landscape and its impact on employment standards.

  • Innovative Paper on LLM Training Efficiency: @swyxio shares a new paper by Quentin Anthony on optimizing hardware utilization for transformer model training, urging a shift in mindset towards viewing models through the lens of GPU kernel calls to mitigate inefficiencies.

  • Codeium Hits Series B: @swyxio celebrates Codeium’s advancement to Series B funding, with a congratulatory note to the team, including a tweet marking the achievement. @.prem, associated with Codeium, acknowledges the milestone, highlighting the excitement around the company’s growth.

Links mentioned:


Latent Space ▷ #llm-paper-club (2 messages):

  • New Pythia Paper Reveals 20% Speed-up Missed by Many: @swyxio shared insights from BlancheMinerva’s tweet about a crucial hardware-aware design that can lead to a 20% throughput improvement for 2.7B LLMs. This tweak, overlooked due to copying GPT-3’s architecture, has been detailed in the paper found at arXiv:2401.14489.

  • Curated List of Influential AI and NLP Resources: @ivanleomk came across a comprehensive list on Twitter, featuring landmark resources in AI and NLP including The Annotated Transformer, The Unreasonable Effectiveness of RNNs, and more key readings and papers beneficial for understanding AI models and their formulation. This collection serves as a valuable starting point for those looking to deepen their knowledge in AI and NLP.

Links mentioned:


Alignment Lab AI ▷ #general-chat (3 messages):

  • Announcing Lilac Garden: @nikhil_thorat announced Lilac Garden, a new cloud service for accelerated dataset transforms built on Lilac, featuring LLM-powered clustering as the first service. The announcement was made on Twitter.
  • OpenOrca Dataset Hosted on Lilac: As part of the Lilac Garden launch, the entire OpenOrca dataset, with embeddings and clusters precomputed, is now hosted on Lilac, available here.
  • Exploring OpenOrca Clusters: @nikhil_thorat shared a direct link to explore the clusters within the OpenOrca dataset on Lilac, providing users with a detailed view on how to navigate the dataset clusters.

Links mentioned:


Alignment Lab AI ▷ #looking-for-workers (1 messages):

  • WashU Startup in Search of Founding Engineer: DoubleMint announces a startup collaboration with Washington University in St. Louis looking for a founding engineer well-versed in Next.js. With a first Letter of Intent signed for $50,000, they’re eager to scale up and are also interested in skills related to TailwindCSS and Supabase.

LLM Perf Enthusiasts AI ▷ #general (2 messages):

  • Short and Sweet Gratitude: User @an1lam expressed their thanks with a simple, “Thanks!”

  • Inquiry about Gemini Pro in Production: @res6969 inquired if anyone has conducted experiments with Gemini Pro in a production environment, seeking insights or results from these experiments.


AI Engineer Foundation ▷ #general (1 messages):

  • Open Source Projects Wanted for AI Engineer Foundation: User @hackgoofer called on @everyone to share and recommend Open Source projects for joining the AI Engineer Foundation. A Guide to Submit Projects was shared for interested parties.

Links mentioned:

Guide to Submit Projects to AI Engineer Foundation: no description found