> AI News for 3/21/2024-3/22/2024. We checked [**364** Twitters](https://twitter.com/i/lists/1585430245762441216) and **22** Discords (**341** channels, and **5210** messages) for you. Estimated reading time saved (at 200wpm): **526 minutes**.

We save you the most time when we can say an entire day’s worth of news is skippable… and we like the (apocryphal) irony should we be wrong!

Happy peaceful reading, or check out the new Adept episode on Latent Space. We grow our Reddit coverage next week.


Table of Contents

[TOC]


REDDIT

Just starting with /r/LocalLlama for now, and we’ll be summarizing the comments soon, but next we have r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence mapped out. Let us know if we’re missing any major alpha drop subreddits.

/r/LocalLlama

Fine-Tuning and Training LLMs:

  • Learning how to fine-tune (first time), I’ve provided links to tutorials I found, but would anybody else recommend further material. A user is trying to learn how to fine-tune models and has compiled reading material from Reddit and DuckDuckGo. They have questions about training models on specific topics like Cyberpunk 2077 and business data, and are looking for tips on using llama.cpp for fine-tuning. Link
  • Can LLM trained on a Dictionary? If yes, how to do it? A user wants to train a multi-language model like Gemma on a local language dictionary and is looking for steps for a non-tech layman. Link
  • How to generate large-scale synthetic data. A blog post on how to build large-scale synthetic datasets with 25B+ tokens like those used for training the Phi models from Microsoft, using a Mixtral model. Link

Retrieval-Augmented Generation (RAG) and Embeddings:

  • [question] Query in RAG returning no chunks and no results ? A user is trying to develop RAG based on a mistral 7b model, chroma DB and markdown texts as input data source. They are doing custom chunking and embedding, but when doing a general query, it does not return any chunks or response. They provide sample code and the markdown file. Link
  • Has anyone worked on generating embeddings on brain activity? A user is working with EEG data and wants to match similar EEG signal patterns. They reference a paper and are wondering if anyone has had success in this space. Link
  • Great video on understanding why your RAG/LLM combo isn’t working. A user recommends a highly researched video that discusses the reason why finetuning and RAG are better than RAG alone, the differences between larger and smaller parameter models, and how to contextualize biases in RAG queries. Link

Deploying and Optimizing LLMs:

  • hardware suggestion for llama 2 70b. A user’s boss is asking them to build a suitable workstation rack to run a llama model locally, aiming to get query time under 10s from the current 3 mins on a 7b model. They have a budget of under 15k euros and are looking for suggestions. Link
  • a script to measure tokens per second of your ollama models (measured 80t/s on llama2:13b on Nvidia 4090). A user shares a script they made to measure tokens per second of ollama models. On an Nvidia 4090, they got 80t/s on llama2:13b and 127t/s on llama2:7b. Link
  • Speed and Memory Benchmarks for Qwen1.5 models. A link to benchmarks for Qwen1.5 models in terms of speed and memory usage. Link

Extending LLMs:

  • Is it possible to turn LLaMA into LLaVA. A user has fine-tuned a LLaMA 2 7B model and is wondering if it’s possible to add vision to it without needing to fine-tune LLaVA separately. Link
  • Model “memory”. A user is asking if it’s possible to improve the “memory” of a model so it can remember what it wrote at least 5 messages back. They know context size matters but are wondering if there’s anything else. They also ask if there are any 13b models that support CS 8K. Link
  • Depth upscaling at inference time. A user shares an experiment that implements depth upscaling at inference time, without actually making the model bigger, so it’s GPU-poor friendly. It needs fine-tuning as the model is currently a bit repetitive. Link

Applications and Use Cases:

  • Let’s get real: is there anybody who’s running agents that actually make money? A user is asking if anyone runs LLM agents that make money autonomously, even if it’s just a few dollars a day. They are looking for vague information about the architecture and models used if people are willing to share. Link
  • What is an efficient way to create your own writing assistant with LLM and training from your own words writing style? A user is asking for a quick yet efficient way to train an installed LLM or chat.ml to write like the user, as prompting alone still results in writing like chatGPT. Link
  • interacting with a large PDF library. A user has thousands of scientific papers stored as PDFs and would like a chatbot that could answer questions about the content of the whole library, retrieving info from multiple PDFs without the user having to specify which ones. They are asking if such a tool exists. Link

PART X: AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs

Open Source Models & Frameworks

  • Open-Sora 1.0: Open-source text-to-video model, full training process, data, and checkpoints available (100k views)
  • Thunder: New open source compiler for PyTorch, achieves 40% speedup over regular PyTorch in LLM training tasks like Llama 2 7B (87k views)
  • Jan: Open-source ChatGPT alternative that runs locally on your computer, supports multiple architectures (51k views)
  • LLaVA-NeXT (LLaVA-1.6): Powerful open source Vision-Language model, now added to Hugging Face Transformers library (1 retweet)
  • Transformers 4.39: New release packed with model updates like Mamba, Command-R, LLaVA-NeXT, MusicGen Melody, StarCoder2, SegGPT and more (11k views)

Compute Trends & Hardware

Evolutionary Model Merging

Retrieval Augmented Generation (RAG)

Emerging Trends & Applications

Prompt Engineering as a Career


PART 0: Summary of Summaries of Summaries

we are concluding that Claude Opus is just the best model for top level summaries so we’re discontinuing the A/B/C tests (see archives for our struggles/record). We’ll be exposing parallel runs for all 3 + more models (incl Gemini 1.5!!) as this problem is topologically similar to our personalization app we’ll be launching.

  • Stable Diffusion 3 Anticipation Builds: The Stability.ai community eagerly awaits the release of Stable Diffusion 3 (SD3), discussing optimal control nets for art generation, AMD GPU compatibility, and cloud GPU services for those with limited hardware. Troubleshooting tips were shared, like using lshqqytiger’s fork for AMD support.

  • Unsloth AI’s Upcoming Features: Unsloth AI is working on integrating multi-GPU support and a platform UI for automatic data curation. The community also debated evaluation frameworks, data quality, and the importance of transparency in benchmarks like correcting the MMLU dataset where 25% of examples had incorrect reference solutions.

  • OpenInterpreter’s 01 Light Launch: The 01 Developer Preview launch, a portable AI device that controls computers via voice, generated buzz. The community shared assembly instructions, the Bill of Materials, and 3D print designs, while also discussing shipping and software features.

  • LM Studio Updates Spark Discussions: LM Studio’s new features like multi-model support and ROCm 0.2.17 Beta v3 release led to troubleshooting discussions around ejecting models, GPU offloading, ZLUDA interference, and high CPU usage. The community also recommended the Instructor library for structured LLM outputs.

  • AI Ethics and Security Concerns: Conversations in Perplexity AI and HuggingFace touched on the ethics of AI accessing sensitive information, like in the ‘Guardrails Arena’ experiment, and security vulnerabilities allowing interception of encrypted AI chatbot tokens (detailed explanation).

  • Emerging Techniques and Datasets: Several channels discussed new AI techniques and datasets, such as:

    • DenseFormer proposing Depth-Weighted-Average to improve transformer models
    • Quiet-STaR for generating rationales per token to enhance LM text interpretation
    • Cosmopedia, a large synthetic dataset for LLM pre-training
    • ASCII Art Dataset sparking interest in diffusion models for ASCII art
  • Optimizing AI Performance: Discussions covered various optimization techniques, including:

    • 1-bit LLMs like BitNet b1.58 matching full-precision models with better efficiency
    • Galore optimizer for memory-efficient tuning of large models
    • Fusing GaLore’s Adam optimizer with Triton for faster pre-training and fine-tuning
    • Guidelines for maximizing GPU performance of transformer models (paper)

PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord

  • Stable Diffusion 3 Fever Rises: The community is eagerly awaiting the release of Stable Diffusion 3 (SD3), with conversations focused on selecting the best control nets for art generation. There’s also a vibrant exchange of insights on AMD GPU compatibility and recommendations for cloud GPU services to empower those with limited hardware capabilities.

  • Diving Into AMD Waters: A user facing a RuntimeError with NVIDIA drivers on an AMD system received help by being steered towards lshqqytiger’s fork that supports AMD GPUs, along with a thorough installation guide.

  • VRAM-Gate: Technical discussions are unfolding on the anticipated V-RAM requirements for the soon-to-drop SD3, fueling speculation about the feasibility of running resource-intensive models on local machines.

  • Prompt Engineering-as-a-Service: Community members are sharing techniques to refine their prompting skills for creations ranging from “tribal videos” to D&D campaign visuals, searching for specific models fine-tuned to understand elaborate prompts for complex character and scenery art.

  • AI Tools: Bane or Boon?: Debates are sparking around the impact of AI on employment and creativity, with opinions ranging from caution to optimism about AI’s role as an evolutionary tool in augmenting human effort.


Unsloth AI (Daniel Han) Discord

  • Multi-GPU Support and Data Curation Heading to Unsloth AI: Unsloth AI is actively working on integrating multi-GPU support as an open-source feature, aiming for compatibility with platforms like Kaggle. Additionally, they’re developing a platform UI for automatic data curation to simplify the data preparation steps in model fine-tuning.

  • Exploring Solutions for Unsloth AI Installation Issues: Users reported problems installing Unsloth AI, including ‘no matching distribution’ errors and RuntimeError on single-GPU restriction setups. There’s also discussion on potential CUDA level changes for 4-bit quantized models and VRAM constraints exceeding 15GB in quantized models, possibly causing out-of-memory errors.

  • Unsloth AI Community Tackles Diverse Issues: Discussions around configuring LoRA settings, handling out-of-memory errors by adjusting training parameters, and tips on saving/loading models and tokenizers locally. Concerns about missing dependencies like protobuf and confusion about the best models for certain technical domains were also notable.

  • Community Spotlight on Samantha Mistral Instruct 7B: Community member cognitivetech showcased their work with Samantha Mistral Instruct 7B, specifically for summarizing psychology books. Troubles with model quantization and the promise of a working upload to Hugging Face were shared.

  • Lightning Thunder Causes a Stir with Unsloth AI: Community members highlighted potential missteps in the integration of Unsloth AI with Lightning Thunder, pointing out performance issues and incorrectly implemented kernels. There’s a call for collaboration and accurate representation of Unsloth’s capabilities in benchmarks and some expressed frustration over misleading performance comparisons on Twitter.


OpenInterpreter Discord

  • Global Launch Party for 01 Light: Engineers are excited about the 01 Developer Preview launch, a portable voice interface device for computers with capabilities to see the screen and use apps. The community is sharing assembly instructions and Bill of Materials, and have concerns about shipment to regions like India and the EU.

  • Hardware Enthusiasts Get Crafty: DIY community members are discussing 3D-printing their own versions of 01 Light, with design files available on Printables and source code found on the OpenInterpreter GitHub.

  • Troubleshooting Across Time Zones: Wide-ranging troubleshooting topics include setting up 01 on various operating systems and addressing international shipping concerns. A workaround for Windows compatibility was suggested—poetry run 01 --client --client-type linux --server-host localhost.

  • Curiosity and Concerns Around Software Features: Members probe into the software aspects of OpenInterpreter, discussing local versus cloud operation, API keys, language compatibility, and battery life, highlighting key aspects for an AI engineer’s understanding of product usability and technical specifications.

  • Serving Up a Teaser: A single message in the #ai-content channel heads towards lean content with just a YouTube link related to OpenInterpreter, without providing context or content details.


LM Studio Discord

Hermes 2.5 Holds the Crown: After the addition of code instruction examples, Hermes 2.5 has shown superior performance over Hermes 2 across various benchmarks, with users discussing the impact of different models and configurations on LMStudio’s performance.

Tackling LM Studio Quirks and Quibbles: Members report issues with LM Studio version 0.2.17, including symlinks failing to be recognized and errors stating “Model with key Mistral/Hermes… not found.” Additionally, performance discussions include abnormal CPU usage and compatibility with AMD Rocm and RX 570 graphics cards.

AI Ethics and Security - A Hot Debate: The community delved into the ethics and security of AI through discussions about interacting with models in Hugging Face’s ‘Guardrails Arena’, and security exploits allowing the interception of encrypted AI chatbot tokens (detailed explanation here).

Model Mastery and Multitasking: Users exchanged knowledge on optimizing the functionality of multimodal models in LM Studio, dealing with issues of VRAM limitations, and using multi-model setups to improve complex tasks. The conversation also included advice on models that facilitate “Full GPU Offload Possible” on personal machines with specific capacities.

AMD ROCm - Going for Stability or Stirring Up Storms?: The release of ROCm 0.2.17 Beta v3 generated mixed feedback, with members reporting issues related to ejecting models, GPU offloading, ZLUDA interference, and high CPU utilization. Despite these challenges, several reported stable performance on AMD GPUs, suggesting potential improvements in the latest ROCm beta version.

Streamlining AI Workflows: Engineers recommend exploring the Instructor library for structured outputs in language model workflows and sharing successful integrations of special fine-tuned versions of OpenChat with the dolphin mistral fine-tune to enhance language modeling efficiency.


Perplexity AI Discord

  • Model Showdown: Claude 3 Opus vs. Gemini: Users debated the performance nuances between Claude 3 Opus and Gemini, discussing which AI feels more humanlike. The discussion also extended to personal AI models like Inflection-2.5 and Pi.AI, highlighting their conversational strengths and concerns about their platforms’ futures.

  • Navigating AI with Perplexity: Queries about how Perplexity AI conducts web searches and image generation were prominent, indicating user interest in mobile accessibility of features like Unlimited Claude3 Opus. Inquiries also involved the use of Perplexity AI for topics ranging from the largest planet to GPT-5 release rumors.

  • Community Cries for Darker iOS Themes: Tech-savvy discordians shared frustrations about the lack of a darker midnight/black theme in iOS app updates, citing the need for visual comfort in their digital environments.

  • Token Limit Reached! Learning the Hard Way: An API user’s BadRequestError, due to exceeding perplexity’s 16384 token limit with a 6621-token prompt and 9900-token output, highlighted the importance of accurate token counting in API requests.

  • Frustration Over Cloudflare’s Overzealous CAPTCHA: A user lamented the intrusive nature of Cloudflare’s CAPTCHA challenges, especially when using VPNs, suggesting that even regular browsing could trigger these defenses.

The sources cited for technical reference included Inflection-2.5, Neuralink’s first human trial patient insights, and Perplexity’s nature as a possible Google Search wrapper according to Analytics India Magazine. The Perplexity documentation was noted for clarifying token counts.


LAION Discord

  • Missing Personality Predictions: Conversations revealed interest in the “mypersonality” dataset, notably for its applications in author personality prediction from text, but there were concerns over its accessibility.

  • Pooling for AI Excellence: The Hugging Face diffusers library was critiqued for its embedding implementations, with suggestions to revise the pooling method for text embeddings to boost model performance.

  • Dataset Future Uncertain: The LAION-5B dataset’s removal has led to the exploration of alternative datasets like Datacomp and DFN amid new EU regulations, casting doubt on whether LAION can overcome legal barriers to republish their datasets.

  • Calls for Transparency from OpenAI: The guild anticipates that OpenAI might open-source the training code of upcoming models like SD3 despite previous hesitancy, an important topic for those pursuing progress in AI.

  • AI Sabotage or Safety?: Members were skeptical of the intentions behind researchers’ concerns over datasets containing sensitive material, pondering whether such actions serve as unnecessary hindrances to AI advancements or are genuine efforts to address safety.

  • Innovative Image Scaling Suggested: A study on arXiv proposed using multiple scales of an image to enhance model outcomes, indicating a potential path for visual AI engineering.

  • Time Tricks for Image Encoding: An intriguing approach introduced via an arXiv paper employs encoding images with six times as many timestamps, though some community members consider this to be more of a workaround.

  • Cryptic Tweet Teases Tech Trends: A tweet from Bindu Reddy was mentioned as potentially hinting at future developments, sparking curiosity among the members about its implications.


Nous Research AI Discord

  • Exploring “Extended Mind” in AI: Engineers discussed the “Extended Mind” concept, which involves storing vectors for associative memory and fetching the top k during forward pass, enhancing reasoning and memory in models. The debate was based on Phoebe Klett’s tweet and the integration with Mistral was seen as a promising future experiment.

  • Fine-Tuning Challenges & AI Devices Buzz: A new YouTube tutorial offers guidance on fine-tuning the LLaVA model, while discussions also centered around the latest open source AI device, 01 Light, aiming to control computers via voice, shared in a tweet by OpenInterpreter.

  • Cosmopedia and Quiet-STaR Make Waves: The Hugging Face blog’s post on Cosmopedia showcases creating synthetic datasets for AI, and the paper on Quiet-STaR suggests LMs can generate explanations per token, enhancing text interpretation.

  • AI Model Improvement Efforts Gather Steam: Engineers faced difficulties with BatchAllTripletLoss performance in embedding models and shared progress on projects such as an open-source Rainfall API (RAG) platform. Discussions also entertained the possibility of AI interaction using gestures or even direct brain interfaces.

  • Quantization Enquiries and Collaborative Advances: Members shared information on model quantization, including an outdated repository, AutoAWQ, (GitHub link) for 4-bit quantization, and pondered over the theoretic underpinnings of causal masking in attention mechanisms.

  • Data Tools and Technology March Forward: Users rallied around LanceDB for its hybrid search capabilities with generative interfaces, while integration technologies like Polars and a shared GitHub repository (Neural-Dragon-AI/Cynde) exhibited potential for combining semantic wisdom with predictive machine learning models.


OpenAccess AI Collective (axolotl) Discord

  • RAG Debate Heats Up: A lively debate ensued over Retrieval-Augmented Generation (RAG) versus agent-based models in AI, with some arguing that RAG is merely a stopgap for missing knowledge, while others champion the complexity and robustness of agent-based models.

  • FastChat’s Formatting Fiasco: FastChat’s alpaca model was flagged for inconsistent formatting when compared to Stanford’s alpaca format, prompting suggestions for a pull request to unify them for consistency, as seen in the FastChat GitHub repository.

  • Galore’s Graceful Integration: Buzz surrounds the Galore optimizer, notable for VRAM efficiency in tuning large models, recently merged with Hugging Face Transformers, and its capability to manage full parameter tuning with less memory usage, as highlighted in a benchmark issue.

  • GPT-3.5 Inquiry Ignites Interest: Questions about GPT-3.5 performance and inference times sparked discussions amid concerns over slower local inference speeds on Macs due to privacy constraint workarounds for sensitive data such as patient information.

  • Text Classification Contemplations: In the realm of text classification, the strategy of fine-tuning Language Models (LLMs) to generate class names as outputs, rather than adding a classifier head, was debated for its flexibility and the benefits of encouraging a model to follow a chain of thoughts.


Latent Space Discord

  • Evolving AI Merging Methods Unveiled: A recent paper by Hardmaru introduced an automated evolutionary algorithm for foundation model merging, sparking debate on its potential to combine open-source models and boost performance without heavy training.

  • AI Community Thrives in Paris: Members actively shared their experiences and plans related to AI meetups in Paris, with particular excitement about the Paris Retrieval Augmented Generation group, highlighting a robust digital tech scene.

  • Zoom Saves the Paper Club: A Zoom room creation was suggested to overcome speaker rights issues in the Discord channel, demonstrating resourcefulness in face of technical limitations.

  • Innovations and Discussions in AI Utility: The group dove into llama.cpp’s potential GPU use, “pad and pray” tensor dimension solutions, and a visualization by bbycroft.net for transformer model understanding. Additionally, there’s a look forward to discussions on music generation models and navigating large codebases.

  • Podcast Sheds Light on AI Giants: A new podcast with insights into companies like OpenAI, Google, and Adept gained attention, which was complemented by a Twitter post. An AI event named AI In Action spotlighted Llama.cpp, with an invitation to join via a Discord channel.


LlamaIndex Discord

Sensitive Data Meets AI Safely: LlamaIndex blog highlighted the risks of training LLM/RAG apps with sensitive data such as patient clinical reports and proposed using differential privacy to protect individual information, with insights shared via a blog post tweet.

Navarasa 2.0 Embraces Diversity: The blog introduced Navarasa 2.0, the upgraded Google Gemma 7B fine-tuned for 15 Indian languages, emphasizing the value of local language support in AI, highlighted through a release tweet.

UX Gets Smarter: A new UX template featured on LlamaIndex aims to enhance agent-human interactions by limiting agent requests for human input to necessary instances, with more information available in the associated tweet.

Integration Headaches!: Discord members discussed the complexities of integrating various tools with a chatbot and encountered issues like “BadRequestError,” with documentation suggestions and troubleshooting advice shared in the heated conversation.

Documentation Drama: Users wrestled with accessing the LlamaIndex documentation amidst an update to MKDocs, shared links to the new documentation format, and offered clarification on a query pipeline DAG confusion detailed here.


Eleuther Discord

Quest for Compact Code Datasets: The CodeSearchNet corpus was considered as a pretraining dataset but encountered issues with context length, and instead, The MiniPile, a 1M document corpus, was suggested for its diverse and compact size suitable for pre-training with minimal performance loss.

Under the Hood of Closed-Source Models: The community discussed the lack of access to logprobabilities and tokenizers in closed-source models like Claude and Gemini, in contrast to platforms like OpenAI that readily provide them, speculating proprietary reasoning behind the restriction.

Maximize Your Model’s GPU Potential: Guidelines from a recent paper on maximizing GPU runtime performance for transformer models included hyperparameter tuning and efficient model shapes, potentially increasing throughput by up to 39%.

AI Venturing into Biotechnology: An Ars Technica article on AI in antibody design sparked discussions, revealing both excitement for the promise of diffusion models and skepticism regarding their practical economic applications.

Easing the Debugging Headache: Participants faced issues when using megatron-deepspeed with lm-eval 0.3.0 and proposed workarounds like loading from an older version of cais/mmlu, which was still problematic due to auxiliary train split relocations, as indicated by a Gist traceback.


HuggingFace Discord

ASCII Art Gets a Dataset and Develops in Diffusion: Engineers shared excitement over ASCII Art with the unveiling of an ASCII Art dataset, and discussions on fine-tuning LLMs and diffusion models to generate ASCII art. A particular challenge is fine-tuning a language model to generate intricate designs, prompting a search for efficient training methods and the idea of an ASCII-adaptive diffusion model.

SMIT Brings Audio to Language Models: A new modality integration tool named SMIT was introduced, making it easier to include audio in language models. A YouTube demonstration of SMIT for music generation models piqued the interest for its potential applications. Meanwhile, Fluently-v4 was globally released, offering a single model solution for multiple tasks.

1-bit LLMs Promise Efficiency: The paper on 1-bit LLM BitNet b1.58 suggested significant performance matching full-precision models while optimizing for cost-efficiency. This could lead to the development of 1-bit optimized hardware for LLMs.

New Approaches and Tools in Various AI Domains: SegGPT’s introduction adds to the toolset for image segmentation tasks, promising one-shot results. The UniProt project’s 1024-dimensional embeddings are poised for retraining with Matryoshka embeddings for better searchability in protein databases. A profound exploration of obesity trends using data analysis sets a new precedence for health-related AI research.

Community Collaborations Flourish in Model Development and Federated Learning: The search for collaboration grows with members seeking assistance on projects from federated learning for load forecasting, sharing possibilities like the 6TB “The Stack” dataset for deep code generation, and invoking BERTopic for modernized topic modeling. Concerns over quantizing finely-tuned models and issues around the Trainer class in Huggingface were discussed, reflecting a shared commitment to overcoming technical hurdles together.


OpenAI Discord

  • Conversations on the Cost of AI and its Applications: Members discussed the cost of adding Chat GPT bots to Discord and the pain points around not receiving responses in Postman despite correct setup. The buzz around Perplexity’s AI as a Google Search wrapper fueled discussions, with a reference to Mohit Pandey’s article suggesting it summarizes top Google Search results. A comparison between AI’s potential in video compression and deep learning super sampling (DLSS) was drawn, with an existing blog post as a reference point. In terms of efficiency, a member claimed an 80% storage cost reduction by converting Float32 embeddings to Int8 for their vector database Deep Compression with AI and Perplexity article.

  • GPT-4 Custom Models and Usability Queries: Inquiry into connecting to custom GPT-3 models via API led to a shared Assistants API Guide. Feedback was sought for a GPT that assigns animal alter-egos with a prompt example provided. A sudden reduction to pinning only 4 Custom GPTs perplexed a user, signaling a possible undocumented change. Conversations covered the productivity of distributing knowledge files across multiple GPTs versus single GPT consolidation for diverse parts of a prompt Assistants API Guide.

  • Server Rules and Product Descriptions Dominate Prompt Engineering Talk: Rule 7 came into highlight, reinforcing guidelines against self-promotion after a user’s post on prompt engineering jobs, and a user’s attempt to advertise a prompt chaining/prompt engineering toolkit. Frustration arose over GPT-4 Vision’s inability to assist with disabilities, whilst another member sought to challenge ChatGPT with generating natural product descriptions, suggesting to split the task into generating specific sections may be more effective.

  • API Channel Echoes Rule Reinforcements and Model Limitations: Similar to discussions in the prompt engineering channel, the API discussions highlighted Rule 7, with an apology issued for a previous violation. The limitations of GPT-4 Vision in recognizing disabled individuals catalyzed a conversation on AI inclusion. The challenge of using ChatGPT for automated product descriptions without human oversight was raised, questioning the preciseness of AI-generated content.


LangChain AI Discord

  • Python Dependency Puzzles Pester Langchain Enthusiasts: Python version conflicts and dependency issues in langchain-ai/weblangchain cause headaches, with errors like TypeError: Type is not JSON serializable: numpy.float64 leading to crashes. A related issue is being tracked on GitHub as TypeError: Type is not JSON serializable: numpy.float64.

  • Scribe Seeking Serenity in Serialization: The numpy serialization problem persists despite using Poetry and pinning older versions of Starlette, culminating in a new GitHub issue titled TypeError: Type is not JSON serializable: numpy.float64 to resolve the Langchain/Langserve incompatibilities.

  • Trouble with Token Limits Triggers Tech Talk: Langchain users are exploring features to handle large outputs that exceed a model’s token limitation, such as OpenAI’s GPT-4-Turbo’s 4k output tokens, considering methods for chains to continue generating output by sending additional requests.

  • Promptsage Aims to Sweeten the Prompt-Empire: A new project, Promptsage, offers a simplified approach to prompt building and sanitization for Large Language Models alongside security and privacy guardrails, designed for compatibility with langchain.

  • Data Analysts Delight in AI-Driven Evolution: An article titled “Harnessing Langchain, Instructor, and Pydantic: Redefining Data Analysis with AI” applauds the integration of various tools to enhance data analysis. The insights can be read on Medium.


OpenRouter (Alex Atallah) Discord

  • West Coast Users Struggle with Latency: Users on the West Coast are facing slow requests suspected to be related to a cloud services issue; an ongoing investigation is underway.

  • Gemini 1.5 Pro Sparks Interest and Inquiry: Despite official documentation making no mention beyond version 1.0, discussion buzzed around Google’s Gemini 1.5 Pro and its impressive 1 million word context window; with some members already reaching out to Google for access.

  • Model Showdown: C3 vs. Claude 3 vs. GPT-4: Engineers debated models with C3 Model under fire for its inconsistency, while a self-moderated variant of Claude 3 received favorable comparison to GPT-4 for content moderation.

  • Divided Opinions on Grok AI’s Performance: A split emerged in opinions on Grok AI, with criticisms of it being potentially undertrained and costly, while others defended its capability as a base model not directly comparable to chat-tuned models like Mixtral.

  • Grok’s Benchmarks and Public Testing Spark Debate: Engineers debated the value of Grok AI benchmarks and shared a link to trial the model, highlighting its accessibility through the xAI platform possibly without the need for Twitter Premium+. Discussion also included what content would be best to evaluate Grok’s performance.


CUDA MODE Discord

  • Nanobinding for Machine Learning Acceleration: In discussions, nanobind was recommended for efficiency improvements in machine learning, particularly for MLX. Concurrently, members encountered difficulties during a GTC event with Discord’s stage channel, suggesting a pivot to voice channels to avoid similar issues in the future.

  • Optimizers and Compilers on the Leading Edge: A member revealed success in fusing GaLore’s Adam optimizer with Triton to enhance memory efficiency in models, supported by a GitHub pull request. Separately, the micrograd-cuda library was introduced for CUDA accelerating Python-based micrograd extensions, and Lightning Thunder, a compiler for PyTorch, drew attention for promising performance improvements on accelerators.

  • Matrix Multiplication, Summation, and Standards Enlightened: The community analyzed the Ozaki scheme for enhancing matrix multiplication, with a nod from Jeremy Howard, and discussed the Kahan summation algorithm for reducing computation errors. Additionally, the IEEE 754 floating-point standards were noted as crucial, citing an ITU paper on the topic.

  • Virtual Conversation Conduct and Knowledge Incubation: A member proposed using structured messages for improved clarity in conversations, with a hat tip to <@272654283919458306> for exemplifying this on another server. On the educational front, a Springer book link for PPAM 2022 was shared, offering a gateway to contemporary proceedings in parallel processing.

  • Seekers of CUDA Knowledge Engage in Sharing and Humor: A member looking for confirmation on Chapter 2 exercises from a ‘pmpp-book’ suggested private messages for answer verification. Engaging the lighter side, a new Zero to Thunder tutorial targeting Python and PyTorch users was unveiled at GTC, alongside observations that new Blackwell GPUs sport designs resembling smiley faces, sparking light-hearted exchanges via Twitter.

  • Triton’s Tenacious Troubleshooting: In the triton-puzzles channel, the community decoded tensor color coding and discussed potential misrepresentation in out-of-bounds indicators. Issues with the tl.exp operator ignited conversations about a NotImplementedError in interpreter mode, and efforts on Triton puzzles progressed, marking the completion of Puzzle 3 and collaborative debugging on Puzzle 4.


LLM Perf Enthusiasts AI Discord

  • Matchmaking Malfunction with GPT4.Turbo: Uniti AI is grappling with GPT4.Turbo inaccurately suggesting property spaces, with mismatches as glaring as offering 17,000 sq. ft for requests of 2,000 - 4,000 sq. ft. The challenge is amplified when trying to adhere to a specified percentage range for property sizes, encouraging suggestions for simplified solutions such as direct SQL queries.

  • Beware the “Common LLM Trap”: Engineers discussed the potential overuse of LLMs for tasks that might be more efficiently tackled with basic database queries. A blog post on Retrieval Augmented Generation (RAG) by Jason Liu was shared to highlight how pairing LLMs with standard database interactions can improve tasks like date range extraction.

  • Direct Integration Trumps Bedrock for Claude: In the realm of AI interfacing, a user reported that direct integration with the AI model Claude is preferable over using frameworks like Bedrock, citing better reliability and uptime. Even a user with priority rate limits, bypassing a hefty 200k+ waitlist, chose a direct connection with Claude.

  • Jeffreyw128 and ibash Leave Cryptic Remarks: Within the discourse, succinct messages such as “lol wut” from jeffreyw128 and ibash’s one-word critique, “Damn,” in response to high quality code writing, punctuated the conversations but provided limited context or actionable discussion points.

  • Is Basic Prompting Insufficient?: Questioning the effectiveness of basic prompting emerged in a solitary message, implying a need for more advanced or nuanced techniques when engaging with AI, particularly for those in technical fields.


Interconnects (Nathan Lambert) Discord

  • Hunt for Synthetic Benchmarks Heats Up: Engineers are looking into fully synthetic benchmarks to study language model capabilities, with startups generating data to support this research. The goal is to better understand LLM capabilities by manipulating factors like diversity and reasoning in the training data.

  • Engineers Buzzing About Synthetic Data and Open Curation: Interest has been piqued in the realm of synthetic data and worlds; one engineer is even considering authoring a paper on it. Additionally, a systematic approach to open-source data curation for model pretraining has been suggested to improve collective efforts in the field.

  • ChatGPT: The Academic’s New Assistant: Discussion highlighted the utilization of ChatGPT for rewriting content in academic projects to push for state-of-the-art results, with a side project underway to explore further applications, indicating rewriting tasks are now a mainstream strategy.

  • Chess, Go, and Human Psyche: Tech Giants in an AI-Infused World: Members muse over the psychological impact of AI advancements, citing historical events like Kasparov’s defeat to Deep Blue and reflecting on an individual’s attitude towards AI. A philosophical discussion on the potential for creating generalist agents in reinforcement learning was highlighted, featuring insights by Minqi Jiang and Marc Rigter and shared via MLStreetTalk Tweet.


Alignment Lab AI Discord

Calling All Open Source Enthusiasts: A community member is seeking collaboration on the 01, a fully open source hardware device, and has shared details in a public tweet.


PART 2: Detailed by-Channel summaries and links

Stability.ai (Stable Diffusion) ▷ #general-chat (884 messages🔥🔥🔥):

  • Model-mania Begins: Members are fervently discussing Stable Diffusion models, specifically their anticipation for Stable Diffusion 3 (SD3) and the careful selection of control nets and addons for generating art. Questions about AMD GPU compatibility and advice on cloud GPU services for those with less powerful hardware are prevalent.
  • Tech Troubleshooting in Action: One member needed assistance with a RuntimeError about NVIDIA drivers on an AMD GPU system when trying to use Stable Diffusion WebUI. They were directed to lshqqytiger’s fork for AMD support and given a step-by-step guide for installation.
  • The Hype for Higher Quality: The conversation turned technical discussing V-RAM requirements for different Stable Diffusion models. With the upcoming SD3 believed to demand high VRAM, members speculate about the practicalities of running such large models locally.
  • Prompt Crafting and Art Creation: Users are sharing prompting techniques and AI results for various creative projects, like generating images for “tribal videos” and D&D campaigns, with some seeking specific models that can comprehend detailed prompts for generating character art and scenery.
  • The Spectrum of Community Opinions: Debates emerge around the benefits and drawbacks of AI, with some expressing skepticism regarding AI’s impact on jobs and creativity. Meanwhile, others stress the evolutionary nature of AI tools and their potential to augment human workflows.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (696 messages🔥🔥🔥):

  • Unsloth AI Gearing Up for Multi-GPU Support: The Unsloth AI team confirmed that multi-GPU support will eventually be available as an open-source feature, intending to allow for free use of Mixtral on platforms like Kaggle. The focus currently remains on launching Unsloth Studio (Beta).

  • Improving Data Curation for Fine-Tuning: Unsloth AI is exploring the creation of an efficient platform UI for automatic data curation, targeting users who find data preparation for model fine-tuning challenging. This platform aims to address the data formatting and question-answer preparation steps.

  • Debate on Evaluation Frameworks and Data Quality: There was a lengthy discussion about the importance of creating robust evaluation frameworks and the challenges of defining and obtaining high-quality data for model training. An important part is ensuring transparency and accuracy in benchmarks, like correcting datasets used, such as MMLU where 25% of examples had incorrect reference solutions.

  • Unwavering Community Support Despite Setbacks: Despite previous instances of misinformation spreading in the community, Unsloth AI has gained notable traction and support, and their VRAM reduction technique has been acknowledged widely. Enthusiastic community members are expressing eagerness for upcoming multi-GPU support and other features.

  • Collaboration and Open Source Contributions Celebrated: There was mention of projects such as OpenInterpreter’s batch one selling out and their profits being redistributed to open-source contributors. This highlights a positive trend towards collaboration and reinvestment within the AI tools community.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (35 messages🔥):

  • Installation Troubles with Unsloth AI: A user experienced issues installing Unsloth AI from a nightly build using pip, encountering an error that there was no matching distribution found for the specified requirement. The problem referenced a specific extra named “kaggle-new”.

  • Training on a Single GPU Card Failure: Another encountered an error training a model specifically when restricting it to one GPU card. The error message indicated a mix of devices, causing a crash: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

  • CUDA Level Changes for Quantized Models?: A user questioned if there had been changes at the CUDA level for 4bits quantized models after running into issues with Unsloth’s solar-10.7b-bnb-4bit, which had previously worked on their machine.

  • VRAM Constraints with Solar Model: It was observed that despite being a quantized model, which should require less VRAM, Unsloth’s solar-10.7b-bnb-4bit was still possibly exceeding the available VRAM on a user’s 15GB A4000 GPU, potentially causing out-of-memory issues.

  • Kernel Restarts Required to Avoid 32-bit Warnings: A repeated requirement for kernel restarts to avoid warnings about 32-bit processing was noted, despite expectations that the referenced quantized models should not trigger such warnings. There is speculation that the machine in question might be running out of memory.

Link mentioned: Quantization: no description found


Unsloth AI (Daniel Han) ▷ #help (92 messages🔥🔥):

  • Switching to 16-bit LoRA Configured: Members discussed changing LoRA settings, suggesting setting load_in_4bit to false or using parameters like load_in_8bit or load_in_16bit.

  • VRAM Consumption and Out-of-Memory Issues During Training: One member reported Out-Of-Memory (OOM) errors during evaluation but not during training and was advised to try changing “adamw_8bit” to “paged_adamw_8bit” and reducing batch size to lower VRAM usage.

  • Saving and Loading Models and Tokenizers Locally: A member figured out that to use FastLanguageModel.from_pretrained() effectively, both the model and tokenizer need to be saved in the same folder.

  • Potential Missing protobuf Dependency in Unsloth: A member raised a concern that protobuf might be missing in a particular version of Unsloth, which was acknowledged but with uncertainty as to whether it was the case.

  • Unclear Model Choice for Physics, Math, and Engineering: A member asked for advice on AI model selection suitable for high-level physics, mathematics, engineering, and Python, with recommendations to look at available resources like YouTube videos and articles such as this one on Towards Data Science.

  • Challenges with Unsloth Library Updates and Environment Management: Multiple members experienced issues related to Unsloth updates, with suggestions to upgrade necessary libraries, while one described difficulties in environment management and the need for extensive dependency overhauls.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (30 messages🔥):

  • Showcasing Samantha Mistral Instruct 7B: cognitivetech highlighted their work on Samantha Mistral Instruct 7B, a model designed to summarize psychology books 2000 tokens at a time, and thanked the community for their support.
  • Gratitude for the Community’s Guidance: Acknowledgement was given for the help received in utilizing Unsloth notebooks and the community’s assistance in answering questions related to fine-tuning models.
  • Troubleshooting Model Issues: cognitivetech discussed experiencing issues with q4 quantization, suggesting it produced “garbage” results unlike the model model-unsloth.Q8_0.gguf, which worked flawlessly when summarizing books.
  • Upload of Working Quant Model: After some discussion about troubleshooting the model, cognitivetech informed they would be uploading a working version of q8 to Hugging Face for others to check in about 20 minutes.
  • Community Collaboration and Testing: There was a collaborative effort between cognitivetech and solobsd to test and run the models on platforms like GPT4All, sharing Ollama templates and discussing potential causes for issues encountered.

Link mentioned: Samantha Mistral Instruct 7b - Comprehensive Bulleted Notes: no description found


Unsloth AI (Daniel Han) ▷ #suggestions (14 messages🔥):

  • Lightning Thunder Piques Interest: A member shared a link to Lightning Thunder, highlighting its potential to make PyTorch models faster by leveraging different hardware executors. They noted, however, it may not be directly helpful for Unsloth AI since it is built on Triton.

  • Confusion Over Unsloth Implementation: Some members expressed concern that Lightning Thunder did not properly implement Unsloth, suggesting they could have consulted the Unsloth team for better integration.

  • Potential Misuse of Unsloth Kernels: A member pointed out issues with Lightning Thunder’s use of Unsloth kernels, like unnecessary copies and transpositions, highlighting that a consultation could have prevented this mishandling.

  • Call for Collaboration and Clarification: Suggestions were made to reach out to the Lightning Thunder team to rectify mistakes and clarify the use of Unsloth in their presentations, emphasizing the importance of accurate comparisons in benchmarks.

  • Frustration Over Performance Comparison: One member shared frustration through a Twitter link regarding the inaccurate comparison that made Unsloth kernels look underperforming, urging for the presentation to reflect the correct implementation.

Link mentioned: GitHub - Lightning-AI/lightning-thunder: Source to source compiler for PyTorch. It makes PyTorch programs faster on single accelerators and distributed.: Source to source compiler for PyTorch. It makes PyTorch programs faster on single accelerators and distributed. - Lightning-AI/lightning-thunder


OpenInterpreter ▷ #general (254 messages🔥🔥):

  • OpenInterpreter Discord Channel Buzzing with Activity: There is significant excitement as members discuss the OpenInterpreter Discord chatbot messages, with various time zones making it a challenge for some to stay awake for the ongoing discussions.
  • Launch Anticipation and Pre-Order Queries: Members are sharing their enthusiasm for the 01 Light launch, asking about pre-orders and expressing hope for international shipping options beyond the current US-only availability.
  • Tech Enthusiast Community Rallies Behind Hardware Innovations: Links to 3D print designs for the 01 Light are shared, encouraging DIY enthusiasts to build their own language model computers, with design files found at Printables and GitHub for more info at GitHub - OpenInterpreter/01.
  • Development and Safety Discussions Heat Up: There’s chatter about the OpenInterpreter development process and safety measures, with members curious about red-teaming initiatives and safeguards, directing others to the OpenInterpreter/01 GitHub repository for more details.
  • Community Collaboration and Questions Surrounding Windows Support: Users are querying about running OpenInterpreter on Windows and if there will be any official Windows support, with no direct responses confirming such support provided in the conversation.

Links mentioned:


OpenInterpreter ▷ #O1 (286 messages🔥🔥):

  • Launch Party Anticipation: Members express excitement for the 01 Developer Preview. The 01 Light is a portable voice interface device for controlling a home computer, equipped with capabilities to see the screen, use apps, and learn new skills.
  • Build Your Own 01: Community members discuss assembling their own 01 devices from a Bill of Materials and troubleshoot potential shipment issues to regions like India and the EU, suggesting the potential for localized community collaborations.
  • Setup Queries and Troubleshooting: The chat addresses setup concerns for using 01 on various operating systems. One key solution for Windows users is to run with poetry run 01 --client --client-type linux --server-host localhost, indicating compatibility with Windows when using Linux client type settings.
  • Batch Updates and Shipping Concerns: The community is informed of the fullness of pre-order batches and shared curiosity around shipment times. People inquire when batch 2 and subsequent batches will ship, with no committed date given.
  • Evolving Software Discussions: Members ask about various features and usability of the software, including local vs. cloud operation, non-developer accessibility, API keys, and compatibility with languages like German. Concerns about software updates and battery life are also questioned.

Links mentioned:


OpenInterpreter ▷ #ai-content (1 messages):

cyanidebyte: https://www.youtube.com/watch?v=Q_p82HtBqoc


LM Studio ▷ #💬-general (305 messages🔥🔥):

  • LM Studio Local Document Support Inquiry: A user inquired about the possibility of using local documents for Retrieval-Augmented Generation (RAG) with Anything-LLM and LMStudio support, but no subsequent answers were provided regarding the functionality.
  • Concerns Over LM Studio Multi-Model Issues: Users reported facing issues with the new beta release of LM Studio failing to add multiple models and one experiencing excessive CPU usage despite offloading all layers to GPU, which was resolved by rebooting their system.
  • LM Studio Model Loading Errors After Update: One user described a problem with LM Studio failing to recognize non-local model names, resulting in an error, with another user suggesting that loaded models now generate a static key name visible through the GET endpoint, which wasn’t directly resolved within the shared messages.
  • Disappearing Icons in LM Studio’s Playgrounds: Another user experienced interface behaviour where model names were ejected from the UI when navigating between sections in LM Studio’s playground. A workaround was mentioned, suggesting clicking the yellow “Reload” box only once when wanting to reload a model.
  • Image Analysis with Llava in LM Studio: A user queried how to feed images for analysis to a llava model within LM Studio Chat AI, with the response indicating the necessity to drag and drop the image into the input box for the model to “see” it.

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (29 messages🔥):

  • Exploring AI’s Ethical Boundaries in Bank Security: Members discussed a Hugging Face space called ‘Guardrails Arena’, where users interact with models to assess fictional bank security measures, revealing that some models consistently refuse sensitive information whereas others are more forthcoming.
  • Under the Hood of Guardrails: For those interested in the technical details of the ‘Guardrails Arena’, links to the model’s Python script and configuration settings are provided at Guardrails Models Python Script and Prompts Configuration, offering insight into the AI’s decision-making policies.
  • New Reasoning Model on the Horizon: A new paper discussing ‘Quiet-STaR’ as a generalization to improve language models by generating rationales at each token is referenced, which has been translated into model form and can be viewed at the Hugging Face repository for quietstar-8-ahead-GGUF and in a YouTube video.
  • Human Oversight in AI-Driven Architecture: A conversation highlighted that while AI might design a structurally sound building, human oversight is legally and ethically necessary due to the inherent risk of structural mistakes.
  • Complementary AI Workflow, not Replacement: Members posit that AI should be used as a workflow accelerator, aiding in tasks like design and testing, rather than an outright replacement, emphasizing the constant need for human involvement and verification.

LM Studio ▷ #🧠-feedback (26 messages🔥):

  • Symlink Troubles with LM Studio: An update to LM Studio version 0.2.17 caused models to stop loading, with symlinks that worked in previous versions no longer being recognized. Despite attempts at regenerating symlinks, users experienced “Model with key Mistral/Hermes/Hermes-2-Pro-Mistral-7B.Q4_0.gguf not found.” errors.

  • Language Reminder: Discord members were reminded that English is the primary language of the server after a user posted a message in Chinese.

  • Channel Confusion: There’s been discussion suggesting a need for clearer guidance on where to post certain topics, as feedback or help-related questions are often posted in incorrect channels.

  • Feature Request for File Interaction: A user expressed a desire to chat with files such as PDF, DOCX, or PNG, and was informed that chatting with PNG images is supported using a Llava model.

  • Summarizing Multiple PDF Documents: In response to an inquiry about summarizing multiple PDFs, a member was directed towards a specific channel for model suggestions.

  • Download Speed Limiter Request: A download speed limiter feature was requested to prevent large model downloads from monopolizing bandwidth. Discussion ensued about whether OS-level settings might be a better solution for bandwidth management.


LM Studio ▷ #🎛-hardware-discussion (98 messages🔥🔥):

  • Cloud vs. Local AI Hardware: A member expressed their preference for local hardware over cloud services for machine learning, citing cost-effectiveness and learning opportunities. Experimenting with AI on personal hardware allows companies to understand AI without hefty cloud service expenses.

  • Shifting IT Paradigms: Members discussed the cyclical nature of centralized and decentralized computing, predicting that, following trends, on-premise AI servers may become preferred before shifting back to powerful decentralized AI PCs.

  • Security Concerns with AI Chatbots: A discussion ensued about a security exploit that allows interception of encrypted AI chatbot tokens through a side channel attack. Despite the encryption, attackers can infer information about messages sent to users, indicating a potential vulnerability for services like OpenAI (more detailed explanation here).

  • The Quest for Efficient AI Development: Conversation pivoted around the increasing complexities and expenses of AI development infrastructure. High-end hardware like GPUs, infiniband switching, and the need for massive power become economic and environmental concerns, with predictions about the future leaning towards SaaS solutions.

  • Choosing the Right Model and Specs for Personal Machines: A member sought advice on which AI models to run on their M3 Pro with 18GB of RAM. It was advised to look for models that enable “Full GPU Offload Possible” and to expect limitations when working with higher-capacity models due to hardware constraints.

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (10 messages🔥):

  • Seeking Clarity on Multimodal Usage: A user expressed they’ve been too focused on getting multimodal LM Studio to work and lost sight of their original purpose for learning to use it.
  • Model Recommendations Passed Around: Users discussed their experiences with different versions of Command-R models. A recommendation was made for second-state’s q8 model.
  • Looking for Models with External Storage Capabilities: A user expressed interest in a multi-model setup that allows a model to interact with external storage, like a text file or local redis instance, to improve model performance on complex tasks involving Golang and Hugo.
  • Technical Issues and Troubleshooting in LMStudio: An individual shared their configuration settings for an unspecified model that led to abnormal behavior and another user suggested restarting LMStudio as a possible solution to similar issues they encountered.
  • Hardware Compatibility Queries Solved: A user encountered an error when trying to run LM Studio on an AMD Rocm with an RX 570. Another user clarified that the RX 570 graphics card is too old to work with the ROCM build.

LM Studio ▷ #autogen (10 messages🔥):

  • Clarification on LM Studio’s New Features: LM Studio has launched a capability that supports multi-model use, which allows having several models in VRAM at once to compare and utilize the best model for a specific task through the LMS console or tools like autogen.

  • Assistance Request for Autogen Issue: A user encountered a TypeError in their Autogen script indicating that Completions.create() got an unexpected keyword argument ‘request_timeout’. They posted the error traceback and sought assistance in resolving the issue.

  • Code Review and Sensitive Data Caution: Another user advised removing the API key from the user’s config_list file to prevent potential scraping by bots, despite being told it’s not an actual key. This is shared as a reminder to practice good security habits by not posting sensitive data in public forums.

  • Seeking Advice on VRAM Limitations: A member inquired if 8GB of VRAM is considered low when they report that only one Language Model (LM) can run before exceeding the limit. They wondered if there were options to remove or increase this limit.


LM Studio ▷ #langchain (2 messages):

  • Ease Language Modeling with Instructor: Members are advised to check out the Instructor library on GitHub, which is designed to facilitate structured outputs for language model workflows. The library is highlighted as something that could simplify processes for users.
  • Special Fine-tuned OpenChat Success: A member mentioned they have a special fine-tuned version of OpenChat that performs well and have successfully integrated it with the dolphin mistral fine-tune.
  • Just a Quick DM: A brief note indicating a private message was sent to follow up on the conversation, presumably on one of the mentioned topics or tools.

Links mentioned:


LM Studio ▷ #amd-rocm-tech-preview (23 messages🔥):

  • Eject and Context Size Tweaks Required: A member reported needing to engage “Eject” during loading, and then minimize context size to prevent out-of-memory issues while using model versions like Command-R, despite its support for large context.

  • Feedback on ROCm 0.2.17 Beta v3: A link to the new ROCm 0.2.17 Beta v3 was shared, including a change log mentioning a potential fix for issues around GPU offloading.

  • Mixed Experiences with GPU Offloading: Members discussed various experiences with GPU offloading on ROCm, noting issues like 100% CPU utilization and confusion possibly caused by ZLUDA taking precedence in system paths.

  • ZLUDA Interference With ROCm: It was noted by the members that having ZLUDA installed and in the PATH may interfere with ROCm operations, potentially explaining high CPU utilization issues.

  • Stable Performance on AMD Hardware: Several users reported successful and stable use of ROCm 0.2.17 Beta v3 on various AMD GPUs, with feedback ranging from “working fine” to observing substantial GPU activity.

Link mentioned: no title found: no description found


Perplexity AI ▷ #general (340 messages🔥🔥):

  • Discord Users Discuss Perplexity and AI Differences: Users engaged in discussions on the performance of different AI models, with comparisons between Claude 3 Opus and Gemini being frequent. Some stated Gemini sounds more human, while others expressed a preference for Opus or its Anthrophic’s top model.

  • Tech Updates and Troubleshooting: Participants discussed updates to various apps, challenges with the text input fields, and provided mutual aid for use on mobile versus PC. Some shared frustrations related to iOS app updates and features, such as desiring a darker midnight/black theme for better visual comfort.

  • Cloudflare Critique: A user expressed dissatisfaction with Cloudflare’s CAPTCHA challenges, especially when using VPNs for privacy, indicating it even affects users without privacy settings enabled.

  • Perplexity’s Web Search and Image Generation Queries: Queries on how Perplexity AI conducts web searches and image generation featured prominently. Users clarified that while on mobile, certain features like Unlimited Claude3 Opus might not be accessible, but images could be generated following specific instructions.

  • Discussions and Comparisons of Personal AI Models: Users shared thoughts on various personal AI models such as Inflection-2.5 and Pi.AI, highlighting their strengths in conversational usage and voice models. Concerns about the future of these platforms, in light of talent loss, also surfaced.

Links mentioned:


Perplexity AI ▷ #sharing (19 messages🔥):

  • Largest Planet Knowledge Inquiry: A post shared a Perplexity AI search link relating to the largest planet, indicating research or a discussion may have taken place about this topic.
  • Boosting Japanese LLM Development: A user pointed to Japanese language model development, suggesting a focus or interest in this area.
  • A Question of Time: A member shared a Perplexity AI search link concerned with the French phrase “Combien de temps,” potentially signifying a language-related query.
  • GPT-5 Release Rumor Mill: Curiosity or rumors about the release of GPT-5 was indicated through a shared Perplexity AI link.
  • Embracing Linux as a macOS Vet: One user narrated their switch from macOS to Linux for AI/ML studies, utilizing Perplexity AI as an aid in this learning journey, expressing satisfaction with MX Linux.

Perplexity AI ▷ #pplx-api (26 messages🔥):

  • Perplexity API vs UI Citations: A member inquired about why the API does not provide sources and citations like the Perplexity UI, hinting at a potential feature addition.
  • Token Limits Gaffe: A user faced a BadRequestError when attempting a request with a prompt of 6621 tokens and an output of 9900 tokens, exceeding Perplexity’s 16384 token limit. They were puzzled by how to adjust their API calls accordingly.
  • Resume Analyzer Challenge: The member encountering token limits is building a resume analyzer/builder as an AI practice project, indicating they are fairly new to the field.
  • Token Counting Tricks: Another community member pointed the user to the Perplexity documentation to check token counts accurately for their AI queries.
  • Seeking Clarification on Usage: The user explored how to limit user prompts and was informed that it entirely depends on the content length, which was helpful advice for the resume project they are working on.

Link mentioned: Chat Completions: no description found


LAION ▷ #general (369 messages🔥🔥):

  • Dataset Dilemma - Seeking the “mypersonality” Dataset: Members were discussing the “mypersonality” dataset from Facebook and its utility in predicting the personality of authors based on text. The dataset’s accessibility was in question, with one member indicating knowledge about it due to research requirements.

  • Hugging Face Implementations Under Scrutiny: Extensive conversation unfolded over potential issues with embedding implementations in the Hugging Face diffusers library, with code snippets and corrections shared among members. Concerns were raised about the correct pooling method for text embeddings, with suggestions to correct the existing code and improve model performance.

  • Dataset Recovery and Alternatives Explored: The status and future of the LAION-5B dataset post-removal was a topic of discussion, with new datasets like Datacomp and DFN touted as alternatives given EU legislation. Skepticism regarding LAION’s ability to fully clear legal hurdles and republish their datasets was expressed, implying that the datasets may remain unpublished.

  • Push for OpenAI Code Transparency: Members discussed the importance of open-sourcing training code for the advancement of AI and expressed anticipation for potential openness with future models like SD3, despite the setbacks in previous versions.

  • Training Technique Discussions and Improvements: Debates about finetuning the text encoder in models like SD2.x revealed a consensus that drastic text encoder modifications might not be necessary. The pivotal tuning method, once a topic of debate, was acknowledged as an “advanced” method now incorporated into Diffusers’ official training scripts.

  • Skepticism Toward AI Sabotage Claims: One member expressed doubt that researchers who raise concerns about datasets containing sensitive material, like CSAM, are sincere in their intentions, speculating instead that their actions could be aimed at hindering AI progress.

Links mentioned:


LAION ▷ #research (4 messages):

  • Scaling up Image Scales: A paper discussed at arXiv suggests improving model performance by using multiple scales of an image.

  • Using Time to Encode Images: Another paper from arXiv introduces a method of encoding images with 6 times the number of timestamps, employing different zig-zags, which some believe might be a workaround rather than an elegant solution.

  • Fractals in the Spot: Continuous fractal space-filling curves were humorously referenced in a discussion, implying their potential in addressing current encoding methods.

  • Peering into the Future: A tweet from Bindu Reddy was shared as an indicator of a forward-looking development, though the specific content of the tweet wasn’t disclosed in the message.


Nous Research AI ▷ #ctx-length-research (21 messages🔥):

  • Debating “Extended Mind” Concept: Members discussed the concept of “Extended Mind,” referencing a tweet by Phoebe Klett. Interest was expressed in porting it to Mistral for easier accessibility.
  • Understanding the Depth of Extended Mind: One member mentioned that Extended Mind seemed akin to an associative memory, where a separate database holds information that attention can call upon, similar to memory and tool use.
  • Clarifying the Mechanism of Extended Mind: Discussion clarified that the Extended Mind involves storing vectors and fetching the top k during the forward pass, emphasizing that it’s more about selecting aspects of associative memory than tools.
  • Speculating on the Integration of Tools through Extended Mind: There was talk about future experimentation with Extended Mind, speculating on integrating different tools more deeply and exploring its potential for impacting memory and reasoning.
  • Identifying Extended Mind’s Potential and Challenges: Members discussed the need for weighed attention in the Extended Mind approach, where the system must learn when to focus on memories. The concept’s relationship with memorizing versus reasoning was also mentioned as a point of interest.

Nous Research AI ▷ #off-topic (5 messages):

  • Greetings and Salutations: A new member was welcomed with open arms into the community.
  • Hardware Struggles to Keep Up With AI Software: A member commented on the challenge of running oversized param models locally without quantization and speculated that hardware would eventually catch up with the advancements in software.
  • LLaVA Model Fine-tuning Tutorial Shared: A link to a YouTube video was shared, which provides instructions on how to fine-tune the LLaVA model and includes various topics such as multimodal learning and deep learning.
  • New Open Source AI Device Introduced: A member was excited to share a twitter post about a new open source AI device, expressing anticipation to see it powered by Nous Research models.
  • Appreciation for the Open Source AI Device Project: Recognition and appreciation were shown for Killian and his team’s progress on the open source AI device mentioned in the previous tweet.

Link mentioned: Finetune MultiModal LLaVA: This video explains how to fine-tune llava model#llm #ml #ai #deeplearning #largelanguagemodels #python https://wandb.ai/byyoung3/ml-news/reports/How-to-Fine


Nous Research AI ▷ #interesting-links (23 messages🔥):

  • Unlocking the Secrets of Synthetic Datasets: The Hugging Face blog outlines generating a vast synthetic dataset, Cosmopedia, to mirror Phi-1.5. The post highlights the shift from costly human-annotated data to synthetic datasets, with Cosmopedia standing as a testament to this trend in machine learning.

  • The Devil is in the Detail of Prompt Engineering: Generating Cosmopedia didn’t rely on heavy-duty GPUs but instead on detailed prompt engineering. The blog post reveals that time investment in prompt crafting was a significant part of the task.

  • Quiet-STaR Claims to Enhance Text Interpretation: Quiet-STaR, an extension of STaR, is proposed as a technique where language models learn to generate explanations for each token, thereby improving their text predictions. The abstract of the paper points to the potential for LMs to infer unstated rationales in arbitrary texts.

  • OpenInterpreter’s Vision for AI Devices: A new device called 01 Light was introduced via a tweet and promises to be a portable voice interface that can control a computer and its applications. The creators emphasize its open-source nature and the potential for users to build their own or utilize an upcoming app for remote control.

  • Debate on Necessity of Hardware for OpenInterpreter’s 01: Conversations surround the 01 Light’s function as merely a “glorified microphone” with some members noting the hardware is optional and that one can use the system on a computer for free. Despite initial skepticism, there is recognition for the benefits of the open-source project and its software.

  • Nous Models on Kubernetes?: A user questions the integration of Nous models within Kubernetes and shares a desire for an easy installation process akin to an example provided in a SideroLabs tweet. No further information is provided about the Nous models or their Kubernetes compatibility.

Links mentioned:


Nous Research AI ▷ #general (126 messages🔥🔥):

  • Struggling with Embedding Models: A user experienced issues when finetuning an embeddings model, citing problems with BatchAllTripletLoss from Sentence Transformers where eval scores weren’t changing, indicating that the model wasn’t learning. Additionally, using Angle loss pushed both positive and negative samples further from the query.

  • In the Quest for Summarization: One user sought research papers to test a new summarizer, requesting others to provide documents. This led to a discussion involving a paper on Quiet-STaR (a generalization of STaR) where LMs generate rationales at each token to explain future text.

  • Chat about Chatbots: Members discussed the integration of an existing repository, llm_steer, with an interface for interacting with LMs through “activation vectors”. There was also a debate over the effectiveness of logical reasoning and planning in LMs, particularly when employing various methods like cursed model merging.

  • Open Source RAG Platforms and Benchmarks: Users shared links to their projects, such as an open-source platform for RAG applications, and discussed benchmarks of models like Mistral-Evolved-11b-v0.1, commenting on their performance improvements.

  • Exploring AI and Hardware: Some members questioned the practicality of AI-related hardware releases like the ‘Open Interpreter’s 01 Lite’, while others hinted at the potential for “mind to text” technology that can interpret internal speech through EMG sensors on the neck. Some users imagine future possibilities like interacting with AI using gestures or direct brain interfaces.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (21 messages🔥):

  • In Search of Quantization Support: A member mentioned quantization for a language model and suggested that another member might have insights. This second member acknowledged trying quantization and noted more research was needed to make it work.
  • Collaboration on Quantization: Upon request, a member shared a repository named AutoAWQ for 4-bit quantization. However, they specified it was an outdated version and invited others to attempt fixing it: GitHub - casper-hansen/AutoAWQ.
  • Anticipation for NousForge: Members indicated that NousForge is not yet available after someone inquired about its release while implicitly mentioning their discovery of the chat through Google.
  • Debating Few-Shot Prompts in Instruction SFT: A member questioned the commonality and benefit of including few-shot prompts in instruction SFT (supervised fine-tuning) datasets, another member responded negatively on the commonality and the initial member later found a related discussion thread without results reported yet.
  • Theoretical Grounds for Causal Masking Questioned: A member asked if causal masking in attention mechanisms had theoretical justification or was merely an engineering convenience. Another participant highlighted the necessity of masking for the model to learn next token prediction.

Link mentioned: GitHub - casper-hansen/AutoAWQ at striped_hyena: AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation: - GitHub - casper-hansen/AutoAWQ at striped_hyena


Nous Research AI ▷ #project-obsidian (3 messages):

  • Potential Improvement for Obsidian: A link to a Twitter post by Baifeng Shi was shared, suggesting it could improve Obsidian.
  • Affirmation of Obsidian Enhancement: A member acknowledged that the content in the shared link would indeed Enhance Obsidian.
  • Exploration of Implementation: A member expressed intent to attempt the implementation suggested for Obsidian improvement.

Nous Research AI ▷ #rag-dataset (38 messages🔥):

  • LanceDB Gaining Traction with Developers: Gabriel_syme expressed enthusiasm for LanceDB, highlighting its speed, ease of use, and the ability to perform hybrid search queries with generative interfaces like SQL-type queries. In contrast, Iriden discussed using Polars for traditional queries despite its challenging syntax when used with language models.

  • Awaiting Better Integration for Data Tools: There was a mention of Polars, awaiting better integration, and noting that LanceDB and Polars can swap data, but developers need to do the integration manually. Furthermore, gabriel_syme considered potential cloud-native capabilities of Polars.

  • Managed Cloud Solutions a Possibility: Iriden highlighted work on a FastAPI/Streamlit app that allows parquet file uploads and Polars expressions, mentioning that once deployed to modal.com, it could serve as a managed cloud solution.

  • Sharing Development Work on GitHub: Iriden shared a GitHub repository featuring code for running GPT asynchronously over Polars frames and machine learning models that use embeddings. This repo aims to integrate Semantic Wisdom with Predictive Models.

  • Parenting Discussions Amongst Developers: There was a brief, light-hearted exchange about parenthood, with triggerhappygandhi offering congrats and talking about the scarcity of parents on Discord, and denovich responding with personal experience about the biological effects of becoming a parent.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (213 messages🔥🔥):

  • RAG vs. Agent Approaches: Some members debated the efficacy of Retrieval-Augmented Generation (RAG) versus agent-based models. It was argued that RAG is a less capable approach, merely a band-aid for missing knowledge, whereas agent-based models with tool use and reflection are more robust, even though RAG may appear simpler to implement.

  • FastChat Format Frictions: There’s a discrepancy in the formatting of FastChat’s alpaca model prompts, with concerns about how it diverges from Stanford’s alpaca format. One member highlighted the difference and the potential need for a pull request to correct FastChat’s format for consistency.

  • Galore Optimizer Gains: Discussion about the new Galore optimizer was positive, noting its smooth setup and significant VRAM savings for fine-tuning large language models. It utilizes low-rank matrices of the gradient for each full model parameter and allows for full parameter tuning with considerably less memory usage.

  • GPT-3.5 Performance Persistence: Participants in the channel expressed interest in GPT-3.5, asking about the performance and inference times of various model sizes and configurations. One user noted suboptimal inference speeds when running locally on a Mac due to the privacy constraints of handling patient data.

  • Dataset Discussions: There was a brief exchange about dataset curation and formatting. It covered different types of datasets and their respective formats for model training, specifically between sharegpt and chatml formats, with confirmations on how Axolotl interprets and processes these datasets for model consumption.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (6 messages):

  • Galore merges with the Transformers: Galore optimizer has been merged into the Hugging Face Transformers library, exciting members who anticipate its integration.
  • Technical Assistance Required: A member reported a TypeError when running an example from “examples/openllama-3b/qlora.yml” related to an unexpected keyword argument ‘seq_len.’ Another member redirected the request for help to a specific channel for better assistance.

Link mentioned: FEAT / Optim: Add GaLore optimizer by younesbelkada · Pull Request #29588 · huggingface/transformers: What does this PR do? As per title, adds the GaLore optimizer from https://github.com/jiaweizzhao/GaLore Fixes: #29512 This is how I am currently testing the API: import torch import datasets from …


OpenAccess AI Collective (axolotl) ▷ #general-help (10 messages🔥):

  • The Text Classification Debate: A member questioned the common practice of fine-tuning a Language Model (LLM) for text classification by teaching it to output class names as text rather than adding a classifier head on top. A possible reason given for this approach was the flexibility it offers, such as enabling the training on chain of thoughts.

  • Tinkering with Model Parameters: There was a query on how to adjust all parameters in galore, with particular mentions of -mlp and self_attn. It wasn’t clear from the messages whether the member resolved their issue.

  • Training Mixtral for Coding Assistance: A user asked for guidance on training and fine-tuning a Mixtral-7B model to be a coding assistant with documentation for tools like runpod and python. They inquired about the necessary tools, IDEs, and concepts for training the model on personal hardware.

  • PyTorch and Gema: One member inquired whether gema is still not recommended (a no-no) on PyTorch.

  • Troubleshooting Preprocessing Error: A member reported an error related to KeyError: 'instruction' while preprocessing data using an axolotl preprocessing script, and shared a snippet from their configuration file and data. No solution was provided in the message history.


Latent Space ▷ #ai-general-chat (120 messages🔥🔥):

  • Automatic Model Merging Breakthrough: A member shared Hardmaru’s new paper on automated evolutionary algorithms for foundation model merging. They discussed it as a unique way of combining diverse open-source models to enhance model performance without extensive training.

  • Paris AI Community’s Buzz Over Meetups: There was a lively conversation among members about the AI community in Paris, France. Some shared experiences of recent meetups, while others expressed interest in attending future gatherings like the Paris Retrieval Augmented Generation group meeting, emphasizing the region’s active tech community.

  • Model Scaling Clarified: One member inquired about how models on Hugging Face, such as cosmo-1b, are downscaled from larger models like llama. Another member explained that smaller models are not fine-tuned but are instead separate architectures trained from scratch with scaled-down parameters.

  • Video Understanding AI Tools Spotlight: A discussion about tools for video analysis led to several recommendations, including Video Mamba and Twelve Labs, which enable advanced video understanding with foundational models.

  • Growing Interest in Open Source AI Platforms: A member pointed out the project JanAI, an open-source alternative to LM Studio that has gained attention on Reddit. Another clarified details about the partial open-sourcing plans for LM Studio after a discussion on the transparency of AI platforms emerged.

Links mentioned:


Latent Space ▷ #ai-announcements (4 messages):

  • Podcast Release with Adept Insights: A new podcast episode is up featuring an essay with insights into OpenAI, Google, and Adept. The announcement was accompanied by a Twitter link.
  • Team Effort Acknowledged for Adept Podcast: In preparation for the Adept podcast, thanks were given to a member for their assistance, despite not covering all the questions due to time constraints.
  • AI In Action: Llama.cpp: An AI-focused event titled AI In Action was about to start, showcasing Llama.cpp with a Discord channel link provided for live participation.

Latent Space ▷ #llm-paper-club-west (10 messages🔥):

  • Speaker Rights Not Granted: Members noted that they do not have speaker rights in the Discord channel.

  • Zoom to the Rescue: One member mentioned creating a Zoom room as an alternative to communicate since Discord speaker rights were unavailable.

  • Tight Schedule: A participant communicated having a hard stop at 1245, indicating limited time availability for discussion.


Latent Space ▷ #ai-in-action-club (92 messages🔥🔥):

  • Discovering the Identity of Slono: A revelation came to light that a user known as “slono” might not actually go by that name. Despite the surprise, a Spotify link of slono’s music was shared to the group, showcasing a style meant to capture the elusive atmosphere of long nights coming to an end (Listen to slono’s music).

  • The Padding Philosophy: Members humorously discussed the “pad and pray” method as an approach when the “math doesn’t math” in tensor dimensions, suggesting that dimensions should perhaps be more strictly managed or enforced IDE-side, like types in Python.

  • Llama.cpp Capabilities and UI Challenges: One user suggested that llama.cpp could potentially utilize GPU processing. There was also feedback regarding the suboptimal Discord mobile UI, especially the inability to minimize the camera during use.

  • Understanding Transformative Models: A conceptual discussion unfolded about transformer models being seen as weighted tensors with adjustable weights and graph operations. This led to a member sharing a visualization link from bbycroft.net about how these models work (LLM Visualization).

  • Emergent Discussion on Codebases and Music Models: There was a brief touch on the learning curve involved with reading large codebases efficiently and anticipation for a future discussion about music generation models.

Links mentioned:


LlamaIndex ▷ #blog (4 messages):

  • Navigating Privacy in Learning with IFTTT: The LlamaIndex blog discussed the challenge of improving LLM/RAG apps via few-shot demonstrations without risking private data leaks, specifically referencing patient clinical reports. The concern was illustrated with a tweet linking to a blog post.

  • Navarasa 2.0 Breaks Language Barriers: An update in the LlamaIndex blog introduced Navarasa 2.0, which is a fine-tuned version of Google Gemma 7B by @ravithejads to support 15 Indian languages. This development emphasizes the importance of localizing general AI models to better serve regional language speakers, as highlighted in this tweet.

  • Differential Privacy in Healthcare Data: A new post on LlamaIndex discusses implementing differential privacy in LLMs/RAG systems to safely use sensitive data, like healthcare information, with the goal of enhancing research without compromising individual privacy. More insights into this can be found in the associated tweet.

  • UX in Agent-Human Interaction: The LlamaIndex blog introduced a new template that optimizes user experience by having the agent request human input only when necessary. This approach aims to balance autonomy and intervention, and further details can be seen in the shared tweet.


LlamaIndex ▷ #general (184 messages🔥🔥):

  • Struggling with Bot Tool Integration: Members discussed difficulties in creating a chatbot that integrates different tools like Google Search and a code interpreter. The documentation was referenced but members encountered errors such as “BadRequestError”, with suggestions including combining tools into a single list and troubleshooting.

  • API and Documentation Updates: Several users reported issues with accessing certain pages of the LlamaIndex documentation, likely due to the site being updated to MKDocs. Links to the newly formatted documentation were provided by members as a workaround.

  • Query Pipeline Confusion: A query pipeline DAG use case detailed here left a user confused about the decision-making process for path traversal. Clarification was offered explaining that each chain and link in the DAG specifically defines the path and interactions for the inputs and outputs, ensuring convergence to a single output.

  • Batch Evaluation Logic Inquiry: Members requested assistance understanding the evaluation logic applied in LlamaIndex, with specific requests for comments on the code flow for clarity. Direct answers were provided detailing the function of each code piece and the logic behind response evaluations to determine if LLM outputs matched expected results.

Links mentioned:


Eleuther ▷ #general (51 messages🔥):

  • In Search of a Compact Code Dataset: The user sought a small pretraining dataset and considered the CodeSearchNet corpus which includes 2 million comment/code pairs but noted potential issues related to context length.
  • The MiniPile - A Compact Alternative for Diverse Pre-training: The MiniPile was suggested as a suitable diverse text corpus of 1M documents for pre-training language models on smaller datasets, with minimal loss in performance.
  • APIs Holding Back on Logprobs?: Discussion highlighted that closed-source models like Claude and Gemini do not provide logprobabilities and tokenizers, which are typically provided by platforms like OpenAI, potentially for proprietary reasons.
  • Optimizing Models for GPU Performance: A paper provided guidelines for maximizing runtime performance of transformer models by considering impact of hyperparameters and efficient model shapes which possibly can give up to 39% higher throughput.
  • Shifting Fortunes in the Tech Scene: Conversation touched on MS reportedly paying $600m to Inflection for poaching employees and mentioned a valuable H100 cluster, while contrasting the public speaking styles of various tech figureheads.

Links mentioned:


Eleuther ▷ #research (59 messages🔥🔥):

  • AI Innovation in Antibody Design: A member shared an Ars Technica article discussing advancements in AI for creating therapeutic antibodies, revealing excitement about the potential of diffusion models in this field. However, another contended skepticism about actual economic use cases coming from this research area.

  • DenseFormer Sheds Light on Activation Patterns: The DenseFormer architecture proposes a simple yet effective method of using Depth-Weighted-Average (DWA) to improve large-scale models without significant parameter increase, spurring discussion about often overlooked simple ideas in machine learning.

  • Exploring Reinforcement Learning and Transformer Sensitivity: The publication discussed introduces the Catformer architecture, aspiring to address challenges in training transformer models by reducing sensitivity through concatenated layers, a method that could improve stability in training.

  • Deep Attention Methods Discussed: Community members engaged in a discussion about historical precedence and recent innovations in transformer architectures such as the OmniNet, highlighting the potential and the challenges of implementing extensive attention mechanisms with full receptive fields.

  • Novelty and Functionality in Architectural Changes: Within the discourse on modifying neural network architectures, such as densenet-inspired transformers, participants weighed the value of novelty against the practical benefits of getting model modifications to work effectively at scale.

Links mentioned:


Eleuther ▷ #lm-thunderdome (73 messages🔥🔥):

  • Compatibility Issues between Megatron-Deepspeed and lm-eval 0.3.0: A participant highlighted a bug with megatron-deepspeed evaluation compatibility. It was recommended to load from an old version of cais/mmlu to bypass the issue, but this still posed problems due to auxiliary train split being moved, as seen in the provided Gist traceback.

  • Internal Usage of Modified lm-evaluation-harness: An arXiv paper cited the use of an internal fork of EleutherAI’s lm-evaluation-harness for multimodal pre-training evaluations. Discussions ensued about the benefit of gaining access to their evaluation framework, with invitations to collaborate on extending the harness to multimodal models.

  • WandB Logging Challenges with lm-evaluation-harness: A user reported issues where WandB logs eight times when running with eight GPUs, and GSM8K scores are printed to the terminal but not logged. It was suggested to move a block of logging code to post_init() as a temporary fix, with additional coordination for testing required.

  • Quantized Activations Support Inquiry: A question was raised about whether the eval harness supports quantized activations like W8A8, leading to the clarification that quantization support is indirect through other libraries like Huggingface, which might offer some A8 methods.

  • Potential Numerical Discrepancies with Megatron-Deepspeed: Concerns about slight numerical differences between evaluations using Huggingface transformers and Megatron-Deepspeed were discussed. It was speculated that differences in fused KQV multiplications could be due to bfloat16 usage, and that flash attention was deterministic but an analysis of forward pass outputs was necessary.

Links mentioned:


HuggingFace ▷ #announcements (1 messages):

  • ASCII Art Dataset Unveiled: Discover the art of ASCII with the new dataset by a community member, containing text files like andreas_who_is_who.txt and ascii_history_jgs.gmi. Explore the dataset and various ASCII artist resources provided here.

  • Melody Meets Model: Integrate audio into your language models with SMIT, a modality integration tool available on GitHub. Watch a demonstration of a music generation model fine-tuning process on YouTube.

  • One Model to Rule Them All: Fluently-v4 gets a global release, promoting a single model solution for multiple tasks. Details about the model and its creation involving checks and Lorases are showcased on Hugging Face.

  • AI Aids Open Governance: A blog post discusses the potential of AI, particularly LLMs, to improve government transparency and accessibility of public records. The use of AI technology like GPT-4 and Claude 3 in this domain is reviewed on kyopengov.org.

  • Imagining with SVGDreamer: The blog unveils SVGDreamer, a new text-guided vector graphics generation tool using a diffusion model. Published for CVPR2024, the tool allows for the creation of editable vector graphics from text prompts—more info provided on Hugging Face blog.

Links mentioned:


HuggingFace ▷ #general (76 messages🔥🔥):

  • Curiosity around Cookbooks: A member inquired about the term “cookbook” in the HuggingFace learn section, but specifics were not provided in the responses.
  • Choosing between Sdxl 1.0 and Stable Cascade: Discussion highlighted that Sdxl 1.0 or Stable Cascade could be the best models overall, with specialized finetuning possibilities for improvement in various areas.
  • Accelerate’s Quantization Techniques: Members touched upon the load_and_quantize_model functionality within Accelerate’s quantization document as a possible alternative to load_checkpoint_and_dispatch, with a simple test suggesting it is a viable option.
  • Gradio API Calls and Inactivity: Query raised about whether calling a space via API from Gradio Client will automatically restart an inactive space was not specifically answered.
  • Requests for Collaboration and Expertise: Multiple calls were put out for assistance or collaboration on various topics, including pretraining data challenges, project collaborations involving expertise in PyTorch, and understanding of model quantization.

Links mentioned:


HuggingFace ▷ #today-im-learning (1 messages):

  • Protein Sequences Get a Vector Boost: The UniProt project has released 1024-dimensional embeddings for a large number of proteins in their database. A member is considering retraining these for better searchability using Matryoshka embeddings, as described in a recent HuggingFace blog post.

Links mentioned:


HuggingFace ▷ #cool-finds (10 messages🔥):

  • BitNet b1.58 Unveiled: A new 1-bit LLM named BitNet b1.58, detailed in a paper on arXiv, claims to match full-precision LLMs in performance while being more cost-effective in terms of latency, memory, throughput, and energy consumption. The work could spur the development of hardware optimized for 1-bit LLMs.

  • AI-Driven Data Analysis Techniques on the Rise: An article on Medium discusses the use of Langchain, Instructor, and Pydantic to redefine data analysis with AI, promising enhancements in efficiency and capability. The article is available here.

  • Study on Human-Robot Team Cohesion: The first PhD paper discussing a conceptual framework to study team cohesion in Human-Robot Teams (HRTs) within engineering contexts is accessible at Cambridge Core here.

  • PatchTST Breakthrough in Time Series Forecasting: A Towards Data Science article introduces PatchTST, a new method that promises advancements in time series forecasting. The article can be referenced here.

  • Measuring LLMs’ ASCII Art Skills: A study offering a measurable set of metrics for evaluating Large Language Models’ capability in generating ASCII art is presented in a paper found on arXiv.

  • Tutorial on Visual Processing Mechanisms: A YouTube video from CVPR 2022 titled “Understanding early visual processing mechanisms by the principle of efficient encoding” provides insight into how biological vision works. The lecture can be watched here.

  • Research intrigue without details: A member shared a potentially interesting link from IEEE Xplore, but no direct information was provided about the content or relevance of the document. No further description was offered in the consecutive messages.

Links mentioned:


HuggingFace ▷ #i-made-this (23 messages🔥):

  • The Quest for ASCII Mastery: A participant seeks collaborators to tackle the challenge of fine-tuning a language model to generate quality ASCII art, having made moderate improvements with custom GPTs. They shared the ASCII Art dataset and expressed a desire to develop an open-source LLM that could, for instance, create intricate ASCII art of impossible geometric illusions.

  • Telegram Bot Unleashed: An AI bot created using the Hugging Face Mistral AI was introduced, and feedback is requested following engagement with the bot at @mistralaichat_bot on Telegram. The developer is seeking collaboration for upscaling and future projects.

  • Chaiverse’s Beta Developer Platform: An engineer from Chai Research announced their beta developer platform, Chaiverse, which ranks community-produced LLMs and allows developers to submit their models for real-user feedback. Interested individuals are encouraged to read more about their mission in the Chaiverse white paper.

  • Promoting Federated Learning: A link was shared to a GitHub repository focusing on federated learning for load forecasting using clustering and sequential DNN methods.

  • Chat Experiments with ASCII Art: Participants discussed methods and challenges of generating ASCII art with LLMs, including the use of HTML and CSS for formatting and the mixed results when requesting complex ASCII art from the models. The consensus seems to be that models can sometimes produce simple representations like cats but struggle with more intricate designs.

Links mentioned:


HuggingFace ▷ #reading-group (2 messages):

  • Hurry, Mark Your Calendars!: Event details have been added with the event link provided; an announcement is also expected today.
  • Decoding Obesity with Data: Check out an in-depth EDA notebook on obesity trends, where statistical analysis and visualizations reveal the interplay of age, gender, and lifestyle choices on this critical health issue.

Link mentioned: Deciphering Obesity Trends 📉: An In-depth EDA 📊: Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources


HuggingFace ▷ #computer-vision (4 messages):

  • Beware of Unwanted DM Offers: A member warned that an individual might reach out in direct messages to ask for paid work, noting that the person has been previously kicked out of other Discord servers for similar behavior.
  • SegGPT Model Unveiled: A new SegGPT model has been added, capable of various image-to-image tasks with impressive one-shot segmentation results. The SegGPT model and its paper are accessible via the Hugging Face documentation.
  • Gratitude for SegGPT: A member expressed thanks and showed interest in trying out the newly introduced SegGPT model.

Link mentioned: SegGPT: no description found


HuggingFace ▷ #NLP (33 messages🔥):

  • Quest for Personality Prediction Data: There’s an exploration for datasets suited for text-based personality prediction research, with myPersonality dataset being unavailable. The scarcity of public datasets for this application presents challenges for student-level research due to limited access to large-scale data.

  • A Journey to Master ASCII Art with LLMs: An exciting endeavor is being pursued to fine-tune large language models (LLMs) to excel in generating ASCII art, with a mention of a specific dataset, THE.ASCII.ART.EMPORIUM, and a call for guidance on how to effectively embed ASCII art for LLM training.

  • Sharing Deep Code Generation Dataset - “The Stack”: There’s a sharing of “The Stack” dataset, a 6TB trove of source code spanning over 300 programming languages, potentially useful for code generation projects. Users must agree to terms, including original code licenses and data removal updates, here.

  • Modernizing Topic Modeling with BERT-based Algorithms: A recommendation was made to check out BERTopic, a technique for topic modeling using 🤗 transformers and contextually informed embeddings that offer various topic modeling methods, detailed here.

  • Solving Quantization Challenges for Fine-Tuned Models: A discussion on best practices for quantizing LoRA-adapted models highlighted the utility of merging and minimizing quantization loss, with related examples found in the PEFT documentation.

  • Troubleshooting Trainer Class Issues in Huggingface: Issues were reported with using the Trainer class in Huggingface, particularly around dependencies requiring updates and acceleration. Suggestions involved upgrading libraries, clearing cache, manipulating import orders, and considering a restart or reconfiguration.

Links mentioned:


HuggingFace ▷ #diffusion-discussions (28 messages🔥):

  • Corrupted State Dictionary Woes: A member encountered a ValueError indicating a corrupted state dictionary when trying to load a fine-tuned model using model.eval(). It is unclear if there was a solution proposed or found for this issue.

  • Decoding the Diffusion Checkpoint Codes: A brief explanation was given that a checkpoint stores the learned information of a model, and the conversation transitioned towards searching on HuggingFace for checkpoints like sdxl 1.0 or stable diffusion 2.1.

  • ASCII Art with Diffusion Models: A discussion emerged around creating a diffusion-like model for an ASCII art dataset. The conversation explored converting ASCII to images, but the question of making a diffusion model that operates natively on ASCII remained open.

  • Financial AI Chatbot Construction: A user inquired about building an AI chatbot for financial data with multiple access levels and classifications. No specific model was proposed in the given messages, but another user mentioned the need to review the data first.

  • Inquiring Minds Want to Know: Users posed questions about joining a group named Zero-GPU-Explorers and assistance with using and training the all-MiniLM-L6-v2 model with their dataset, indicating a desire for community support.

Links mentioned:


OpenAI ▷ #ai-discussions (40 messages🔥):

  • Adding Chat GPT Bots to Discord & API Cost: To add Chat GPT bots to a Discord channel, one must obtain the API, which is not free. It is a paid service.
  • Troubles Receiving Responses in Postman: A community member struggled with not receiving responses on Postman despite setting up an assistant, thread, and message and was advised to review the documentation and check the “content” parameter for responses.
  • Perplexity Allegedly a Search Wrapper: A member shared an article claiming that Perplexity likely condenses content from Google Search top results, summarizing the content from the top 5-10 entries. Perplexity is Most Likely a Google Search Wrapper was published on March 18, 2024, by Mohit Pandey here.
  • Pondering AI’s Role in Video Compression: Community discussion explored the idea of using AI in video compression, comparing potential uses similar to deep learning super sampling (DLSS) and Whisper for audio compression. An existing blog post discussed this in terms of audio compression here.
  • Conversion to Int8 Embeddings for Storage Efficiency: A member reported saving about 80% on storage costs when preconverting Float32 embeddings to Int8 before sending to their vector database. They expressed a wish for native Int8 support in embedding-v3 models to streamline the process and debated the potential use of pickle, sqlite, and another database for various tasks in multimodal prototypes.

Links mentioned:


OpenAI ▷ #gpt-4-discussions (11 messages🔥):

  • Custom API Connection Clarification: A member asked how to connect to a custom GPT-3 model through the API. Solbus directed to use the Assistants API, providing a link for further assistance: Assistants API Guide.

  • Seeking Feedback on Animal Alter-Ego GPT: A user named boouyaah shared a GPT creation that turns individuals into animal versions of themselves and sought feedback on the prompts: You, but as an animal.

  • Sudden Reduction in Pinned Custom GPTs: Jaredquek reported an issue with the number of Custom GPTs that can be pinned to the sidebar, stating that previously pinned GPTs vanished and there’s now a limit to pinning only 4, seeking an explanation or workaround.

  • Optimizing Knowledge File Distribution Across Multiple GPTs: Mikejeason posed a question about whether it’s more productive to distribute knowledge files across multiple GPTs tailored for different parts of a prompt, rather than combining everything into a single GPT.


OpenAI ▷ #prompt-engineering (41 messages🔥):

  • Rule Reminder Tightens Up: After a query about prompt engineering jobs, users were reminded of Rule 7 prohibiting self-promotion, soliciting, or advertising. The rules were further clarified with a direct link to the rule provided by a user.
  • Disabilities Get a Cold Shoulder in GPT-4 Vision: A user expressed frustration when GPT-4 Vision failed to provide assistance regarding disabled individuals, repeatedly responding with “Sorry, I can’t help with that.”
  • Toolkit Teaser Ignites Curiosity: Despite the rule against self-promotion, a user mentioned developing a prompt chaining/prompt engineering toolkit and looked for people to test a prototype.
  • Challenging the ChatGPT Product Description Generator: A detailed discussion took place regarding the feasibility of using ChatGPT to generate product descriptions for a catalog, focusing on natural and organic products. There was skepticism about the AI’s ability to accurately handle the task without manual intervention.
  • Seeking Clarification on Benefits and Applications: The discussion evolved towards simplifying the task for ChatGPT, with a user suggesting to focus on generating benefits and uses sections based on the product descriptions provided, which might be a more manageable approach for the AI.

OpenAI ▷ #api-discussions (41 messages🔥):

  • Rule 7 Reminder: A new user inadvertently violated Rule 7 by asking about prompt engineering jobs, was reminded to review the server rules, particularly against self-promotion, soliciting, or advertising.
  • Apology for Misstep: Following the call to attention on server rules, the user apologized and promised to review the rules to ensure it does not happen again.
  • GPT-4 Vision Limitations Discussed: A member discussed difficulties in getting GPT-4 Vision to acknowledge disabled people, receiving standard unhelpful responses from the system.
  • Prompt Toolkit Promotion Violation: User quixoticquiche violated Rule 7 by advertising their prompt chaining toolkit and seeking feedback, leading to another reminder about the rules against soliciting in the server.
  • Challenges of Automating Product Descriptions: Members discussed the feasibility of using ChatGPT to automatically generate detailed product descriptions; concerns were raised about the preciseness and reliability of such generated content without human oversight.

LangChain AI ▷ #general (96 messages🔥🔥):

  • Understanding LangChain Tool Ingestion: A member inquired whether an array can be passed as input to a tool in LangChain, leading to an explanation that while general examples were provided, specific cases for array inputs were not available in the knowledge sources.
  • GraphCypherQAChain Use Case: A member sought advice on how to perform string comparisons in lower case within the GraphCypherQAChain, but no specific information was provided from the knowledge sources.
  • Learning Retrieval-Augmented Generation: A free resource, Intro to AI for Developers, was recommended for those looking to learn AI with a focus on Large Language Models in a project-based approach.
  • Humorous Take on AI Challenges: In a lighthearted discussion, members joked about the complexity of integrating various frameworks and technologies with LangChain, implying the difficulty can be as hard as solving the space-time continuum.
  • Dynamic Decision-Making for Database Queries: Discussion touched upon creating an agent capable of determining whether to query an SQL database or a vector database based on user questions, emphasizing the need for automatic decision-making in LangChain use cases.

Links mentioned:


LangChain AI ▷ #langserve (7 messages):

  • Python Version Hell Strikes Again!: A member is attempting to update langchain-ai/weblangchain and encounters issues with dependencies and Python versions. The error TypeError: Type is not JSON serializable: numpy.float64 is causing the application to crash, hinting at a serialization problem with numpy data types.

  • Potential Link to Existing Issue: The serialization problem may be related to a known issue discussed in TypeError: Type is not JSON serializable: numpy.float64 on the langchain-ai’s GitHub discussions.

  • Troubleshooting Other Components: Testing with LangSmith shows no issues, hence the problem could be tied to something on the TypeScript client side, as pinning Starlette to older versions did not resolve the issue.

  • Poetry Doesn’t Solve All Problems: A member suggested using Poetry to escape the Python version issues, but it’s revealed that Poetry is already in use and the problem persists with the latest versions of Langchain/Langserve.

  • Issue Raised on GitHub: The serialization issue led to the creation of a GitHub issue titled TypeError: Type is not JSON serializable: numpy.float64 to address the incompatibilities with the latest versions of Langchain/Langserve.

Links mentioned:


LangChain AI ▷ #share-your-work (5 messages):

  • AI-Powered Data Analysis Enhanced: An article titled “Harnessing Langchain, Instructor, and Pydantic: Redefining Data Analysis with AI” details how integrating several tools can transform data analysis. Read the full story on Medium.

  • Introducing Promptsage for Simplified Prompt Engineering: A new weekend project, Promptsage, aims to simplify prompt building and sanitization for LLMs, featuring security and privacy guardrails, and it’s compatible with langchain. Explore the tool on GitHub.

  • Exploring Chain Extensions for Large Outputs: A member inquires about a Langchain feature allowing chains to continue generating output beyond a model’s token limit by sending additional requests based on “Stop Reason” determinations. The question highlights a desire for effective handling of large outputs that exceed token restrictions like OpenAI’s GPT-4-Turbo’s 4k output tokens.

  • Python Meets Bedrock Anthropic Haiku: Due to a lack of support for functions in Bedrock, a comprehensive guide has been created, demonstrating how to leverage Bedrock Anthropic Haiku using Python. Interested readers can find the guide on Medium.

Link mentioned: GitHub - alexmavr/promptsage: Promptsage is an LLM prompt builder, linter and sanitizer with built-in guardrails: Promptsage is an LLM prompt builder, linter and sanitizer with built-in guardrails - alexmavr/promptsage


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

  • Sluggish Requests on the West Coast: Users on the West Coast (US) are experiencing unusually slow requests. The cause is suspected to be a cloud issue and is currently under investigation.

OpenRouter (Alex Atallah) ▷ #general (53 messages🔥):

  • Curiosity About Google’s Gemini 1.5 Pro: Members discussed the release of Google’s Gemini 1.5 Pro with a 1 million word context window. Despite public documentation only mentioning version 1.0, a member mentioned being in contact with Google to gain access.

  • C3 Model’s Inconsistency Woes: Users expressed frustration with C3’s performance, with one recommending the self-moderated version of Claude 3, claiming it’s less likely to reject content incorrectly, and even surpasses GPT-4 in this regard.

  • Debating Grok AI’s Capabilities: The conversation pivoted to Grok, an open-source model, where opinions diverged on its quality compared to Mixtral, with some labeling it “shitty” due to its high cost and potential undertraining. However, others defended Grok’s capabilities as a pure base model, emphasizing that it’s unfair to compare it with chat-tuned models.

  • Grok Benchmarks Questioned: A debate occurred over the usefulness of benchmarks for Grok, with some challenging the comparison to models like Mixtral, while others pointed to Grok’s official benchmarks showing it as a knowledgeable conversational model with wit.

  • Exploring Grok’s Testing and Access: Members discussed how to test Grok, with one member providing a link to try it out, and it was clarified that Grok can be accessed through the xAI platform, potentially without needing Twitter Premium+. They also discussed the content that could be used for testing, such as asking about political opinions or IT-related questions.

Links mentioned:


CUDA MODE ▷ #general (4 messages):

  • Nanobind as a Solution: A member suggested looking into nanobind to increase efficiency for MLX, with a mention of it being potentially helpful based on their experience.
  • Acknowledging the Helpful Tip: A follow-up message from the same member expressed gratitude for the recommendation to check out nanobind.
  • GTC Event Discord Hiccups: During the GTC event, members faced issues with Discord’s stage channel regarding screen sharing not functioning correctly. It was resolved by moving to a voice channel, prompting the suggestion to use voice channels by default for future lectures.

CUDA MODE ▷ #triton (1 messages):

  • Fusing GaLore’s Adam with Triton: A member conducted a study and opened a pull request on GitHub detailing the process of fusing GaLore’s Adam optimizer with Triton. Best results were achieved with a hybrid kernel leveraging torch.matmul for the projection of gradients to low-rank, enhancing the memory efficiency during pre-training and fine-tuning of models.

Link mentioned: [WIP] Fused Adam Triton Kernels by jeromeku · Pull Request #29 · jiaweizzhao/GaLore: Fused GaLore Adam (WIP) Various fused implementations of Adam update step per Gradient Low-Rank Projection This is an initial attempt at optimizing the update step of the GaLore Adam optimizer. Ove…


CUDA MODE ▷ #cuda (1 messages):

  • micrograd gets a CUDA boost: A member shared a link to a library, micrograd-cuda, that extends Karpathy’s micrograd library with CUDA kernels and adds 2D tensor logic. The GitHub repository offers contributions to further develop this CUDA-accelerated version of micrograd.

Link mentioned: GitHub - mlecauchois/micrograd-cuda: Contribute to mlecauchois/micrograd-cuda development by creating an account on GitHub.


CUDA MODE ▷ #torch (3 messages):

  • Lightning Strikes PyTorch: Lightning Thunder, a source-to-source compiler for PyTorch, was highlighted, aiming to speed up PyTorch programs on single accelerators and distributed systems.

  • GTC Session Announcement: Members were informed about an upcoming GTC talk and prompted to ask questions to specific individuals before the session is up in ~24 hours.

  • Link to NVIDIA GTC Session: An NVIDIA GTC talk related to Thunder was mentioned with a direct link to the session catalog, detailing the dates for workshops, AI conference and expo, and the keynote running from March 17-21, in San Jose, CA, and virtually.

Links mentioned:


CUDA MODE ▷ #algorithms (9 messages🔥):

  • Ozaki Scheme Enhances Matrix Multiplication: The Ozaki scheme, as explained in an arXiv paper, optimizes multiple precision basic linear computation and can perform faster than existing methods for fixed and arbitrary precision matrix multiplication. It benefits from optimized low precision operation and outperforms Strassen matrix multiplication up to a certain precision.
  • Exploring the Kahan Summation Algorithm: A link was shared to the Wikipedia article about the Kahan summation algorithm, an approach in numerical analysis that significantly reduces numerical error during summation by maintaining a running compensation.
  • IEEE 754 Standards: Reference was made to an ITU paper discussing the IEEE 754 standards, which are crucial for floating-point computation.
  • Jeremy’s Team Acknowledges Mention: Acknowledgement came from Jeremy Howard regarding the mention of their work related to Ozaki scheme implementation, with an expression of future improvements in the area.

Links mentioned:


CUDA MODE ▷ #suggestions (5 messages):

  • Recommendations for Organized Conversation: A member suggested using standard messages for new topics, replies for branching into conversations, and thread creation for focused subjects, underlining the benefits for message visibility and channel readability. They complimented a user named <@272654283919458306> for excellent thread management on another server called Latent Space.
  • Acknowledgement of Server Tips: A member expressed appreciation for the etiquette suggestions made regarding conversation organization on Discord, finding them especially helpful as a new user.
  • Presentation of a Resourceful Link: A member shared a Springer book link, highlighting details about conference proceedings for PPAM 2022, including contributors and a table of contents with access to papers.

Link mentioned: Parallel Processing and Applied Mathematics: no description found


CUDA MODE ▷ #pmpp-book (2 messages):

  • Seeking Solution Verification: A member has completed Chapter 2 exercises from the ‘pmpp-book’ and is looking for ways to verify their answers. They expressed interest in DM exchanges for cross-checking solutions with others.

CUDA MODE ▷ #off-topic (2 messages):

  • Lightning Strikes for CUDA Lovers: A member highlighted the launch of a new Zero to Thunder tutorial at GTC, targeting Python and PyTorch enthusiasts who want to deliver custom CUDA kernels for not-so-standard models. Although it’s still in its experimental stage, and some functionalities may be lacking, it’s an enticing venture for the adventurous.

  • Smiling GPUs Cause a Stir: An observation was shared via a Twitter link pointing out the amusing fact that the new Blackwell GPUs appear to have smiley faces. This quirky design feature caught the attention of tech enthusiasts, sparking humorous comments online.

Links mentioned:


CUDA MODE ▷ #triton-puzzles (12 messages🔥):

  • Tensor Color Code Deciphered: In triton-puzzles, color coding of tensors was clarified; color signifies the source tensor, with out-of-bounds access showing red. There was a discussion about a potential bug, suggesting that out-of-bounds loads might be incorrectly indicated, even if masking is correct.

  • The Draw Order of Tensors Explained: The draw order for tensors in Triton was specified to be depth, row, column, and 1D tensors are drawn as 1,1,col.

  • Triton tl.exp Operator Issue Reported: A new member encountered a NotImplementedError when using tl.exp(x) or x.exp() in Triton, stating this operation is unsupported in interpreter mode with no numpy implementation.

  • Member Submits PR for Exp2: The exp2 function was mentioned in a probable context of a fix or implementation in flash attention with a Pull Request submitted to Triton.

  • Puzzle 3 Completed, Debugging Puzzle 4: A member completed Puzzle 3 and shared their debugging process using print statements for Puzzle 4, where they compare their answer with the expected one by performing an outer sum operation with torch.


LLM Perf Enthusiasts AI ▷ #general (21 messages🔥):

  • Struggling with Space Matching: Uniti AI is facing issues with their AI leasing agents where GPT4.Turbo is not accurately matching property inventory to user requirements—for example, suggesting properties with 17,000 sq. ft when asked for spaces between 2,000 - 4,000 sq. ft.
  • Complex Matching Logic Challenges: A nuanced challenge is to offer inventory within a 33% range for properties below 5,000 sq. ft, and a 20% range for those above 5,000 sq. ft, making the matching process increasingly complex.
  • Suggestions for a Simplified Approach: The current approach involves a detailed prompt to match broker inquiries, but the suggestion was made to use regular filters or have the LLM generate a SQL query to pull the right units instead.
  • Recognizing Over-Reliance on LLMs: The conversation highlights a “common llm trap”, suggesting that not all tasks require an LLM, and simpler database queries could be the solution. The idea is to use the LLM for generating the query rather than the filtering itself.
  • Reference to RAG for Efficiency: A blog post on Retrieval Augmented Generation (RAG) by Jason Liu was mentioned, illustrating the effectiveness of pairing LLMs with regular database queries for tasks such as date range extraction.

Link mentioned: RAG is more than just embedding search - Instructor: no description found


LLM Perf Enthusiasts AI ▷ #claude (5 messages):

  • Bedrock vs Direct Approach: A user mentioned that opting for a direct integration with Claude might be more efficient as Bedrock has shown to be somewhat cumbersome and less reliable in terms of uptime.
  • Frontline Access to Claude: One member revealed they have priority rate limits with Claude owing to a year-long development partnership, putting them ahead of a substantial 200k+ waitlist.
  • Choosing Direct Connection Over Bedrock: Despite having priority access, the same user confirmed they are employing a direct connection to interact with Claude, bypassing the Bedrock framework.

LLM Perf Enthusiasts AI ▷ #jobs (1 messages):

ibash: > write high quality code Damn.


LLM Perf Enthusiasts AI ▷ #openai (1 messages):

jeffreyw128: lol wut


LLM Perf Enthusiasts AI ▷ #prompting (1 messages):

emrgnt_cmplxty: Basic prompting isn’t getting it done for you?


Interconnects (Nathan Lambert) ▷ #ideas-and-feedback (15 messages🔥):

  • Seeking Synthetic Benchmarks for LLM Study: A member inquired about the existence of fully synthetic benchmarks with controllable properties to study foundation model capabilities, specifically LLMs.
  • Startups Generate Data Through LLMs: It was mentioned that startups are creating synthetic benchmarks using an LLM to generate a lot of data based on the model they are studying.
  • Disentangling LLM Capabilities from Data Quality: A discussion arose about studying the origins of LLM capabilities by altering the diversity and reasoning presence in the training data to move beyond the general belief that “the capabilities are in the data.”
  • Synthetic Data and Worlds Garner Interest: One member expressed enthusiasm for synthetic data and worlds, contemplating writing a paper on the subject.
  • Organizing Open Source Data Curation: A member suggested that a public, systematic approach to constructing pretraining data could be beneficial for organizing open-source data curation efforts.

Interconnects (Nathan Lambert) ▷ #ml-questions (6 messages):

  • In Search of SOTA: ChatGPT is mentioned as a tool for rewriting content, hinting that the practice of using language models for rewriting is commonplace in achieving state-of-the-art (SOTA) results.
  • Minor Tweaks in the Workflow: A member discusses using ChatGPT for rewriting content, making minor changes to tailor the output for their needs.
  • Project Work in Progress: A side project related to ChatGPT and rewriting was mentioned, with the member noting a lack of substantial insights due to limited involvement.
  • Academic Rush: ChatGPT’s rewriting capabilities are being used to expedite the completion of a class project.

Interconnects (Nathan Lambert) ▷ #random (5 messages):

  • Bot Beatdown Effect on Human Psyche: A member speculated about the impact of losing to AI on human players, indicating games like chess are unscathed despite superhuman AIs.
  • Historical AI Victories Resonate: One participant pointed out that Garry Kasparov’s loss to Deep Blue had significant impact on chess, similar to AI’s later triumph in Go.
  • The Individual Player’s Take: A user weighed in, suggesting that how affected a player is by AI might vary greatly depending on their personal demeanor.
  • Philosophical AI Discussion: Someone shared a link to a discussion with Minqi Jiang and Marc Rigter about the possibility of creating a generalist agent in reinforcement learning (MLStreetTalk Tweet).

Link mentioned: Tweet from Machine Learning Street Talk (@MLStreetTalk): We just dropped the show with @MinqiJiang and @MarcRigter and discuss the philosophy of whether it is possible, in principle and in practice to build a “generalist agent” in RL.


Alignment Lab AI ▷ #looking-for-collabs (1 messages):

  • Help Wanted for the 01 Open Source Hardware: A member introduced the new open source hardware device named the 01, asking for contributions from the community. The project’s hardware and software are fully open source, with details available in this tweet.

Alignment Lab AI ▷ #general-chat (1 messages):

venadore: life lesson


Skunkworks AI ▷ #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=21Tc92g15pM