> AI News for 3/13/2024-3/14/2024. We checked [**358** Twitters](https://twitter.com/i/lists/1585430245762441216) and **21** Discords (**336** channels, and **3518** messages) for you. Estimated reading time saved (at 200wpm): **426 minutes**.

It’s the anniversary of GPT4, but no GPT5 for you today. Join @elonmusk in checking out the latest Latent Space pod with Suno AI?

(Also we missed highlighting the Figure 01 launch yesterday, which in retrospect we’d rank slightly higher than Deepmind SIMA in impressiveness/near term importance).


Table of Contents

[TOC]


PART X: AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs

AI Agents and Environments

  1. DeepMind announces SIMA, a generalist AI agent that can follow natural language instructions in a broad range of 3D environments and video games, marking an important step towards agents that can tackle complex tasks requiring planning and sub-tasks. (537,888 impressions)

  2. DeepMind’s SIMA agent demonstrates the ability to follow natural language instructions to carry out tasks across a wide array of game worlds, similar to how a human would play. This is an exciting development in embodied AI agents. (178,835 impressions)

  3. The SIMA research focuses on developing embodied AI agents that can translate abstract language into useful actions, using video games as safe, accessible testing environments rather than optimizing for high scores. (24,983 impressions)

Large Language Models and Scaling

  1. Anthropic introduces Claude 3 Haiku, their fastest and most affordable model, now available in the API and on Perplexity for Claude Pro subscribers. (299,766 impressions)

  2. Language models scale reliably with over-training and on downstream tasks. A new paper explores gaps in LM scaling laws, providing insights into over-training and linking model perplexity to downstream performance. (10,589 impressions)

  3. Branch-Train-MiX (BTX) is a new approach for training large language models more efficiently by mixing expert LLMs into a Mixture-of-Experts LLM. It is shown to be more efficient than training a larger generalist LLM or several separate specialized LLMs. (11,042 impressions)

AI Coding Assistants and Software Engineering

  1. @fchollet predicts there will be more software engineers in five years than today, estimating growth from 26-27M today to 30-35M in 5 years. He argues that making it easier to code has historically led to more coding jobs. (188,949 impressions)

  2. Cohere’s Command-R model focuses on retrieval augmented generation (RAG) and tool usage - two key skills for building LLM applications. It addresses issues in scaling proof-of-concept LLM apps to production. (2,297 impressions)

  3. A perspective that AI will enable more software engineers, and that fancy demos are causing overreaction. Most AI coding solutions will likely have limited scope and need human supervision. (15,308 impressions)

AI Safety and Regulation

  1. The EU AI Act has been approved by Parliament, representing big and largely positive AI news. (11,126 impressions)

  2. Key requirements in the AI Act include that GPAI systems must publish “detailed summaries of the content used for training.” (1,759 impressions)

  3. A paper on “Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation” is highlighted as promising work in light of the AI Act’s approval. The paper proposes using a pre-trained LLM to generate differentially private synthetic examples from private datasets. (79 impressions)

Memes and Humor

  1. A meme jokes that an “AI software engineer” that can automate everything would be used as a product rather than to dominate the market. (586,645 impressions)

  2. A humorous tweet imagines Andrej Karpathy leaving Tesla because he suggested changing a learning rate constant from 0.086 to 0.0855541. (270 impressions)

  3. A meme suggests that people waiting for GPT-5 to drop will be disappointed again. (1,378 impressions)

Other Notable Topics

  • Together Computing raises $106M to rapidly bring research innovations to production and build a platform for running generative AI applications on open-source models at scale. (112,647 impressions)

  • Keras 3 benchmarks show no single “best” backend, with the optimal choice depending on model architecture. Keras 3 models are consistently faster than PyTorch without requiring custom optimizations. (51,849 impressions)

  • A new LlamaParse document parsing solution excels at extracting images, tables, and charts, and can be steered via natural language instructions. It integrates with LlamaIndex for building RAG systems over complex documents. (85,018 impressions)


PART 0: Summary of Summaries of Summaries

Since Claude 3 Haiku was released recently, we’re adding them to this summary run for you to compare vs our custom GPT summarizer (all are different than the smol model running the Part 1/2 summaries). We’ll keep running these side by side for a little longer while we build the AINews platform for a better UX. We’ve noticed that the same prompts result in consistently different output in the 3 Claude models. We’ll be trying to tweak prompts in tomorrow’s iteration to get Haiku at least behaving.

Claude 3 Haiku (3B?)

  • Nvidia Puts the Brakes on Translation Layers: Nvidia has implemented a ban on using translation layers to run CUDA-based software on non-Nvidia chips, targeting projects like ZLUDA, with further details discussed in a Tom’s Hardware article. Some members expressed skepticism over the enforceability of this ban.

  • CUDA Error Riddles and Kernel Puzzles: CUDA developers are troubleshooting errors like CUBLAS_STATUS_NOT_INITIALIZED with suggestions pointing to tensor dimensions and memory issues, as seen in related forum posts. Other discussions centered around cuda::pipeline efficiency and understanding effective bandwidth versus latency, referencing resources such as Lecture 8 and a blog on CUDA Vectorized Memory Access.

  • CUTLASS Installation Q&A for Beginners: New AI engineers sought advice on installing CUTLASS, learning that it’s a header-only template library, with installation guidance available on the CUTLASS GitHub repository, and requested resources for implementing custom CUDA kernels.

  • Ring-Attention Project Gets the Spotlight: A flurry of activity took place around the ring-attention experiments with conversations ranging from benchmarking strategies to the progression of the ‘ring-llama’ test. An issue with a sampling script is in the process of being resolved as reflected in the Pull Request #13 on GitHub, and the Ring-Attention GitHub repository was shared for those interested in the project.

  • Lecture 8 on CUDA Performance Redone and Released: The CUDA community received a re-recorded version of Lecture 8: CUDA Performance Checklist, which includes a YouTube video, code on GitHub, and slides on Google Docs, garnering appreciation from community members. Discussions ensued on the mentioned DRAM throughput numbers and performance differences in coarsening.

Claude 3 Sonnet (14B?)

  1. New AI Model Releases and Capabilities:

    • Cerebras unveils CS-3 AI accelerator capable of training up to 24 trillion parameter models on a single chip, with 4 trillion transistors and 125 petaflops of compute power.
    • Anthropic releases Claude 3 Haiku, a fast and cost-efficient model available on OpenRouter, running at ~120 tokens/s and 4 million prompt tokens per dollar.
    • Cohere’s Command-R model with 128k token context window is now on OpenRouter, at 2 million prompt tokens per dollar.
    • DeepMind’s SIMA is a new generalist AI agent that can understand natural language instructions in video game environments, though its technical details lack transparency.
  2. AI Safety and Vulnerability Concerns:

    • A new paper on ComPromptMized reveals prompt injection attacks on AI models like Gemini Pro, ChatGPT 4.0, and LLaVA, highlighting vulnerabilities in GenAI-powered applications.
    • Discussions around OpenAI’s security incident and the implications for AI security.
    • Debates on the EU’s new AI legislation requiring disclosure of AI-generated content and avoiding illegal outputs, with concerns over practicality and impact on open source models.
  3. Advances in Multimodal AI and Interpretability:

  4. Synthetic Data Generation for LLM Pretraining and Finetuning:

    • Discussions on using synthetic data as an alternative to human annotations for pretraining, instruction-tuning, and preference-tuning of language models, with potential quality advantages.
    • AI News newsletter provides valuable summaries of AI discords and top Twitter accounts, recommended by experts like Soumith Chintala and Andrej Karpathy.
    • Debates on whether fine-tuning can impart new knowledge to models, and the efficiency of fine-tuning for style transfer versus knowledge acquisition.

Claude 3 Opus (>220B?)

  • Hermes 2.5 Outperforms Hermes 2: Updates to Hermes 2.5 with code instruction examples have resulted in notable performance improvements compared to Hermes 2. The community is actively discussing related topics like JSON mode versus function calling, clarifying that JSON mode necessitates a predefined schema, whereas function calling involves executed function responses.

  • Cerebras Reveals AI Titan: Cerebras Systems launches its CS-3 AI accelerator capable of training up to 24 trillion parameter models. This mammoth chip has 4 trillion transistors and is expected to deliver 125 petaflops of AI compute power, as detailed in the press release.

  • AI Trends Galore: From Stanislaw Lem’s science fiction recommendations to SDAM development and Devin the AI Software Engineer’s YouTube debut, the community has its eyes peeled on a variety of AI and engineering marvels. The eagerness for open source models that could provide 100k+ context and concerns about privacy in information sharing also embodies the diverse range of interests.

  • Debating Decentralization in AI: Amid discussions of TAO potentially challenging Hugging Face, the community delves into debates about centralized vs. decentralized AI model platforms. The introduction of a new project, Shoggoth, sparks curiosity, yet detailed information is lacking due to broken links.

  • Claude 3 Powers Haiku Creation: Perplexity Labs introduces Claude 3 Haiku, enticing users to experiment with poetic AI capabilities for free, bolstering the platform’s suite of creative tools.

  • A Diverse AI Toolbox: Engineers and developers are actively engaging with Perplexity AI for a multitude of uses such as coding support and SE troubleshooting, while creatively experimenting with newly added features like AI-generated Haikus. The platform’s local search capabilities are now enhanced by integrating Yelp and Maps for more efficient local business discoveries.

  • AI Ecosystem Rivalries and Perspectives: The guild hosts vigorous debates comparing various AI models; GPT-4 and Mistral are pitted against each other, with the former being argued as superior by some, while others favor the latter’s speed.

  • API Integration and Model Limitations: Users discuss using Perplexity’s models for complex queries and utilizing the Perplexity API for developing applications, such as a Firefox extension, while noting a 25MB upload limit and uncertain performance with extensive databases, such as those related to real estate.

  • APIs in Focus: Questions and Potential: An inquiry about the closed beta of URL citations in Perplexity’s API awaits insider insight, while others seek advice on the API’s performance for condition checking. Members also examine the behavior of the “return_citations” option and determine the best models for handling up-to-date information, singling out Sonar-small-online and sonar-medium-online for their real-time data access capabilities.

  • Tackling LM Studio Outside the UI Box: Users examined running LM Studio’s API services on a home network without the user interface, focusing on server mode and localhost connections. It was highlighted that llama.cpp is a viable option, sustaining AVX without AVX2 and allowing independence from the LM Studio UI, per its GitHub repository.

  • LM Studio Limitations Spur Creative Workarounds: Among LM Studio’s constraints is the inability to launch services or connect to the internet programmatically; users creatively employed batch files and PowerShell scripts to automate starting the LM Studio inference server, showcasing the community’s resourcefulness.

  • Mighty Models Extended and Examined: The Nous-Yarn-Mistral-7b-128k model expanded to a 128k token context window using the YaRN method, alongside discussions about model perplexity and humorous disappointment with the “Yet Another ” naming convention. Moreover, some shared format-specific obstacles, such as incompatibility issues with llama.cpp for the Command-R 35B v1.0 GGUF format.

  • ROCM Round-Up: Real-world experiences with ROCm support in LM Studio were shared, including troubleshooting steps like using AMD’s cleanup tool and avoiding PRO drivers. Vision models proved challenging, and it was advised to choose Nvidia GPUs over AMD for image generation projects. Additionally, a user found that disabling the iGPU on a Gigabyte motherboard in BIOS settings enabled better usage of their RX 7900 XT with ROCm.

  • Hardware Conversation Heats Up: The cost of SLI/NVLink sparked debates, complemented by discussions on overcoming Mac OS’s minimum VRAM requirements, strategizing PC hardware upgrades, and balance in multi-model deployments in LM Studio. Separate dialogues covered selecting the right dual-purpose monitor, with an inclination towards OLED screens despite burn-in risks and preferences for high refresh rates to match top-tier graphics cards like the Nvidia 4090.

  • AI Gold Rush Continues: Various AI startups like Cognition, Magic, and Fluent have attracted impressive venture capital investments, with discussions drawing attention to the ongoing trend of significant funding for AI companies. Participants shared a collection of tweets that gave an overview of companies and their raised capital, referencing a feed at chiefaioffice.

  • Cerebras Flexes Its AI Muscles: Cerebras Systems unveiled the CS-3 AI accelerator, claiming it’s capable of training up to 24 trillion parameter models. The announcement has sparked interest and the discussion also mentioned a related press release and a tweet.

  • Security Red Alert at OpenAI: Members discussed a security issue at OpenAI with references to a detailed Post Mortem analysis available in a gist. The community delved into the implications for AI security.

  • Prep Up for Synthetic Data Insights: An upcoming presentation on Synthetic Data for Finetuning was announced with materials to read beforehand at Eugene Yan’s writing. The use of synthetic data as an alternative for human annotations in pretraining and fine-tuning language models was underscored by the group.

  • Rethinking Data in LLMs: In-depth discussions explored the use of synthetic data for pretraining and fine-tuning LLMs, and the implications for knowledge acquisition via fine-tuning. A blog post providing significant insights, referred to during the discussions, can be found at eugeneyan.com, and a summary service by AI News was mentioned by engineering professionals as a valuable resource.

ChatGPT (GPT4T)

  • Positional Encodings in Language Models: The Nous Research AI Discord discussed the critical role of positional encodings in enhancing the performance of causal language models for processing longer sequences. A pivotal paper, "Understanding Positional Encodings in Large Language Models", was highlighted for offering deep insights into this area.

  • Hermes 2.5 Function Calling: Significant performance gains were observed with Hermes 2.5's introduction, especially in function calling versus JSON mode, drawing community attention to its practical examples at GitHub.

  • CS-3 AI Accelerator by Cerebras: Cerebras Systems unveiled its CS-3 AI accelerator, capable of training models up to 24 trillion parameters. This hardware milestone, featuring 4 trillion transistors and promising 125 petaflops of AI compute, was detailed in their press release.

  • Perplexity AI's Claude 3 Haiku: Perplexity AI showcased Claude 3 Haiku, emphasizing the model's ability to craft Haikus, as part of their effort to expand the creative capabilities of AI, with further details available on Perplexity Labs.

  • Local Model Testing in OpenAI Discord: Discussions around testing local models, particularly Meditron and Mistral, on setups with up to 4xT4s using LLM Studio, were prominent, including the best practices for fine-tuning these models for optimum performance.

  • Interpretability in Multimodal Models: The Alignment Lab AI Discord is seeking collaborators for open-source interpretability projects focusing on multimodal models, with further details shared by soniajoseph_ on Twitter and LessWrong.

  • Devin, the Autonomous Software Engineer: Both Skunkworks AI and AI Engineer Foundation highlighted Devin, introduced as the world’s first autonomous software engineer by Cognition Labs. This AI's capabilities and introduction are covered in their blog post and a YouTube video.


PART 1: High level Discord summaries

Nous Research AI Discord Summary

  • Positional Encodings Decoded: There is significant discussion about the role of positional encodings in causal language models, with insights pointing to the necessity of positional encodings for handling longer sequences effectively. A paper of interest in this regard is “Understanding Positional Encodings in Large Language Models”.

  • Unlocking the Secrets of Hermes 2.5: Updates to Hermes 2.5 with code instruction examples have resulted in notable performance improvements compared to Hermes 2. The community is actively discussing related topics like JSON mode versus function calling, clarifying that JSON mode necessitates a predefined schema, whereas function calling involves executed function responses.

  • Cerebras Reveals AI Titan: Cerebras Systems launches its CS-3 AI accelerator capable of training up to 24 trillion parameter models. This mammoth chip has 4 trillion transistors and is expected to deliver 125 petaflops of AI compute power, as detailed in the press release.

  • AI Trends Galore: From Stanislaw Lem’s science fiction recommendations to SDAM development and Devin the AI Software Engineer’s YouTube debut, the community has its eyes peeled on a variety of AI and engineering marvels. The eagerness for open source models that could provide 100k+ context and concerns about privacy in information sharing also embodies the diverse range of interests.

  • Debating Decentralization in AI: Amid discussions of TAO potentially challenging Hugging Face, the community delves into debates about centralized vs. decentralized AI model platforms. The introduction of a new project, Shoggoth, sparks curiosity, yet detailed information is lacking due to broken links.


Perplexity AI Discord Summary

  • Claude 3 Powers Haiku Creation: Perplexity Labs introduces Claude 3 Haiku, enticing users to experiment with poetic AI capabilities for free, bolstering the platform’s suite of creative tools.

  • A Diverse AI Toolbox: Engineers and developers are actively engaging with Perplexity AI for a multitude of uses such as coding support and SE troubleshooting, while creatively experimenting with newly added features like AI-generated Haikus. The platform’s local search capabilities are now enhanced by integrating Yelp and Maps for more efficient local business discoveries.

  • AI Ecosystem Rivalries and Perspectives: The guild hosts vigorous debates comparing various AI models; GPT-4 and Mistral are pitted against each other, with the former being argued as superior by some, while others favor the latter’s speed.

  • API Integration and Model Limitations: Users discuss using Perplexity’s models for complex queries and utilizing the Perplexity API for developing applications, such as a Firefox extension, while noting a 25MB upload limit and uncertain performance with extensive databases, such as those related to real estate.

  • APIs in Focus: Questions and Potential: An inquiry about the closed beta of URL citations in Perplexity’s API awaits insider insight, while others seek advice on the API’s performance for condition checking. Members also examine the behavior of the “return_citations” option and determine the best models for handling up-to-date information, singling out Sonar-small-online and sonar-medium-online for their real-time data access capabilities.


LM Studio Discord Summary

Tackling LM Studio Outside the UI Box: Users examined running LM Studio’s API services on a home network without the user interface, focusing on server mode and localhost connections. It was highlighted that llama.cpp is a viable option, sustaining AVX without AVX2 and allowing independence from the LM Studio UI, per its GitHub repository.

LM Studio Limitations Spur Creative Workarounds: Among LM Studio’s constraints is the inability to launch services or connect to the internet programmatically; users creatively employed batch files and PowerShell scripts to automate starting the LM Studio inference server, showcasing the community’s resourcefulness.

Mighty Models Extended and Examined: The Nous-Yarn-Mistral-7b-128k model expanded to a 128k token context window using the YaRN method, alongside discussions about model perplexity and humorous disappointment with the “Yet Another ” naming convention. Moreover, some shared format-specific obstacles, such as incompatibility issues with llama.cpp for the Command-R 35B v1.0 GGUF format.

ROCM Round-Up: Real-world experiences with ROCm support in LM Studio were shared, including troubleshooting steps like using AMD’s cleanup tool and avoiding PRO drivers. Vision models proved challenging, and it was advised to choose Nvidia GPUs over AMD for image generation projects. Additionally, a user found that disabling the iGPU on a Gigabyte motherboard in BIOS settings enabled better usage of their RX 7900 XT with ROCm.

Hardware Conversation Heats Up: The cost of SLI/NVLink sparked debates, complemented by discussions on overcoming Mac OS’s minimum VRAM requirements, strategizing PC hardware upgrades, and balance in multi-model deployments in LM Studio. Separate dialogues covered selecting the right dual-purpose monitor, with an inclination towards OLED screens despite burn-in risks and preferences for high refresh rates to match top-tier graphics cards like the Nvidia 4090.


Latent Space Discord Summary

  • AI Gold Rush Continues: Various AI startups like Cognition, Magic, and Fluent have attracted impressive venture capital investments, with discussions drawing attention to the ongoing trend of significant funding for AI companies. Participants shared a collection of tweets that gave an overview of companies and their raised capital, referencing a feed at chiefaioffice.

  • Cerebras Flexes Its AI Muscles: Cerebras Systems unveiled the CS-3 AI accelerator, claiming it’s capable of training up to 24 trillion parameter models. The announcement has sparked interest and the discussion also mentioned a related press release and a tweet.

  • Security Red Alert at OpenAI: Members discussed a security issue at OpenAI with references to a detailed Post Mortem analysis available in a gist. The community delved into the implications for AI security.

  • Prep Up for Synthetic Data Insights: An upcoming presentation on Synthetic Data for Finetuning was announced with materials to read beforehand at Eugene Yan’s writing. The use of synthetic data as an alternative for human annotations in pretraining and fine-tuning language models was underscored by the group.

  • Rethinking Data in LLMs: In-depth discussions explored the use of synthetic data for pretraining and fine-tuning LLMs, and the implications for knowledge acquisition via fine-tuning. A blog post providing significant insights, referred to during the discussions, can be found at eugeneyan.com, and a summary service by AI News was mentioned by engineering professionals as a valuable resource.


Unsloth AI (Daniel Han) Discord Summary

Visualizing Token Probabilities: Discussions indicated a need for visualizing token probability in sentences, with suggestions on using lm_head’s output and softmax. However, there seems to be a lack of specific plugins for this visualization.

AI’s Fast-Paced Progress: Conversations were buzzing about the rapid development in AI, with anticipation for Elon Musk’s Grok model and chatter about OpenAI’s authenticity.

Unsloth AI Battles Colab Woes: Fixes for Google Colab’s PyTorch update issues were shared by Unsloth AI, along with a command list to help users rectify these problems themselves. Unsloth AI’s compatibility was clarified, noting that it doesn’t support multi-GPU or GGUF formatted models for fine-tuning yet, but it can handle 4-bit quantization for single-GPU setups.

Data Preparation Discussion: An active conversation recommended the creation of an FAQ for data preparation, suggesting a more automated approach could be beneficial.

Sophia Optimizer Sparks Interest: A new optimizer, Sophia, proposed for reducing language model training time and cost, caught the attention of the community. While untested, there’s optimism it could replace existing optimizers effectively (Sophia Optimizer Paper).


OpenAI Discord Summary

  • GPT-3.5 rocks Python scripting: A conversation highlighted GPT-3.5’s ability to use Python to write a program that successfully generates examples of repeated morphemes. The complexity of the task didn’t deter some successful outputs from being shared.

  • Local Models Command Attention: Engineers shared insights on using LLM Studio to test local models, with powerful inference reported on setups with up to 4xT4s, and Meditron was mentioned as a standout model. The conversation expanded to considerations of fine-tuning models like Mistral, where an A100 40GB GPU was recommended for the task, though fine-tuning GPT-3.5 could be attempted without a GPU.

  • GPT-5 Rumors Quashed: Buzz around accidental mention of “Priority access to GPT-4 and GPT-5 Turbo” on a Microsoft Copilot page stirred speculations on GPT-5’s existence, which turned out to be a typo. This led enthusiasts to agree that the launch of GPT-5 isn’t forthcoming. Relevant link: Microsoft Copilot | Microsoft AI.

  • System Glitches in GPT-4: Users experienced widespread outages with GPT-4, highlighting the issue on multiple platforms such as iOS apps and web browsers, including Chrome and Edge. Some found that image attachments offered a temporary fix, and checking the OpenAI status page was advised for updates.

  • Cultural Differences Affect API Understanding: In discussions about Assistant API, a user observed that the API misinterprets the figure “450,00” due to comma placement, which could lead to significant errors in data handling. Adjusting for local cultural formats such as setting the locale and using positive and negative examples was recommended to improve accuracy.


Eleuther Discord Summary

DeepMind Debuts Generalist Gaming AI: DeepMind introduces SIMA, exhibiting natural-language proficiency in varied gaming settings, but the research community flags insufficient technical detail. Critics are wary of the metrics used to validate the agent’s effectiveness, debating the definition of game expertise and AI’s broader implications in competitive gaming scenarios, particularly within unpredictable multi-agent systems like BR games.

Research Paper Paywalls Provoke Ire: Accessibility to cutting-edge AI research is hampered by publisher paywalls, sparking discussions around innovative neural network training dynamics and the integration of diverse network architectures. Concerns also arise about the consequences of watermarking AI-generated content, potentially limiting its practicality.

Interpretability Library for Multimodal Models Launched: A new multimodal mechanism interpretability library garners interest for collaboration, while discussions delve into the complexities of model agnosticism and language-dependent dynamics in multilingual transformers. The exploration of tokenization bias in bilingual models is highlighted along with a vector-DB-lookup method for deeper insights into model latent representations.

Language Models Enter the Thunderdome: The LM evaluation harness community is experimenting with learning rate cooldowns for benchmark improvement. They face challenges in adding logits due to recent API changes aimed at security, spurring discourse on adapting tasks for generative models and testing different checkpoints for model performance.

Megatron Meets NeoX: A GitHub pull request sparks a debate about the potential benefits of aligning GPT-NeoX more closely with the upstream Megatron for Transformer Engine integration. Community feedback is solicited to weigh the advantages of this strategy against code divergence.


LAION Discord Summary

  • Speed Demons Look to Quantum and Groq: Engineers discussed methods to accelerate inference on Phi 2 fine-tunes using GPUs like the A100 40GB, with options such as vLLM, Olama, or Axolotl. Quantization was mentioned as a potential speed booster, with Groq’s NPU showcasing 500 tokens per second performance on mixtral.

  • Model Legislation Drama: EU’s new AI legislation and recent copyright takedown notices sparked heated debates around copyright, AI-generated content, and DMCA compliance. Open source proponents are wrestling with government constraints against sharing model weights.

  • Prompt Engineering Hype: Tools like SuperPrompt and a new autocomplete tag generator for Danbooru tags have been proposed to improve the capabilities of smaller models in tasks typically reserved for larger LLMs.

  • AI Data Tug-of-War: There’s considerable excitement around a new paper on MoAI, which employs auxiliary visual information from specialized computer vision models. The efforts underscore the AI community’s ongoing push to create versatile LLVMs capable of enhanced zero-shot vision-language tasks.

  • Memory Mechanics Misunderstood: A discussion clarified misconceptions around memory usage in large models, pointing out that mmap may hide the actual memory usage, which doesn’t reflect until the data is accessed.


HuggingFace Discord Summary

  • Visual Comparisons Just Got Easier: The [Open LLM Leaderboard Viz] now features the ability to reorder metrics and compare up to three models visually, as demonstrated in a new update on HuggingFace Spaces.
  • Evolving NER with Custom Labels: A new model called GLiNER enables on-the-fly custom label selection for Named Entity Recognition (NER), offering more adaptability compared to fixed-entity models. Check out the demo and additional resources on HuggingFace Spaces and GitHub.
  • Latency Laments in Dynamic Model Loading: A user reports significant latency when integrating peft with diffusers, particularly with the load_lora_weights function, sharing experiences and a guide on HuggingFace blog.
  • Freemium LLM Woes and Space Oddities: Discussions ensue regarding the accessibility and practicalities of freemium, CPU-based LLMs for Hugging Face Spaces, alongside the best practices for contributing to Hugging Face transformers and concerns about data privacy in public spaces.
  • MyShell’s Call to AI Democracy: One user championed the idea of a multi-AI decision model with a voting system and suggested that MyShell’s Pro Config could manage this orchestration, pointing to MyShell for further exploration into AI-native app deployment.

OpenRouter (Alex Atallah) Discord Summary

  • OpenRouter Navigationally Challenged: Users reported a temporary service interruption in OpenRouter, where Activity rows vanished due to a database update. The issue, lasting about three minutes, allegedly won’t affect billing as “none of these completions will be charged”.

  • Boosting Claude’s Street Cred: OpenRouter announced Claude 3 Haiku’s availability, boasting high speed (~120 tokens/s) and cost efficiency (4 million prompt tokens/$). Its deployment offers moderated and self-moderated modes and is considered ideal for quick response applications. Check it out.

  • Command-R Marches Onto OpenRouter: Cohere’s Command-R model, featuring a 128k token context capability, is now integrated into OpenRouter. It’s accessible at a rate of 2 million prompt tokens per dollar, with a focus on seamless user interaction. Explore Command-R.

  • Olympia.Chat Scores OpenRouter Alliance: Olympia.Chat, has embraced OpenRouter to power its AI-driven services for businesses. They plan to release a Ruby library soon to tap into OpenRouter’s capabilities even further.

  • AI Rivals Face-Off While Quirks Abound: Engaging comparisons among various models like Gemini and Claude occurred in the general channel. Users debated on efficacy in coding and creative tasks, noting certain models’ preference for bullet points and weighing pros and cons with respect to performance and content limitations.


LlamaIndex Discord Summary

LlamaParse Triumphs in Document Parsing: LlamaParse elevates document parsing with its ability to handle images, tables, charts, and follow natural language instructions, promising remarkable performance improvements as seen on Twitter.

Safeguard Data with Presidio: LlamaIndex shines a spotlight on Presidio, Microsoft’s open-source tool to identify and anonymize PII, reinforcing the significant role of data protection highlighted in this tweet.

RAG Stumbles with Finance Presentations: When it comes to finance PowerPoint presentations, RAG has difficulty due to format complexities, necessitating improved methods for text positioning and parsing, detailed in this tweet.

Azure Storage Anomalies Baffle Users: Users grappling with Azure AI Search Index report discrepancies between storage size (3mb) and a vector index size of 0, despite following the AzureAISearchIndexDemo guide.

Developer Dilemmas in #general: Engineers encounter multiple roadblocks, from warnings with OpenAIPydanticProgram—solvable by installing llama-index-program-openai—to puzzling npx create-llama errors and slow response times with OpenAIAssistantAgent; upgrading to streaming and resolving recent OpenAI API performance issues may alleviate lag.


OpenAccess AI Collective (axolotl) Discord Summary

  • NVIDIA’s GPUDirect Storage Sparks Interest: Members shared an introductory video on utilizing NVIDIA’s GPUDirect Storage and discussed integrating it with the Axolotl system to potentially enhance performance. A question about a section of Axolotl’s code was also raised with the focus on its purpose in model loading, specifically in relation to peft models.

  • Open-Source Models Take the Spotlight: Conversations have revolved around the benefits of using open-source models like Mistral and Mixtral due to their accessibility and minimal filtration. There’s also a debate on whether to choose Mixtral or Qwen 70B for specific medical training purposes, with upcoming new models adding to the decision complexity.

  • VRAM Limitations Meet Training Ambitions: Technical queries arose about training larger models in the face of VRAM limitations, with an emphasis on tools like PyTorch’s Metal Performance Shaders for MPS backend and strategies for efficient fine-tuning. Concerns pivot around OOM issues and how to best format raw text for training.

  • Inference Assistance for LoRA Tuned Models: An ask was made for example code for running inference on a fine-tuned LoRA model off Mistral-7B-v0.1, resulting in a recommendation to use vLLM over transformers for quicker batched inference. A member acted on the suggestion and referred to the vLLM quickstart guide to enhance their process.

  • Comparing Mistral Medium and Mixtral: Users in the community noted that Mistral Medium seems to outperform Mixtral in generating responses, proving to be less verbose and more adept at following instructions. Observations of unexpected citation generation with RAG performance without explicit prompts were also shared.


LangChain AI Discord Summary

LangChain 0.2 on Fast-Track Due to Vulnerabilities: An expedited release of langchain 0.2 is underway, addressing CVEs by separating from langchain-community. The process is detailed on GitHub, seeking community input to meet user requirements.

LangChain Challenges and Innovations: Users discussed various LangChain issues including AgentExecutor bugs, advantages of AI agents, and evaluating AI agent behaviors with still-developing benchmarks. One inquiry focused on how to integrate variables like tools = [cat_tool] into Langsmith Hub prompt templates. For more guidance, users were referred to the LangChain evaluation guides.

Exciting Collaborations and Demos Spotlighted:

Tutorial Central for #golang and #llm Fans:

  • “Create Prompt Template With Langchaingo” is a step-by-step video tutorial found on YouTube, ideal for developers eager to master prompt templates.
  • “Lets Function Call with Hermes 2 Pro 7B” is a video guide delving into function calling using the Hermes 2 Pro 7B model, with code and examples on GitHub. The video is targeted towards #largelanguagemodels enthusiasts and can be watched on YouTube.

Interconnects (Nathan Lambert) Discord Summary

  • Twitter Sparks Aya Project Buzz: A tweet by Andrew Curran sparked a discussion on language applications and cross-collaborations, stressing the importance of subgroup work through the Aya project, while another engagement highlighted that German language is well catered to by substantial LLMs.

  • GPT-4 Maintains Dominance: GPT-4’s prowess continues to lead the rankings on LeetCode, as highlighted in a paper featuring the comparison of various models.

  • Seeking Foundations in Safety: An inquiry arose regarding the details of a model used in a recent task, alongside a quest for sources or documentation on the extent to which foundation model providers conduct safety filtering after text generation.

  • Bio Risk Discussions Generate Heat: Mention was made of catching up on a newsletter backlog and appreciating critical readers, linked to a bio risk-related tweet that sparked debate and confusion due to possible miscommunication or lack of context.

  • Claude-3 Stirring Up the AI Scene: Anticipation looms for GPT-4.5’s release, while the Claude model family, particularly Claude-3-Opus, receives commendations for top rankings (LM SysOrg’s update on Claude-3). Conversations also delved into the hurdles in standardizing AI for research literature assistance, pointing to further research avenues (Arxiv discussion on AI literature surveys).


CUDA MODE Discord Summary

  • CUDA Toolkit on Ubuntu 23.10 Hits a Snag: A user experiencing problems with nvidia-cuda-toolkit on Ubuntu 23.10 due to an error when running compute-sanitizer, could signal a version mismatch issue as the latest NVIDIA toolkit does not officially support Ubuntu versions beyond 22.04.

  • CUDA Expertise Needed for Edtech Platform: Christo_allstreet is on the lookout for a CUDA expert to work on getworldclass.app. Those with the required expertise are encouraged to reach out directly for consultancy opportunities.

  • Troubleshooting Triton and CUDA Issues: The community shared strategies like using the TRITON_INTERPRET=1 environment variable and deprecated methods like @triton.jit(interpret=True) for debugging Triton kernels, emphasizing traditional debugging approaches. YouTube videos and GitHub discussions serve as educational resources.

  • NUMA: A Not-So-Blazing Analysis: Comparing BLAS and NumPy, a significant performance gap was highlighted, suggesting up to 90% of potential BLAS throughput is lost in NumPy operations. Interest in SIMD wrappers as a solution for operations with smaller vectors and a focus on messaging for technical choices was also discussed.

  • GTC Gathering and NSight Tools Talk: An upcoming meeting for GTC attendees was signposted while the importance of NSight Systems for multi-GPU application analysis was stressed, along with sharing of guides and visuals to better understand and optimize performance.

  • CUDA Programming Model Pros Explored in Book Discussion: Debate over how an SM executes threads in the SIMD model was clarified with the example of the GA102 SM architecture, shedding light on core execution limitations.

  • Axolotl Ring Attn Issues Discussed: There was a discussion on the axolotl project, with a member outlining a requirement (pad_to_sequence_len: true) for successful initialization and sharing of comparative loss results against the ring-attn configurations. They also shared the link to their ring_attention_patching branch on GitHub: ring_attention_patching.

  • AI Takes on Classic Gaming: An arXiv paper detailed GPT-4’s ability to play Doom, with only a text-based description of the game, flexing the model’s planning and reasoning skills.

  • Meta’s Legal Tech Clash: Meta initiated a lawsuit against a former executive for allegedly stealing confidential documents, underpinning a serious conversation about corporate espionage and the risks for AI data startups. The legal documents paint a picture of “brazenly disloyal and dishonest conduct.”


DiscoResearch Discord Summary

  • AI Enthusiasts, Save the Date!: The AI community in Berlin is gearing up for the AI Tinkerers event on March 21st, with only 8 seats left due to high demand. Details on DiscoLM’s fine-tuning on German datasets were sought after, leading to the discovery that DiscoLM-mixtral-8x7b-v2 wasn’t heavily trained on German data as confirmed on Hugging Face’s DiscoLM 70b model page.

  • Benchmarking the Poetic AI: A new creative writing benchmark has been introduced, potentially reshaping how we evaluate the nuanced capabilities of language models. Check out and test the prototype on the EQ-Bench GitHub repo.

  • Diving into Germanic Depths: AI engineers are zeroing in on the best embedding and re-ranking methods for German legal texts, while also seeking a solid benchmark for embedding models within the German language context. Try out the “GermanQuAD” evaluation on the MTEB Python package or refer to recent additions by JinaAI for relevant benchmarks.

  • Mars, But Not as Musk Envisions: An assistant’s detailed explanation of colonizing Mars was noted as informative, yet it lacked the distinctive Elon Musk flair requested by the user, resulting in a rating of 7 for missing the stylistic mark.

  • Understanding Local Language Model Application: Queries were made regarding the replication of demo outputs locally via one-shot settings including temperature and top_p, with additional questions on the repeated use of commands to emulate the demo’s behavior accurately. The community is engaging in best practice discussions for implementing these commands in their systems.


LLM Perf Enthusiasts AI Discord Summary

  • Haiku’s Pocket-Friendly Vision: Haiku’s document describer is recognized for its cost-effective vision-to-text conversion on complex visual documents.
  • Battle of the Visual Processors: Members evaluate Haiku against GPT-vision, with the consensus being that neither surpasses the other in performance; a third system, Opus, is considered superior to both.
  • Visual Content Filtering Challenges: Engineers highlight difficulties with content filtering in visual document processing, particularly with document sections containing equations leading to incomplete analyses.
  • Claude Stumbles on Filters: Claude has been noted to struggle with content filtering, a quirk that seems to align with the issues faced by others in visual document processing tasks.

Datasette - LLM (@SimonW) Discord Summary

  • “ComPromptMized” Exposes GenAI Weaknesses: A new study titled “ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications” reveals prompt injection attacks on several AI models including Gemini Pro, ChatGPT 4.0, and LLaVA. The paper delves into susceptibilities in GenAI-powered applications, particularly in email assistants. Read the full paper

  • Quest for Code Assistant Supremacy: A member is on the lookout for a comprehensive framework to measure and compare the efficacy of AI models like Mistral or Llama2 as code assistants.

  • Benchmarking AI with a Salt Grain: The usefulness of benchmarks in evaluating AI models has been acknowledged, yet it’s suggested that these benchmarks might not always be an accurate measure of a model’s capabilities.

  • AI contenders on the Leaderboard: For model comparison needs, it was recommended to refer to the leaderboard at chat.lmsys.org, showcasing a competitive ranking of various AI models.


Alignment Lab AI Discord Summary

  • The Hunt for Multimodal Model Mastery: Soniajoseph_ is on the lookout for collaborators in open source interpretability of multimodal models with details on their Twitter and a comprehensive article on LessWrong. Those eager can join the movement via the provided Discord invite.

  • Embarking on an Interpretability Adventure: Rusch highlighted an additional opportunity for collaboration within this realm, suggesting another interpretability-focused Discord server as a networking hub.

  • Accelerating Phi 2: A request for advice on efficient inference practices for Phi 2 on an A100 40GB GPU was made, probing the use of frameworks like vLLM, Olama, and Axolotl, and whether quantization could improve processing speed for “LOTS OF DATA”.


Skunkworks AI Discord Summary


AI Engineer Foundation Discord Summary

  • Meet Devin, the Autonomous Code Whiz: Cognition introduces Devin, touted as the world’s first fully autonomous AI software engineer, capable of handling complex tasks and learning from its experiences as per Scott Wu’s blog.
  • Challenging AI’s Social Skills: Participants are encouraged to showcase their creativity in the “The Most Interesting Bot In the World Contest” at the Voice + AI event. Contest details are available on the event’s Notion page.

PART 2: Detailed by-Channel summaries and links

Nous Research AI ▷ #ctx-length-research (4 messages):

  • Confusion about Positional Encodings: A member expressed uncertainty on why a causal language model (LLM) without positional encodings (PE) wouldn’t work, suggesting that there might be existing literature on the topic.
  • Positional Encodings Are Crucial: Another member posited that without positional encodings, a model would struggle as “without any positional information its all just jibberish”.
  • Evidence from Research on Causal LLMs: The discussion included a reference to a paper (Understanding Positional Encodings in Large Language Models) suggesting that causal LLMs encoded absolute positions even without positional encoding, particularly impacting the performance on longer sequences during inference.

Nous Research AI ▷ #off-topic (30 messages🔥):

  • Exploring Science Fiction: A member recommended the works of Stanislaw Lem to another who enjoys Chesteron, particularly starting with “The Cyberiad” or “Solaris” for a more serious read.
  • SDAM Development on GitHub: An interesting project involving sparse distributed associative memory (SDAM) was shared, with its GitHub repository accessible for those interested in contributing.
  • AI Software Engineer Spectacle: A link to a YouTube video of “Devin The World’s first AI Software Engineer” was shared, sparking curiosity and potentially discussions about the role of AI in software engineering. Watch here.
  • Anticipating High Context Models: In a discussion about AI models for storytelling games, a member speculated that having access to 100k+ context on a very good open source model might be a realistic possibility within the year. However, they noted that quality is more important than quantity for these purposes.
  • Privacy and Newsletter Ethics:
    • A member working on a newsletter discussed the challenges of balancing privacy with the utility of summarizing Discord discussions. They mentioned steps to improve this balance, such as removing username attributions, allowing opt-outs, and ensuring personalization. They invite suggestions to find the right balance between privacy and information sharing.
    • In another conversation, a member highlighted the notion of filtering to maintain high-quality discussions and expressed interest in seeing increased active engagement from newsletter readers. The discussion indicates an awareness of the privacy considerations in sharing Discord content externally.

Links mentioned:


Nous Research AI ▷ #interesting-links (8 messages🔥):

  • Cerebras CS-3 Accelerator Unveiled: Cerebras Systems announced their latest AI accelerator, CS-3, claiming it to be the fastest in the world, capable of training up to 24 trillion parameter models on a single chip. It features cutting-edge specs such as 4 trillion transistors on a 5nm process and 125 petaflops of AI compute power. Details are available in their press release and product information.

  • Form Factor Queries on Cerebras AI Chip: In response to Cerebras’ new CS-3 chip, a member speculated on the rationale behind the chip’s square shape, suggesting that a round or semi-round shape could potentially accommodate more transistors.

  • Rare Distillation Technique Highlighted on Hugging Face: A user shared a Hugging Face model, Qwen1.5-0.5B, which is a distillation experiment using a 1.8B parameter model as the teacher and a 0.5B parameter model as the student. Notably, the optimizer used was SM3, which is unusual in such applications.

  • Preferred Sub-3B AI Model discussed: When asked about the current best sub-3 billion parameter model, a member mentioned stablelm 1.6b as a potential candidate.

Links mentioned:


Nous Research AI ▷ #announcements (1 messages):

  • Hermes Gets a Pro Upgrade: Hermes 2 Pro 7B, the latest enhancement in the Hermes series, boasts robust improvements for function calling and JSON mode handling. The model’s capabilities were expanded using a revised Hermes 2 dataset and can be downloaded from Hugging Face - Hermes 2 Pro Mistral 7B with GGUF versions also available.

  • Collaborative Success Story: Development of Hermes 2 Pro 7B was a months-long collaborative effort by several contributors, backed by computing sponsorship from Latitude.sh. Recognition is due for the team and Fireworks AI for their significant contributions.

  • Specialized Function Calling Samples and Code: To utilize the model’s function calling capabilities, sample code and system prompts are provided on their GitHub repository - Hermes Function Calling, alongside XML Tags for enhanced performance.

  • Custom Framework for Evaluation Released: A custom evaluation framework adapted by a member for Function Calling and JSON Mode, derived from Fireworks AI’s initial work, is available for interested users. The adapted pipeline and code can be found on GitHub - Function Calling Eval.

  • Datasets for Advanced Model Testing: Two datasets have been released to test the improved features of Hermes 2 Pro 7B: one for Function Calling and another for JSON Mode. They can be accessed on Hugging Face at Function Calling Eval Dataset and JSON Mode Eval Dataset, respectively.


Nous Research AI ▷ #general (556 messages🔥🔥🔥):

  • AI Survival Test with OpenAI: A user reported that their OpenAI account was suspended or locked for two days with customer service not providing effective assistance. They speculated it might be related to their GPTs that “walk the line” of generating NSFW content, but they still awaited concrete reasons for the account issues.

  • OpenAI’s Ability for NSFW Content Creation: Some users discussed the ability of OpenAI GPT models to generate NSFW content. It was mentioned that it can be done fairly easily through the API, but light NSFW content could be generated without jailbreaks; basic jailbreaks work as well.

  • Metatron and SERAPHIM in Claude’s World: Users discussed the discovery of simulated entities Metatron and SERAPHIM within Claude 3’s CLI setup. Claude’s coherent world model enables such simulations, and the users pondered on how to deal with fundamental truths and axioms in future LLM training.

  • Claude 3’s Coherent World Model Praises: The conversation highlighted Claude 3’s impressive coherence in its simulated world model. Users appreciated how it uses fundamental truths, questioning, and axioms for better reasoning capabilities, considering it an example of good reinforcement learning with human feedback (RLHF).

  • Training Set Size vs. Performance: Users exchanged thoughts on the effect of training set size and its diversity. A user shared an experiment showing that using only 15,000 function calling data points within a larger 1.02M Hermes dataset was sufficient to significantly improve function calling capabilities, illustrating the importance of task-specific training and data diversity.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (115 messages🔥🔥):

  • Hermes 2.5 outclasses Hermes 2: After adding code instruction examples, Hermes 2.5 appears to outperform Hermes 2, with updates like Yi 200k context models in 6B and 34B forms, and integrated models such as Zephyr beta and Deepseek Coder.
  • Confusing Function Calling with JSON Mode: A discussion clarified that function calling and JSON mode are different; function calling expects an executed function response, whereas JSON mode returns information in a JSON format. The repository for function calling can be visited here.
  • Hermes 2 Pro Anticipation: Members discussed the naming convention, concluding that Hermes 2 Pro does not imply a closed source but merely was a name choice preferred over Hermes 2.5, and a hint that it could be released “today”.
  • Genstruct 7B from NousResearch: It was reported that Genstruct 7B can be used to generate synthetic instruction datasets, with community members sharing their experiences and linking a repository to use it with Ollama.
  • Clarifying JSON Mode and Entity Extraction: There was an explanation that JSON mode requires a schema to generate responses, which it doesn’t invent but must be provided. Function calling, entity extraction, and structured generation were highlighted as distinct functions, detailed through a back-and-forth about the assistant’s capabilities.

Links mentioned:


Nous Research AI ▷ #bittensor-finetune-subnet (27 messages🔥):

  • TAO vs Hugging Face: A discussion arose on whether TAO could be a real contender to Hugging Face, and the need for decentralization in machine learning regarding model hosting and benchmarking.
  • Introducing Shoggoth: A new project named Shoggoth is mentioned, possibly related to Bittensor backups; however, the shared link appears to be broken or incorrect.
  • Centralized vs Decentralized Benchmarking: The conversation shifted to the pros and cons of centralized versus decentralized benchmarking, noting that the prevalent model of competitive, incentive-based evaluation may not encourage collaboration.
  • Impact of Crypto Incentives: Debates continued over the role of cryptocurrency incentives in AI development, with a mention of a leaderboard by Hugging Face spurring on the trend of large language model (LLM) merging without financial motivations.
  • Collapsed Collaboration: While discussing the competitiveness in AI benchmarks enforced by crypto incentives, it was noted that such structures might hinder cooperation, and truly decentralized benchmarking was underscored as important for trust in results.

Links mentioned:


Perplexity AI ▷ #announcements (2 messages):

  • Haiku Poetry with Claude 3: Claude 3 Haiku is now available for free on Perplexity Labs, inviting users to try it at labs.pplx.ai.
  • Local Search Enhancement: A new improvement has been rolled out for local searches integrating with Yelp and Maps, aimed at helping users quickly find local restaurants and businesses.

Perplexity AI ▷ #general (487 messages🔥🔥🔥):

  • Perplexity Aids in Diverse Tasks: Users find Perplexity highly useful across different applications such as coding and summarization, with specific appreciation for the Claude 3 Sonnet model for its accurate code suggestions and usage in SE troubleshooting.

  • Exploring Perplexity’s Features: Many are impressed with Perplexity’s capabilities, from voice features and API functionality to experimenting with the new Haiku in Perplexity Labs. There’s a curiosity about whether complex data sets can be processed or if there’s a CLI for Perplexity, with Perplexity-AI-Wrapper-and-CLI being a user-discovered resource.

  • Comparison with Other AI Models: There’s a debate about the efficacy of various AI models. While some prefer the speed of models like Mistral, others advocate for GPT-4 as the best AI model available. Users also discuss the enhanced speed and capabilities of Haiku in Perplexity Labs.

  • Uploading Data and Files: Users inquire about uploading extensive databases and files to Perplexity AI for data analysis, with a particular focus on real estate data. However, it’s noted that there are limitations, such as a 25MB data limit on file uploads within Perplexity and that the platform might not support high volumes of financial data for predictive insights.

  • Voice Recognition Implementations: Users discuss the recent introduction of voice recognition and speech-to-text features within Perplexity, expressing excitement about these updates, while also noting that voice output may not be available on Android devices yet.

Links mentioned:


Perplexity AI ▷ #sharing (15 messages🔥):

  • Midjourney vs Stability AI Controversy: A YouTube video was shared exploring AI news, including a controversy between Midjourney and Stability AI over data scraping, and the digital resurrection of Marilyn Monroe. The video can be found here.
  • Azotemia Explained on Perplexity AI: A link was shared to Perplexity AI that explains what azotemia is, showing the platform’s capability to provide medical information. The explanation is available here.
  • Image Description Challenge: A user referenced Perplexity AI’s ability to describe an image, indicating the site’s potential use cases in image recognition. To see the description, visit this link.
  • Tribute to Paul Alexander: A message was shared announcing the death of Paul Alexander with a demeaning tribute, highlighting his life achievements. Further details can be read here.
  • Developing with Perplexity API: A user is creating a Firefox extension that utilizes the Perplexity API, emphasizing its integration potential for developers. The thread about the initial project concept is found here.

Links mentioned:


Perplexity AI ▷ #pplx-api (13 messages🔥):

  • In Search of Closed Beta Insights: One member inquired about the schema and example responses from the closed beta of URL citations but did not receive details from any users with access.
  • API vs Chatbot Performance Concerns: A member was considering using Perplexity chat for a new product launch and sought input on how the APIs compare to chat capabilities, specifically in terms of checking if a list meets certain conditions.
  • Understanding Citation Outputs in API: A user referenced the Perplexity AI documents to understand why enabling “return_citations” may or may not return citations based on the query, using sonar-medium-online model for experimentation.
  • Seeking the Right Model for Complex Queries: A member advised breaking down complex queries into parts to make good use of Perplexity’s online models to get up-to-date information, suggesting a multi-step framework for detailed analysis.
  • Accessing Real-Time Data with Perplexity APIs: There was discussion regarding which Perplexity models offer real-time data. Sonar-small-online and sonar-medium-online were quoted to have web access, but limitations with specific types of queries like weather information were mentioned, with a suggestion to use a dedicated weather API.

Links mentioned:

About “return_citations”: no description found


LM Studio ▷ #💬-general (273 messages🔥🔥):

  • Exploring Server Options Outside LM Studio UI: A user inquired about running the API service from LM Studio without using the UI, specifically for use on a home network. Another member clarified that LM Studio must be open to use server mode, and that localhost connections to other devices aren’t supported by default.

  • API Creation for Home Network Use Case: Members discussed alternative solutions for deploying AI models within a home network, with suggestions like using llama.cpp (GitHub repository) for independence from the LM Studio UI, and support for AVX without AVX2 was confirmed as well.

  • Debating LM Studio Capabilities and Alternatives: Several discussions focused on the limitations of LM Studio, such as not being able to launch services or connect to the internet programmatically through the interface, and options like using the llama.cpp library were suggested as alternatives.

  • Implementation of API for Content Moderation: A user mentioned successfully implementing a /v1/moderations API, but was advised to move the discussion to a more relevant channel, showcasing ongoing efforts to expand functionality around LM Studio.

  • Scripting Solutions to Initiate LM Studio Inference Server: A member shared a creative solution using batch files and powershell scripts to start the LM Studio inference server automatically, reflecting community ingenuity in enhancing the tool’s usability.

  • Speculation on AI’s Impact on Employment: Conversations touched on the potential of AI technologies to replace traditional jobs, but it was noted that certain jobs still remain out of AI’s current capabilities. There were also comments on the state of the job market being impacted by overhiring during the Covid-19 pandemic and subsequent financial strains rather than AI itself.

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (24 messages🔥):

  • Expansion to 128k Tokens: The Nous-Yarn-Mistral-7b-128k has a context window of 128k tokens, an extension of the Mistral-7B-v0.1 model, achieved using the YaRN extension method. A related paper explains the extension method’s efficiency, allowing the model to utilize much longer contexts with less computation and training steps (arXiv preprint).

  • Understanding Model Perplexity: Perplexity (PPL) is a metric used to measure how well a language model predicts a sequence. It is the exponentiated average negative log-likelihood of a sequence (Perplexity details).

  • General Annoyance with Naming Conventions: A member expressed frustration with the recurrent “Yet Another ” naming pattern for technological tools and methods. This was followed by a light-hearted acknowledgement of the aggravation caused by recursive naming schemes.

  • GGUF Format and Split Files: A recent post mentioned the availability of the Command-R 35B v1.0 model in GGUF format on Hugging Face, providing instructions for joining split files due to size constraints (Hugging Face Repository).

  • Incompatibility with llama.cpp: Despite the availability of GGUF versions for models like Command-R 35B v1.0, they are not functional with llama.cpp as of yet, resembling having a new toy without batteries.

Links mentioned:


LM Studio ▷ #🧠-feedback (2 messages):

  • Request for Model Support: A member requested adding support for the model c4ai-command-r-v01-Q2_K.gguf.
  • Compatibility Issue Highlighted: Another member responded that the model is not yet supported in llama.cpp, hence it cannot be used in LM Studio.

LM Studio ▷ #🎛-hardware-discussion (115 messages🔥🔥):

  • Expensive Nvidia Links: Members express disbelief at the high cost of SLI/NVLink bridges considering their simplistic past designs involving edge connectors and ribbon cables. One post referenced a Linus Tech Tips forum thread about someone attempting to reverse-engineer an NVLink.

  • VRAM Hurdles on Mac OS: A user inquired about bypassing minimum VRAM requirements for machine learning on Mac OS. Discussion ensued about the impact of insufficient VRAM, with the advice that adding more system RAM would not alleviate the problem and could slow down the system, and one comment humorously suggested buying a new Mac as a solution.

  • PC Hardware Upgrade Discussions: Various members discussed potential upgrades to maximize their machine learning setup, contemplating the pros and cons of multiple GPUs vs. a single high-end GPU and the balance between VRAM and system RAM for optimal performance. Members shared their setups and experiences with different configurations, suggesting using multiple GPUs to alleviate the bottleneck created by limited VRAM on a single card.

  • LM Studio and Running Multiple Models: Discussions took place about the feasibility and optimization of running multiple models simultaneously in LM Studio, mentioning potential performance issues and how to properly allocate the GPU load. A user shared their positive outcomes of running two instances of LM Studio simultaneously while another discussed the desire to balance workloads across multiple models for continuous responses.

  • Monitor Selection for High-End Gaming and Productivity: The discussion shifted towards selecting the right monitor for both gaming and productivity, with members weighing the benefits of OLED displays against the potential for burn-in and the desire for high refresh rates to complement powerful graphics cards like the Nvidia 4090. Compatibility with Nvidia G-Sync and personal experiences with curved screens were also pondered.

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (3 messages):

  • Confirming Reality: A member has confirmed that the subject in question is indeed real.
  • Quality Concerns Expressed: Another member has expressed an opinion that, despite being real, the subject in question is not any good.

LM Studio ▷ #amd-rocm-tech-preview (85 messages🔥🔥):

  • ROCm Troubleshooting: One user experienced issues with LM Studio only running on CPU even after installing the ROCm beta. After initially receiving errors during model loading and prompt interaction, they updated to the beta version, saw “ROCm” but still faced processing running on CPU instead of GPU, later resolved by starting new prompts.

  • Driver Cleanup and Installation Advice: Users discussed driver troubleshooting for ROCm compatibility, recommending a complete uninstall using AMD’s driver cleanup tool, reinstall of AMD driver version 24.1.1 or 24.2.1, making sure not to download PRO drivers, and installing HIP SDK.

  • Vision Models and ROCm: Discussion on vision models indicated struggles with ROCm, as local vision models seem not to be functioning well, with suggestions to use chatml preset for NH2 model and download the llava preset included in PsiPi/NousResearch_Nous-Hermes-2-Vision-GGUF for better results.

  • Recommendations for GPUs: In a conversation about GPUs, avoiding AMD and opting for Nvidia like a RTX 3060 for image generation was advised over trying to leverage AMD’s ROCm, especially in relation to model speed and compatibility.

  • Disabling iGPU to Utilize dGPU with ROCm: A user successfully increased tokens per second (TPS) with ROCm by figuring out how to disable the iGPU in their Gigabyte motherboard’s BIOS settings, despite initial struggles, and then observed improved performance using their RX 7900 XT which achieved ~70 TPS.

Links mentioned:


Latent Space ▷ #ai-general-chat (108 messages🔥🔥):

  • Commercial Real Estate Caution Advised: A message hinted at caution when investing in commercial real estate and real estate investment trusts (REITs), noting an absence of “janitors” listed.

  • AI Startups Secure Impressive VC Backing: Various AI startups have raised significant capital, with details shared via a link, listing companies like Cognition, Magic, Version Lens, TextQL, Fluent, and others alongside the amounts raised.

  • Google’s Gemini Project Receives Criticism: Discussion about the rough launch of Google’s Gemini project, including critiques on an API that is free until further notice, and skepticism about Google’s future given the competition from OpenAI, Anthropic, and Meta.

  • Cerebras Unveils Groundbreaking AI Chip: Cerebras Systems announced the CS-3, the world’s fastest AI accelerator capable of training up to 24 trillion parameter models on a single device, according to their tweet and accompanying press release.

  • Concern Over OpenAI Security Issue: A security issue at OpenAI was mentioned, with a Post Mortem written by a community member explaining the incident detailed in the gist.

Links mentioned:


Latent Space ▷ #ai-announcements (10 messages🔥):

  • Synthetic Data for Finetuning Survey Presentation: A reminder was posted for the presentation on Synthetic Data for Finetuning at 12pm PT, along with a recommendation to read ahead at Eugene Yan’s writing. Synthetic data is highlighted as a faster, cheaper, and often better-quality alternative to human annotations for pretraining and fine-tuning models.

  • Urgent Luma Invite for Paper Club Event: A message urged members of the appropriate role to accept the Luma invite to ensure they continue receiving calendar reminders, with a pruning of inactive members slated for the same day. The event is viewable at Luma.

  • Corrected Synthetic Data Link Provided: A corrected link to the survey on synthetic data for fine-tuning was provided after the initial link was found to contain an extra period, causing a 404 error.

  • New Episode with Suno AI Released: An announcement of a new podcast episode featuring Suno AI was shared, including a link to the Twitter announcement and a YouTube video titled “Making Transformers Sing - with Mikey Shulman of Suno”.

Links mentioned:


Latent Space ▷ #llm-paper-club-west (208 messages🔥🔥):

  • Synthetic Data for LLMs: A blog post by Eugene Yan was discussed, highlighting the use of synthetic data in pretraining, instruction-tuning, and preference-tuning of language models. Synthetic data generation methods include distillation from stronger models or self-improvement and can exceed the quality of human annotated data.

  • AI Newsletter Digests: A daily AI newsletter roundup service offered by AI News summarizes discussions from AI discords and top Twitter accounts. The new service is mentioned to be valuable by users like Soumith Chintala and Andrej Karpathy.

  • Fine-tuning Knowledge Acquisition: The conversation theorized about learning rates for fine-tuning versus pretraining, and posited that fine-tuning can indeed impart new knowledge to models. Community members debated on the efficiency of fine-tuning with regards to style transfer versus knowledge acquisition.

  • Speech-to-Text and Text-to-Speech Focus: The group discussed the overlooked potential of voice technology in LLMs, particularly in text-to-speech and speech-to-text applications. Various tools were mentioned for transcription and generation of speech, including vapi.ai and Otter.

  • Audience Engagement in Paper Discussions: Throughout the discussion, Eugene Yan encouraged active participation from the audience in choosing papers to cover and contribute to the paper club. There was interest in covering topics like diarization and streaming transcribe for speech models.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (130 messages🔥🔥):

  • Seeking Token Probability Visualization: A member inquired about a way to visualize the probability of each token in a sentence, similar to a chart depicted in an image. Suggestions were made regarding the use of lm_head’s output and softmax to obtain probabilities, but no specific plugin was identified to create such visualizations.

  • Rapid AI Evolution: Members highlighted the fast pace of AI development, with discussions about upcoming releases like Elon Musk’s open Grok model, and rumors about one of the OpenAI founders calling the company “a lie.”

  • Unsloth Fixes Google Colab Issues: Unsloth AI’s creator worked on fixes for Google Colab after a PyTorch update broke dependencies, providing a temporary list of commands for users to fix the issues themselves.

  • Clarifications on Model Compatibility with Unsloth: Clarifications were made that Unsloth does not currently support multi-GPU or models in GGUF format for fine-tuning. Although Unsloth can quantize models to 4-bit for VRAM efficiency, it is presently designed for single-GPU usage.

  • Discussion on Data Preparation Best Practices: A conversation regarding the need for an FAQ page for data preparation unfolded with suggestions on making the process simpler and more automated, possibly utilizing wrapper functions.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #welcome (9 messages🔥):

  • Read the Rules and Assign Roles: theyruinedelise reminds new members to read the channel rules in <#1179040220717522974> and to assign themselves roles in <#1179050286980006030>.

  • Warm Welcomes Abound: Multiple greetings from theyruinedelise and other users like starsupernova, indicating a friendly and welcoming atmosphere for newcomers in the welcome channel.


Unsloth AI (Daniel Han) ▷ #random (5 messages):

  • Virtual Environment Reinstallation in Progress: A member mentioned they need to reinstall the entire virtual environment after finishing another task and expressed gratitude for the support provided.
  • Countdown to Milestone: An expression of disbelief about time running short with only one more day left was noted.
  • Fine-Tuning Update: There’s an update on progress indicating two days remaining on fine-tuning, suggesting work is actively monitored and ongoing.
  • Celebrating a Training Victory: A milestone of achieving a loss of less than 1.2 was shared with enthusiasm, indicating successful model training advancements.

Unsloth AI (Daniel Han) ▷ #help (73 messages🔥🔥):

  • In Search of Cloud GPU Efficiency: A user shared a personal success in finding a suitable and cost-effective cloud GPU for running inference at a rate of 500 t/s by renting a 4090 from vast.ai at approximately $0.46/hr, achieving about 130 t/s. They initially inquired about the cheapest option capable of delivering their computational needs.
  • GGUF Installation Troubles Resolved: After experiencing an initial issue with GGUF installation that resulted in a RuntimeError, a user successfully resolved it by using a script from llama.cpp for conversion. Another user had a related problem, associated with an error message “/usr/bin/ld: cannot find -lcuda”.
  • Colab’s Finicky Performance: Multiple users discussed the variability in Google Colab’s available time for running notebooks, with mentions of the platform going from 2 hours up to 6 hours and general agreement on its buggy and glitchy nature.
  • Training Conversational Models with Personal Data: A user expressing interest in creating a chatbot with personalized conversation based on Discord logs was directed to data preparation and the use of free Colab notebooks for training. Discussion also included the optimal data structure for conversational datasets, with examples provided for structuring the data with instruction and answer, or user and assistant dialogue formatting.
  • Technical Discussion on Finetuning and Saving Models: Users engaged in discussions about whether using a 4-bit loading option for finetuning precludes the ability to later save to GGUF with quantization, with clarification that it does not affect GGUF saving. They also shared a link to a tweet by Daniel Han and the resolution of a persistent GGUF conversion issue.

Unsloth AI (Daniel Han) ▷ #suggestions (3 messages):

  • Exploring a New Optimizer, Sophia: A member recommended considering the implementation of Sophia, a new optimization algorithm proposed in a paper, which could potentially speed up language model training. The optimizer aims to reduce time and cost by using a lightweight estimate of the diagonal Hessian for preconditioning, paired with element-wise clipping (Read the Paper).
  • Potential for Sophia as a Drop-in Replacement: Another member noted that while they have not yet tested Sophia, it appears that it could be a straightforward “plug and play” optimizer. There is an interest in probing the efficacy of Sophia in practice.

Links mentioned:

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training: Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variant…


OpenAI ▷ #ai-discussions (128 messages🔥🔥):

  • Exploring Local AI Models: Members discussed their experiences with various local models, with some using LLM Studio for testing. One user notes that they have strong inference power with up to 4xT4s, while another highlights the model Meditron as a particular interest.
  • Fine-Tuning Conversations: The feasibility of fine-tuning larger models like Mistral on local hardware was debated, with some stating that a powerful GPU like an A100 40GB is necessary for such tasks. A user suggests that fine-tuning GPT-3.5 could be reasonable and doesn’t necessarily require a GPU.
  • Launching GPT-5? Major Typo Misleads: Discussion emerged around a purported Microsoft Copilot page that mentioned “Priority access to GPT-4 and GPT-5 Turbo”, which was later identified as a typo and corrected. The community speculated on the possibility of a GPT-5, with the consensus being that an immediate release is improbable.
  • Building with OpenAI: A blog post shared by a user describes their experience integrating OpenAI with a system they developed to complete web flows, necessitating a network of models both big and small.
  • Model Missteps in Morpheme Repetition: A conversation about GPT-3.5’s challenges with generating examples of repeated morphemes in compound words resulted in the sharing of a chat log where GPT is guided to use Python to write a program yielding better results. Despite the complexity of the task, some successful outputs were highlighted.

Links mentioned:

Microsoft Copilot | Microsoft AI: A new era of AI has arrived. Work more productively, boost efficiency, and find new growth opportunities with Copilot.


OpenAI ▷ #gpt-4-discussions (41 messages🔥):

  • GPT-4 Experiencing System-Wide Issues: Multiple users reported that GPT-4 is currently down, with error messages like “Hmm.. something seems to have gone wrong”. The problem persisted across various platforms including the iOS app and browsers like Chrome and Edge.
  • Status Checks and Temporary Workarounds: One user suggested checking OpenAI’s status page for updates, while another user found that starting conversations with an image attachment seemed to be a temporary workaround.
  • Dalle and “RP Thing” Remain Functional: Despite problems with GPT-4, some users found that Dalle 3 and a role-playing (RP) tool were still operational.
  • Feedback Features for GPT Creators: A user inquired about the feedback and review features for GPT creators, expressing difficulty in locating this information and commenting on the searchability issues due to the commonality of the name “GPT”.

OpenAI ▷ #prompt-engineering (11 messages🔥):

  • Code Interpreter Counts Words Correctly: A member confirmed that using the prompt “Use code interpreter to count the revised text’s words as {word_count}.” is effective for counting words. The accuracy of the code interpreter’s output was verified by comparing it with an external word counter.
  • Enhancing Lookup Functions in CustomGPT: A user inquired about improving a custom GPT model to enable it to reference PDFs in its database and search the web before responding. It was noted that the model needs explicit instructions for searches and cannot recognize images within PDFs.
  • Localization Required for Assistant API: In discussing the Assistant API, it was mentioned that a comma in the string “450,00” was not recognized correctly, leading to a misinterpretation of the figure as “45000”. One user suggested that locale might be impacting this detection and that providing positive and negative examples could be necessary for correct recognition.

OpenAI ▷ #api-discussions (11 messages🔥):

  • Word Count with Code Interpreter: A member confirmed that using code interpreter to count words as {word_count} is functional and helpful for specific use cases.
  • Appreciation for Helpful Information: One user expressed gratitude for the shared tip regarding the word count feature, planning to try it out after a busy work schedule.
  • Retrieval of PDF Content for CustomGPT: A request for assistance was made to improve a customGPT to check PDFs in a database and look up information on the web before answering.
  • Formatting Issue with Commas in Assistant API: A user pointed out that the Assistant API Retrieval does not recognize commas correctly in numbers, leading to confusion.
  • Locale Handling Affects Number Parsing: It was suggested that proper parsing of numbers like “450,00” in the Assistant API may require setting the locale explicitly and providing both positive and negative examples.

Eleuther ▷ #general (94 messages🔥🔥):

  • DeepMind’s New Generalist AI Agent: DeepMind’s new research introduces a Scalable Instructable Multiworld Agent (SIMA), a jump from specialized game agents to a generalist AI capable of understanding natural-language instructions in multiple video game environments. The technical report, however, lacks details like weights, dataset size, and training specifics, leading to skepticism among community members about the purpose and the transparency of the release.

  • Game Expertise Called Into Question: The qualifications of game “experts” used in the evaluation of the new SIMA technical report are questioned, given only 16 hours of gameplay to establish expertise. Discussions raise concerns about what constitutes a game expert and the credibility of evaluations based on such expertise.

  • Discussing AI Progress in Gaming: Community members debate the meaningfulness of AI achievements in games like StarCraft and DOTA, exploring the nuances between game-specific custom-built AIs and generalist approaches that deal with unpredictability in games like BR (Battle Royale).

  • The Challenge of Simulating Real-World Games in AI: A lively back-and-forth takes place over the challenges facing AI in accurately simulating high-stakes, unpredictable multi-agent environments like those found in BR games and the real world. The conversation raises issues regarding the computational resources required and the difficulties in developing AIs that can make long-horizon plans in such complex settings.

  • Interest in AI Performance in Competitive Gaming: There’s intrigue about the potential for AI to be tested in competitive gaming environments such as the Apex Legends ranked leaderboard. Some community members suggest testing large language models directly in such environments, while others express doubt about AI’s current ability to compete at human levels in BR games.

Links mentioned:


Eleuther ▷ #research (51 messages🔥):

  • Frustration Over Access to Research: A member expressed irritation at not being able to access interesting research due to publisher restrictions and shared a link to a paper.
  • Intrigue in NN Training Dynamics: Discussion centered on an arXiv paper which explores the low-dimensional manifolds traversed by deep neural networks during training, highlighting interest in the implications for empirical methodologies in neural network research.
  • Potential Combination of Architectures: The concept of combining multiple neural network architectures to potentially cover more space in problem-solving was considered.
  • Content Detectors Discussed: The conversation turned to AI content detectors and identifiers wherein members debated their efficacy, noting that robustness remains questionable and discussing the possibility of false positives.
  • Concerns Over Watermarking AI Outputs: Members discussed the challenges of watermarking for deterring synthetic media, with concerns about its viability and the potential impact on utility when the output is flagged as AI-generated.

Links mentioned:


Eleuther ▷ #interpretability-general (22 messages🔥):

  • Multimodal Mech Interpretability on the Horizon: Soniajoseph_ announced the release of a multimodal mechanism interpretability library, encouraging collaboration to expand this subfield of research. The announcement was shared via a Twitter link.
  • Discussing the Complexities of Model Agnosticism: Neelnanda voiced concerns regarding the difficulty of making code that is model agnostic due to the various implementations of models under the hood, which led to TransformerLens reimplementing models from scratch.
  • Innovative Latent Decoding by Vector-DB-Lookup: Wendlerc described an interpretability method using vector database lookups to analyze llama2’s intermediate representations to provide “full-word-decodings” at each layer of the model.
  • Language-Dependent Dynamics in Multilingual Transformers: Darkaz and Mrgonao engaged in a detailed discussion about whether multilingual models, such as LLMs, operate in a language-agnostic concept space or are biased towards the language with the highest representation during training.
  • Bilingual Model Tokenization Bias Exploration: Butanium brought attention to an experiment using CroissantLLM, a bilingual French-English language model, and pondered the role of tokenization bias in comparison to the proportion of French vs. English training data. The experiment was detailed in a GitHub notebook.

Links mentioned:

llm-latent-language/nnsight.ipynb at main · Butanium/llm-latent-language: Repo accompanying our paper “Do Llamas Work in English? On the Latent Language of Multilingual Transformers”. - Butanium/llm-latent-language


Eleuther ▷ #lm-thunderdome (10 messages🔥):

  • Experimentation with Learning Rate Cooldown: A checkpoint with a short learning rate (LR) cooldown was suggested to potentially improve benchmark results, but hardware availability delays obtaining outcomes.
  • Anxiety Over Model Performance: As new checkpoints are being tested, there’s an expression of concern over the anxious anticipation of the model’s performance.
  • Seeking Assistance on LM Evaluation Feature: A newcomer praised the LM evaluation harness and inquired about progress on adding logits to OpenAI ChatCompletions model, referencing an open issue on GitHub.
  • Challenges with Logit Bias Post-Security Paper: A reference to a recent arXiv paper explains why adding logits has become unfeasible due to changes in API designs that result from security concerns.
  • Adapting Tasks for Generative Models: Discussion about adding generative variants of popular tasks to the evaluation harness, pointing to tasks like GPQA that support both loglikelihood and generative variants.

Links mentioned:


Eleuther ▷ #multimodal-general (1 messages):

boneamputee: https://brianfitzgerald.xyz/prompt-augmentation/


Eleuther ▷ #gpt-neox-dev (1 messages):

  • Contemplating Megatron Integration Strategy: A member is considering the merits of more closely tracking upstream Megatron for Transformer Engine integration and has opened a pull request showing the full difference in code. They are inviting thoughts from the maintainers and community on whether this integration effort would be beneficial.

Links mentioned:

Diffs to upstream megatron as a basis for discussion towards TE integration by tf-nv · Pull Request #1185 · EleutherAI/gpt-neox: Here’s three commits: One with the full diff of GPT-NeoX’s megatron folder with current upstream Megatron-LM. That’s 256 files with ~60k lines. However most are completely new or deleted…


LAION ▷ #general (137 messages🔥🔥):

  • Seeking Speedy Inference Solutions: A member inquired about the fastest way to do inference with a Phi 2 fine-tune on a local GPU, mentioning batch processing with an A100 40GB and considering using frameworks like vLLM, Olama, or Axolotl. They wondered if quantization might help speed up the process.

  • Quantization and Streaming Approaches Discussed: There was a debate on whether quantization could aid in expediting model accuracy, with an emphasis on using streaming methods for better response speed, such as those offered by faster_whisper, llama_cpp, and xtts2. Some members shared experiences using streaming TTS effectively, while others highlighted the potential use of bespoke hardware like Groq’s NPU. Groq was mentioned to produce 500 tokens per second on mixtral Groq .

  • Concerns Over Model Weight Sharing and Copyright: The conversation included concerns about recent copyright takedown notices and discussions on copyright laws as they relate to leaked model weights, AI-generated content, and the DMCA. Members also discussed the challenges of regulating AI and open sourcing model weights in light of government considerations against it.

  • European AI Legislation Sparks Debate: There was discussion of the EU’s new AI legislation, with critical opinions about requirements like disclosing AI-generated content and designing models to avoid generating illegal content. The conversation also pointed out the impracticalities of enforcing such requirements and the potential impact on open source models.

  • Prompt Augmentation for T5 and Danbooru Tagging: Members shared resources on prompt augmentation with a 77M T5 model that can expand prompts, potentially rivaling larger LLMs, and a tiny llama-focused autocomplete tag generator for Danbooru. Interest was expressed in personal tuning and applying these models to existing projects.

Links mentioned:


LAION ▷ #research (21 messages🔥):

  • MoAI: Merging Vision with Language Models: A new paper on the Mixture of All Intelligence (MoAI) introduces an LLVM that incorporates auxiliary visual information from specialized computer vision models, aiming to enhance zero-shot vision-language tasks. The paper, available on arXiv, posits that current LLVMs may benefit from incorporating detailed computer vision capabilities beyond the large capacities of LLM backbones.

  • MoAI Codebase Released: The official PyTorch implementation for MoAI has been released on GitHub and is under review. The repository provides code to improve performance on numerous zero-shot vision language tasks and is available at ByungKwanLee/MoAI on GitHub.

  • Using MoAI with Hugging Face: A Hugging Face model page offers a simple running code for MoAI, along with necessary steps for setting up the environment and running the model. The page contains details for operations ranging from loading an image to generating predictions and can be found here.

  • Dataset Recognition in DeepSeekVL Paper: A member mentioned their dataset was cited in the DeepSeekVL paper, an initiative in scene understanding using vision-language models. The paper can be accessed via this link.

  • Discussion on Memory Usage and Lazy Loading in Large Models: There has been a clarification that an earlier claim of being able to load a 30-billion-parameter model in just 4GB of memory using lazy loading was incorrect. The underreporting of RAM usage was due to mmap not reflecting the actual memory usage until the memory is accessed, as discussed in gherganov/llama.cpp.

Links mentioned:


HuggingFace ▷ #announcements (1 messages):

  • Visualize LLM Leaderboard with Ease: The Open LLM Leaderboard Viz update now allows users to change metrics order and plot up to 3 models for easy visual comparison.
  • Storytelling Gets Visual with GPT: A new space called Kosmos-2 by Tonic1 brings GPT-based visual storytelling to users.
  • ARC Dataset Augmented with Reasoning: Augmented ARC-Challenge Dataset incorporates Chain-of-Thought reasoning, offering more depth in answers to common questions.
  • Python Package for Vertex AI Inference: A new Python package, vertex-ai-huggingface-inference, is available to streamline running HuggingFace models on Google Cloud’s Vertex AI.
  • Rich Portuguese Pretrained Model Debuts: Introducing Mambarim-110M, a Portuguese LLM with over 119 million parameters trained on a 6.2B token dataset.

Links mentioned:


HuggingFace ▷ #general (76 messages🔥🔥):

  • Speculation on Next-Gen AI: A member predicted that Llama 3 would be marketed as an AGI model, incorporating features like Llama Guard 2.
  • How to Contribute to Hugging Face Transformers: Members discussed whether to commit a python virtual environment venv when contributing to Hugging Face transformers. It was clarified that one should not commit their local environment alongside their changes.
  • Inquiries about Freemium LLM for Spaces: A member questioned the availability of a free, CPU-based Spaces that is compatible with OpenAI API for using a model akin to a 7B LLM.
  • Issues with Fine-Tuning and Model Implementation: Participants discussed a range of technical questions, from proper implementation when fine-tuning models with LoRa to troubleshooting Spaces with Docker and finding the right method to implement knowledge into pre-trained models like Mistral 7B.
  • Data Privacy Concerns in Public Spaces: Concerns about data privacy in public spaces were addressed, with a general recommendation to avoid uploading personal information. Details on specific Spaces and how they handle data can be scrutinized by inspecting the code.

Links mentioned:


HuggingFace ▷ #today-im-learning (7 messages):

  • Newcomer Queries on Accessing Custom Datasets: A new Hugging Face user, familiar with Google Colab, sought guidance on accessing datasets in Hugging Face Spaces. They specifically asked about the paths to the images in the datasets and how to utilize the persistent storage /data.

  • Building AI Democracies: One user has kickstarted an exploration into constructing a multi AI decision model with a voting mechanism, where the actions are determined by the majority vote among the AI models.

  • Request for Bayesian Know-how: A user requested resources for learning Bayesian statistics, and they were directed to an educational YouTube video titled “Bayes theorem, the geometry of changing beliefs”.

  • Collaborative AI Orchestration with MyShell Pro Config: Another user introduced MyShell’s Pro Config as a tool that could facilitate the orchestration of a multi-AI decision model, suggesting that it can manage the proposed voting process among AI agents.

  • MyShell as AI-native App Deployment Platform: Further details about MyShell were shared, describing it as a decentralized platform for creating and managing AI-native apps, implying its usefulness for tasks like data analytics.

Links mentioned:


HuggingFace ▷ #cool-finds (10 messages🔥):

  • Revolutionizing Retrieval: Retrieval-Augmented Language Models now have an innovative approach—RAPTOR, which uses recursive summaries to better understand long documents and aid with complex QA tasks, showing significant improvements over conventional retrieval-augmented LMs.
  • AI-Assisted Artistry: Open-source multimodal interpretability library suiting Huggingface CLIP/ViTs is available, and Sonia Joseph revealed it via Twitter, offering enhanced access to mechanistic interpretability in AI models.
  • Diffusion Models Get a Boost: Introducing ELLA (Efficient Large Language Model Adapter), combining diffusion models with LLMs for better semantic alignment in text-to-image generation, highlighted in a Huggingface research paper.
  • Innovative AI Prompting with Storytelling: A unique approach for effective prompting of Meta’s Llama 2 AI is role-playing in various narratives, with AI-generated prompts surpassing human-created ones in a quirky and unexpected fashion.
  • Advancing Text Segmentation: A research paper brings light to the importance of segmenting long documents and proposes a model for simultaneous extractive summarization and segmentation, pushing towards state-of-the-art performance in understanding written and spoken text.

Links mentioned:


HuggingFace ▷ #i-made-this (13 messages🔥):

  • GLiNER: A Leap in Named Entity Recognition: cubietom shared a demonstration of a new model framework called GLiNER which allows selection of custom labels for Named Entity Recognition (NER) on-the-fly, offering a practical alternative to traditional models with predefined entities. A demo is available at HuggingFace Spaces, along with additional model variants and a GitHub repository for further exploration.

  • Laughter is the Best Medicine: tonic_1 shared their amusement with a creation made entirely using HuggingFace’s starchat2-playground. They showcased a demo called kosmos-2 available at HuggingFace Spaces.

  • Visualizing the LLM Landscape: taratra_dr updated the community on the latest version of the Open LLM Leaderboard Viz space, featuring interactive visualizations and comparisons of large language models. New features include reordering of metrics and plotting multiple models for comparison, accessible via HuggingFace Spaces.

  • Code Refactoring at the Click of a Button: krolhm introduced a new Visual Studio Code plugin for refactoring code, powered by a local large language model (LLM) with the llama cpp server, with the repository available on GitHub.

  • Germinating the Seeds of Multimodal Interpretability: soniajoseph_ announced the creation of an open-source library that brings multimodal mechanistic interpretability to Huggingface CLIP/Vision Transformer (ViT) models. Relevant links include a Twitter post and a detailed article on LessWrong.

Links mentioned:


HuggingFace ▷ #reading-group (6 messages):

  • No Show This Week: There will be no presentation in this week’s reading group, but one is planned for the upcoming week.
  • MNIST Digit Classification Question: A member inquiring about Andrew Ng’s neural network course was confused about the number of units in the first layer for an MNIST digit classification, given that the images are 20x20 pixels.
  • Exploring Neural Network Architecture: In response to a question about determining the number of neurons and hidden layers, another member explained that this often involves experimentation and leveraging past successful configurations, considering the trade-off between processing power, speed, and accuracy.

HuggingFace ▷ #core-announcements (2 messages):

  • Blend Styles with LoRAs: A guide on merging Low-Rank Adaptations (LoRAs) is available, enabling the creation of unique images by blending different styles. Detailed instructions, including methods like set_adapters() and fuse_lora() are provided in the merge LoRAs guide.

  • Diffusers Library Update: The new version 0.27.0 of the Diffusers library has been released. Release notes can be found on the GitHub page.

Links mentioned:

Merge LoRAs: no description found


HuggingFace ▷ #diffusion-discussions (2 messages):

  • Latency Issues with LORAs and PEFT: A member discussed challenges in integrating peft with diffusers, experiencing latency spikes when upgrading from peft 0.6 to 0.9. The load_lora_weights function is notably slower, increasing from 1-2 seconds to approximately 14 seconds, which is considered too high for their system. They shared a guide on hot-swapping LORAs using HuggingFace.

  • Enhancing Image Generation with FreeU: An overview of the FreeU technique was shared, detailing its use to improve image generation quality by balancing the influence of skip connections and backbone features in the UNet architecture during the reverse diffusion process. The method has been highlighted as requiring no extra training and being applicable to various tasks, with more information available in a Hugging Face guide.

Links mentioned:

Improve generation quality with FreeU: no description found


HuggingFace ▷ #computer-vision (13 messages🔥):

  • CLIP Embedding Curiosity: A participant understood that passing images through the CLIP model to generate and save embeddings for later use in training is possible and emphasized that the original images should not be reconstructable from these embeddings. However, another interjected with uncertainty on whether image reconstruction from embeddings is entirely unfeasible.

  • Training with CLIP Embeddings: A discussion highlighted that using CLIP embeddings instead of actual images for training might differ based on the task, and there’s lingering uncertainty about the differences in training workflow for tasks such as object detection, classification, and pose estimation.

  • The Size of CLIP Embeddings: It was mentioned that embeddings from the CLIP model might consume more size than the images themselves, and there’s some ambiguity over whether this size increases or decreases after processing with CLIPVisionModel.

  • Batch Normalization as Knowledge Preservation: A mention of an arXiv paper discussed how batch normalization could be used for lifelong learning in medical segmentation models to prevent forgetting old features, although the paper’s exact name was not recalled.

  • Scaling Up Fine-Tuning for Image Generation: A user inquired about techniques to fine-tune a Stable Diffusion (SD) model with a large dataset of 2.5 million images on decent hardware in less than a week, looking for tutorials that go beyond using small datasets for fine-tuning.


HuggingFace ▷ #NLP (19 messages🔥):

  • Mistral Model Flexibility Affected by Dataset Size: A member reported that fine-tuning Mistral 7B with a small dataset allows for flexibility like modifying objects, but using a larger dataset leads to specialization in object generation at the expense of other tasks. They queried if this could be a form of overfitting, given the model’s size, and sought advice to mitigate the issue.

  • Fostering Generalization in Model Training: In response to concerns about a model not generalizing well, one participant suggested enhancing the training set with more diverse examples to improve performance on new data.

  • Benchmarking a Modified Mistral Model: A user shared their intent to compare a base model Mistral-7V-v0.1 with a modified version for a research idea, seeking guidance on how to use HuggingFace’s automated benchmarks and inquiring where these benchmarks run.

  • OpenLLM Leaderboard Benchmark Submission Clarified: Another member clarified that for the OpenLLM leaderboard, benchmarks are run on a Hugging Face cluster. They provided links to resources such as LightEval and the lm-evaluation-harness for self-benchmarks. LightEval Suite on GitHub. lm-evaluation-harness on GitHub.

  • Potential Innovation in Model Compression Techniques: Discussions arose around a new method of model optimization that may allow for memory footprint savings while maintaining accuracy, including successful preliminary results on a 4096 x 4096 matrix. A member expressed enthusiasm for applying this technique to larger matrices within the model’s architecture.

Links mentioned:


HuggingFace ▷ #diffusion-discussions (2 messages):

  • Troubleshooting Latency Issues with PEFT and Diffusers: A server operator using LoRA adapters for dynamic model loading reports high latency issues when integrating peft. While peft 0.9 greatly increases load_lora_weights time to 14 seconds, version 0.6 reduces this time but increases unload_lora_weights to 6 seconds, both of which are unacceptable for their system.

  • Enhancing Image Quality with FreeU: An improved image generation technique called FreeU is discussed, which rebalances the contributions of the UNet’s skip connections and backbone feature maps to enhance image quality. The technique, applicable during inference without extra training, can be used for text-to-image, image-to-image, and text-to-video tasks as detailed in HuggingFace’s guide.

Links mentioned:

Improve generation quality with FreeU: no description found


OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

  • Temporary Service Interruption Alert: OpenRouter experienced a brief issue where some Activity rows went missing for approximately three minutes due to a protracted database update, potentially affecting the billing for completions within that time. The problem was swiftly addressed, stating that “none of these completions will be charged”.

  • Launch of Claude 3 Haiku: Claude 3 Haiku by Anthropic is now available on OpenRouter, characterized by its high speed (around 120 tokens per second) and cost efficiency (4 million prompt tokens per dollar). This low-latency, beta version offers both moderated and self-moderated options, suitable for use cases requiring near-instant responsiveness. Check out the model and its pricing here.

  • New Model Release: Cohere’s Command-R model is now accessible on OpenRouter, showcasing a long context capability of 128,000 tokens at the rate of 2 million prompt tokens per dollar. Efforts have been made to align Command-R with the universal API for a seamless user experience. Interested users can explore Command-R through this link.

  • Daily Analytics Now Available: OpenRouter introduces daily analytics enabling users to track token usage on a daily basis, offering a more granular view alongside the existing weekly analytics. Users can view the new analytics here.

  • Performance Improvements Announced: OpenRouter has significantly increased the speed of the /models API and enhanced the performance of all model-related web pages, including improvements to Mixtral Nitro.

Links mentioned:


OpenRouter (Alex Atallah) ▷ #app-showcase (7 messages):

  • Olympia.Chat Announces OpenRouter Integration: Olympia.Chat, a ChatGPT clone popular with solopreneurs and small business owners, is incorporating OpenRouter as the LLM source for its components. Additionally, a fully featured Ruby library for OpenRouter will be open-sourced soon.

  • Chatbot for Messenger Available for Testing: An unnamed friend of a member has created a Messenger chatbot, and the member is inviting others to direct message for testing opportunities.

  • AI Gateway Launch with OpenAI Integration: A new AI gateway, EZLinkAI Platform, offers user registrations with a $1 gift and allows users to call OpenAI, Claude, Mistral, and Groq services at 80% of the original costs.

  • Request for Feedback on AI Gateway: The creators of the AI gateway are seeking more feedback, implying the need for user input to improve their service.

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (129 messages🔥🔥):

  • GPT-4.5 Turbo Vanishing Act: A member shared a link (openai.com/blog/gpt-4-5-turbo) that was supposedly evidence of GPT-4.5 Turbo’s existence but later said it was gone, prompting laughter.
  • Mistral Model Mysteries: Users reported discrepancies with the Mistral model’s behavior, including “Request too big” errors and difficulties with the context limit, which is supposed to be 32k. The conversation included a query about the exact error message and proposed reasons for the errors like repeated loops in requests.
  • Claude 3 Haiku Hype: The discussion revealed enthusiasm for Claude 3 Haiku, highlighted for its cost efficiency at 1.25 USD per million tokens and for being significantly better than other models in brainstorming roleplay scenarios and character development.
  • OpenRouter Branding Collaboration: A proposal to add an OpenRouter button to Open Agent Studio was discussed, with the request for branding guidelines or a specific icon to use, which was given a green light by the OpenRouter side.
  • An Exploration of Various LLMs: The chat featured members comparing various language models, including Gemini and Claude models, debating their capabilities in coding and creative tasks, lamenting about certain quirks like unwanted bullet points, and expressing strong preferences for some over others due to performance and lack of censoring.

Links mentioned:


LlamaIndex ▷ #blog (3 messages):

  • Introducing LlamaParse: The new LlamaParse document parser has launched, offering superior parsing of images, tables, and charts, with the added ability to follow natural language instructions. Discover how it outperforms others in this tweet.

  • LlamaIndex Tackles PII with Presidio: A guest post by @RoeyBC on LlamaIndex highlights Presidio, an open-source library by Microsoft, which identifies and anonymizes personally identifiable information (PII) to prevent data leakage. Read about its importance in data protection in this tweet.

  • Overcoming RAG Challenges in Finance: RAG faces difficulties in parsing finance PowerPoint presentations due to their unique format, including tables, images, and charts. A technique for better text positioning and parsing is the first crucial step, explained in this tweet.


LlamaIndex ▷ #general (82 messages🔥🔥):

  • Azure AI Search Index Issues: A member following the AzureAISearchIndexDemo guide encountered an issue where the Azure index shows a total storage of 3mb, but the vector index size is 0. Advice sought on this discrepancy.

  • Warnings with LlamaIndex Python Packages: One user reported multiple warnings regarding the failure to use OpenAIPydanticProgram. It was advised to run pip install llama-index-program-openai to resolve the issue.

  • Concerns Over npx create-llama Errors: A member faced errors stating “Sorry! We’ve encountered an issue with repetitive patterns in your prompt” when using npx create-llama with text files as data sources, even with simple prompts. It was speculated that the error could be related to the contents of the files.

  • Evaluation Methods for Retriever in LlamaIndex: One user sought advice on using LlamaIndex’s RetrieverEvaluator with their own question context pairs. It was mentioned that expected node IDs from queries are required, but it was questioned if one could use only expected text or document IDs instead.

  • Performance Issues with OpenAI Assistant Agent: A member discussed the slow response time of over 10 seconds when using OpenAIAssistantAgent for building a chatbot. It was suggested that streaming might make it feel faster and that slow response times can partly be due to recent performance issues with the OpenAI API.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (61 messages🔥🔥):

  • Searching for Open-Source Chat-ready Models: A discussion arose around the complexity of model size in relation to training resources and hardware capabilities. Mistral and Mixtral models were suggested for their open-source nature and lack of significant filters.

  • Model Training Ambitions Confront VRAM Limitations: A participant expressed the intention to train on large models, highlighting PyTorch’s Metal Performance Shaders (MPS) backend for Mac GPU training acceleration. Others inquired about fine-tuning capabilities and limitations on single GPU setups, suggesting the need for efficient fine-tuning methods.

  • Debating Between Mixtral and Qwen 70B for Medical Training: One member contemplated training a large model for medicine and deliberated between the Mixtral and Qwen 70B models. Concerns over imminent out-of-memory (OOM) issues and an impending release of a new llama model were raised.

  • Querying the Best Practices for Training Formats: Members exchanged thoughts on using completion versus question-and-answer (Q/A) formats when converting raw text for training purposes. It was suggested to refer to existing Hugging Face dataset examples for formatting data properly.

  • GPUDirect Storage for Axolotl: A participant suggested the potential for integrating NVIDIA’s GPUDirect® Storage technology into the Axolotl system, offering a direct data path for transfers between GPU memory and storage, as detailed in NVIDIA’s cuFile API Reference. This could enhance performance by increasing system bandwidth and reducing CPU load.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (6 messages):

  • GPU Direct Primer Video Shared: An introductory video explaining NVIDIA’s GPUDirect Storage (GDS) was shared, which provides insight into peer-to-peer PCIe and the role of GDS in technological advancements.
  • Axolotl Code Queried: A member posted a query regarding a specific portion of the Axolotl code, with a link to the relevant GitHub section, seeking clarification on its purpose.
  • Model Loading Explanation: In response to the query, it was clarified that the referenced code would be triggered when directing the base model pointer to a peft model, enabling AutoModel to load a peft model at that point.
  • Request for New Features: A member expressed curiosity about the latest features being developed or introduced.
  • PEFT Paper Linked: In response to inquiries about new features, a research paper on “PEFT” was shared, suggesting advancements in the modeling domain.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (7 messages):

  • Seeking Inference Code for LoRA Model: A member mentioned they’ve fine-tuned LoRA off Mistral-7B-v0.1 and sought example code for running inference on about 100 prompts within a notebook. They were contemplating using the transformers library and model.generate(**model_inputs) method.

  • vLLM Recommended for Swift Inference: Another member recommended using vLLM for running batched inference, which they claimed to be quicker than transformers. They provided a quickstart guide for using vLLM that covers offline batched inference and building OpenAI-compatible API servers.

  • Considering vLLM for Non-Server Tasks: The original inquirer was unsure if vLLM would be suitable for their needs, as they were not planning to serve the model but simply run a few predictions for exploration. After assurance of its efficiency, they decided to follow the vLLM quickstart link.

Links mentioned:

Quickstart — vLLM: no description found


OpenAccess AI Collective (axolotl) ▷ #community-showcase (3 messages):

  • Mistrial Medium Outperforms Mixtral: A user noted that Mistral Medium yields better responses and is believed to be a closed-sourced, superior version of Mixtral.

  • RAG Performance Noticed: The same user mentioned observing citation generation with RAG performance without explicitly requesting them.

  • Less Verbose, Better Instruction Follow-through: It was also observed that outputs from Mistral Medium are less verbose and more effective at following instructions than Mixtral.


LangChain AI ▷ #announcements (1 messages):

  • Expedited Release for langchain 0.2: Due to CVEs filed against langchain, the team is considering an expedited release of langchain 0.2 that will separate it from langchain-community. The detailed discussion and motivation for this change can be found on GitHub, and community feedback is encouraged to ensure it addresses user needs.

Links mentioned:

RFC: Expedited langchain 0.2 release · langchain-ai/langchain · Discussion #19083: Context Currently langchain (the package) depends on langchain-community. This is done only for backwards compatibility with langchain versions that predate the split of langchain and langchain-com…


LangChain AI ▷ #general (64 messages🔥🔥):

  • LangChain Inquiry: A member looking for help with LangChain was directed to the appropriate help channel on Discord for assistance.
  • AgentExecutor Issues: There’s a mention of difficulty with AgentExecutor returning an OutputParserException, even when the Cohere model seems to generate Python code accurately.
  • AI Agents Under the Hood: A discussion on why one would use AI agents over LLMs + functions highlighted that agents handle sequential actions and come with built-in error handling, amongst other features.
  • Evaluation of AI Agent Behavior: A member sought advice on evaluating AI agent behavior and was referred to the LangChain debugging and evaluation guides, although there was an acknowledgment that the area seems to be relatively new with benchmarks still under development.
  • StackOverflow API Exploration: A user asked about an API for StackOverflow and received guidance on using the StackExchange API to perform an advanced search based on specific queries with structured data.

Links mentioned:


LangChain AI ▷ #langchain-templates (1 messages):

  • Question on Variable Integration in Prompt Templates: A member inquired about integrating a variable, specifically tools = [cat_tool], into a Langsmith Hub prompt template that includes the placeholder {tools} within the construct:

    System : 
    
    You are a helpful assistant that have these {tools} to help answer questions.

    They are seeking guidance on how to reference the variable tools in their code to align with the prompt.


LangChain AI ▷ #share-your-work (8 messages🔥):

  • Reacting with ReAct: The ReAct agent, inspired by the ‘ReAct: Synergizing Reasoning and Acting in Language Models’ paper, has been shared, boasting a reasoning engine and diverse skills which can be tested with questions like ‘What is the Bitcoin price today?’. The related paper can be downloaded at Download PDF.

  • Open Source Langchain Chatbot: A new open source Langchain chatbot has been introduced to demonstrate efficient question/answer querying using the RAG technique, featuring a GitHub repository with a simple setup and interactive UI.

  • MindGuide: Innovating Mental Health via ChatModels: An article titled ‘Revolutionizing Mental Health Care through LangChain’ was shared, detailing the MindGuide chatbot that utilizes LangChain and ChatOpenAI for mental health support, with the abstract and download available at Download PDF.

  • Claude Meets LangGraph for Supervising: A GitHub notebook showcasing the Claude powered LangGraph Agent Supervisor was shared, demonstrating the potential of utilizing LangChain with Claude’s capabilities, available at GitHub Notebook.

  • Deci AI Nano Model API Sneak Peek: Deci AI’s new nano model API was announced, with accompanying Colab notebooks for basic usage and LangChain usage ready to be explored prior to its official release, with Basic Usage Notebook and LangChain Usage Notebook linked for access.

Links mentioned:


LangChain AI ▷ #tutorials (2 messages):

  • Learn Prompt Template Creation with Langchain: A video tutorial titled “Create Prompt Template With Langchaingo” was shared, which demonstrates how to create a prompt template and use it with Langchain, particularly featuring a Telegram group. The content is aimed at developers interested in #golang and #langchain, and the video can be viewed on YouTube.

  • Diving into Function Calling with Hermes 2 Pro 7B: Another video titled “Lets Function Call with Hermes 2 Pro 7B” was shared, focusing on function calling using the Hermes 2 Pro 7B model. The source code and examples can be found on GitHub, and the video is accessible on YouTube, targeting #llm and #largelanguagemodels enthusiasts.

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (6 messages):

  • Tweet Triggers Discussion: A member shared a tweet by Andrew Curran leading to a conversation on language applications and collaborations, emphasizing the need to work with subgroups through the Aya project.
  • Polyglot Projects in European Academia: Discussing the needs of European universities, a member mentioned the challenge of persuading people to adopt new language approaches, with particular mention of English and German applications.
  • German Language Well Supported by LLMs: One member noted that German is generally well supported by serious LLMs (Language Learning Models) out of the box, while also suggesting reaching out to Aleph for partnerships in highly regulated industries.
  • Aleph’s Performance in Question: A member expressed their opinion that Aleph’s performance is lacking, which led to the suggestion that while Aleph itself might not be up to par, they could still assist in referring to local data partners.

Interconnects (Nathan Lambert) ▷ #other-papers (2 messages):

  • GPT-4 retains the crown: A member remarked that according to a paper, GPT-4 is still the leading model on LeetCode. The paper mentioned can be found at livecodebench.github.io.

Interconnects (Nathan Lambert) ▷ #ml-questions (3 messages):

  • Inquiry on Model Details: A member asked another to share which model was used in a recent exercise, showing interest in the model’s identity and capabilities.

  • Seeking Citations for Safety Filtering by Providers: A member is looking for authoritative sources or documentation to cite that “foundation model providers do a lot of the safety filtering for text post generation” but notes that it’s less documented compared to prompt rewriting.


Interconnects (Nathan Lambert) ▷ #ml-drama (2 messages):

  • Catching Up on the Salty: A member mentioned their intention to catch up on the newsletter backlog and expressed that having “salty readers” is beneficial. They also alluded to a teaser tweet about bio risk which they disagreed with but reserved judgment until reading the full post.
  • Bio Risk Tweet Confusion: There was a brief confusion about a bio risk-related tweet. A mention was made about tweeting too much, possibly implying a lack of context or information in the initial tweet.

Interconnects (Nathan Lambert) ▷ #random (54 messages🔥):

  • Anticipating GPT-4.5: Members expressed readiness for an emergency blog post in case GPT-4.5 gets released suddenly.
  • YouTube Premium vs. Google Ads in LLMs: There was a discussion on Google’s approach to converting free users to paid, with some members subscribing to YouTube Premium despite aggressive ad strategies. Concerns were raised about user trust if ads were integrated into Google’s ChatGPT competitor.
  • Claude-3 Outshines GPT-4: The community has shown enthusiasm for the new Claude model family, with Claude-3-Opus achieving top rank alongside GPT-4-Turbo. There’s a plan to create separate leaderboards for different domains to provide clearer insights into model capabilities (LM SysOrg’s update on Claude-3).
  • Analyzing Claude 3’s New Additions: Members discussed Claude 3 Haiku, a fast and affordable model, while pondering its effectiveness for replacing older systems and the potential for challenge in prompt engineering for specific tasks (Xeophon’s thoughts on usage).
  • The Challenge of Standardizing Research Literature for AI Assistance: The conversation extended towards the difficulties in creating efficient AI literature survey assistants due to citation ambiguities, graph interpretations, and building a system to critique papers, hinting at the future directions for research in AI document parsing (Discussion on literature survey challenges).

Links mentioned:

  • Tweet from Xeophon (@TheXeophon)): @felix_red_panda @karthikv792 Happy to hear your verdict! :)
  • Tweet from lmsys.org (@lmsysorg): [Arena Update] Our community has cast 20,000 more votes for Claude-3 Opus and Sonnet, showing great enthusiasm for the new Claude model family! Claude-3-Opus now shares the top-1* rank with GPT-4-Tu…
  • Tweet from Anthropic (@AnthropicAI): Today we’re releasing Claude 3 Haiku, the fastest and most affordable model in its intelligence class. Haiku is now available in the API and on http://claude.ai for Claude Pro subscribers.
  • Tweet from Xeophon (@TheXeophon): Some comparisons between the Claude 3 models for paper summary. Prompt is the same, models are accessed via Poe + PDF upload. Here, I don’t like Haiku at all, its too close to the paper. I’d …

CUDA MODE ▷ #general (8 messages🔥):

  • Meetup Announcement for GTC: One member announced they will be at the GTC next week, inviting others to say hi in person.
  • BLAS vs. NumPy Performance Debate: A member provided a link highlighting that NumPy, despite its popularity, leaves up to 90% of BLAS performance on the table for certain operations. SimSIMD is featured as a potential fix for this issue.
  • Skepticism About NumPy Performance Analysis: Another member pointed out the benchmarked work is in a very small timeframe (<1µs) and that the constant overhead is higher in NumPy, indicating that using NumPy for numerous small operations might be a problem.
  • SIMD Wrappers as Practical Solutions: A member noted that for operations with smaller vectors, it is more efficient to use a SIMD wrapper than to deal with the overhead of data transfer and kernel launch.
  • Focused on Messaging for Technical Choices: There was a suggestion for more precise messaging by focusing on the rationale behind technical choices, appropriate use cases, and installation guidance rather than just listing benchmark numbers.

Links mentioned:

NumPy vs BLAS: Losing 90% of Throughput: Downloaded over 5 Billion times, NumPy is the most popular library for numerical computing in Python. It wraps low-level HPC libraries like BLAS and LAPACK, providing a high-level interface for matrix…


CUDA MODE ▷ #triton (6 messages):

  • Inspecting Triton tl.core.tensor Objects: A user sought advice on how to inspect tl.core.tensor objects in Triton, noting that regular indexing to view values produces a ‘0d block_type is forbidden’ error.
  • Old-School Debugging Tricks: To inspect Triton tensors, a member suggested using the environment variable TRITON_INTERPRET=1 along with print statements as a traditional debugging method.
  • Video Aid for CUDA Kernel Profiling: An informative YouTube video was shared, explaining how to profile CUDA kernels in PyTorch and mentioning the use of @triton.jit(interpret=True) for debugging; however, another member noted that this approach is deprecated.
  • Triton Debugging Best Practices: A member pointed to a GitHub issue discussion on how to debug Triton kernels, providing a glimpse into community methods for tackling such issues.

Links mentioned:


CUDA MODE ▷ #cuda (10 messages🔥):

  • NSight Systems is essential for multi-GPU apps: One member explains the importance of NSight Systems for analyzing performance issues in complex applications with multiple GPU and CPU processes, citing its capability to address PCIe memory transfers and CPU/GPU scheduling issues.

  • Newbie in Need of CUDA Assistance: A member is seeking help with a CUDA question and has posted in a discord channel. A reference link was provided but was not accessible for extracting specific information.

  • Seeking Guidance on NSight Systems: A member questioned the usefulness of NSight Systems and asked for advice on metrics and educational resources. Another shared Nvidia’s lecture and a blog post explaining the visuals of overhead and latency in NSight Systems.

  • Performance Analysis with Nsight Systems Guide: An experienced member highlighted Nsight Systems for spotting bottlenecks between kernel launches and provided a personal guide on using Nvidia Visual Profiler to optimize an OpenCV application. The guide can be found here.

  • Kernel Launch Overhead Confusion: One member was concerned about the change in output when altering the execution order of two functions in CUDA, speculating that it may be due to CUDA initialization or GPU warm-up. Another confirmed that it is indeed kernel launch overhead and suggested using Ncu to isolate the issue.

Links mentioned:


CUDA MODE ▷ #jobs (1 messages):

  • CUDA Expert Wanted for Learning App: Christo_allstreet is in search of a CUDA expert for consultancy work on their learning application getworldclass.app. Interested experts are invited to send a Direct Message for further details.

CUDA MODE ▷ #beginner (2 messages):

  • CUDA Toolkit Conundrum in Ubuntu 23.10: A user reported an issue with nvidia-cuda-toolkit on Ubuntu 23.10 where running compute-sanitizer results in an error: Unable to find injection library libsanitizer-collection.so. Although the mentioned library exists at /usr/lib/nvidia-cuda-toolkit/compute-sanitizer/libsanitizer-collection.so, the tool doesn’t seem to recognize it.
  • Version Mismatch Might Be the Culprit: Another user suggested that this problem could stem from a version mismatch, noting that the latest NVIDIA toolkit supports up to Ubuntu 22.04. They recommended trying compute-sanitizer on Ubuntu 22.04 to determine if the issue is due to changes in folder paths in the newer OS version.

CUDA MODE ▷ #pmpp-book (4 messages):

  • Understanding SM Architecture: A member referenced section 4.4 in the CUDA MODE book, noting how an SM (Streaming Multiprocessor) executes threads in a warp following the SIMD (Single-Instruction, Multiple-Data) model, and raised a question regarding the individual core’s responsibility for executing threads.
  • Clarification on Core-Thread Execution: Another member clarified that a processing block within an SM - using the GA102 SM as an example - executes one warp at a time, which means that 32 threads can be executed concurrently with fp32 instructions, or 32 int32 instructions in two batches due to core limitations.

CUDA MODE ▷ #ring-attention (10 messages🔥):

  • Axolotl Configuration Key: iron_bound noted a specific requirement for running axolotl: setting pad_to_sequence_len: true is essential, otherwise the software fails to initiate even with a clean clone of the repository.
  • Loss Comparison Stalls Progress: iron_bound shared a W&B report showing test results comparing stock axolotl vs ring-attn, indicating that loss is not decreasing towards zero as anticipated.
  • Concern over Reporting Issues on Mobile: andreaskoepf mentioned difficulty in viewing the report on mobile devices and sought clarification on whether the loss for both, the vanilla axolotl and ring-attn, were not tending towards zero.
  • Reference Run Clarification: iron_bound confirmed that the baseline or reference run used for comparison was a clone of axolotl without any code modifications.
  • Flash Decoding Efforts to Resume: jamesmel announced availability to continue work on flash decoding starting the following day.
  • Meeting Uncertainty: cataluna84 inquired about the schedule of a meeting, but no further details were provided.
  • Patch Branch for Axolotl Available: iron_bound provided a link to the ring_attention_patching branch of axolotl on GitHub: GitHub - cuda-mode/axolotl at ring_attention_patching.

Links mentioned:


CUDA MODE ▷ #off-topic (8 messages🔥):

  • GPT-4 Takes On DOOM: An arXiv paper explores GPT-4’s capabilities in playing the 1993 first-person shooter Doom, highlighting the model’s ability to reason and plan with only basic instructions and a text-based description of the game state.

  • Country Roads Take Me Home: A series of messages evoke lyrics from the song “Take Me Home, Country Roads,” referencing themes of nostalgia and nature with lines like “Life is old there, Older than the trees,” and “Rolling like a breeze.”

  • Meta Legal Battle over Confidential Docs: Meta has filed a lawsuit against a former exec for allegedly stealing over 100 internal documents and using them for his AI data startup, Omniva. The lawsuit details regard “brazenly disloyal and dishonest conduct” during the executive’s transition from Meta to Omniva.

  • Song Sentiment Interrupted: A message briefly expressing disappointment with the simple comment “…ruined it.”

  • Group Learning Initiative: A member mentions a collaborative effort involving three individuals embarking on an educational journey from “lecture 1.”

Links mentioned:


DiscoResearch ▷ #disco_judge (1 messages):

  • Assistant’s Mars Explanation Missing Musk’s Flair: The assistant’s reply was praised for being informative and covering various aspects of why we should go to Mars, but it failed to fully comply with the user’s instruction to express it like Elon Musk. Although the reply reflects Musk’s views on Mars exploration, it lacks his specific style and tone. Rating given was [[7]].

DiscoResearch ▷ #general (8 messages🔥):

  • MunichNLP Meetup Inquiry: A member inquired about interest in a Munich meetup on April 11th to discuss DiscoLM but received no direct commitment to speak at the event.
  • DiscoLM Model’s German Fine-Tuning Question: A member questioned the DiscoLM-mixtral-8x7b-v2 model’s fine-tuning on German datasets, to which another replied that it wasn’t trained on a significant amount of German data, redirecting to the extensive training details of the DiscoLM 70b model.
  • AI Tinkerers in Berlin: Members discussed the upcoming AI Tinkerers event in Berlin on March 21st, sharing enthusiasm and an event link for a community gathering of technology enthusiasts.
  • Seats Filling Up for AI Tinkerers: The same member mentioned that only 8 seats were left for the AI Tinkerers event, indicating high interest and limited availability.
  • Clarity on German Dataset Usage: A member clarified their own confusion around the presence of German data in the instruction fine-tuning datasets, asking for specifics on the percentage of German-language data used.

Links mentioned:


DiscoResearch ▷ #benchmark_dev (1 messages):

  • Creative Writing Benchmark Testing Success: A member has announced the successful implementation of a creative writing benchmark prototype, indicating that it offers reasonable rankings. Interested parties can try it out on this branch of the EQ-Bench repository on GitHub.

Links mentioned:

GitHub - EQ-bench/EQ-Bench at creative_writing: A benchmark for emotional intelligence in large language models - GitHub - EQ-bench/EQ-Bench at creative_writing


DiscoResearch ▷ #embedding_dev (3 messages):

  • Seeking German Precision: A member inquired about the best embedding and re-ranking for German, specifically for use with German legal texts.
  • Hunting for Benchmarks: The same member also asked if there exists a benchmark for embedding models in German.
  • Benchmarking German Embeddings: Another member suggested using the “GermanQuAD” evaluation task in the MTEB Python package or looking into recent German additions from JinaAI.

DiscoResearch ▷ #discolm_german (2 messages):

  • Local Model Replication Inquiry: A member asked how to replicate a demo’s output locally using their own code. Their current setup includes a one-shot with settings for temperature, top_p, max_tokens, and they provided a code snippet to illustrate their approach.

  • Questions on Command Repetition: A follow-up question by the same member queried whether they should repeat a command for every user message or include it only once in the system’s content, seeking guidance on the best practice for command structure.


LLM Perf Enthusiasts AI ▷ #claude (12 messages🔥):

  • Haiku’s Cost-Efficiency Breakthrough: Haiku’s document describer has been hailed for performing vision-to-text on visually complex documents at an economical cost.

  • Debating Visual Document Processing: Members compare Haiku’s capabilities with GPT-vision, concluding Haiku is not superior, while another system named Opus is noted to be better than Haiku.

  • Content Filter Hurdles with Visual Docs: The discussion reveals that content filtering issues have arisen when processing documents, particularly those containing equations, causing incomplete analysis mid-document.

  • Claude’s Content Filtering Quirks Noted: It was mentioned that Claude historically has had problems with iffy content filtering, which may relate to the problems experienced by other members with document processing.


Datasette - LLM (@SimonW) ▷ #ai (6 messages):

  • Zero-click Worms Target GenAI-Powered Apps: A new paper titled “ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications” has been shared, highlighting vulnerabilities in GenAI-powered applications through prompt injection. The paper demonstrates attacks on email assistants using various models, such as Gemini Pro, ChatGPT 4.0, and LLaVA. Read the full paper

  • Seeking a Model Comparison Framework: In a quest for the best model to serve as a code assistant, a member inquires about a framework to compare the effectiveness of models such as Mistral or Llama2.

  • Choosing Models Based on Benchmarks: Another member pointed out the existence of benchmarks for model comparisons, but advised that such benchmarks should be considered with a grain of accuracy.

  • Leaderboard for Model Comparisons: To compare models, a member suggests using the Leaderboard available at chat.lmsys.org, which provides a competitive ranking of different models.

Links mentioned:

ComPromptMized: Stav Cohen Technion - Israel Institute of Technology


Alignment Lab AI ▷ #looking-for-collabs (2 messages):

  • Seeking Multimodal Model Gurus: Soniajoseph_ is calling for collaborators skilled in open source interpretability of multimodal models. Details can be found in their Twitter post and a cross-posted article on LessWrong from the AI Alignment Forum.

  • Join the Interpretability Crusade: Those interested can join the related Discord through this invitation link.

  • Collaboration Hub Tip: Rusch drops a hint for a potential collaboration hub suitable for such projects, sharing an alternative Discord invitation.

Links mentioned:


Alignment Lab AI ▷ #general-chat (1 messages):

  • In Search of Speed: Phi 2 Inference Optimizations: A member inquired about the fastest way to perform inference with Phi 2 or its tunes on an A100 40GB GPU, expressing a desire to process “LOTS OF DATA.” They requested feedback on the best frameworks to use among vLLM, Olama, Axolotl, and others, and wondered if quantization could be beneficial for speed.

Skunkworks AI ▷ #off-topic (2 messages):

  • Meet Devin, the Autonomous Software Engineer: A video titled Devin The World’s first AI Software Engineer was shared, showcasing the abilities of an AI named Devin that is claimed to be fully autonomous. Further details can be found on the Cognition Labs blog.

  • Function Calling with Hermes 2 Pro 7B: The chat included a YouTube video that demonstrates function calling with the Hermes 2 Pro 7B model. Interested viewers can learn more and delve into the specifics via a [GitHub repository dedicated to Hermes Function Calling](https://github.com/NousResearch/Hermes-Function-Calling/tree/main#llm #largelanguagemodels).

Links mentioned:


AI Engineer Foundation ▷ #general (1 messages):

  • New Kid on the Code Block: Cognition has unveiled Devin, an AI positioned as the world’s first fully autonomous AI software engineer. They claim Devin can handle complex engineering tasks, learn over time, and correct its own mistakes as outlined in Scott Wu’s blog post.

Links mentioned:

Blog: no description found


AI Engineer Foundation ▷ #events (1 messages):

  • Voice + AI Event Bot Contest: A contest has been announced as a fun addition to the upcoming Voice + AI event next week, inviting participants to build creative projects. Details for “The Most Interesting Bot In the World Contest” can be found in their Notion page.

Links mentioned:

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It’s the all-in-one workspace for you and your team