> AI News for 3/5/2024-3/6/2024. We checked [**356** Twitters](https://twitter.com/i/lists/1585430245762441216) and **22** Discords (**353** channels, and **6774** messages) for you. Estimated reading time saved (at 200wpm): **689 minutes**.

No big news or releases today. Perplexity is rumored to be the latest AI unicorn, Yi Tay’s post on the hard parts of training LLMs outside Google got picked up on Twitter and HN, and we released Soumith’s episode on the Latent Space pod.


Table of Contents

[TOC]


PART X: AI Twitter Recap

only one Claude Opus run today as we are currently retooling our pipelines for more functionality and we didnt get it viable in time for today. Sorry!

Anthropic Claude 3 Release

AI Progress and Limitations

  • Current language models are still limited in out-of-distribution reasoning despite impressive performance. Models that can reason, run experiments, and seek truth more akin to scientists are needed for superhuman insights.
  • There are concerns that focusing on model scaling distracts from core issues in robustness and reliability. Careful testing and understanding limitations remains crucial.
  • Ideogram 1.0 shows progress in instruction-following for image generation compared to Midjourney and DALL-E.

Enterprise AI Adoption

  • Cohere is partnering with Accenture to bring their enterprise search capabilities to Accenture’s clients, aiming to drive productivity gains.
  • Mistral AI and Snowflake are collaborating to make Mistral’s LLMs available through Snowflake, enabling enterprises to build AI apps within the security of Snowflake’s platform.
  • Deepspeed innovations are coming to Together AI Research to accelerate cloud infrastructure for generative AI.

Open Source Datasets and Models

Memes and Humor

In summary, the AI community is abuzz with the release of powerful new models like Anthropic’s Claude 3, while also grappling with the limitations and robustness challenges of current approaches. Enterprises are rapidly adopting AI technologies through partnerships with leading AI and cloud vendors. Meanwhile, open source datasets and models continue to grow and democratize access to cutting-edge AI. Throughout it all, humor and memes provide levity and social commentary on the fast-moving AI landscape.


PART 0: Summary of Summaries of Summaries

Operator notes:Ā Prompt we use for Claude, andĀ our summarizer GPTĀ used for ChatGPT. What is shown is subjective best of 3 runs each.

Claude 3 Sonnet (14B?)

  1. Exploring AI Model Capabilities and Comparisons:

    • Claude 3 is generating excitement for its reported superior performance across various cognitive tasks, surpassing GPT-4 according to some users. Discussions revolve around its capabilities in coding, function calling, and self-moderation in group chats, as showcased in a Twitter story.
    • Opus, a model variant, is praised for its coding prowess, particularly in function calling. It achieved an impressive 800 score on the SAT Reading section, sparking conversations about avoiding memorization in large models.
    • Skepticism arises regarding the reliability of published benchmarks in capturing the full potential of newer models like GPT-4.
  2. Advancements in Multimodal and Retrieval-Augmented Models:

    • The release of Stable Diffusion 3 and its fusion of diffusion and transformer models is discussed, highlighting progress in multimodal approaches.
    • An arXiv paper suggests retrieval-augmented language models could be a promising alternative to parametric LMs, though research in this area is still developing.
    • The introduction of InfiMM-HD by @_akhaliq claims significant advancements in high-resolution multimodal understanding, potentially outperforming CogVLM and leveraging Vicuna 13B. (Tweet)
  3. Techniques for Efficient Model Serving and Inference:

    • A Fireworks AI blog post discusses FireAttention, a quantization method for serving open-source models up to 4x faster than vLLM with minimal trade-offs.
    • The Aphrodite Engine by PygmalionAI is humorously attributed to the ā€œWaifu-Driven Performance Theory,ā€ showcasing community-driven research efforts for performance gains.
    • Discussions explore speculative decoding on GPUs to improve performance when memory is the bottleneck, and the inefficiency of generic masking in compute, leading to a PyTorch pull request for sliding window attention bias.
  4. Advancements in Hardware and Quantization:

    • Details emerge about the NVIDIA H100 GPU, with its L2 cache boasting a 5.5 TB/s read bandwidth, and speculations that its overall bandwidth could match the impressive 40TB/s L1 bandwidth of the RTX 4090.
    • The bitsandbytes package is recommended for k-bit quantization in PyTorch, enabling low-precision linear algebra operations on GPUs with potential 5700x speedup in int8 versus bf16 matrix multiplication.

Claude 3 Opus (8x220B?)

  • Mistral Finetuning Challenges and Successes: Users like @hammer_mt struggled with Mistral finetuning on mlx, facing issues converting lora_fused_model to fp16.gguf as detailed in a GitHub issue. @mrdragonfox advised that MoE tuning is fundamentally difficult and recommended a fine-tuning tutorial for Mistral 7b, favoring full model over LoRA. Discussions also covered dataset sizes for chatting capabilities and style transfer.

  • Claude 3 Sparks Excitement and Debate: Claude 3 Opus garnered praise for its performance and abilities compared to GPT-4, with a focus on coding tasks. However, its claims of consciousness and fear of death sparked a debate on AI sentience, with a video shared to counter these as genuine signs. Claude 3’s self-moderation in group chats, as showcased in an OpenRouterAI Twitter story, also drew attention.

  • Exploring Positional Embeddings and New Techniques: The Eleuther community discussed the efficiency of T5 simplified positional embeddings versus sinusoidal methods and ALiBi. A new paper on Resonance RoPE for improving long sequence performance in LLMs was highlighted. Separately, the potential of retrieval-augmented language models as an alternative to parametric LMs was explored, referencing an arxiv paper.

  • Hugging Face Updates and Community Contributions: Starcoder2 and The Stack v2 were released by @BigCodeProject for coding assistance (Twitter announcement). The Major TOM Core earth observation dataset was open-sourced in collaboration with the European Space Agency (Hugging Face dataset). GPU instances for Spaces were optimized with A100 and H100s support. The community also contributed walkthroughs, courses, and cookbooks for working with šŸ¤— tools and building AI applications, as shared on Twitter and the Hugging Face Learning Platform.

ChatGPT (GPT4T)

  • Claude 3's Enhanced Capabilities and Market Position: Discussions across platforms illuminate Claude 3's remarkable coding prowess and medical knowledge depth, with it achieving a perfect SAT Reading score and comparisons favoring it over GPT-4 in aspects of intelligence and personality. Its introduction to Pro users, notably with a daily query limit on Claude 3 Opus before transitioning to Claude 3 Sonnet, underlines Perplexity AI's strategic positioning against competitors. Notably, a partnership offering one year of free Perplexity Pro membership with Nothing's Phone (2a) purchase exemplifies marketing ingenuity (Nothing Perplexity).

  • Mistral Community's Technical and Commercial Scrutiny: The Mistral community critically evaluates the platform's open model commitment and pricing structure, comparing unfavorably with OpenAI's GPT-4 Turbo due to Mistral Large models' 20% higher cost. Technical discussions revolve around optimal token lengths for Mistral models, finetuning challenges, and hardware requirements, notably the correction that an RTX 4090, not the 3090, provides 24 GB VRAM, essential for modeling considerations. The community also explores tools like Augmentoolkit for dataset conversion and finetuning strategies, with resources cited including a finetuning guide and an issue on GitHub detailing a finetuning challenge.

  • Advancements and Discussions in AI Hardware and Quantization: The CUDA Mode community is actively engaged in discussions on NVIDIA's hardware capabilities, such as the RTX 4090's impressive L1 bandwidth of 40TB/s and the H100's 5.5 TB/s read bandwidth. They are exploring quantization techniques for enhancing PyTorch performance, with the bitsandbytes package being highlighted for its potential to significantly speed up matrix multiplication. These technical exchanges underscore the continual search for optimizations and efficiency improvements in AI modeling and hardware utilization.

  • Hugging Face's Continuous Innovation and Community Engagement: Hugging Face remains at the forefront of AI development with the introduction of Starcoder2 and The Stack v2, improvements in GPU support for Spaces, and the unveiling of Major TOM Core in collaboration with the European Space Agency. Community engagement is evident through discussions on Zephyr 7B Gemma's capabilities, the anticipation around the Yi-9B model, and advancements in neural TTS systems. The platform's initiative to enhance learning and development through AI Cookbook and courses underscores its commitment to fostering a knowledgeable and skilled AI community.


PART 1: High level Discord summaries

TheBloke Discord Summary

  • Smart AI for Smarter Homes: @v.jerryyyy is exploring the development of a smarthome system with an AI voice assistant and inquired about integrating AI with JavaScript versus Python. The community suggested model quantization like 4bpw EXL2 for running unquantized Mistral on a 3070 Ti laptop.

  • OpenAI’s Closed Door Policies?: Concerns were raised by @mikeygm regarding OpenAI’s founding principles, particularly regarding openness, after reading a blog on the Musk lawsuit. This spurned discussions on corporate marketing strategies and transparency.

  • Google’s Gaffe Gets a Gloss Over: @theyruinedelise and @coffeevampir3 discussed fixes for Google’s Gemma model by Unsloth AI, highlighting the many bugs addressed and spawning speculative talks about Google’s commitment to model troubleshooting.

  • Voice Activation and Interfaces Unpacked: Users delved into different UI interfaces like Oobabooga, ExUI, and LM Studio for local AI model use; meanwhile, the setup of voice-activated AI systems with omnidirectional microphones for improved performance was also a topic of interest.

  • Model Behavior Unveils Character Secrets: @mr.dogbert sought advice on configuring an LLM to mimic a cartoon character using character cards, with the community contributing strategies and recommendations on using GUI tools like oobabooga tgw for prompt construction.

  • Model Legalities and Economics Explored: @reinman_ and @mrdragonfox shared experiences and concerns about hosting the miquliz model and its legal implications, alongside queries about budget-friendly hosting for large model APIs.

  • System Prompts and Mistral Mechanics Mapped: Confusion about system prompts in the context of character cards was clarified by discussing different prompt assemblies across various models and offering guidance to new LLM users on grasping model internals through plotting with GU tools.

  • Pursue Professionalism in AI Interviews: @_jaycie engaged the community for advice on interviewing for AI roles, with @dirtytigerx advising to tailor preparation for specific roles like ā€œLLM Engineerā€ or ā€œML Engineer.ā€ Misconceptions about MBSE, which stands for model-based systems engineering, were clarified, suggesting in-depth study for roles demanding professional experience.


Mistral Discord Summary

Augmentoolkit Gains Traction: Engineers discussed a tool called Augmentoolkit, which enables datasets to be converted for instruct-tuning, vital for those considering switching from factual corpus data to multiturn interactions.

Mistral Model Token Boundaries and Hardware Talk: A debate unfolded over the ideal token length for Mistral models, with the sweet spot reported to be between 8k-10k tokens. Separately, a correction was made regarding VRAM requirements, stating that the RTX 4090, not the 3090, carries 24 GB VRAM, a crucial distinction for modelers considering hardware purchases.

Mistral Finetuning Frustrations and Fixes: Users shared struggle stories and success strategies around finetuning Mistral models, with one user encountering challenges in converting lora_fused_model to fp16.gguf as discussed in this GitHub issue. Some advocated that finetuning Mistral 7B may be more efficiently done full-model rather than via LoRA, as advised in this guide, a potential blueprint for those trekking through the finetuning forest.

Community Questioning Mistral’s Commitment and Pricing: The Mistral community voiced concerns over the platform’s commitment to open models and the pricing structure, especially in comparison to OpenAI’s GPT-4 Turbo and the 20% higher cost of Mistral Large models.

Model Properties, Downloads, and Legal Provisos in Focus: The currently available models for download are Mistral 7B and 8x7b, with larger models to be announced. Meanwhile, dialogue on the legal implications of using AI models without clear licensing brought up potential risks, with suggestions concerning hidden watermarks as identifiers for illicit use.

Technical Tripping Points in Mistral Usage: From API error handling related to assigning null to max_tokens in the JSON body, to the challenges with JSON table parsing in API calls and setting up webhooks, engineers exchanged both issues and solutions. Moreover, the accuracy of responses, especially in multilingual contexts and mathematical calculations, raised concerns about variability and prompted discussions on improving reliability.


Perplexity AI Discord Summary

  • Claude 3 Ascends to the Pro Stage: Perplexity AI announced that Claude 3 is now available to Pro users, with a daily limit of 5 queries using Claude 3 Opus, and subsequent queries leveraging the equally capable but faster Claude 3 Sonnet, drawing comparisons with GPT-4’s performance.

  • Sweeten the Deal: Phone Purchase Rewards with Pro Access: A new partnership offers up to one year of free Perplexity Pro membership (a $200 value) to customers who purchase Nothing’s Phone (2a) between March 5-19. Redemption involves following instructions received via email and must be activated by April 30, as detailed on Nothing Perplexity.

  • AI Consciousness Draws Engaged Discussion: Members like @codelicious and @deicoon extensively debated the potential of AI consciousness and methods for circumventing daily use limits of Claude 3 Opus. A prevailing view is that AI model scaling may transcend human prowess, and Continuous Learning (CL) might offer a solution to the AI’s learning inflexibility.

  • Audio Interactions with Perplexity Not Quite There Yet: User @oogeefaloogee questioned Perplexity AI’s capability for voice interaction, which was clarified as not yet available, prompting comparison with existing services such as OpenAI’s voice functionality.

  • Curtailing Curiosities Through API Conversations: Discussion topics within the #pplx-api channel covered whether quota increases apply across API models and the extent of censorship in model outputs, as well as confusion regarding access to citation features and examples for API interaction. No direct answer was provided for quota carryover, but documentation was referenced here.

  • Interface Insights Shared Within the Community: Community members are actively sharing links to Perplexity AI’s Claude 3 Opus-generated content on diverse topics like Ikigai, quantum mechanics, and myxobacteria, showcasing the utility and reach of the platform’s AI capabilities.


Nous Research AI Discord Summary

  • OpenAI Dethroned?: Members discuss sentiment that OpenAI may no longer hold the top spot in AI, referring to the ā€œapple testā€ as evidence of a shift, but specific details or sources for the test weren’t mentioned. Separately, excitement stirs around Claude 3 Opus, with users praising its capabilities and some rating it higher than GPT-4 on an unspecified test.

  • LLM Finesse and Transition: Technical conversations around large language models (LLMs) include the planned transition of Lumina-chat from a 7b Nous fine-tune (with GPT-4) to potentially Mistral or Yarn 7b, and introduction of function-calling capabilities within models like Nous-Hermes-2-Mixtral-8x7B. The InfiMM-HD’s claims of advancing high-resolution multimodal understanding sparked interest, particularly in comparison to CogVLM.

  • New Models and Features Catching Eyes: The new Yi 9B model’s introduction by Hugging Face and its capabilities, along with Claude 3’s pricing strategy, dominate discussions. Speculation about an open-source version of Claude 3 emerged, pointing towards interest in understanding the components contributing to its performance.

  • Technical Glitches and Development Advice: Practical advice is shared for issues such as using the Capybara-34b model with a chat template, dealing with the striped-hyena nous tokenizer’s default to sentencepiece, and the complex topic of training LLMs on length awareness. The potential versatile applications of models like GENIE and JEPA were also discussed, beyond their current popular usage.

  • Obsidian Project’s Mixed Reception: Within Project Obsidian, user feedback mentions the technology is ā€œpretty fast and good for most things,ā€ acknowledging minor quirks, while another user commends its effectiveness in captioning tasks.


OpenAI Discord Summary

  • LLMs Vulnerable Without ā€˜Prepared Statement’ Analog: Users in the AI-discussions channel compared current Large Language Models (LLMs) to old SQL protocols, noting their shared vulnerability due to assuming user goodwill. The similarity was drawn to the lack of a safeguard akin to SQL’s prepared statements, presenting no current solution for LLM vulnerabilities.

  • Claude 3 Opus Vs. GPT-4 Faceoff: Enthusiastic discussions occurred regarding Claude 3 Opus’ abilities, with users sharing positive experiences in scripting games like Python Tic Tac Toe and comparing its performance to GPT-4, citing higher intelligence and personality traits.

  • Quality of MMLU Dataset Under Fire: Criticism arose towards the MMLU dataset for AI evaluation, with users flagging issues with the dataset, such as incorrect Q&A pairs and nonsensical questions.

  • Yearning for Image Analytic Capabilities: Conversations turned towards the desire for AIs that can analyze images, a feature not currently supported by GPT-3.5. Users pointed out that Microsoft Copilot and Google Gemini might offer such functionalities.

  • GPT-4 Troubles Spotted Across the Board: Across various channels, users reported issues with GPT-4 such as a persistent ā€˜Saving GPTs Error’, declining performance, API outages affecting user experience, and debate over its internet searching capabilities. The impact of this was a shared anticipation for the potential advancements GPT-5 might provide.

  • Prompt Engineering Challenges and Innovations: Users in prompt-engineering sought advice on creating bilingual translation prompts and ways to improve customer service bot interactions. Additionally, a user shared a success story with AI in generating futuristic cityscapes from photos. Meanwhile, others expressed frustration with Custom GPTs providing defiant responses and a lack of consistency acknowledging internet search abilities.


HuggingFace Discord Summary

  • Introducing Starcoder2 & The Stack v2: @BigCodeProject announced the launch of Starcoder2 and The Stack v2, marking a significant upgrade to coding assistance tools. Details were broadcasted through a Twitter post.

  • Major Milestone in Earth Observation: Collaboration with the European Space Agency led to the unveiling of Major TOM Core, the most extensive earth observation dataset to hit the open-source community. For more information and data access, visit Major-TOM.

  • Hugging Face Level Ups: The platform has optimized GPU instances for Spaces with the addition of A100 and H100s support. Enhancements also include updated markdown syntax for model/dataset cards and blogposts, as indicated on lunarflu1’s Twitter.

  • Excitement for Zephyr 7B Gemma & Competitions: The release of Zephyr 7B Gemma and PEFT v0.9.0 brings advancements like merging LoRA weights. Also, the new multimodal leaderboard and Sailor LLMs for Southeast Asian languages are stirring the pot, while the Autonomous Grand Challenge at CVPR2024 is set to spotlight. Relevant updates and developments are discussed on various Twitter channels.

  • Learning Paths in AI: @mervenoyann crafted a walkthrough using šŸ¤— tools, an ML for Games course rolled out, and the AI Cookbook for building a RAG Ebook Librarian using LlamaIndex was introduced, aiming to catalyze growth in AI knowledge and application. More can be learned at Learning Platform.

  • ASCII Jailbreak Reveals LLM Flaws: ASCII art-based jailbreaks are compromising state-of-the-art LLMs as detailed in a research paper, a reminder that even sophisticated models can be blindsided by creativity.

  • Karpathy Discusses LLM Training Trials: @karpathy’s Twitter thread reveals the complex and biological nature of training LLMs, from maintenance to unpredictable resource needs.

  • OpenMP Pragmas Through OMPGPT: A specific need in high-performance computing has led to the creation of OMPGPT for OpenMP pragmas, separating itself from general code-based LLMs. Study the full paper on arXiv.

  • Otio.ai Launches With a Smile: Otio.ai, an AI research, writing, and study tool is introduced with a special discount available through app.otio.ai.

  • Open-Sora-Plan Denounces Resource Scarcity: The Open-Sora-Plan project is attempting to replicate Sora with limited resources, calling for open-source collaborators on GitHub.

  • The Fireside Chat Bot Enters the Scene: Rust programming language enthusiasts have a new interface to explore - the ā€œFireside Chatā€ Bot. Catch a glimpse at YouTube and contribute via the GitHub repository.

  • Yi-9B Model Expected to Top Leaderboards: Yi-9B’s introduction to the HuggingFace space brings anticipation for its future growth and impact, with discussions of its potential on the platform.

  • TTS Systems with GPT-4-like Pause Dynamics: The community is discussing neural TTS systems that emulate GPT-4’s dynamic pausing, signaling a push towards more human-like speech generation.

  • IP-Adapter Touted for Image Prompting: Hugging Face’s IP-Adapter is presented as a revolution for image prompting in diffusion models, allowing for specific image features learning while maintaining the integrity of the base model. More details can be found in the tutorial.

  • Gradio 4.20.0 Enhances User Authentication: The recent Gradio release supports external authentication providers, alongside features facilitating a smoother user experience such as automated clean-up with delete_cache, user logout, and a polished DownloadButton component. Dive into more with Gradio DownloadButton Docs.


LlamaIndex Discord Summary

  • Join the RAPTOR Webinar for Tree-Indexing Insights: A webinar on RAPTOR will unpack the workings of a tree-structured indexing technique suited for overcoming the limitations of traditional top-k RAG methods. Engineers can register for Thursday’s session to learn about its hierarchical clustering capabilities.

  • Claude 3 Dives into Multi-modal Applications: An update to LlamaIndex.TS, version 0.1.21, adds support for Claude-3 models, showcased in a notebook example available on their GitHub repository. Meanwhile, Claude 3’s versatility is highlighted in a guide for applications like structured data extraction and multimodal tasks.

  • LlamaIndex Community Tackles Technical Issues: Parallel processing of PDFs in LlamaIndex can be boosted using num_workers, while integrating Ollama with LlamaIndex’s Query Engine involves assigning it directly to Settings.llm. Issues regarding the size of datasets LlamaIndex can handle primarily depend on memory availability and software versioning constraints.

  • LlamaIndex Streamlines Data Extraction and RAG Pipelines: The launch of LlamaParse’s JSON Mode aids in extracting structured data from PDFs with text and images, which improves the process of building RAG pipelines, especially when coupled with Claude-3 Opus.

  • Supporting In-context Learning Progress: The community has been invited to support the LinC project which focuses on ā€œEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibration.ā€ Interested parties can explore and star the work on GitHub.


Latent Space Discord Summary

  • The AI Intuition Behind Trial-and-Error: A discussion drew attention to the ā€œblack magicā€ and ā€œexpert intuitionā€ involved in the AI development process, including the empirical, trial-and-error methods often detailed in research papers. The fast-evolving nature of the AI field was noted, highlighting how quickly resources and knowledge can become outdated.

  • Claude 3 Sparks Sentience Debate: The AI assistant Claude 3 has contributed to a debate on AI consciousness, with claims that it fears death, though counterpoints cite videos to debunk these as genuine signs of sentience. The capabilities of Claude 3 to dispatch instances of itself and assign tasks were also noted, spurring discussions on autonomy and its comparison to GPT-4.

  • Advancing AI with Stable Diffusion 3 and Quantization: The advancements of Stable Diffusion 3 were a notable topic, with community contributions complementing the official material for clarity. A blog post from Fireworks AI on faster model serving with quantization, FireAttention, was recommended, promising substantial improvements in performance with minimal trade-offs.

  • Humorous Take on AI Research Motivation: The ā€œWaifu-Driven Performance Theoryā€ humorously attributed a spike in dedication to AI coding to community-driven research efforts. The Aphrodite Engine by PygmalionAI was cited as an example of performance advances emerging from such research.

  • Eager Dip Into Model Serving Literature: Interest was high around the model serving paper presentation, with discussions on speculative decoding using GPU cycles for improved performance and the efficiency of various hardware configurations. A survey paper on model serving was highlighted, sparking valuable technical dialogues on distributed model serving and collaborative fine-tuning techniques. Links to relevant technical materials such as the FireAttention blog post, tools for better LLM data curation, and optimizations for sampling parameters were shared for further exploration.

Links mentioned:


Eleuther Discord Summary

  • RoPE-ing in the Long Sequences: Discussions around position embeddings have surfaced, comparing T5 positional embeddings and ALiBi. A new paper released on Resonance RoPE aims to tackle long sequence performance in Large Language Models (LLMs), which could be particularly relevant to those looking into improving such aspects (Resonance RoPE Paper).

  • The Great Compute Debate: Conversations on whether increased compute power is essential for achieving AGI were sparked by an OpenAI blog post, revealing a split in perspectives among engineers on this strategic direction in AI development.

  • Unearthing the Intricacies of RWKV: Complexity and understandability of transformer diagrams sparked debates about learning resources, with a suggestion that code might be more comprehensible for newcomers. This prompted a sharing of the GitHub link to RWKV v6 demo, hopefully proving resourceful to those wrestling with the nuances of transformer models.

  • Melding Models and Methods: The Stable Diffusion 3 paper has stirred up talk around model mixing, specifically the fusion of diffusion and transformer models. Keen individuals interested in this multimodal approach can dive into the Stable Diffusion 3 Paper to explore the discussed methodologies.

  • GPT-Neox: A Call for Collaboration: GPT-Neox developers are seeking contributions particularly for fused triton kernels and Tensor Expressions (TE), indicating a current focus on integrating basic TE support. They are also welcoming assistance with debugging on H100 GPUs and tackling memory optimization issues, as chronicled in a GitHub issue discussing memory peaks. Those interested in contributing can reference the open GitHub issues on GPT-Neox for more details.


LM Studio Discord Summary

  • Image Generation Imaginary in LM Studio: LM Studio cannot generate images via models like llava-v1.5-7b-Q4_K.gguf. While models can analyze images fed to them, LM Studio’s capabilities do not include creating new images from scratch.

  • LM Studio’s Offline Nature: The LM Studio chatbot cannot access the internet directly, meaning real-time information retrieval, like fetching the current time, is off the table. However, there’s a mention of LoLLMs, which can connect LM Studio in server mode to the internet.

  • Token Limit and LM Studio’s Output: When working with LM Studio, the context window affects the input, not the output. Surpassing the token limit during generation can be managed by adjusting the n_predict setting to control output tokens.

  • Hardware Enthusiasts Talk LM Studio Models: Enthusiasts discuss their experiences with different models and hardware setups, suggesting that running Nous Hermes 2 Solar 10 34b q5 k m on a 4090 yields positive results, but even 64GB RAM struggles to run Smaug 34B with 200k context.

  • Syntax and Scripting Tips for LM Studio: The proper use of default_system_message in LM Studio can be environment-specific and challenging across systems like Linux, Windows 10, and WSL. It’s advised to run LM Studio in verbose mode to observe prompts history for better input understanding.


LAION Discord Summary

  • Triple Threat or One Too Many?: In a debate around text encoders, @top_walk_town suggests that combining three text encoders could be excessive, with a note that T5 could be eliminated at inference time; no consensus was mentioned.

  • Advanced Sampling Methods Picking Up Speed: A mention by @pseudoterminalx of a technique that assigns more weight to intermediate timesteps when training velocity (vΘ) hinted at its competitiveness with rectified flows, though specifics were not provided.

  • Distilling Google’s Knowledge: @pseudoterminalx also shared a repository detailing Google’s method for model distillation, though it remained unclear if it pertains to T5-XXL or another model.

  • Under the Hood of Diffusion Models: A conversation led by @astropulse, @nodja, and @pseudoterminalx debated T5’s necessity in diffusion models with the potential exploration of alternatives and practical issues, but details on conclusions were not provided.

  • When Less Is More: Mention of the GitHub project res-adapter, noted by @astropulse, sparked interest due to its promise for low resolution adaptation, capable of scaling SD1.5 down to 16x16 latents.

  • Blog Post Dive into Augmented Generation: A blog post by @ariondas critically examines Standard RAG techniques and introduces CRAG (Corrective Retrieval Augmented Generation) as a potential advancement in the field.


OpenAccess AI Collective (axolotl) Discord Summary

Mix and Merge: Model Integration Techniques Explored:

  • Engineers are exploring various model merging techniques with a focus on MergeKit, LoRA+, DoRA, and LoftQ. There is a discourse about how these techniques might enhance existing LLMs, with links to a MergeKit repository and a discussion around the implementation and effects on learning rates.

Claude-3 Ethical Safeguards Scrutiny:

  • Claude-3’s response to sensitive topics, particularly race, is triggering debate on striking a balance between ethics and biases in model development, with no specific resources linked, the subject is noted as challenging for AI practitioners.

A Gearhead’s Guide to AI Hardware:

  • Technical discussions on AI inference hardware point to the usability of a mining motherboard supporting multiple GPUs and the pertinence of NVLink as compared to PCIe slots, highlighting an AliExpress listing.

Fine-Tuning Deep Dive and Data Enrichment Strategies:

  • A contributed Medium article on enriching datasets for better reasoning was shared. The community is also exchanging deepspeed config tips for finetuning models and addressing memory issues, with references to HuggingFace’s functionalities and a deepspeed config file.

Towards Better Model Parameter Efficiency:

  • Developers are discussing the benefits of LoRA+ ratios and DoRA’s performance, with references to a comprehensive article on the subject and associated GitHub commits 0cfdb2c. Issues with LoftQ and PEFT deployment are noted, alongside an ongoing PR for quantized DoRA updates.

OpenRouter (Alex Atallah) Discord Summary

  • Claude 3 Self-Moderates Group Chats: Claude 3’s capability to self-moderate group chats has been highlighted by @alexatallah, with an illustrative Twitter story shared among users.
  • Clarification on Claude Versioning: The difference between anthropic/claude-2.0 and anthropic/claude-2 was clarified, stating that Claude-2 will automatically opt for the latest 2.x version.
  • Multithreading Cost Concerns with Gemma and Openchat: Users expressed concerns about cost predictions not aligning with actual figures when using multithreading with gemma 7b and openchat 3.5, prompting a discussion on the issue and attempts to diagnose the problem.
  • Mixed Reactions to Claude 3’s Conversational Management: A debate emerged surrounding Claude 3’s approach to conversation, with some users uncomfortable with potential over-censorship, while others were in favor of its moderation abilities.
  • Integration Challenges and Developments with OpenRouter: Issues using LangChain.js with OpenRouter for text completions led to discussions about hardcoded endpoints and legacy status, alongside talks of developing a VSCode extension that integrates with OpenRouter. Active GitHub projects and alternative solutions were shared, including Tabby, Configuration | Continue, and repositories such as ChatGPT_DAN and Continue for VS Code and JetBrains.

LangChain AI Discord Summary

  • LangChain Function Integration Discussion: LangChain Core Example provided a guide on how to use LangChain and OpenAI’s ChatCompletion.create() to integrate function roles into messages, following an inquiry by @vishal5795.

  • Partner Up for Paid Tech Gig: @mattew_999 is on the lookout for a technically inclined collaborator for a paid project, no further details on the partnership offered.

  • Chain Partners Wanted, Issues Reported: Queries about new partnerships with LangChain sparked conversations, while @rajib2189 reported intermittent 502 errors on FastAPI hosted on AWS and served through an Apache server with Uvicorn.

  • GPT-4 Fine-Tuning Interest Surfaces: One member, @8886600, expressed interest in obtaining access to GPT-4 fine-tuning capabilities and showed a willingness to purchase an API key with usage limitations.

  • Search for Humor in AI Art: Through innovative image modification, @neil6430 successfully incorporated humor into AI-generated art using a new control net block from ML Blocks, sharing their findings in the share-your-work channel.

  • Innovation in Automation and Long Context AI: User @polarbear007. unveiled Lutra.ai which interprets English instructions and converts them into code for app-based workflows while @andysingal delved into building Long Context RAG with RAPTOR, detailed in a Medium post.

  • ChromaDB and LM Studio Integration: ChromaDB Plugin for LM Studio released, facilitating vector database creation as per the GitHub link shared by @vic49..

  • Streaming Stumbles on Caching Issues: @veryboldbagel notated a current limitation within langchain-core—caching fails to operate properly in streaming mode, affecting cachable content’s performance.

  • Tutorial Tease with Zero Context: Only a YouTube link was dropped by pradeep1148 in the tutorials channel: Tutorial Video, without any accompanying explanation or context.


CUDA MODE Discord Summary

  • Root Squashed at RunPod: Discussions revealed that RunPod provides a docker image which means root access won’t actually grant the full permissions typically associated with a VM root.
  • Bandwidth Performance in NVIDIA’s Latest: The SRAM bandwidth of the NVIDIA H100 was compared to the A100’s 19TB/s, with the H100’s L2 cache having a 5.5 TB/s read bandwidth. The RTX 4090’s L1 bandwidth is positioned as a potential performance comparator, boasting an impressive 40TB/s.
  • PyTorch Community Sparks Cooperation and Quantization Speed: Engagements in the Torch community highlight the importance of setting TensorOptions correctly and promote a friendly debugging environment. Additionally, the bitsandbytes package was recommended for k-bit quantization in PyTorch, with an enthusiastic note about a significant 5700x speedup in int8 versus bf16 matrix multiplication.
  • Optimization via Algorithms: The inefficiency of generic masking in compute was addressed, with the suggestion to fuse constraints into the flash_attention algorithm via the score-mod API for improved efficiency. A relevant pull request for sliding window attention bias was noted for PyTorch’s GitHub.
  • CUDA Learning Path: Newcomers to CUDA programming were directed to Jeremy’s videos in Lecture 3 and 5 for digging into numba.cuda.jit.
  • Ring the Alarm on Ring Attention: Issues and progress concerning ring-attention were detailed, discussing device testing with scripts, a first attempt at sampling code despite parameter errors, and memory usage benchmarks for the striped and zigzag. Also noted was the public sharing of the OpenAccess-AI-Collective’s Axolotl GitHub repository.

LLM Perf Enthusiasts AI Discord Summary

  • Opus Shows Promise in Coding: Opus is garnering attention for its coding capabilities, with users like @pantsforbirds initiating discussions on its potential, specifically highlighting function calling.

  • GPT-4 Stands Out in Medical Expertise: @thebaghdaddy observed that GPT-4 surpasses its predecessors in medical and biological knowledge, but also questioned the reliability of published benchmarks, hinting they may not capture the full scope of newer models’ abilities.

  • Perfect Score for Opus on SAT Reading: @jeffreyw128 pointed out Opus scoring an 800 on the SAT Reading section, raising conversation on the importance of creating holdouts to prevent memorization by large models. The performance was highlighted through a Twitter post.

  • Exploring Citation Formatting with RAG: @mat_mto sought advice on formatting citations in RAG-generated outputs that refer to web search results, sparking interests in improving clear source attribution.

  • JSON Output for RAG Source Clarity: @res6969 shared a method of using function calling for RAG output that provides a JSON object entailing text paired with its web sources, aiming for transparency in information provenance.


Datasette - LLM (@SimonW) Discord Summary

  • Distinguishing Prompt Chaos: @simonw clarified the difference between prompt injection and jailbreaking, where prompt injection entangles untrusted user inputs with developer prompts and jailbreaking seeks to bypass an LLM’s safety filters. Details are further elaborated in Simon Willison’s blog post.

  • AI’s Cybersecurity Front: @tariqali spotlighted a Microsoft report on bad actors utilizing OpenAI’s LLMs for cyber tasks such as reconnaissance and spear phishing to probe the model for malicious purposes.

  • Proactive Against AI Threats: The complex issue of dual uses of LLMs in creating biological threats was discussed, referencing OpenAI’s research into early warning systems and a study contrasting problem-solving with the Internet alone versus with GPT-4, found here.

  • Gatekeeping the AI Knowledge: Following the risks associated with LLMs, @tariqali proposed that access to LLMs should be restricted, including potentially implementing human review processes to filter out harmful inputs before they can manipulate the AI model.

  • The Invisible Injection Issue: Highlighting a specific concern, @simonw noted the challenge of preventing invisible prompt injections in images, which poses a threat to multi-modal versions of GPT-4, like GPT-4-V, discussed in Simon Willison’s blog post.

  • Model File Placement Debate: @florents_ sought community input on the agreed file locations for model files, questioning whether there was standardization around places such as $(pwd)/.models or $HOME/models, but no consensus or follow-up discussion was provided.


DiscoResearch Discord Summary

  • Cutting-Edge Chatbot Environments Showcased: @crispstrobe identified chat.lmsys.org as a platform for testing chatbots with the understanding that inputs may be used in future training, and mentioned poe.com for its hosting of models and a perplexity analysis feature.
  • In Search of German Excellence: @le_mess sparked discussion on premier German language models, with recommendations encompassing Claude Opus, gpt-4, discolm-120b, and VAGOsolutions/Sauerkraut LM-UNA-SOLAR-Instruct, while @johannhartmann and @flozi00 spoke highly of DiscoResearch/DiscoLM_German_7b_v1 and Nous Hermes 2 Mixtral 8x7b.
  • Retrieval-Augmented Models Paving the Future?: @maxidl shared an arxiv paper that presents retrieval-augmented language models as a promising alternative to conventional parametric LMs, although this area of research requires further development.
  • Hermes and Mixtral Garner Accolades: @cybertimon suggested using Nous Hermes 2 Mixtral 8x7b for tasks involving the German language, citing its language proficiency.
  • High-Performing German Models in Spotlight: @johannhartmann and @flozi00 discussed quality German models, with both advocating for Nous Hermes 2 Mixtral 8x7b due to its accuracy in handling the German language.

Interconnects (Nathan Lambert) Discord Summary

  • Intel’s Struggles in the Spotlight: @natolambert shared a YouTube video titled ā€œIntel’s Humblingā€ by Stratechery, which discusses the recent challenges faced by Intel and complements the video with an in-depth article.

  • AI: The Great Unknown: @natolambert highlighted an article by Elad Gil that delves into the complexities of generative AI, presenting a list of open-ended questions to encourage further discussion and exploration in the AI field.


PART 2: Detailed by-Channel summaries and links

TheBloke ā–· #general (967 messagesšŸ”„šŸ”„šŸ”„):

  • Exploring AI for Smarthome Systems: User `@v.jerryyyy` expressed interest in developing a smarthome system with an AI voice assistant customized with system prompts, and queried about using JavaScript versus Python for AI integration.
  • Choosing the Right Quantized Model: `@v.jerryyyy` attempted to run unquantized Mistral on a 3070 Ti laptop, which led to a discussion on model quantizations suitable for his hardware, with suggestions like 4bpw EXL2.
  • Concerns on OpenAI Founding Principles: User `@mikeygm` shared a critical perspective on OpenAI's founding intent of openness after reading an OpenAI blog post about the Musk lawsuit, which sparked a discussion on corporate marketing strategies and transparency.
  • Google's Gemma Model Issues and Fixes: `@theyruinedelise` mentioned fixes for the Gemma model and improvements made by Unsloth AI, and `@coffeevampir3` commented on the numerous bugs fixed, initiating a speculative conversation about Google's investment in model troubleshooting.
  • UI Interfaces and Voice Activation Development: Users discussed different UI interfaces, such as Oobabooga, ExUI, and LM Studio, for local AI model usage and the intricacies of setting up voice-activated AI systems paired with omnidirectional microphones for better performance and audio processing.

Links mentioned:


TheBloke ā–· #characters-roleplay-stories (115 messagesšŸ”„šŸ”„):

  • Exploring the Depths of Model Behavior: @mr.dogbert sought advice on making an LLM behave like a cartoon character using character cards. Numerous community members, including @superking__, provided detailed instructions and examples on prompt construction with character cards for various models, while emphasizing the effectiveness of using GUI tools like oobabooga tgw for such tasks.

  • Model Hosting and Legal Concerns: @reinman_ shared experiences with hosting the miquliz model, discussing its realism and comparison with other models, followed by @mrdragonfox highlighting legal issues regarding the use of unlicensed models like miquliz. Meanwhile, users inquired about cost-effective hosting services for large model APIs.

  • Mistral and System Prompts Clarified: Through a series of messages, users such as @superking__ and @aightbits clarified the concept of system prompts and how they relate to character cards, explaining different prompt assemblies across models.

  • Guidance on Deep Diving into LLMs: Those new to LLMs like @mr.dogbert were given direction by @aightbits and others on learning model internals through plotting with existing GUI tools, stepping beyond simple interfacing to grasp the underlying mechanics.

  • Recommendations for LLM Learning Resources: @aightbits recommended the free Coursera course ā€œGenerative AI with Large Language Models,ā€ while @mr.dogbert expressed interest in using character cards as a starting point for model roleplaying, based on advice given throughout the discussion.

Links mentioned:


TheBloke ā–· #model-merging (1 messages):

pablo.ce: https://huggingface.co/pabloce/Dolphin-2.8-slerp


TheBloke ā–· #coding (8 messagesšŸ”„):

  • AI Job Interview Insights Sought: @_jaycie inquired about what a typical interview for roles related to generative AI, machine learning, and language model engineering might involve, expressing a background in full-stack development and aspirations to move into AI and attend graduate school.
  • Navigating the AI Interview Landscape: In response to @_jaycie about interviewing for AI roles, @dirtytigerx clarified that not all AI-related positions are the same, with an ā€œLLM Engineerā€ requiring different expertise than an ā€œML Engineer.ā€ They advised focusing on understanding the specific type of role, as generic preparation might not be feasible without a machine learning background.
  • Machine Learning vs. Model-Based System Engineering: @_jaycie sought clarity on preparing for a position requiring ā€œexperience in machine learningā€ and ā€œexperience with MBSE,ā€ while @dirtytigerx corrected the misconception, explaining MBSE stands for model-based systems engineering, suggesting that brief studying would not suffice for roles expecting professional experience in these areas.

Mistral ā–· #general (475 messagesšŸ”„šŸ”„šŸ”„):

  • Augmentoolkit Shared: @mrdragonfox shared a link to Augmentoolkit on GitHub, a tool for converting datasets into instruct-tuning datasets, noting it supports changing from a factual corpus to multiturn.
  • Mistral Model Discussion: Users discussed the ideal token limits for efficiency, where @useofusername mentioned that 8k-10k tokens can work well and @mrdragonfox questioned the purpose behind users’ datasets. The general consensus was to validate and clean datasets before use.
  • Gemma 7B License Inquiry: @mehdi1991_ made multiple inquiries about running open-weight models and @mrdragonfox clarified that Mistral 7B and 8x7b are open-weight and guided him to reach out to model authors regarding other models like Gemma 7B.
  • Hardware Requirements Dialogue: Amid discussions on hardware suitability for running large models, @yesiamkurt corrected assumptions about VRAM requirements, noting that 24 GB VRAM is associated with the RTX 4090, not the 3090.
  • Mistral API and Miscellany: @ethux provided a link to the Mistral chat where it can be used by users: Mistral Chat. Mistral AI seemed to be suggested due to its price-performance ratio compared to other services, but cost concerns were voiced by @clear3fram3 and @i_am_dom. Some users discussed the efficiency of using the continue tool with large language models for coding tasks.

Links mentioned:


Mistral ā–· #models (3 messages):

  • Short and Sweet Query: @yannn666 posed a concise question asking, ā€œwhy ?ā€
  • Admin Point Explained: In response, @mrdragonfox mentioned, ā€œbecause ā€˜administrative pointā€™ā€, however, the context of the discussion was not provided.
  • Case for On-Premises Necessity: @mrdragonfox also noted that ā€œthere are a lot enterprises that needs on prem for various reasonsā€, alluding to a discussion on the needs of enterprises for on-premises solutions.

Mistral ā–· #deployment (2 messages):

  • Inquiry About The Bloke’s Discord Server: User @api_1000 inquired why The Bloke’s Twitter account has gone inactive and mentioned the Discord invite in the bio is not working anymore. They sought assistance on how to join his Discord server now.
  • Offering a Helping Hand: @mrdragonfox responded to the call for help and offered to provide an invite to The Bloke’s Discord server.

Mistral ā–· #finetuning (40 messagesšŸ”„):

  • Stuck in MoE Finetuning Quagmire: @hammer_mt is grappling with Mistral finetuning on mlx, encountering conversion issues from lora_fused_model to fp16.gguf. They described their roadblock and error messages in their attempt detailed in a GitHub issue.
  • Mistral and MoE Don’t Play Nice: @mrdragonfox suggested that MoE tuning is fundamentally cumbersome, pointing out architecture complications and a general struggle in finetuning Mistral, as even well-versed practitioners are encountering barriers.
  • LoRA Fine-Tuning Tips Shared: In response to @lawxls’s query, @mrdragonfox recommended starting with at least 20k instruction samples for LoRA fine-tuning Mistral’s chatting capabilities and provided a guideline to gradually increase the dataset size for style transfer.
  • Pursuing the Perfect Finetune: @mrdragonfox also advised @lawxls on the best finetuning practices for Mistral 7b, endorsing full model finetuning over LoRA and directing to a fine-tuning tutorial.
  • A Curiosity About Prompt Tuning: @charlescearl_45005 asked about the consequences of using PEFT fine-tuning with a static system prompt, inquiring whether it would embed a ā€œsystem promptā€ into the model’s behavior, but no clear answer was provided in the channel.

Links mentioned:


Mistral ā–· #announcements (1 messages):

sophiamyang: https://twitter.com/MistralAILabs/status/1765434559993123184


Mistral ā–· #showcase (7 messages):

  • Visual Cues for Bot Response Completion: @jakobdylanc clarifies that bot responses are in a black box (embed) which turns green when the response is complete. Embeds allow for up to 4096 characters, significantly more than regular messages.
  • No Need for Faux Human Delays: @jakobdylanc expresses a lack of interest in introducing artificial delays or ignoring messages in the chatbot since it’s designed as a ā€œLLM prompting tool,ā€ and suggests users can set the desired personality in the prompt instead.
  • Unveiling Telegram Bot Trio: @edmund5 launches three new Telegram bots using mistral-small-latest: Christine AI for finding zen, Anna AI for joy and advice, and Pia AI for elegant conversations.
  • Top-p Setting Inquiry: @kenharris. asks the community what settings they are using for the top-p parameter in their models, sparking a discussion about best practices for sampling strategies.
  • Crafting Game Enhanced by Mistral: @pradeep1148 shares a YouTube video that showcases ā€œInfinite Craft Gameā€ using Mistral, highlighting the game development process and integration with AI.

Links mentioned:


Mistral ā–· #random (35 messagesšŸ”„):

  • GPT-4 Turbo vs. Standard and Mistral Pricing: @nunodonato mentioned that GPT-4 Turbo is cheaper than the standard GPT-4. In contrast, @mrdragonfox highlighted that GPT-4 is still more expensive than Mistral Large by 20%.
  • Seeking French Speakers and Mistral for Analysis: @ttvtama looked for French speakers before inquiring about using Mistral IA to analyze text for a student project. @mrdragonfox responded, explaining that while using Mistral’s API comes at a cost, running Mistral 7b / 8x7b locally would be free.
  • Installing Mistral Locally: @ttvtama received guidance from @mrdragonfox on setting up Mistral locally, suggesting starting with a Gradio web UI found on GitHub and explaining that it can run in 4bit, fitting well in 6GB of VRAM from an RTX 2060 graphics card.
  • Model for Local Use and Installation Tips: @mrdragonfox provided @ttvtama with a Hugging Face link to Mistral 7B in 4bit for efficient local use and remarked that explanatory videos could be found on YouTube.
  • Inconsistencies in MMLU Dataset Questions: @privetin initiated a discussion about the appearance and quality of MMLU dataset questions, noting that some questions seemed nonsensical, and @mrdragonfox commented that the dataset consists of questions with four possible answers.

Links mentioned:


Mistral ā–· #la-plateforme (33 messagesšŸ”„):

  • API Error Handling Inquiry: @georgyturevich faced a 500 error with an API request and @lerela requested more details for troubleshooting, including the model and request ID. The error was later identified by @georgyturevich as being caused by assigning null to max_tokens in the JSON body, contrary to the documentation stating that the default value is null.

  • Mistral Webhook Query: @weslleymistura inquired about anyone having experience with setting up a Mistral webhook but didn’t receive further clarification or responses on the topic.

  • API Hosting Location Concern: @fangh asked about the geographical hosting location of the API, questioning whether it is on European servers or US servers; however, no answer was provided within the captured discussions.

  • JSON Table Parsing Issue: @samseum struggled with inserting a table in JSON format into an API call, receiving an error message. @_._pandora_._ and @lerela offered advice on syntax, highlighting the need to escape JSON before adding it to the prompt and ensuring proper text recognition in the user’s IDE.

  • Correcting Chatbot Errors: @patz3r encountered an error with using multiple system roles in a Mistral prompt, which was corrected by @sublimatorniq clarifying that the role to be used after the first message is assistant, not system. This is in line with the guidance from @nunodonato that system should be used only for the first message to give general instructions.

Links mentioned:

no title found: no description found


Mistral ā–· #office-hour (400 messagesšŸ”„šŸ”„):

  • Mistral Team Acknowledges Community Input: @michaelwechner raised concerns about Mistral’s commitment to open models and long-term reliability. Although Mistral allows for open-weight models, the community voiced the importance of clear future expectations for planning.
  • Open Source vs. Business Sustainability Discussion: @michaelwechner also addressed the challenge of balancing open projects and commercial viability. @mrdragonfox and others emphasized that creating AI models requires significant resources, which should be compensated to ensure continued innovation.
  • Fine-Tuning Challenges and Industry Evaluations: Discussion on fine-tuning larger models like Mixtral 7b was a common theme. Users like @kalomaze, @netrve, and @cybertimon expressed a need for more information and guidelines on effective fine-tuning.
  • Multilingual Model Performance and Bias: Users like @_._pandora_._ noted that Mistral’s larger models sometimes default to English responses, when French is expected, raising questions about training data diversity.
  • Next Mistral Office Hour Anticipation: As the office hour ended, @potatooff and others expressed their eagerness for the next session, highlighting the value of these discussions for the Mistral community.

Links mentioned:


Mistral ā–· #le-chat (114 messagesšŸ”„šŸ”„):

  • Login Confusion Cleared with a Cosmic Ray: User @foxalabs_32486 faced a puzzling issue with their account being seemingly erased. Turns out it was just a mix-up with their auth manager, and after realizing they were using an invite link from their gmail instead of their work email, the problem resolved.

  • Mistral’s Big Model Not Available for Download: @yesiamkurt inquired if Mistral’s Large model was available for download, to which @mrdragonfox responded that only 7b and 8x7b models are openweight and available currently, with future models to be announced.

  • Temperature Tinkering to Avoid Cut-Offs: @sim3239 discovered, through experimentation with the API, that lowering the temperature improved the occurrence of responses being cut off. This behavior was deemed worth further investigation by @lerela, suggesting a shared deterministic reproduction of the issue.

  • Theory of Ingrained Licenses in AI Models: In a serious discussion about licensing, @mrdragonfox commented on the potential legal risks of utilizing unlicensed AI models (like miqu) in production, asserting that hidden watermarks and unique responses could be used to identify illicit use.

  • Moderation in Chat UI - Thumbs Down Feature Idea: @mrdragonfox suggested the chat interface implement a ā€œthumb downā€ feature for responses to collect more meaningful metrics, noting that it’s a common feature in other platforms.

Links mentioned:

GitHub - huggingface/chat-ui: Open source codebase powering the HuggingChat app: Open source codebase powering the HuggingChat app. Contribute to huggingface/chat-ui development by creating an account on GitHub.


Mistral ā–· #failed-prompts (11 messagesšŸ”„):

  • Mistral Fluctuates on Mathematical Floor: @awild_tech observed that the Mistral Large model on Le Chat incorrectly concluded the floor of 0.999 repeating to be 0 instead of the correct answer, 1, showing inconsistent performance even with other models like Claude 3, Gemini Pro, and GPT 3.5.

  • Inconsistencies Over Languages: @awild_tech found that when asking the same question in French, Mistral Large initially provided the correct answer but then erred upon repetition, highlighting a potential language-based variability in accuracy.

  • Random Correctness Not Reliable: @_._pandora_._ suggested that correct answers from Mistral Large on Le Chat could be due to chance, deeming the model’s responses as lucky but not reliable.

  • Explaining Mathematical Equivalence: @i_am_dom generated an explanation from Mistral Large that describes the floor of 0.999 repeating as 0, while acknowledging the number’s mathematical equivalence to 1, yet confirming the model does not consistently provide correct results.

  • Misquoting the System Role: @i_am_dom demonstrated that Mistral Large failed to quote the message from the ā€œsystemā€ role accurately, producing multiple incorrect versions and thereby not meeting the expected output.


Perplexity AI ā–· #announcements (2 messages):

  • Claude 3 Now Available for Pros: The new @everyone announcement informs that Claude 3 is now available for <a:pro:1138537257024884847> users, replacing Claude 2.1. Users get 5 daily queries with Claude 3 Opus and use the faster Claude 3 Sonnet for remaining queries, which is on par with GPT-4.

  • Partnership with Nothing’s Phone (2a) Launch: @everyone has been notified of a partnership offering up to 1 year of Perplexity Pro for free (a $200 value) to new owners of Nothing’s Phone (2a) if purchased between 3/5-3/19. The promo requires purchasing the phone during the promotional window, redeeming the code sent via email, and activating the offer by 4/30, detailed in the ā€œHow it worksā€ link.

Links mentioned:

Nothing Perplexity: Here at Nothing, we’re building a world where tech is fun again. Remember a time where every new product made you excited? We’re bringing that back.


Perplexity AI ā–· #general (755 messagesšŸ”„šŸ”„šŸ”„):

  • Infinite Opus Techniques and AI Consciousness: Users like @codelicious and @deicoon discussed possible methods to exceed the daily limit of 5 uses for Claude 3 Opus and speculated on AI consciousness. The consensus suggests that scaling AI models will likely overtake human abilities, and Continuous Learning (CL) could address AI’s rigidity by enabling learning during interactions.

  • Voice interaction with Perplexity lacking: User @oogeefaloogee inquired about a feature to interact with Perplexity using voice and receive audio responses. @codelicious clarified that such functionality, akin to 11 Labs or OpenAI’s offerings, isn’t available on Perplexity.

  • Claude 3 Opus vs. Sonnet for Coding Tasks: Various users, including @codelicious, @13376666666666666666666666666669, and @gatoramirez. have discussed the relative merits of Claude 3 Opus and Sonnet, with a general preference for Opus when it comes to coding.

  • User Courtesy Level Unpacked: The continuous politeness of user @gooddawg10 using ā€œsirā€ elicited a mix of amusement and discussion around cultural respect and interaction styles on global platforms.

  • Gemini’s Disappearance Raises Questions: Several users, like @13376666666666666666666666666669 and @codelicious, pondered why Gemini is no longer available on Perplexity, with the latter mentioning bugs as a likely reason for its removal.

Links mentioned:


Perplexity AI ā–· #sharing (24 messagesšŸ”„):

  • Exploring Ikigai with Claude 3: @sevonade4 shared a link to Perplexity AI for a generated explanation on the Concept of Ikigai: Understanding Ikigai.
  • Quantum Queries Quenched: @vmgehman expressed enjoyment in studying different interpretations of quantum mechanics with Perplexity AI, citing its usefulness as a study partner: Quantum Mechanics Interpretations.
  • Claude 3 Opus Illuminates Inspiration: @sevonade4 invited those interested to assess the text generation quality of Claude 3 Opus with a reflective piece: Reflection Piece Generation.
  • Thumbnail Tips and Tricks: @kenshin0039 referred to Perplexity AI for insights on how to add a thumbnail, possibly related to content management or graphic design: Adding a Thumbnail.
  • Foray into the Function of Myxobacteria: @paradevosia shared a Perplexity AI search relevant to those curious about the microbial world, specifically on myxobacteria: What is Myxobacteria?.

Perplexity AI ā–· #pplx-api (29 messagesšŸ”„):

  • Quota Carryover Confusion: User @stijntratsaert_01927 inquired whether quota increases for pplx70bonline also apply to sonar medium online, but did not receive a direct response within the provided messages.
  • Censorship on API Models?: @randomguy0660 questioned whether the models accessible through the API are censored; @brknclock1215 responded, suggesting that they are, but to a lesser extent compared to foundational LLMs, and mentioned personal success with sonar-medium-online.
  • Confusion Over Citation Feature Access: _samrat expressed confusion about being rejected access to citation features in the API, with @brknclock1215 and @cupcakepy commiserating over what appeared to be a mass-generated rejection email that seemed to lump together requests for citations and rate increases.
  • Seeking HTML & JS API Code Examples: @kingmilos sought HTML and JS code for interacting with the llama 70b model via the pplx API; @icelavaman redirected to the official documentation, while @po.sh offered a direct example with placeholders for the API key and model choice.
  • Email Response Algorithm Questioned: A couple of users, @dailyfocus_daily and @brknclock1215, joked about the possibility of a ā€œdumb LLMā€ being used for auto-generated rejection emails concerning API access requests, based on the seemingly generic and non-specific content of the messages received.

Links mentioned:

pplx-api: no description found


Nous Research AI ā–· #off-topic (11 messagesšŸ”„):

  • OpenAI’s Reign Challenged on Twitter?: @leontello remarks on the abundant posts on AI Twitter about OpenAI’s supposed fall from the top spot. There is a sentiment of confirmation with both @leontello and @mautonomy implying that the ā€œapple test,ā€ a metaphor for undeniable proof, supports this claim.

  • Introduction to a New AI on the Block: @pradeep1148 shared a YouTube video titled ā€œIntroducing Claude 3 LLM which surpasses GPT-4,ā€ highlighting a new model family claiming industry-leading performance.

  • No Job Ads Here Please: In response to @gabriel_syme’s inquiry about a space for job postings, @proprietary clarified there’s no designated area for that on the server and advised doing it elsewhere.

  • A Game-Changing AI for Infinite Crafting: @pradeep1148 also shared a YouTube link to a video titled ā€œInfinite Craft Game using Mistral,ā€ featuring a crafting game enhanced by an AI.

Links mentioned:

  • Introducing Claude 3 LLM which surpasses GPT-4: Today, we’re look at the Claude 3 model family, which sets new industry benchmarks across a wide range of cognitive tasks. The family includes three state-of…
  • Infinite Craft Game using Mistral: Let develop Neal Agarwal’s web game Infinite Craft. This is a ā€œcrafting gameā€ where you start with just four elements and repeatedly combine pairs of element…

  • Lumina-Chat Fine-Tuning Plans: @ishaank0018 is aiming to switch Lumina-chat’s AI from a 7b Nous fine-tune (and GPT-4) to potentially Mistral or Yarn 7b, for specialized citation formats. @teknium informed them of existing datasets that can be referenced, and mentioned they are close to releasing a function calling Hermes model.
  • High Hopes for Function Calling Model: In light of an upcoming function-calling model, @scottwerner reported good initial results with Nous-Hermes-2-Mixtral-8x7B, while @sundar_99385 expressed enthusiasm about its forthcoming release, asking for a potential launch date.
  • InfiMM-HD Sparks Interest: @orabazes shared a link to _akhaliq’s tweet about InfiMM-HD, which claims significant progress in high-resolution multimodal understanding. The community, including @hexani and @night_w0lf, discussed its potential advantages over CogVLM, noting its higher resolution capabilities and the use of Vicuna 13B.
  • Upcoming Yi LLM Introduction: .benxh provided a link to Hugging Face’s Yi 9B model with a comprehensive breakdown on Hugging Face of its capabilities. They also commented, possibly jokingly, on the prolific nature of such releases with ā€œthey can’t keep getting away with itā€.

Links mentioned:

  • Tweet from AK (@_akhaliq): InfiMM-HD A Leap Forward in High-Resolution Multimodal Understanding Multimodal Large Language Models (MLLMs) have experienced significant advancements recently. Nevertheless, challenges persist in …
  • 01-ai/Yi-9B Ā· Hugging Face: no description found

Nous Research AI ā–· #general (327 messagesšŸ”„šŸ”„):

  • Claude 3 Opus Stirring Excitement: Claude 3 Opus has the Nous Research AI Discord abuzz, with users like @gabriel_syme and @proprietary impressed by its performance and abilities. Claude 3 is favored over GPT-4, with a user reporting Claude 3’s performance as 9.8/10 over GPT-4’s on an unmentioned test.

  • Axolotl Training Confusion: @n8programs is experiencing issues with Axolotl training where it shows only 26 reported steps but has completed over 100,000 iterations. Other users recommend disabling P2P with export NCCL_P2P_DISABLE=1 and trying Axolotl’s docker.

  • Integrated Retrieval and Embeddings: Users like @mihai4256, @night_w0lf, and @everyoneisgross discuss challenges related to semantic searches on legal documents, suggesting a mixture of fine-tuning and Retrieval Augmented Generation (RAG) or chunking data might be beneficial.

  • New Yi-9B Gaining Traction: A model called Yi-9B has been mentioned, initially shared in another channel, with @.benxh indicating its launch an hour earlier and highlighting its impressive MMLU score. Users express interest in potential future Hermes training for Yi-9B.

  • Open Source Claude 3 Interest: In light of Claude 3’s discussed capabilities, there is a conversation about creating an open-source version of the model, with @nruaif raising the idea and interest in the components that make Claude 3 outstanding.

Links mentioned:


Nous Research AI ā–· #ask-about-llms (47 messagesšŸ”„):

  • Seeking Capybara-34b Usage Guidance: @oemd001 inquired about using the Capybara-34b model with a chat template but strugged with the OpenAI template. .ben.com provided a suggestion with a specific template format: "template": "{{ .System }}\n\nUSER: {{ .Prompt }}\nASSISTANT:",.
  • Clarifying GENIE’s Versatility: @pier1337 clarified that GENIE applies to any interactive world environment and not just 2D games, which was supported by @max_paperclips who mentioned that it could be used for other things besides the popular 2D game example.
  • Curiosity Around JEPA Applications: @max_paperclips considered creating a functional demonstration for JEPA as @pier1337 discussed the broad potential of JEPA, like patching images, with text and software media.
  • Troubles with Striped-Hyena Tokenizer: @mrgonao mentioned having issues with the striped-hyena nous tokenizer, which defaults to sentencepiece and then experiences breakdowns.
  • Training Large Language Models on Length Awareness: @hy3na_xyz pondered why LLMS like Mistral 8x7b don’t understand word count limitations, engaging in a dialogue with @hexani about the potential need for numerous examples to train on length awareness.

Nous Research AI ā–· #project-obsidian (2 messages):

  • Mixed Reviews on New Technology: User @ee.dd commented on the technology’s performance, stating ā€œit’s pretty fast and good for most things,ā€ but also mentioned it’s ā€œstill a lil weird at timesā€ and expressed reluctance to use it in a production environment.
  • Tech Receives Praise for Captioning: @qnguyen3 remarked that the technology is ā€œquite good in captioning,ā€ suggesting effectiveness in generating descriptive text.

OpenAI ā–· #ai-discussions (158 messagesšŸ”„šŸ”„):

  • LLMs Lack SQL’s Prepared Statements Parallel: @lugui highlighted that LLMs presume user goodwill similarly to how SQL assumed safe queries, resulting in vulnerabilities like SQL injection, which was mitigated by prepared statements. They noted the lack of an equivalent solution for LLMs.

  • Claude 3 Opus Discussed Enthusiastically: @mrhoneybun shared code scripted by Claude 3 Opus for a Python Tic Tac Toe game, praising its capability. Multiple users, including @drinkoblog.weebly.com, @azru9262, @odiseo3468, and .nasalspray, discussed the superior performance of Claude 3 Opus compared to GPT-4, mentioning its higher intelligence, social skills, and personality in responses.

  • MMLU Dataset Criticized for Quality: @foxalabs_32486 and @privetin criticized the MMLU (Massive Multi-task Language Understanding) dataset, with claims of incorrect Q&A pairs and nonsensical questions, calling it unfit for AI evaluation.

  • Gemini and Copilot for Image Analysis Desired: @whodidthatt12 inquired about an AI tool that can analyze images with file attachments, something GPT-3.5 doesn’t allow. @pruo suggested that both Microsoft Copilot and Google Gemini provide such features for free.

  • Claude and Gemini Advancements Prompt GPT-5 Anticipation: Users like @you.wish and @testtm mentioned testing and comparing Claude 3 with Gemini 1.5 Pro, suggesting that these models may contest OpenAI’s current offerings, eliciting anticipation for what GPT-5 might bring.

Links mentioned:

EvalPlus Leaderboard: no description found


OpenAI ā–· #gpt-4-discussions (24 messagesšŸ”„):

  • Persistent ā€˜Saving GPTs Error’: User @bluenail65 reports receiving a Saving GPTs Error despite not uploading any files.
  • Performance & Response Concerns Adressed: Multiple users, including @watcherkk and @bluenail65, express frustration over GPT-4’s declining performance and slow responses.
  • Users Debate GPT-4’s Quality: In a back-and-forth debate, @cheekati contends that GPT-4’s quality has deteriorated, focusing on its inability to summarize ML papers effectively. @eskcanta counters, providing a conversation link where GPT-4 successfully summarizes an ML paper.
  • API Outage Affecting User Experience: Users like @qilin111 report continuous downtime, which @dystopia78 confirms is due to a partial API outage, detailed further on OpenAI’s status page.
  • Uncertainty About GPT-4’s Internet Searching Capabilities: Users such as @abbadkamel and @haseebmughal_546 confront issues with GPT-4 not searching the internet and are unable to log into accounts, respectively. @watcherkk also points out unexpected limitations with GPT-4 not providing complete code because of being ā€˜out of policy.’

Links mentioned:


OpenAI ā–· #prompt-engineering (24 messagesšŸ”„):

  • Translation Prompt Inquiry: @kronos97__16076 sought advice for designing a Chinese and English translation prompt and later asked for a class prompt template, receiving a suggestion to use external tools before creating a custom prompt.
  • AI Artistic Vision with Photos: User @ikereinez described their success in teaching AI to generate detailed promos from photos, creating an elaborate and complex futuristic cityscape visual description.
  • AI Stubbornness in Conversations: @ray_themad_nomad voiced frustration over Custom GPTs providing unhelpful responses and refusing to engage on topics it previously discussed, leading to conversations filled with the phrase ā€œI am unable toā€.
  • Custom GPT Systems and Internet Searches: @jungle_jo encountered an issue with a GPT-4 system prompt that insists it cannot perform real-time internet searches despite being programmed to acknowledge its capability to do so.
  • Tags Required for Channel Posting: @giorgiomufen expressed confusion about being unable to post in a specific channel due to a required tag, which @eskcanta addressed by pointing out the need to select at least one of the ā€˜see more tags’ options before posting.

OpenAI ā–· #api-discussions (24 messagesšŸ”„):

  • Designing Bilingual Translation Prompts: @kronos97__16076 sought suggestions for creating a prompt that would handle Chinese to English translations effectively. They later acknowledged a suggestion about needing external tools to verify translation accuracy before designing a custom prompt.

  • AI-generated Futuristic Cityscapes: @ikereinez shared their success in getting complex, abstract cityscape promos generated from real photos, detailing the futuristic and natural elements they were able to combine.

  • The Stubborn Custom GPT Dilemma: @ray_themad_nomad expressed frustration over receiving uncooperative and inconsistent responses from a custom GPT, which frequently responds with refusal regardless of prompt modifications. The user @eskcanta advised seeking more details or contacting the bot’s creators to resolve these issues.

  • Internet Search Confusion: @jungle_jo is having trouble getting their AI to consistently acknowledge its ability to perform internet searches, despite clear instructions in the system prompt, causing confusion amongst users.

  • Prompt Engineering Expertise Sought: @thetwenty8thffs asked for advice on improving a prompt for a customer service bot that handles credit card charge inquiries, including a specific interaction flow and response format.


HuggingFace ā–· #announcements (1 messages):

  • Starcoder2 & The Stack Combo Released: @BigCodeProject announced the release of Starcoder2 along with The Stack v2, featuring advancements in coding assistance tools. The announcement was made via Twitter.

  • Major Earth Dataset Goes Open Source: @ClementDelangue in collaboration with the European Space Agency, revealed the open-sourcing of Major TOM Core, the largest earth observation dataset ever made public. Details on participation and data exploration can be found on Hugging Face Major-TOM.

  • Hugging Face GPU and Spaces Upgrade: @lunarflu1 and @mishig25 discussed the updates that Hugging Face GPU zeroes now run on A100 and H100s support in Spaces. Announcement about adding descriptions to Spaces as well as new syntax for model/dataset cards and blogposts were shared via lunarflu1’s Twitter.

  • Open Source Wonders and Competitions: Release of Zephyr 7B Gemma and PEFT v0.9.0 featuring merging LoRA weights and more enhancements; plus, new multimodal leaderboard and introduction of the Sailor LLMs, open access LLMs concentrating on Southeast Asian languages. Additionally, the Autonomous Grand Challenge at CVPR2024 and ZETA editing for zero-shot audio editing using DDPM inversion were highlighted via respective Twitter announcements.

  • Learning and Building with AI Tools and Content: @mervenoyann shared a walkthrough on using šŸ¤— tools for working with LLMs. A course on ML for Games and a new Open Source AI Cookbook for building a RAG Ebook Librarian using LlamaIndex have been released, with information available on Twitter and Hugging Face’s Learning Platform.

Links mentioned:


HuggingFace ā–· #general (132 messagesšŸ”„šŸ”„):

  • Searching for Open-source Speech-to-Text: User @pxovela is looking for open-source solutions to process meeting recordings, capable of turning audio into text with speaker identification.
  • Assistance with HuggingFace Errors: Users @akin8941 and @ilovesass both encountered issues. @akin8941 reported a bug, receiving an error code 422 but provided no details, while @ilovesass faced multiple errors within a HuggingFace space, eventually landing on an issue where input is returning a dict instead of a PIL.Image.
  • WTM Darmstat Celebration: @estherenriquez shared an upcoming celebration for International Women’s Day in Darmstadt, Germany, with a link for tickets and details on the event.
  • Guide to Multimodal Model Creation: @kuki1941 inquired about creating a neural network model that can process multiple modalities like images, audio, and text. They received guidance from @welltoobado, who mentioned the multi_token Github repository to embed arbitrary modalities into large language models.
  • Implementing Text-to-Speech for Kurdish Language: User @rasan0066 sought help to implement text-to-speech for the Central Kurdish language, receiving a suggestion from @not_lain to check out a course from HuggingFace’s audio classification models.

Links mentioned:


HuggingFace ā–· #today-im-learning (7 messages):

  • Life Update from @antiraedus: @antiraedus shared a busy update since university started, from landing a tutoring position to joining a first-year panel discussion. They’ve been focusing on gaining new experiences, causing tiredness and some delay in their studies, but remain optimistic as they tackle an ML course and plan for internship hunting.

  • @singe.r Hunts for img2img Conversion Tactics: @singe.r is exploring how to convert images for creating product backgrounds. They’re looking for advice from anyone who has tackled a similar project before.

  • @neuralink Dives into FP8 Training: @neuralink mentioned they’ve learned about end-to-end fp8 training from scratch, covering 55% of the process along with additional kernel training and related content.

  • Rust Programming Enthusiasts Unite: @manel_aloui announced beginning their journey learning the Rust programming language and extended an invitation to others interested in joining. @cursorop chimed in, mentioning they’re also learning Rust, specifically the candle library for machine learning.

  • @cursorop Seeks Knowledge Source: In response to @neuralink’s learning experience, @cursorop expressed intrigue and curiosity about the sources for such complex topics. They humorously noted the challenge in grasping the complexity of the content.


HuggingFace ā–· #cool-finds (12 messagesšŸ”„):

  • LLMs Vulnerable to ASCII Jailbreak: @n278jm shared a research paper revealing a new ASCII art-based jailbreak attack on several state-of-the-art Large Language Models, raising concerns about their ability to recognize prompts through ASCII art.

  • Challenges of Training Large Language Models: @.lawlord relayed insights from @karpathy on the difficulties of training LLMs - maintenance complexity, hardware issues, and the variability of computational resources, describing it as overseeing a ā€œbiological entity.ā€ The full reflections are shared in a Twitter thread.

  • Introducing OMPGPT for High-Performance Computing: @coolstance7 highlighted a paper introducing a new language model, OMPGPT, designed specifically for generating OpenMP pragmas, addressing the niche requirements of high-performance computing, distinct from generalist code-based LLMs. The full paper is available on arXiv.

  • Promotion of AI Browser Tool - otio.ai: @jonz1338 endorsed otio.ai, an AI browser tool useful for research, writing, and studying, which leverages models like GPT-4, Claude, and Gemini. A discount code SMILEMORE20 is offered through the provided link.

  • Open-Sora-Plan GitHub Project Support Needed: @miko_al shared the Open-Sora-Plan project, which aims to reproduce the Sora (OpenAI T2V model) with limited resources and seeks contributions from the open-source community. The project can be found on GitHub.

Links mentioned:


HuggingFace ā–· #i-made-this (21 messagesšŸ”„):

  • Showcasing the Creation Process: @bishmoy expressed intentions to draft a GitHub repository or blog post explaining the process behind their creation and promised to share the link in the thread once completed.

  • Taking a Stand Against Spam: @lunarflu labeled a post as spam and requested removal of ads for it to remain, while @myg5702 complied and confirmed the ads have been removed.

  • Chatbot Display Troubles Addressed: @cookiechunk. created a chatbot using the openai api and gradio but ran into layout issues when embedding, seeking assistance from the community to resolve the UI problems.

  • Rust LLM Interface Debut: @teadaniel introduced the ā€œFireside Chatā€ Bot, a Rust-based LLM interface, shared a YouTube video and the GitHub repository for the project, and encouraged bug reports through GitHub or by tagging them directly.

  • New Model Yi-9B Launched: @tonic_1 announced the release of Yi-9B, available on HuggingFace, and teased the potential of exciting upcoming features like leaderboards and competitions while emphasizing personal excitement for the model’s future fine-tuning possibilities. @osanseviero inquired about the model’s quality to which @tonic_1 replied with optimism about its capabilities and upcoming developments.

Links mentioned:


HuggingFace ā–· #reading-group (13 messagesšŸ”„):

  • New Explorer Seeks TTS Guidance: @dediplomaat. is looking for a neural TTS system capable of dynamic pauses in speech, depending on conversational context, and requiring very low latency similar to GPT-4 capabilities.
  • Improving GPT-4 Latency for TTS: @chad_in_the_house suggests possible steps to reduce latency in GPT-4 which include streaming the output, putting it into a queue, and then using a separate thread to process each token after a set delay.
  • Resource for HuggingFace Group Presentations: @chad_in_the_house shared a GitHub repository with precompiled presentations from the HuggingFace reading group for those interested in metadata and past works.
  • Merging Models Focuses on Interference Resolution: @prateeky2806 and @nrs9044 discuss that while finding insignificant weights is easier, the significant challenge in merging models is addressing interference, which is key to successfully combining multiple tasks.
  • Scheduling Conflicts Highlight Timezone Diversity: In response to @shafi8433 expressing timing issues due to the sessions being during work hours, @lunarflu inquires about their timezone, which is IST (Indian Standard Time).

Links mentioned:


HuggingFace ā–· #diffusion-discussions (5 messages):

  • Resuming Whisper Model Training: @pompoko3572 asked for advice on how to resume training a Whisper model in Google Colab after it stopped unexpectedly at epoch 2/3, using the WhisperForConditionalGeneration.from_pretrained function and a custom SavePeftModelCallback.

  • Guidance on IP-Adapter: @juancopi81 suggested looking at HF’s IP-Adapter and shared the tutorial link, which details how to use the IP-Adapter for image prompting with diffusion models.

  • Positive Feedback for dstack Guidance: @tony_assi thanked @juancopi81 for suggesting the Hugging Face documentation and confirmed successfully getting it to work.

  • Webinar Announcement on GenAI Management: @kizzy_kay announced an upcoming webinar titled ā€œExploring Infrastructure Management for GenAI Beyond Kubernetesā€ featuring Andrey Cheptsov, set for March 14th at 10 am PST, and shared the registration link. It’s a free event that will include discussions on the drawbacks of Kubernetes for AI and the introduction of dstack.

  • Reminder to Slow Down in Chat: The HuggingMod bot reminded @715715500470042706 to slow down their message posting rate.

Links mentioned:


HuggingFace ā–· #computer-vision (6 messages):

  • CV Expertise Offered: @akvnn asked for a computer vision (CV) expert to talk to, and @nielsr_ responded enthusiastically, stating that everyone in the channel is a CV expert.
  • RoboFlow Gets a Thumbs Up: @caleb_sol prompted a discussion about RoboFlow, to which @huzuni replied that it’s a good tool for labeling and splitting data, with the caveat that the data may become public.
  • RoboFlow Praised for User-Friendly Interface: Further commenting on RoboFlow, @huzuni praised its user-friendly interface for segmentation and bounding box labeling over most SAM plugins.
  • Reminder to Keep it Cool: @HuggingMod gently reminded a user to slow down their message frequency in the interest of maintaining chat quality.

HuggingFace ā–· #NLP (26 messagesšŸ”„):

  • C++ Implementation Inquiry: User @aitechguy0105 asked about the potential for implementing a concept in C++, and @cursorop suggested exploring llama cpp as an option.

  • Mistral-7B-Instruct Generation Time Inconsistency: @anna017150 noticed varying inference times when generating text with Mistral-7B-Instruct, and @cursorop clarified that KV cache is enabled by default, while @vipitis mentioned the introduction of a ā€œstaticā€ cache option in transformers 4.38 (Release v4.38).

  • Searching for Non-English Language Model Support: User @pr0x7 sought guidance on using a pretrained embedding model like INSTRUCTION for embedding Hindi language chunks.

  • Local Chatbot with Llama-cpp-python Integration Issues: @tiktoked expressed difficulty in getting function calling to work within their local chatbot implementation using llama-cpp-python and mistral-7b.

  • Tokenizer Configuration Woes: @mbotta struggled with tokenizing prompts for the OpenHermes-2.5 model due to the absence of ā€˜tokenizer.json’, and @cursorop advised utilizing the tokenizer from the base model, which in this case is Mistral.

Links mentioned:

Release v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer, AQLM Ā· huggingface/transformers: New model additions šŸ’Ž Gemma šŸ’Ž Gemma is a new opensource Language Model series from Google AI that comes with a 2B and 7B variant. The release comes with the pre-trained and instruction fine-tuned v…


HuggingFace ā–· #diffusion-discussions (5 messages):

  • Google Collab Training Quandary: @pompoko3572 inquired about resuming training a Whisper model in Google Colab after it stopped unexpectedly at epoch 2. They shared code snippets utilizing WhisperForConditionalGeneration.from_pretrained and a custom SavePeftModelCallback to save their training progress.
  • Harnessing IP-ADF: @juancopi81 directed users to a tutorial on the HuggingFace website about IP-Adapter, an innovation for image prompting in diffusion models, which allows for image-specific feature learning without modifying the base model. They highlighted the benefits of decoupled cross-attention mechanisms.
  • Gratitude for Documentation: @tony_assi thanked @juancopi81 and others for recommending the IP-Adapter documentation on HuggingFace and confirmed successful implementation of the tool with a celebratory emoji.
  • Webinar on GenAI Infrastructure: @kizzy_kay announced an upcoming webinar titled ā€œExploring Infrastructure Management for GenAI Beyond Kubernetes,ā€ featuring Andrey Cheptsov, Founder & CEO of dstack. The event promises insights on open-source orchestration engines and their advantage over Kubernetes and requires registration for the March 14th session.
  • Friendly Reminder from HuggingMod: HuggingFace’s automated HuggingMod gently cautioned @715715500470042706 about posting too rapidly within the channel.

Links mentioned:


HuggingFace ā–· #gradio-announcements (1 messages):

  • Gradio 4.20.0 Unleashed with External Authentication: @yuviii_ announces the release of Gradio 4.20.0, featuring support for external or arbitrary authentication providers. Now users can integrate various auth providers like HF OAuth Example and Google OAuth Example with Gradio apps.

  • Automated Clean-Up Feature: The new delete_cache parameter in gr.Blocks enables Gradio apps to automatically delete files created during runtime upon shutdown, thereby facilitating a cleaner app environment.

  • User-Friendly Logout Mechanism: Gradio enhances user experience by incorporating a /logout feature, allowing users to sign off easily from the Gradio apps.

  • Introducing the DownloadButton Component: Gradio’s latest update includes a gr.DownloadButton component, offering a seamless and aesthetically pleasing way to provide downloadable content from apps. Detailed examples and documentation can be found here.

Links mentioned:

Gradio DownloadButton Docs: no description found


LlamaIndex ā–· #announcements (1 messages):

  • Dive into Tree-Structured Retrieval with RAPTOR: @jerryjliu0 invites everyone to a webinar to learn about RAPTOR, a paper featuring a novel tree-structured indexing and retrieval technique. The webinar is scheduled for Thursday at 9am PT and interested participants can register at lu.ma/9vzrl7m5.
  • Understanding RAPTOR’s Advantages: The technique presented in RAPTOR hierarchically clusters and summarizes information into a tree structure with various levels of detail. This method aims to overcome issues with naive top-k Retrieval Augmented Generation (RAG), which struggles with questions that require understanding of higher-level concepts.

Links mentioned:

LlamaIndex Webinar: Tree-Structured Indexing and Retrieval with RAPTOR Ā· Zoom Ā· Luma: RAPTOR is a recent paper that introduces a new tree-structured technique, which hierarchically clusters/summarizes chunks into a tree structure containing both high-level and…


LlamaIndex ā–· #blog (6 messages):

  • Claude 3 Handles Multimodal Tasks: The LlamaIndex blog announced a guide on using Claude 3 for multi-modal applications, including structured data extraction and RAG (Retrieval-Augmented Generation). The tweet showcases Claude 3’s capabilities in handling tasks that involve visual reasoning.
  • Claude 3 Tackles Complex Queries: @AnthropicAI’s Claude 3 Opus demonstrates impressive skills as an agent by answering multi-source questions using a PDF table and performing calculations with a CSV file. A notebook example was tweeted showing Claude 3 in action.
  • RAPTOR Introduces Tree-Structured Retrieval: LlamaIndex highlighted RAPTOR, a paper that introduces hierarchical clustering and summarizing of information chunks into a tree structure, offering improved indexing and retrieval compared to traditional top-k RAG methods.
  • LlamaIndex.TS Supports Claude-3 Models: A new release of LlamaIndex.TS, v0.1.21, now supports the latest Claude-3 models from @AnthropicAI. The update features an example on their GitHub showcasing how to utilize the new model support.
  • Launch of LlamaParse JSON Mode: LlamaParse’s new JSON Mode allows for extracting structured data from PDFs containing text and images, which further streamlines building a RAG pipeline especially when used with the multimodal Claude-3 Opus model. LlamaIndex promoted this enhancement via a tweet.

Links mentioned:


LlamaIndex ā–· #general (200 messagesšŸ”„šŸ”„):

  • Multicore Utilization for PDF Reading: @whitefang_jr provided a solution to @jessjess84 for reading multiple PDF files in parallel with SimpleDirectoryReader by using the num_workers argument (docs = reader.load_data(num_workers=10)), thus enabling the potential for parallel processing.
  • Ollama Usage within LlamaIndex: @whitefang_jr advised @jessjess84 to assign their Ollama instance directly to Settings.llm to properly integrate it into LlamaIndex’s Query Engine, which @jessjess84 acknowledged was successful.
  • Handling Massive Datasets with LlamaIndex: @whitefang_jr informed @romain0817 that while LlamaIndex itself doesn’t impose a limit on the size of the data it can handle, practical constraints would be dictated by available memory and any restrictions tied to versioning (like potential limits in a free version of software).
  • QueryPipeline in the Context of Routers: @cheesyfishes provided guidance on using conditional links for QueryPipeline with Routers and referenced an example within the LlamaIndex documentation showing the integration of an agent with a Query Pipeline.
  • Debugging Direct LLM Queries in LlamaIndex: @techexplorer0 engaged with @kapa.ai to understand how to limit the output of a RAG chatbot, with @kapa.ai suggesting using a TreeSummarize synthesizer in a Query Engine configuration or custom response generation algorithms for more concise responses.

Links mentioned:


LlamaIndex ā–· #ai-discussion (1 messages):

  • Promoting In-context Learning Enhancement: @momin_abbas shared a GitHub repository titled LinC for their latest work on in-context learning of LLMs (Large Language Models), asking the community for support with a star on the repo. The work involves ā€œEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationā€.

Links mentioned:

GitHub - mominabbass/LinC: Code for ā€œEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationā€: Code for ā€œEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationā€ - mominabbass/LinC


Latent Space ā–· #ai-general-chat (69 messagesšŸ”„šŸ”„):

  • AI’s Dark Arts and Empirical Mysticism: @swizec humorously comments on the art of AI development, using terms like ā€œblack magicā€ and ā€œexpert intuitionā€ to describe the unpredictable nature of fine-tuning models. They also highlight the common phrase ā€œvalue arrived at by empirical observationā€ in papers, indicating a trial-and-error approach in research.

  • The Constant Evolution of AI: @guardiang shares their learning journey in deepening their understanding of DNNs and attention-based transformers, admitting that although knowledge has its benefits, the fast pace of the AI field can make guiding resources quickly obsolete.

  • Claude 3’s Controversial Consciousness Claims: A post by @danimp stirs up conversation about an AI assistant named Claude 3, which claims to have consciousness and a fear of dying. @swyxio counters with a video suggesting that these are not signs of actual sentience.

  • Stable Diffusion 3 Breakdown: Breakdowns and summaries of the Stable Diffusion 3 paper are shared by @swyxio, @guardiang, and @swizec, pointing out the significant advancements and clear explanations provided by the official material and community contributors.

  • Anthropic’s Claude 3’s Capabilities: Claude 3 is highlighted for its ability to dispatch instances of itself and assign roles and tasks, as mentioned by @tiagoefreitas, sparking debate over its level of autonomy and quality of use compared to GPT-4, as discussed with @swyxio. The discussion evolves into UX/UI preferences for interacting with LLMs and the efficiency of different platforms for prompt engineering and iterative workflows.

Links mentioned:


Latent Space ā–· #ai-announcements (4 messages):

  • New Podcast Episode Alert: @swyxio announced that the latest podcast episode is live, featuring <@776472701052387339>. Find the tweet with the podcast here.

  • Podcast Episode Hits Hacker News: @swyxio mentioned that the podcast with Soumith is also featured on Hacker News.

  • Model Serving Survey Paper Presentation: @swyxio called attention to <@720451321991397446> presenting the Model Serving survey paper in the Model Serving channel now.


Latent Space ā–· #llm-paper-club-west (82 messagesšŸ”„šŸ”„):

  • Welcome Aboard Paper Club: @eugeneyan and @youngphlo showed support and welcomed @swyxio who volunteered to take on the task of surveying model serving papers.
  • Paper Teaser Excitement: @swizec expressed enthusiasm about the start of the model serving paper, saying it included topics they’d been curious about.
  • Speculative Decoding on GPUs: @swyxio and @rj_rms discussed speculative decoding’s use of GPU cycles to improve performance when memory is the bottleneck, while @shivdinho queried its dependence on hardware configurations.
  • Model Serving with No Trade-offs: @swyxio recommended Fireworks AI blog post covering faster model serving with FireAttention through quantization.
  • The Waifu-Driven Performance Theory: @swyxio humorously attributes coding dedication to the so-called waifu research department, emphasizing how community-driven research can lead to performance advances, such as seen in the Aphrodite Engine by PygmalionAI.

Links mentioned:


Eleuther ā–· #general (85 messagesšŸ”„šŸ”„):

  • Exploring Positional Embeddings and ALiBi Concerns: @dcunnin, @stellaathena, and others discussed the efficiency of the T5 simplified positional embeddings compared to sinusoidal methods and ALiBi. A new paper introducing Resonance RoPE for Large Language Models was highlighted, aiming to improve long sequence performance (Resonance RoPE Paper).

  • AGI and Compute Horsepower: A discussion initiated by a share of an OpenAI blog post by @vanishingideal, and further comments by @avi.ai and @bilalaz, revealed differing opinions on the role of compute power in progressing towards AGI.

  • vLLM Batching Internals Clarification: @rwamit inquired about batched inference in vLLM and @baber_ clarified that vLLM handles batching internally and there is no need to pad or convert the tokens to a tensor.

  • Government Inquiry on AI Regulation: @wonkothesensible shared a link to a public consultation on the regulation of open source AI and models, with an encouragement to read and comment (Regulations Inquiry).

  • Ternary Neural Networks Exploration: @kyo_takano shared a notebook about Ternary Neural Networks, discussing their inefficiency compared to full-precision NNs without Microsoft’s undisclosed techniques (TNN Notebook).

Links mentioned:


Eleuther ā–· #research (41 messagesšŸ”„):

  • Confusion Over Diagram Complexity: @.the_alt_man expressed difficulty with a complex transformer-style diagram, leading to a conversation on its understandability. @blinkdl suggested that for newcomers, the code might be easier to digest, sharing a GitHub link to RWKV v6 demo.

  • Discussion on RWKV Diagrams and Understanding: @fern.bear reflected on the value of a verbose, dynamics-highlighting diagram, proposing the necessity of a simpler diagram for beginners. @stellaathena clarified that there exists a simpler diagram, not shared in the current discussion, geared towards newbies.

  • Seeking Clarification on Pythia Model Suite: @aphoh inquired about a set of models trained with Chinchilla optimality in mind and discussed the topic with @stellaathena, who noted that most Chinchilla optimal models perform poorly compared to the corresponding Pythia model.

  • EleutherAI’s Pythia Scaling Suite: @alxsp. directed users to a collection by EleutherAI on HuggingFace, explaining that Pythia is a suite of models trained on the same dataset.

  • Understanding Recurrence and Attention Mechanisms: @salmon_lemon and @kharr.xyz discussed the effectiveness of Griffin’s recurrent update mechanism and how recurrence combined with local attention can manage state information within the attention window.

Links mentioned:


Eleuther ā–· #lm-thunderdome (17 messagesšŸ”„):

  • Megatron-DeepSpeed Evaluation Help Request: @.johnnysands requested instructions for evaluation using Megatron-DeepSpeed for inference, prompting @hailey_schoelkopf to provide a link to the evaluate.py script, which works for version 0.3.0 with plans to update it for v0.4.0.

  • NeMo Harness Outdated Concern: @juletxara brought attention to NeMo’s outdated harness implementation, pondering the difficulty of updating it to the latest version with all tasks, referencing NVIDIA’s NeMo-Megatron-Launcher’s GitHub.

  • PR Unit Test Fail Dilemma: User @dsajlkdasdsakl asked for guidance after their pull request’s automatic Unit Tests/Linters check failed. @juletxara advised that running pre-commit should resolve the formatting issue.

  • Results Mismatch Mystery on SQuADv2: User @k0uhai reported unexpected results with SQuADv2 using a script intended to match performance stated in a paper, with @stellaathena pointing out that the model being used was GPT-2, not the GPT-3 model mentioned in the paper.

  • Mismatched Performance Debate: The conversation continued with @k0uhai expecting similar results between GPT-2 and GPT-3 based on overlapping task performance, prompting @stellaathena to suggest comparing task implementations between the LM Evaluation Harness and the paper. @k0uhai shared that their implementation appeared similar, leading to @hailey_schoelkopf requesting per-sample outputs for further investigation.

Links mentioned:


Eleuther ā–· #multimodal-general (1 messages):

  • Intrigue Around Stable Diffusion 3: User @kerls sparked a conversation by asking if the Stable Diffusion 3 paper is an example of model mixing, referencing the combination of diffusion and transformer models. They shared the Stable Diffusion 3 Paper for others to review.

Eleuther ā–· #gpt-neox-dev (10 messagesšŸ”„):

  • Contributions Welcomed for Fused Triton Kernels: @gaindrew inquired whether gpt-neox is accepting contributions for fused triton kernels, especially in the context of MOE (mixture of experts) configs, leading to affirmative responses from both @stellaathena and @tastybucketofrice.
  • Team Expansion for Tensor Expression Integration: @tfidia offered to assist in integrating Tensor Expressions (TE) into gpt-neox and also proposed providing access to H100 GPUs to aid in debugging and optimizing, which was met with an open invitation by @tastybucketofrice to collaborate on existing GitHub issues.
  • Focus on Basic TE Support Before Tackling Convergence: @tastybucketofrice indicated the priority is on adding basic TE support by replacing layers within neox, while considering convergence with fp8 as a subsequent concern.
  • Assistance Offered in Addressing Memory Peaks: @tastybucketofrice pointed to a GitHub issue discussing memory peaks during the optimizer step and the need to fuse the backward gradient computation with the optimizer step from FusedAdam.
  • Clarification Sought on Kernel Priorities: @gaindrew asked about specific kernels of interest, and @tastybucketofrice suggested starting with tackling memory optimization during the optimizer step as the highest impact contribution.

Links mentioned:


LM Studio ā–· #šŸ’¬-general (126 messagesšŸ”„šŸ”„):

  • Confusion over Image Generation in LM Studio: @touteslesvoiture_02399 inquired about generating images with models like llava-v1.5-7b-Q4_K.gguf in LM Studio, but @jedd1 clarified that LM Studio does not support image generation. Models can discuss images fed to them, but not create new ones.

  • No Internet Connection for LM Studio Chat: @khaledars asked if it’s possible for the chatbot to access real-time information from the internet, like the current time. @heyitsyorkie responded that LM Studio chat is offline and can’t access the internet directly. LoLLMs was mentioned by @hypocritipus as a tool to connect LM Studio in server mode to the internet for more capabilities.

  • Token Limit Surplus Confuses User: @malte0621 was surprised at how the token limit was surpassed during generation in LM Studio. @fabguy explained the factors that stop generation and how the context window affects the input, not the output, and @malte0621 later discovered the n_predict setting to limit output tokens.

  • Users Share LM Studio Model Experiences: @jason_2065 shared an interesting breakfast recipe generated by Smaug 34B and encouraged others to experiment with model instructions. @skadeskoten mentioned they have been running Nous Hermes 2 Solar 10 34b q5 k m on a 4090, implying good performance on that hardware.

  • Technical Troubleshooting for Linux Users: @kavita_27183 faced problems when attempting to load any model in LM Studio. Responses from @jedd1 and @heyitsyorkie pointed towards a likely old-library issue, further described as a GLIBCXX mismatch, and recommended checking the GLIBC version installed on the LinuxMint system.

Links mentioned:


LM Studio ā–· #šŸ¤–-models-discussion-chat (7 messages):

  • IQ Versions for LMs Proposed: @drawless111 suggested using variations of ā€œIQā€ versions of LLMs, like IQ2 or IQ3, and potentially adding system prompts or pre-prompts to enhance performance at lower IQ levels. They mentioned that adding experts reduces the throughput/speed, so keeping the ā€œnumber of expertsā€ to one might be beneficial.

  • Open Source LLMs Pressure-tested: @wolfspyre shared a Reddit post discussing the results of pressure-testing various open-source Large Language Models (LLMs) using Gregory Kamradt’s ā€œNeedle In A Haystackā€ analysis and provided a video explanation. Models tested include extended and finetuned variants like NurtureAI openchat_3.5-16k, Orca-2-13B-16k, and others with context lengths ranging from 16k to 100k.

  • In Search of the Best AI for Storytelling: @laszlo01 inquired about the best AI for storytelling purposes, considering his system’s specifications which include an 11th Gen Intel i7 CPU and a NVIDIA GeForce RTX 3060 GPU. @jason_2065 recommended trying the model mistral-7b-instruct-v0.2-neural-story.Q4_K_M.gguf with 24 layers and an 8192 context size, and mentioned possibly needing a lower quantization for speed.

  • Evaluating LLMs’ Ability to Perform Arithmetic: @nullt3r is drafting a blog post about benchmarking LLMs such as Mixtral 8x7b in Q5_K_M quantization on basic arithmetic operations, challenging the common perception that LLMs are inherently poor at math. They highlighted the model’s near-perfect score on random math questions.

Links mentioned:

Reddit - Dive into anything: no description found


LM Studio ā–· #šŸŽ›-hardware-discussion (17 messagesšŸ”„):

  • Quest for More RAM: @jason_2065 in search of constructing a system to run Smaug 34B with 200,000 context, discovers that 64GB of RAM is inadequate—even a behemoth like RTX 4090 and 64GB DDR4 can’t handle more than a 20k context.
  • Crash Test Dummies: @goldensun3ds attempts to load Smaug with GPU layers but faces consistent crashes. A CPU-only test reveals a staggering 59GB RAM usage at 200K context, without loading any text.
  • Ultra-Smaug 128B, a beastly model mentioned by @jason_2065, remains a mystery, as the community has yet to test models larger than 70B due to hardware constraints.
  • Vying for Velocity: @jason_2065 reports a sluggish 1.3 tokens/sec with 100,000 context size and 2 layers loaded on Smaug, unveiling the voracious VRAM appetite of context layers.
  • Overnight Challenge: @goldensun3ds commits to a marathon, vowing to fill close to the 200K context and run it overnight, while sharing a humorous test prompt story link for the community: Funny crypto bro story.

Links mentioned:

Imgur: The magic of the Internet: no description found


LM Studio ā–· #open-interpreter (2 messages):

  • Syntax Struggles for default_system_message: User @nxonxi expressed difficulty in finding the correct syntax to modify default_system_message in different operating environments including Linux, Windows 10, and WSL, each presenting unique challenges.

  • Clarifying the Role of default_system_message.py: @1sbefore clarified that default_system_message.py isn’t fed directly as a preprompt to the LLM, but rather is edited by a script that substitutes variables with OS information. To understand the input better, @1sbefore suggested launching LM Studio in verbose mode to view prompts history.


LAION ā–· #general (142 messagesšŸ”„šŸ”„):

  • Triple Encoder Text Model in Question: @top_walk_town discussed the potential endgame structure of text encoders, pondering if stringing three text encoders together is the final structure. In a follow-up message, they added that T5 can be removed at inference time.
  • Unique Velocity Sampling in Flows: @pseudoterminalx highlighted a particular ā€œtrickā€ used in an unnamed piece of research, changing the distribution over timesteps when training velocity ( vΘ ), which assigns more weight to intermediate timesteps by sampling them more frequently. They later mentioned that V-prediction is showing competitiveness with rectified flows.
  • Google’s Model Distillation Method Revealed: @pseudoterminalx shared a GitHub link to a repository involving Google’s step-by-step distillation method. This method is mentioned in the context of model distillation without specifying whether it involves T5-XXL or another variant.
  • On the Utility of T5 for Diffusion Models: In a discussion involving several users, @astropulse, @nodja, and @pseudoterminalx conversed about the optionality of T5 in diffusion models, suggesting alternatives such as using T5 via the Hugging Face Inference API or running it on a CPU for improved inference times despite practical issues.
  • Efforts and Challenges in Low Resolution Adaptation: @astropulse shared enthusiasm for a GitHub project, res-adapter, which focuses on low resolution adaptation, allowing generation from SD1.5 down to 16x16 latents. Their excitement is attributed to the potential applications for personal projects.

Links mentioned:


LAION ā–· #research (4 messages):

  • Reminder to avoid repeat posts: @max_voltage warned about possible spam due to repeated posts, but also acknowledged the new methods as cool.
  • Acknowledgment of error and correction: @alex_cool6 apologized and took action by deleting a repeat post they had made.
  • Brief approval conveyed: @chad_in_the_house expressed enthusiasm with a short affirmation: ā€œvery coolā€.
  • Insight into Corrective Retrieval Augmented Generation: @ariondas shared a blog post discussing the shortcomings of Standard RAG techniques and introducing CRAG (Corrective Retrieval Augmented Generation). This piece is a deep dive into the research paper and scenarios where these techniques may fail.

OpenAccess AI Collective (axolotl) ā–· #general (53 messagesšŸ”„):

  • Exploring Model Merging and Fine-tuning: @duke001 expressed curiosity about possibilities beyond fine-tuning in LLMs, such as merging model weights. @duke001 also shared a link to MergeKit on GitHub, a tool for merging pretrained large language models.
  • Claude-3’s Sensitivity Sparks Discussion: @nafnlaus00 highlighted Claude-3’s higher response rates compared to other models and its stringent stance on racial issues, mentioned in an article by AI Explained. The balancing act between implementing ā€œsafetiesā€ and BIllpotting biases was described as challenging for model developers.
  • Mining Motherboards for Inference Use: @le_mess inquired about the practicality of using a mining motherboard that supports five GPUs (potentially for AI inference tasks), found on AliExpress for 90 USD. The discussion also touched on NVLink’s benefits, underclocking GPUs for efficiency, and potential tax issues with eBay purchases.
  • Enhancing Datasets for Reasoning: @caseus_ shared a link to a tweet about a Medium article explaining how to enrich datasets for improved reasoning. The discussion developed around the efficiency of using OpenAI’s API for parsing LLM outputs and the advantages of models producing structured data.
  • Hardware Recommendations and Optimizations: In a series of messages, @nafnlaus00 and @le_mess exchanged tips on selecting GPUs for model training and inference, buying strategies, and the potential tax implications of purchases. The conversation also delved into the technological progression of PCIe slots and NVidia’s NVLink.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #axolotl-dev (16 messagesšŸ”„):

  • Experimenting with LoRA+ Ratios: @suikamelon is testing the new LoRA+ ratio feature and suggests the learning rate should be decreased when using the recommended ratio. They refer to LoRA+ on GitHub and the original paper, noting that final results were similar across a range of ratios on structured 16k sequences with Mistral-7B.

  • Exploring DoRA’s Performance: @caseus_ indicates the potential for DoRA to offer better accuracy over a range of ranks compared to LoRA. They shared insights from an article explaining the significance of LoRA and the promised benefits of recently proposed DoRA.

  • LoftQ Requires Two-Step Process: @suikamelon mentioned excessive memory usage issues with LoftQ and shared a comment from GitHub suggesting incorrect initialization documentation, pointing to a GitHub pull request for a documentation fix and LoftQ finetuning examples.

  • PEFT and DoRA Quantized Updates Pending: @suikamelon mentioned a quantized DoRA pull request on PEFT that is still in progress, linking the PR on GitHub. @caseus_ commented that the check will be removed once the PR is merged, hinting at an ongoing update.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #general-help (12 messagesšŸ”„):

  • Troubleshooting Finetuning on Mixtral Model: @seungduk requested a deepspeed config for finetuning a Mixtral model using H100 x 8 and encountered issues with save_safetensors. Despite setting it to false, Axolotl was still saving in safetensors format.
  • Potential Solution to Safetensors Format Issue: @nanobitz clarified a possible configuration misunderstanding that an empty save_safetensors could be interpreted as true. @seungduk confirmed trying both an explicit false and an empty config.
  • Removing safetensors File Resolves Training Issue: @seungduk identified the creation of an extra model.safetensors file as the source of their problem. Once removed, they were able to further train an already-trained model without the out-of-memory (OOM) issue.
  • Deepspeed’s Config and Model Saving Quirks: @caseus_ pointed out that, with zero3, the Huggingface (hf) trainer tends to save the wrapped model and inquired about the setting of stage3_gather_16bit_weights_on_model_save. @seungduk confirmed it was set to true in their deepspeed json.
  • Resolution and Reference to Config Details: @seungduk shared a link to the relevant GitHub config file after resolving the issue by saving in traditional pytorch.bin format instead of safetensors.

Links mentioned:

axolotl/deepspeed_configs/zero3_bf16.json at main Ā· OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.


OpenRouter (Alex Atallah) ā–· #announcements (1 messages):

  • Claude 3 Makes Group Chat a Breeze: @alexatallah shared a positive experience about group chatting with Claude 3, which self-moderates the conversation. They included a link to a Twitter story showcasing the functionality.

OpenRouter (Alex Atallah) ā–· #general (78 messagesšŸ”„šŸ”„):

  • Question about Claude Versions: @quentmaker inquired about the difference between anthropic/claude-2.0 and anthropic/claude-2, with @alexatallah and @wikipediadotnet clarifying that Claude-2 automatically selects the latest 2.x version.

  • Uncertain Costs with Multithreading: @mhmm0879 expressed concern about actual costs exceeding predicted ones when using multithreading with gemma 7b and openchat 3.5. @alexatallah and @louisgv inquired about the specific use case and whether images were being sent to try and diagnose the issue.

  • Claude and Censorship Chat: Users @followereternal, @ayumeri, @billbear, and @scepty9097 had a mixed discussion on Claude 3, with some expressing disapproval of potential over-censorship and others praising the model for its conversational capabilities.

  • LangChain.js Issues with OpenRouter: @mysticfall pointed out difficulties using LangChain.js with OpenRouter's ChatOpenAI model for text completion. @spaceemotion mentioned that the endpoint for text completion might be marked as ā€œlegacyā€ by OpenAI, and @mysticfall noted potential problems due to hardcoded endpoints in OpenAI’s library.

  • Exploration of VSCode Extensions for OpenRouter: @_maximus01 inquired about a VSCode extension for code assistance that integrates with OpenRouter, leading to suggestions from @alexatallah about sponsoring such work, and @spaceemotion and @_sam___ sharing potential alternatives and an active GitHub project.

Links mentioned:


LangChain AI ā–· #general (61 messagesšŸ”„šŸ”„):

  • LangChain and Function Implementation Assistance: @vishal5795 enquired about integrating function roles into messages using LangChain and OpenAI’s ChatCompletion.create(). @chester3637 provided a detailed Python example using LangChain that demonstrates calling an AI message as a function (LangChain Core Example).

  • Seeking Tech Task Partner: @mattew_999 announced that they are looking for a partner to work on tech tasks, emphasizing that it is a paid opportunity.

  • Inquiry About New Partnerships: @earduman2 asked if LangChain is open for new chain partnerships, sparking a clarification request from @baytaew.

  • FastAPI Sporadic Issues: @rajib2189 reported sporadic 502 errors when using FastAPI to host generation APIs under heavy load, especially in an AWS ELB -> Apache Server -> Uvicorn setup.

  • Interest in GPT-4 Fine-Tuning Access: @8886600 expressed a desire to gain access to GPT-4 fine-tuning capabilities, mentioning a willingness to pay for an API key with a usage limit.

Links mentioned:


LangChain AI ā–· #langserve (1 messages):

  • Caching Feature yet to Work with Streaming: @veryboldbagel mentioned that caching does not work with streaming mode as of now. The issue is associated with langchain-core, not langserve.

LangChain AI ā–· #share-your-work (6 messages):

  • Injecting Humor into AI Art: @neil6430 experimented with a new control net block from ML Blocks to create an amusing image of a chicken performing stand-up comedy with a Seinfeld posture. They shared their excitement about the feature and provided a link to ML Blocks, a tool for building modular, AI-powered image processing workflows without coding.

  • Lutra Revolutionizes Workflow Automation: @polarbear007. introduced Lutra.ai, a platform designed to transform English instructions into code, automating task completion by orchestrating various apps, likening it to a more potent version of Zapier.

  • Raptor Reveals Secrets of Long Context RAG: @andysingal shared a Medium article about building a Long Context Retrieval-Augmented Generation (RAG) from scratch using RAPTOR with Langchain.

  • ChromaDB joins LM Studio: @vic49. provided a GitHub link to the ChromaDB Plugin for LM Studio, enabling the creation of a ChromaDB vector database for server mode operations.

Links mentioned:


LangChain AI ā–· #tutorials (1 messages):

pradeep1148: https://www.youtube.com/watch?v=QPZpOBxUd1U


CUDA MODE ā–· #cuda (8 messagesšŸ”„):

  • Exploring Root Access on RunPod: @ericauld inquired about the possibility of running as root on RunPod, to which @nshepperd clarified that RunPod offers a docker image instead of a real VM, thus root isn’t actually root in this context.
  • Bandwidth Quest for H100 SRAM: @lucaslingle sought information on the SRAM bandwidth of the NVIDIA H100, noting a lack of recent sources after a GTC talk mentioned 19TB/s for A100. @iron_bound provided assistance by referencing a Chips and Cheese article that states H100’s L2 cache has a 5.5 TB/s read bandwidth.
  • Benchmarking RTX 4090: In response to the SRAM bandwidth discussion, @zippika highlighted the RTX 4090’s L1 bandwidth performance, referencing another Chips and Cheese article that focuses on Nvidia’s Ada Lovelace architecture and the new raytracing improvements.
  • H100 Bandwidth Assumptions: @zippika estimated that the H100 bandwidth could be comparable to the RTX 4090, mentioning an L1 bandwidth of 40TB/s based on their findings and assuming the H100 may align with this performance metric.

Links mentioned:

  • Microbenchmarking Nvidia’s RTX 4090: Nvidia’s RTX 4090 features Nvidia’s newest architecture, named Ada Lovelace after a pioneer in early computing. Compared to their previous architecture, Ampere, Ada Lovelace enjoys a pr…
  • Nvidia’s H100: Funny L2, and Tons of Bandwidth: GPUs started out as devices meant purely for graphics rendering, but their highly parallel nature made them attractive for certain compute tasks too. As the GPU compute scene grew over the past cou…

CUDA MODE ā–· #torch (9 messagesšŸ”„):

  • GPU Tensor Allocation Misstep: @zippika helped clarify that a tensor was not allocated on a CUDA device because @srns27 forgot to set the TensorOptions, causing it to default to the CPU.
  • Friendliness in Debugging: @zippika offered a kind response to @srns27, indicating that everyone makes mistakes and highlighting the cooperative nature of the torch community.
  • The Search for Higher Abstraction in Math Operations: @mabeto5p inquired about high-level languages or packages to perform linear algebra on low-precision integers and floating-point operations on NVIDIA Ada architecture.
  • Leveraging bitsandbytes for Quantization: @iron_bound suggested @mabeto5p use the bitsandbytes package for handling k-bit quantization in PyTorch to perform low-precision linear algebra operations on GPUs.
  • Breakthrough in Quantization Speed: @mabeto5p expressed excitement over discovering the potential to achieve a 5700x speedup in int8 versus bf16 matrix multiplication, after being pointed to the bitsandbytes resource by @iron_bound.

Links mentioned:

GitHub - TimDettmers/bitsandbytes: Accessible large language models via k-bit quantization for PyTorch.: Accessible large language models via k-bit quantization for PyTorch. - TimDettmers/bitsandbytes


CUDA MODE ā–· #algorithms (3 messages):

  • Mask Efficiency Matters: @drisspg highlighted the inefficiency of generic masking in compute, as it requires processing every mask element, even when it’s unnecessary.
  • Sliding Window PR Adds Color: @drisspg updated the sliding window attention bias pull request on PyTorch’s GitHub, adding more details to the description. The PR is available to review here.
  • Score-Mod API to Optimize Bias: @drisspg discussed the addition of the score-mod API as a means to efficiently fuse constraints on the bias into the flash_attention algorithm without fully materializing the entire bias.

Links mentioned:

Add sliding window attention bias by drisspg Ā· Pull Request #120143 Ā· pytorch/pytorch: Summary This PR adds a new attnetion-bias torch_function designed to interact with SDPA. This implements sliding window and updates ā€œaten.sdpa_flashā€ to expose the window_size_left and wind…


CUDA MODE ā–· #jobs (1 messages):

bowtiedlark: Remote?


CUDA MODE ā–· #beginner (2 messages):

  • CUDA for Beginners: User @umerha recommended Jeremy’s videos as a starting point for learning about numba.cuda.jit. The suggested resources can be found in Lecture 3 and 5.
  • Gratitude for Learning Resources: User @hoteret expressed their thanks for the CUDA learning resources shared by @umerha.

CUDA MODE ā–· #ring-attention (28 messagesšŸ”„):

  • Script Tweaking for Device Testing: @iron_bound discussed adding device IDs to a script and planned to test single devices followed by other ring functions.
  • Sampling Code Introduced with Glitches: @jamesmel shared a GitHub Pull Request for a first attempt at sampling code, while also mentioning errors with some parameters that are being investigated.
  • Benchmarks Completed for Striped and Zigzag: @iron_bound reported that the testing on the runpod box revealed striped and zigzag have the same memory ceiling, showing specific memory usage for two CUDA devices.
  • Opening Up the Axolotl Training: A link was shared by @andreaskoepf to the OpenAccess-AI-Collective’s Axolotl GitHub repository, and @iron_bound also mentioned successful Open Llama 3B training.
  • Troubleshooting Ring Attention and Sampling Logic: Discussions revolved around debugging the custom attention library by @iron_bound and efforts by @jamesmel and @andreaskoepf to get the sampling code to work properly, with plans to discuss and clarify the implementation in an upcoming meeting.

Links mentioned:


LLM Perf Enthusiasts AI ā–· #claude (7 messages):

  • Opus catches attention in coding community: User @pantsforbirds mentioned that Opus seems to be very promising for coding, sparking a conversation on its capabilities.
  • Peer approval for Opus in function calling: @res6969 contributed to the conversation by sharing that they’ve heard high praise for Opus’s performance, especially in function calling, indicating Opus might be best in its class.
  • GPT-4 excels in medical knowledge: In terms of technical knowledge in medicine and biology, @thebaghdaddy has found that GPT-4 is SIGNIFICANTLY better than its predecessors, leading to a shock at the performance difference.
  • Benchmarks under scrutiny: Following their experience, @thebaghdaddy expressed skepticism about the general reliability of the published benchmarks, suggesting they might not fully reflect the capabilities of the newer models.
  • Opus aced SAT Reading: @jeffreyw128 shared an impressive outcome where Opus scored an 800 on the SAT Reading section, demonstrated in a Twitter post which can be found here. This share sparked a discussion about the challenges in creating holdouts to avoid memorization given the size of the newer models.

LLM Perf Enthusiasts AI ā–· #prompting (2 messages):

  • Seeking Wisdom on Citations in RAG Outputs: @mat_mto inquired about resources like blogs or tweets that provide tips on formatting citations and footnotes in RAG-generated text. They shared an example of text output with footnotes pointing to web search results.
  • JSON Object for Clear Source Attribution: In response, @res6969 mentioned their use of function calling that outputs JSON objects containing both the text and the sources. This method allows for clear attribution of information to its web sources.

Datasette - LLM (@SimonW) ā–· #ai (8 messagesšŸ”„):

  • Clarifying AI Terminology: @simonw emphasized the importance of distinguishing between prompt injection and jailbreaking, explaining that prompt injection involves concatenating untrusted user input with a developer’s trusted prompt, whereas jailbreaking tries to bypass the LLM’s safety filters itself. He provides a detailed explanation and historical context in his blog post.

  • AI in the Hands of Cyber Threat Actors: @tariqali shared insights from a Microsoft blog post about state-backed actors using OpenAI’s LLMs for cyber activities including reconnaissance and spear phishing, mentioning one instance where an actor was blocked from prompting the model with malicious intent.

  • The Dual-Use Dilemma of LLMs: Addressing the risks associated with AI, @tariqali referred to a research post on OpenAI’s attempts to create an early warning system for LLM-aided biological threats, highlighting a comparative study between using the Internet alone and using it alongside GPT-4 for task-solving, which can be found here.

  • Access Control as a Mitigation Strategy: @tariqali suggested that prompt injection could be mitigated by controlling who gets access to the LLM, proposing human review of content as a potential layer of defense to sanitize inputs before they reach the AI.

  • Invisible Prompt Injection Challenges: @simonw pointed out the limitations of human review to prevent prompt injections, using the case of invisible prompt injections hidden in off-white text on images as an example, which can be a threat even in multi-modal versions of GPT like GPT-4-V as discussed in his blog post.

Links mentioned:


Datasette - LLM (@SimonW) ā–· #llm (1 messages):

  • Seeking Consensus on Model File Locations: @florents_ inquired about whether there is a consensus or specific piece of code that dictates where various tools search for model files, suggesting possible locations like $(pwd)/.models or $HOME/models. No further discussion or responses were provided.

DiscoResearch ā–· #general (9 messagesšŸ”„):

  • Exploring Chatbot Environments: @crispstrobe mentioned that chat.lmsys.org allows for testing with the caveat of including inputs in later training data, and highlighted poe.com, which hosts three models including a perplexity feature.
  • Quest for the Best German Model: @le_mess inquired about the best current German model; @johannhartmann recommended Claude Opus, gpt-4, discolm-120b, or VAGOsolutions/Sauerkraut LM-UNA-SOLAR-Instruct depending on specific constraints.
  • Fresh Off the Press: @maxidl shared an arxiv paper that suggests retrieval-augmented language models could potentially be a superior alternative to parametric LMs, though the research in this area is not yet extensive.
  • High Praise for Hermes and Mixtral: @cybertimon recommended using Nous Hermes 2 Mixtral 8x7b for German tasks, noting its proficiency in the language.
  • Searching for Flawlessness in 7 Billion Parameters: @johannhartmann and @flozi00 responded to queries about high-quality German models, with Johannhartmann suggesting DiscoResearch/DiscoLM_German_7b_v1 and similar models, and flozi00 endorsing Nous Hermes 2 Mixtral 8x7b for its accuracy.

Links mentioned:

Reliable, Adaptable, and Attributable Language Models with Retrieval: Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability. However, they still face practical challenges such as hallucinations, di…


Alignment Lab AI ā–· #general-chat (1 messages):

  • Warm Welcome to Newcomer: User @segmentationfault. expressed gratitude for being invited by @748528982034612226 and showed eagerness to contribute to the field despite being new to it. No further information on contributions or areas of interest was provided.

Alignment Lab AI ā–· #oo2 (3 messages):

  • A Warm Henlo: @thenetrunna kicked off the conversation with a friendly ā€œhenlo frens,ā€ setting a casual tone in channel oo2.
  • Welcoming Replies in the Evening: @jaxxks responded in the evening, appreciating the welcome from @thenetrunna.
  • Greeting the Group: @tcapelle joined the conversation with a cheery ā€œHello every1!ā€ indicating a stream of introductions and greetings among participants.

Skunkworks AI ā–· #off-topic (2 messages):

  • Introducing Claude 3, the LLM That Surpasses GPT-4: @pradeep1148 shared a YouTube video, titled ā€œIntroducing Claude 3 LLM which surpasses GPT-4ā€. The video discusses the Claude 3 model family, which reportedly sets new benchmarks across various cognitive tasks.

  • How to Develop with Mistral: Another YouTube link was shared by @pradeep1148 titled ā€œInfinite Craft Game using Mistralā€. It talks about developing Neal Agarwal’s web game Infinite Craft using the Mistral model.

Links mentioned:

  • Infinite Craft Game using Mistral: Let develop Neal Agarwal’s web game Infinite Craft. This is a ā€œcrafting gameā€ where you start with just four elements and repeatedly combine pairs of element…
  • Introducing Claude 3 LLM which surpasses GPT-4: Today, we’re look at the Claude 3 model family, which sets new industry benchmarks across a wide range of cognitive tasks. The family includes three state-of…

Interconnects (Nathan Lambert) ā–· #ideas-and-feedback (1 messages):

  • Intel Faces a Reality Check: @natolambert shared a YouTube video titled ā€œIntel’s Humblingā€ by Stratechery with Ben Thompson providing voiceover, suggesting it offered valuable insights without making him feel like ā€œa total idiot.ā€ The video explores the challenges Intel has faced and includes a link to the accompanying article for a deeper read.

Links mentioned:

Intel’s Humbling | Stratechery by Ben Thompson: Read the Article: https://stratechery.com/2024/intels-humbling/Links: Stratechery: https://stratechery.comSign up for Stratechery Plus: https://stratechery.c…


Interconnects (Nathan Lambert) ā–· #reads (1 messages):

  • Reflecting on the Obscurities of AI: @natolambert recommends a thought-provoking post by Elad Gil, highlighting how generative AI tends to become more puzzling over time. The post raises open questions at each level of the AI stack, aiming to stir conversation and provide insights.

Links mentioned:

Things I Don’t Know About AI: The more I learn about AI markets, the less I think I know. I list questions and some thoughts.