> AI Discords for 2/22/2024. We checked **20** guilds, **317** channels, and **8875** messages for you. Estimated reading time saved (at 200wpm): **835 minutes**.

Latent Space turned one today. It’s (of course) the #1 AI Engineering podcast, hit #10 in the generalist U.S. Tech podcast charts, and crossing 1 million unique readers on our Substack. Alessio wrote a great reflection and we hosted a great hack/demo day that is in progress as we write.

image.png

image.png


Table of Contents

[TOC]

PART 0: Summary of Summaries of Summaries

  • AI Ethics and Bias Discussion: The Gemini Image Generator controversy on TheBloke Discord highlighted challenges in AI ethics and bias, specifically how Google’s Gemini 1.5 model failed to accurately represent white individuals and historic events. This sparked debates on internal biases vs. rushed implementation, as discussed in a YouTube video on Gemini’s diversity issue.
  • AI-Assisted Creativity and Development: AI’s role in creative industries, especially in game development, was emphasized across TheBloke and LM Studio Discords. Discussions revolved around using AI for artistic direction and the potential of text-to-3D tools for smaller developers, showcasing AI’s growing intersection with creativity.
  • Model Fine-Tuning and Performance Optimization: Several Discords, including Nous Research AI and Mistral, delved into fine-tuning challenges and performance optimization of models like Gemma 7B and Mistral-next. Issues ranged from high initial loss to API access queries, with solutions involving specific learning rates and leveraging open-source tools for superior results, such as a GitHub repository for large-scale finetuning.
  • Emerging Trends in AI Development and Deployment: Discussions on CUDA MODE and LangChain AI Discords underscored emerging trends in AI hardware optimization and application development. Critiques of Nvidia’s CUDA by Jim Keller and explorations in parallel function calls in LLMs reflect the technical community’s focus on improving AI model efficiency and deployment strategies. Notably, advancements in addressing AI hallucination were teased by Richard Socher, suggesting significant progress in enhancing AI’s factual accuracy, as hinted in a tweet.

PART 1: High level Discord summaries

TheBloke Discord Summary

  • Gemini Image Generator Sparks Bias Controversy: The community debated the apparent bias of Google’s Gemini 1.5 AI image generation model after it failed to represent white individuals and historic events accurately, prompting a shutdown; some argued it was due to internal biases while others suggested rushed implementation. The controversy was discussed with references to Gemini’s diversity issue video and articles.

  • AI-Assisted Creativity in Game Development: The potential for AI to assist in game development surfaced, with discussions on text-to-3D tools and the benefits for smaller developers using AI for artistic direction, showcasing the growing intersection of AI and creative industries.

  • Search Engine Market Share Discussion: Why Google continues to dominate search engine market share piqued interest; alternatives like Qwant were discussed alongside critiques of Google’s corporate ethos, underlining the competition and ethics in the tech industry.

  • Opus V1 and Other Models Take the Spotlight in Roleplay and Writing: Users in roleplay and writing channels explored model preferences, with attention on Opus V1’s role in story-writing and character cards’ influence on AI model performance in roleplaying scenarios, reflecting the significance of fine-tuning model settings for creative outputs.

  • Deep Dives into Model Merging and DPO: Conversations on model merging explored the challenges in hybridizing non-homologous models such as Orca-2-13b with Nous-Hermes-2-DPO-7b, discussing complex techniques and potential for knowledge transfer optimization (KTO), and community input on DPO usage; one member opted to use the trl library’s DPOTrainer as a starting point after viewing its code on GitHub.

  • Code Curiosity and JetBrains’ Dotpeek Usage: In the coding channel, there was a distinct curiosity for communities focused on machine learning outside of GitHub and Twitter, as well as an exchange on the use of JetBrains’ Dotpeek for vulnerability research, indicative of the practical applications AI engineers seek from their tools.


LM Studio Discord Summary

  • Gemma Models on the Frits: Users experience issues with Gemma models, particularly with lower quantizations breaking in llama.cpp. A Hugging Face Gemma model is suggested to avoid problems, while the Gemma 7B must be manually downloaded for LM Studio compatibility.

  • Stability and Updates in LM Studio: An urgent update to LM Studio v0.2.16 includes bug fixes for erratic behaviors. Users celebrate the UI improvements and fixed issues from version 0.2.15, but also critique the complexity and Comic Sans font.

  • A TESLA in Hand Worth Two in the Data Center?: Spare TESLA K40 cards are on the market, prompting discussions about their potential use with llama.cpp, despite being limited to CUDA 3.5. The conversation spans adding GPUs for speed and possible disruption by AMD’s MI300X in AI applications.

  • Local Models, No Internet: LM Studio local models like Gemma do not have internet access, which impacts their update and improvement capabilities. Despite the limitations, the AI-assisted teaching tools and Stable Diffusion Web UI are brought up for their functionalities.

  • Visualizing Technical Troubles: OLED monitors get a nod for their quality, affirming a trend in preference even amongst the engineer audience. On the hardware side, the Tesla K40’s cost efficiency is recognized, but with reservations due to its age and limitations.

  • Fixing the Unfixable with a Classic: When facing AutoGen package issues, a user successfully resolved them through the classic IT approach of uninstalling and reinstalling, accented by a nod to the famed “turning it off and on again” GIF humorously shared.

  • How Chunky is Your Data?: A discussion on chunk_size for text preprocessing for embeddings highlights its dependency on the model used. A recommended formula is shared from AI Stack Exchange for calculating num_embeddings when num_categories <= 1000.


OpenAI Discord Summary

  • HTML and CSS on AI’s Curriculum: There was talk about training ChatGPT with HTML and CSS, with @ls_chicha probing how to include programming languages in AI education. Subsequently, @thedreamakeem indicated a potential requirement for a .json database format when training.

  • AI Models in Crisis with PDFs and More: Users grappled with issues such as GPTs prematurely losing the ability to read PDFs (@arani1977) and slow model performance raised by @oleksandrshr, alongside a clarification on quantized AI versions provided by @darthgustav. that affect model speed and precision.

  • Fine-Tuning Model Behaviors: Discourse extended to nuances of model responses, as @tawsif2781 and @darthgustav. discussed the looping glitch in ReAct prompting, and strategies for invoking improvisation even with zero temperature settings.

  • AI Conversations and Character Play: @link12313 proposed an app for interactions between GPT-4 and Google’s Gemini Ultra1.5, while @eskcanta exchanged methods and tips for managing roleplay and consistent character interactions within models, showcasing efficient Custom Instructions usage.

  • Following GPT-4’s Reality Check: There was skepticism on dramatic changes in GPT-4’s abilities post-release with _jonpo and others debating on the model’s context length and memory capabilities, while @lugui dispelled concerns that GPT-4 may have been “powered down.”

External Resources Discussed:

  • A link was shared to Stability AI’s announcement about their most advanced text-to-image model, Stable Diffusion 3.
  • OpenAI’s foray into video generative models was highlighted through a research link.
  • Information on Google’s Gemini Pro model and its anti-bias measures appeared in a YouTube video.

LAION Discord Summary

  • Homemade Orchestration Over Enterprise Solutions: A custom system of “crappy scripts and a database” was mentioned for worker orchestration, hinting at a pragmatic approach over sophisticated enterprise-level solutions.

  • Stable Diffusion 3 and Hiring Practices at Stability AI: There is anticipation for the capabilities of Stable Diffusion 3, possibly featuring a base for medium resolution with an upscaling technique. Meanwhile, Stability AI seems to exhibit a hiring trend toward systems administrators pivoting to machine learning roles, seemingly due to cost-effectiveness, as well as individuals with significant YouTube followings.

  • Increased Secrecy in AI Development: Community members voiced concerns over a trend where companies, such as Stability AI, are moving model development away from public scrutiny and contributing to a decrease in observable diversity in AI generation outputs.

  • Open-Source Models and Fine-tuning: There is a discussion indicating the potential for open-source models like Mistral-7b, when fine-tuned, to provide superior performance compared to commercial offerings such as GPT-4, with an initiative like LoRA Land seen as leading this space.

  • Reevaluating LAION 5B’s Utility and Academic Contributions: The community contemplates whether to retire the LAION 5B dataset, while also exploring crowd-sourced captioning solutions and sharing insights into effective model training practices, such as mixed precision training with bfloat16 on TPUs. Academic contributions in the area include TinyLLaVA—on small-scale Large Multimodal Models—and INTRINSIC LoRA’s exploration of generative models’ capabilities.


Nous Research AI Discord Summary

  • Gemma 7B Puzzles Engineers: AI Engineers, like @interstellarninja, @teknium, and @gryphepadar, reported finetuning challenges with the Gemma 7B model, including high initial loss and suboptimal end results compared to existing models. @stoicbatman found a learning rate of 5e-5 optimal in their experiments.

  • Fine-Tuning Tools and Tips Swapped: @alvion427 applauded @n8programs’s fine-tuned Tinyllama model for advanced multi-turn conversation capabilities. Meanwhile, @qtnx rectified a naming typo for a Nous-Hermes-2-Mistral model on Huggingface, and @teknium provided a GitHub link to shell scripts for large-scale finetuning, albeit with a need for updates.

  • Cutting-Edge LLM Integration Discussed: Conversations spanned Microsoft’s JARVIS project with links to its GitHub repository, the OpenCodeInterpreter with a blend of generation, execution, and refinement, and @pramod8481 sharing critical analyses on human feedback and value biases in LLMs from Arxiv links.

  • AI Models Hog VRAM: @gryphepadar highlighted the considerable VRAM consumption during model finetuning, indicating the necessity for planning computational resources.

  • Ethics and Mechanics of AI Meet: Concerns were raised about how Large Language Models (LLMs) tend to favor high-value options and the possible implicit value function within, as suggested by a study discussed by @pramod8481.


Mistral Discord Summary

Mistral-Next Sparks Anticipation and API Queries: Engineering discussions have revealed that Mistral-next is outperforming previous models like Mistral-Medium, with users like @ethux confirming its existence but noting the absence of API access or model size details. Meanwhile, others like @buttercookie6265 and @louis2567 have been focusing on GPU selection for vLLMs and best practices for batch calls to vLLM servers.

Mistral’s Open-Source Commitment Questioned: Community concerns surfaced about Mistral potentially shifting away from open-source, but users like @casper_ai voiced confidence in Mistral’s open ethos, making parallels to Linux. With a variety of links mentioned, it’s clear that deployment methods and accessibility remain pivotal discussions.

Frosty Feedback for Mistral’s Fine-Tuning: Newcomers to fine-tuning like @4vis received recommendations such as starting with Unsloth, while others like @pteromaple grappled with the intricacies of data formats and model choices for precise tuning tasks. Users discussed the practicality of fine-tuning large models on limited hardware configurations, with @mrdragonfox suggesting that small parameter modifications might suffice for certain style transfers.

Mistral Data Handling Protocols Clarified: Inquiries about the privacy of data processed through the Mistral API led to assurances from @akshay_1 about non-utilization of such data in training. Additional confirmations from @tom_lrd and @ethux noted that Mistral’s data and platform are hosted in Sweden, as included in their privacy policy, which also mentions service providers like Azure, Cloudflare, and Stripe.

Mistral Community Ponders Performance and Pricing: Model performance, serving speeds, and attractive pricing structures brought attention, with @egalitaristen and @mrdragonfox expressing positivity about Mistral’s market presence. An ongoing feedback collection initiative for Mistral Next, supported by @egalitaristen and @mrdragonfox, indicates active community involvement in model improvements.


Perplexity AI Discord Summary

  • Perplexity Presents the Discover Daily Podcast: A partnership between Perplexity and ElevenLabs brings the Discover Daily podcast to life, featuring AI-powered voices from ElevenLabs narrating stories from Perplexity’s Discover feed. The podcast can be found on various platforms.

  • No Double Discounts on Pro Subscriptions: Clarification was offered on Perplexity Pro subscriptions; adding team members to a plan is possible, but no multi-subscription discounts are available as confirmed by a link to the billing and subscription FAQ.

  • Experimenting with Lightweight GPT Models: Perplexity showcases new lightweight “Experience Gemma 2B and 7B models” through Perplexity Labs YouTube playlist and promotes them on Twitter, stressing their impressive performance.

  • Navigating API Issues and Gemma Integration Speculation: Users report trouble with API credit purchases and a successful workaround for a 400 error. Curiosity arises around integrating Google’s Gemma with the Perplexity API.

  • Search Insights and Potential Collaborations: Users utilize Perplexity AI search to explore topics like the identity of pline0, risk analysis, and the Xiaomi 14 series, alongside discussing a potential Perplexity AI and ElevenLabs collaboration. Links directly to Perplexity AI search results are shared in conversations.


OpenAccess AI Collective (axolotl) Discord Summary

  • Axolotl-Dev Insights: CUDA Confusion and Gemma Optimizations: @casper_ai shared advancements in optimizing the Mixtral model but struggled with crafting a compatible backward pass without CUDA expertise. They suggested precomputing token and expert ids for efficient grouped computations to enhance Mixtral’s efficiency. Meanwhile, @curiositix recommended the Gemma Inference Engine to overcome @casper_ai’s backward pass implementation hurdles.

  • Discussions Over Cloud and Server Costs: In the #general channel, @yamashi sparked a debate on the economic trade-offs between cloud services and owning servers for long-term AI projects, considering the costs associated with ongoing cloud rentals versus one-time server purchases.

  • Inference Woes and Contribution Pleas in General Help: @nani1149 and @nanobitz discussed the alpaca inference format in the #general-help channel, where @nanobitz provided a Stanford Alpaca GitHub link for reference. ‘nanobitz’ and @yamashi pondered the necessity of improved documentation to aid community members, hinting at the use of resources like Gitbooks.

  • Community Showcase of Advanced AI Storytelling: In the #community-showcase, @dreamgen announced the release of new AI models for narrative creation featured on Hugging Face and shared the Opus V1 guide. Addressing concerns, they confirmed an oversight in updating tokenizer chat templates and promised further investigation into alleged prompt leakage. Additionally, @finetuningllms spotlighted their tuning of the Phi-2 model, available at axra/phi-2-x-0.1.

  • Finding the Elusive RunPod Image: With confusion in the #runpod-help channel over a missing RunPod image, @nanobitz directed users to Docker Hub for retrieval, however, @stoicbatman noted discrepancies between Docker Hub and the now-misdirecting GitHub readme.


HuggingFace Discord Summary

Aya Dataset Visualization Shared: A visualization of the Aya dataset intended to improve comprehension has been provided by a user.

Innovations in Protein Research and Language Technology: The ProteinBERT model and related paper, as well as Fluently diffusion model demo at this space, offer advancements in understanding proteins and natural language processing.

Stable Diffusion XL Optimization Guide Released: New article details methods for enabling image generation on less powerful GPUs, accessible through an article by @felixsanz, even as the community welcomes Stable Diffusion 3.

Ethical Concerns Raised Over Unofficial API: Users express concerns over the ethical and practical implications of an unofficial ChatGPT API using Selenium, highlighting potential violation of OpenAI’s terms and risk of bans. Link to GitHub Repo.

Debate Over Fine-Tuning vs. Large Model Approaches: The community discusses whether to fine-tune a larger LLM like Mistral 7B for text classification or use an optimized BERT variant. Encoder models are suggested as a more efficient focus for classification tasks over substantial models.

Challenges with Expanding Models and Translation Systems: Users discuss extending the BART MNLI model beyond 10 classes and the creation of an Interlingua-based translator for a university project, reflecting a broader interest in model adaptation and multilingual translation systems.


Latent Space Discord Summary

  • GPT-4 Falls Short of Expectations: @henriqueln7 tested GPT-4’s ability to rewrite prompts but found it functioned like a new assistant. Extensive testing in the playground was planned to explore its capabilities further.

  • Stable Diffusion 3 Makes a Splash: Stability AI announced an early preview of Stable Diffusion 3 with improved performance on multi-subject prompts and spelling abilities. Detailed model information was shared by @rubenartus through various links.

  • Google’s Gemini Pro 1.5 Revealed: Featuring a massive 1,000,000 token context size and video input capabilities, Gemini Pro 1.5 was discussed by @nuvic_, with insights sourced from Google AI Studio.

  • Debating Reddit’s Lucrative Data Deal: The community, including @guardiang and @pennepitstop, debated the implications of Google’s $60 million/year data agreement with Reddit and its impact ahead of Reddit’s IPO.

  • Google’s Gemini Image Generation Goes Awry: After issues with Gemini’s image generation feature, Google paused its function as announced in a blog post linked by @swyxio.

  • LLM Paper Club Takes Deep Dive into T5: The LLM Paper Club, hosted by @ivanleomk and @bryanblackbee, provided in-depth discussions on the T5 paper, with a central repository for notes and insights shared among participants.

  • AI Model Merging Emerges: The technique of model merging, a cost-effective way of combining LLMs, is highlighted, with @swyxio sharing Hugging Face’s blog post on the subject and referencing the mergekit library.

  • Civit.ai Gallery Sparks Debate: The Civit.ai model gallery’s content, particularly images of young women, was a point of debate, emphasizing the importance of content moderation and implications for AI-generated content in @kbal11’s discussion.


Eleuther Discord Summary

  • Debate on Simulating Human Experience: Skepticism arose around GPT-4’s ability to emulate human experiences, with discussions focused on enhancing model memory layers for more realistic behavior. The discourse extended to a company, Superfocus, claiming near-perfect factual accuracy for LLMs.

  • Validity of LLM Benchmarks Questioned: A YouTube video criticising the effectiveness of current LLM benchmarks spurred conversations about the benchmarks’ adequacy.

  • Exploring LLM Unlearning and Chinese Contextualization: A study titled Survey and formalization of LLM unlearning was shared, and training of a Chinese lens for a 13b model was reported, with an investigation into the model’s uniform output behavior and tokenizer issues.

  • Concerns over Misleading Model Naming Conventions: A debate ensued regarding naming conventions for models, with “gemma-7b” actually comprising 8.5b parameters leading to confusion and calls for consistency.

  • Optimizing Pre-training and Finetuning Techniques for GPT-NeoX: Published work highlighting the effects of sequence composition was shared. Discussions included the appropriateness of using LoRA finetuning within the gpt-neox codebase, with movement away from PyTorch native FSDP for NeoX 20B finetuning under consideration.

  • Mitigating False Negatives in Multimodal Models: Thoughts were exchanged on the significance of exact false negatives in large datasets like datacomp or metaclip. Generating unimodal embeddings or computing similarity during training might reduce the incidence of hard negatives.


LlamaIndex Discord Summary

  • Full-Stack RAG Made Easy: @wenqi_glantz provided a tutorial converting a RAG notebook to a full-stack app, including an ingestion service, detailed in her guide. The LlamaIndex release, creating a LlamaPack for advanced RAG, makes web app implementation straightforward with two lines of code, as announced here.

  • ColBERT Accelerates Document Re-ranking: @lateinteraction introduced ColBERT, a tool for fast document re-ranking that’s 100 times speedier than BERT-based models. ColBERT’s improvements were confirmed by @Haotianzh and can be explored in this tweet.

  • Navigating LlamaIndex’s Documentation for RAG Setup: @lapexer queried about setting up a simple RAG in QueryPipeline, with @cheesyfishes offering the documentation link for guidance.

  • Trouble in IngestionPipeline Town: Issues like ValidationError popped up while deploying the IngestionPipeline, but were eventually resolved through community support. It was also noted that inconsistent module imports could require a reinstallation of LlamaIndex.

  • Eager for Code Invocation Models: @gooooooofy seeks models adept at generating accurate code invocations and feels Gorilla LLM might be on the right track, despite its API call specialization.


CUDA MODE Discord Summary

  • Bargain GPU Power-up: An engineer snagged three RTX 3090s for 1.7k euros to upgrade a mining rig for LLM fine-tuning and serving, highlighting the cost efficiency. They detailed the conversion process in a two-part blog series.

  • CUDA Criticism from a Silicon Veteran: Jim Keller criticized Nvidia’s CUDA, describing it as a complex and inelegant solution, analogous to the x86 architecture’s evolution. The criticism was featured in a Tom’s Hardware article.

  • Kernel Crafting and Quantization Intricacies: There was an emphasis on the nuance of quantized model computations, as well as CUDA kernel development for deep learning. One engineer shared their torch-bnb-fp4 repository for a faster alternative to bitsandbytes, and provided a benchmark script to test performance improvements.

  • Exploring PyTorch Through Random Kernels: Discussion revolved around the optimization of random kernels in PyTorch, showcasing the relevance of collaborative work on libraries such as Triton and their educational value as highlighted in a conversation in the Triton channel.

  • Job Opportunity for NLP and ML Enthusiasts: A new opportunity has arisen for an ML Engineer at SIXT in Munich, leaning towards candidates with NLP and Generative AI expertise. Prospective applicants can explore the SIXT job listing.

  • Challenges and Developments in Hardware-Accelerated Training: Members discussed the compatibility issues of AMD GPUs with FA2 training, with a particular focus on the missing backward function/kernel for the 7900xtx. Possible solutions and ongoing work like the flash-attention GitHub repository for better AMD GPU support were mentioned.

  • Ring Attention Draws Community Focus: There was a flurry of activity around ring attention mechanisms, with multiple repository links provided for implementations and benchmarking. Engineers are collaborating to improve these libraries, such as lucidrains/ring-attention-pytorch, and focusing on enhancements for usability and optimization.


LangChain AI Discord Summary

  • Innovation Through Democratic Feedback: A research tool survey has been circulated asking for community insights to improve functionalities like finding research papers and understanding complex studies.

  • LLM Enhancement Discussions: Technical talk has revolved around optimizing LangChain agents, particularly through using RunnableParallel and RunnablePassthrough for improved parallel chain operations, and the integration of local models for streaming.

  • Seeking Langchain Expertise: A community member is in search of a Langchain and OpenAI’s tool agent consultant, offering compensation for guidance and expertise.

  • Debugging Tools Showcased: The debugging and visualization capabilities of LangSmith were recommended for ensuring correct behavior in complex LangChain processes.

  • Explorations in Parallelism: Parallel function calls in LLMs are now possible as revealed in a recent LinkedIn post, expanding the technical toolkit for AI engineering applications.

  • Sharing AI-Enhanced Workflows: Techniques for building custom chatbots with history capabilities, as well as using AI for stock portfolio summarization have been shared, powerfully demonstrating how LLMs can augment various business and development tasks.


Datasette - LLM (@SimonW) Discord Summary

  • Codespaces Template Boosts LLM Play: @derekpwillis provided a template repository ideal for running orca-mini-3b in codespaces, though there might be challenges with larger models. The template garnered positive feedback for its simplicity, though it has a noted long startup time due to on-the-fly compilation.
  • A Quirk in Codespaces Resolved: @simonw detailed a workaround for an initial unavailability bug of llm-gpt4all in codespaces, recommending the command llm chat -m orca-mini-3b-gguf2-q4_0 to preload the model for quicker subsequent usage.
  • Praising Prompt Craftsmanship: @tariqali highlighted the nuanced benefits of traditional prompt crafting in LLMs compared to the straightforward queries now common with methods like RLHF. Traditional prompts may still hold value for specific goals like resuming chatbot conversations.
  • Large World Model’s GPU Requirements: @simonw showed interest in experimenting with the Large World Model’s LWM-Text-1M-Chat and discussed the necessity of a GPU instance for optimal performance due to the model’s training on a substantial dataset.

LLM Perf Enthusiasts AI Discord Summary

  • AI Hallucination Breakthrough Teased by Richard Socher: Significant progress might have been made in addressing AI hallucination, as reflected in a tweet by Richard Socher that showed error-free up-to-date references; the exact mechanism, speculated to involve state-of-the-art embeddings and a validator, was not detailed.
  • Globe Explorer’s Innovations in Information Discovery:
    • Globe Explorer, a tool described as a personalized Wikipedia powered by GPT-4, has been highlighted across discussions for symbolizing a new era in information retrieval. It was first introduced in a tweet and further discussed in the community where it garnered viral attention even before promotional efforts.
  • Finetuning Strategies for GPT-4-Turbo Discussed: A user with successful 1-shot data extraction from whole documents using gpt-4-turbo is weighing whether to include entire documents or just relevant sections in the finetuning dataset for more complex tasks.
  • Spatial Logic Prompting with LLMs Explored: Discussion covered the challenge of writing prompts for organizing non-overlapping components in a grid, questioning the effectiveness of LLMs in spatial tasks without providing a conclusive strategy or results.

Alignment Lab AI Discord Summary

  • GLAN - Next Big Thing?: @.benxh surfaced a recent paper on GLAN (Generative Latent Nearest Neighbors), igniting a spark of interest among the community. The paper in question was linked for those curious about this emerging technology.

PART 2: Detailed by-Channel summaries and links

TheBloke ▷ #general (1038 messages🔥🔥🔥):

  • Gemini Image Generator Bias Debate: A significant discussion revolved around Google’s Gemini 1.5 AI image generation model which was criticized for an inability to accurately depict white individuals or historic events, leading to its shutdown (@coffeevampir3, @netrve). Users debated whether this was due to internal biases or rushed implementation by Google (@shanman6991, @netrve), with references to a video explaining the controversy and several articles discussing the model (@potatooff).

  • AI-Assisted Creativity in Game Development: Several users expressed interest in using various AI tools to generate or enhance game assets, a conversation that included methods like text to 3D (@itsme9316) and the potential for smaller game developers to use AI for artistic direction (@alphaatlas1).

  • Search Engine Market Share Puzzle: The conversation shifted briefly to discuss why Google maintains a dominant search engine market share with suggestions for alternatives like Qwant (@maldevide) and critiques of Google’s corporate ethos and direction (@shanman6991, @selea8026).

  • Control Vectors in AI: @rtyax introduced the concept of control vectors for AI models, which was further expounded on with links to articles and research (@selea8026, @rtyax).

  • Summarization Models in AI Chat: @netrve queried about good model options for summarizing chat messages within an AI platform, and discussed challenges with the current summarization pipeline within Streamlit’s (ST) Transformer framework. @itsme9316 suggested possibly using the same LLM used in ST or training a custom model.

Links mentioned:


TheBloke ▷ #characters-roleplay-stories (438 messages🔥🔥🔥):

  • Exploring Model Preferences and Performance: Users are discussing their experiences and preferences among various models, including Rogue-Rose-103b, miqumaid, and Miquella. @johnrobertsmith indicated a preference for miqu and @splice0001 for Rogue-Rose-103b, citing the writing style as a deciding factor.

  • Troubleshooting Model Behavior: @euchale encountered issues with EstopianMaid acting out of character and received suggestions to check settings or character cards. After further discussion, it was determined the problem might be user-specific or related to the sequence of prompts.

  • Temperature Settings Influence on AI Models: Users like @splice0001 and @dreamgen are exchanging their experiences with temperature settings in AI models. @dreamgen suggested starting with a temperature below 1 and recommended a setup with vLLM.

  • Character Card Complexity in Roleplay: @superking__ shares an interesting observation that giving a character the goal “survive at any cost” made it play its role more effectively in a roleplay scenario using Mixtral.

  • Opus V1 Model Guidance: There’s been a focus on the newly published Opus V1 models for AI story-writing and role-playing, with @dreamgen publishing a guide and offering a Colab script for proper prompt formatting. @splice0001 expressed positive feedback when using the model.

Links mentioned:


TheBloke ▷ #training-and-fine-tuning (4 messages):

  • Seeking a DPO Implementation Guide: @cogbuji is searching for a practical reference implementation of DPO to apply to MLX after finding the Hugging Face alignment handbook unsatisfactory, due to a lack of implementation details beyond configuration files.
  • DPO Attempt Shared by Community Member: Responding to @cogbuji, @dirtytigerx shared an unfinished attempt at implementing DPO, referring to the DPOTrainer in the trl library, which can be found at huggingface/trl on GitHub.
  • Extra Implementation Bits in the Mix: @dirtytigerx mentions that the referenced DPOTrainer code includes not just DPO, but also KTO (knowledge transfer optimization) segments, which might not be directly relevant to @cogbuji’s needs.
  • cogbuji opts for TRL: After the community input, @cogbuji decided to work with the trl module as a basis for implementing DPO.

Links mentioned:

trl/trl/trainer/dpo_trainer.py at main · huggingface/trl: Train transformer language models with reinforcement learning. - huggingface/trl


TheBloke ▷ #model-merging (25 messages🔥):

  • In Quest for Model Hybridization: @jsarnecki is considering a “frankenmerge” of Orca-2-13b with Nous-Hermes-2-DPO-7b, using Orca as the base and merging layer by layer to a 17B parameter model using mergekit. However, @maldevide clarifies that such models are non-homologous and therefore not directly mergeable.
  • Mix-and-Match Model Merging Madness: @maldevide suggests that, while direct merging is impossible, using datasets fine-tuned on Hugging Face could be beneficial and references the complex merging techniques used in creating SOLAR-10.7B-v1.0. They mention “SFT to clean things up” after a layered merge.
  • Homomorphic Hassles and Merging Methods: @alphaatlas1 and @maldevide discuss that for non-homologous merges like @jsarnecki’s project, serious issues arise with no established techniques for such merges, and recommend a homomorphic projection matrix with intensive training.
  • Curiosity Sparked by PEFT and Merge Approaches: @alphaatlas1 points to a blog post revealing PEFT’s findings on model merges and notes DARE ties merging’s adverse results on diffusion models, while it seems more suitable for LLMs according to tests in meh on GitHub.
  • Diffusion Model Merge Dilemmas: The conversation shifts to the peculiar behavior of diffusion models with merging techniques, with @jsarnecki and @alphaatlas1 noting the potential impact due to the models’ density and alignment, while linear merges work well for models like SD (Stable Diffusion).

Links mentioned:


TheBloke ▷ #coding (8 messages🔥):

  • Exploring Communities for MLX Enthusiasts: @fred.bliss inquired about communities with a focus on machine learning and tinkering, aside from GitHub and Twitter. They expressed difficulty in finding such groups outside of those platforms.
  • Preference for Independence over Community: @dirtytigerx mentioned that they do not generally seek out communities, which suggests a preference for working independently or using perhaps more established, less community-oriented platforms.
  • Dotpeek Shines for Spottyluck: @spottyluck shared their use of JetBrains’ Dotpeek, a .NET decompiler, mainly for vulnerability research rather than general programming tasks. They also added a humorous note about the abundance of poorly written system tray apps.
  • Curiosity about Dotpeek’s Capabilities: @al_lansley asked whether Dotpeek is limited to C# or if it has broader applications. Their message illustrates the importance of asking clarifying questions in a technical community, regardless of expertise level.

LM Studio ▷ #💬-general (462 messages🔥🔥🔥):

  • Gemma Model Gossip: Users report disappointing performance with the Gemma models. @heyitsyorkie clarifies that lower than Q8 quantizations of Gemma are broken in llama.cpp, which LM Studio uses.
  • No Picture Upload with LLava: @tvb1199 inquires about uploading images with LLava models. They are informed that vision capabilities require a model and a vision adapter (mmproj-model).
  • Larger Models Present Challenges: @wyrath experiments with a 70b model, finding it slow on CPUs and encountering difficulties with partial GPU offloading.
  • OLED Monitors Steal the Spotlight: Various users praise the vivid display quality of OLED monitors, sharing their experiences and preferential shift away from traditional displays.
  • Phind-70B Curiosity: @pierrunoyt asks about acquiring the Phind-70B model; @heyitsyorkie indicates it is exclusive to the Phind platform and is not available for local use.

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (76 messages🔥🔥):

  • Seeking AI-Assisted Teaching Tools: @therapienachdemtod is designing an assistant to aid in teaching, looking for a model that prepares educational content and interacts with students by correcting grammar and engaging in dialogue. In response, @thebest6337 expressed skepticism about the effectiveness of current models for such tasks, mentioning possible shortcomings and no experience with the model “gemma.”
  • Gemma Model Quirks Revealed: @thorax7835 discussed the limitations of “mixtral” when asking for fitness tips, as it tends to censor itself, and @nullt3r confirmed experiencing odd behavior from “LMStudio gemma 2b model.”
  • No Internet for Local Models: @heyitsyorkie clarified that local models in LM Studio, like “Gemma,” do not have internet access, in response to queries by @thorax7835 about model improvements and internet capabilities.
  • Stable Diffusion Web UI Recommended: In a discussion about image generation capabilities, @heyitsyorkie and @drawingthesun recommended using the Automatic1111’s Stable Diffusion web UI for those tasks, as LLM Studio does not support them.
  • Error Troubleshooting in LM Studio: @macaulj sought help with an error they were encountering with LM Studio and received advice from @heyitsyorkie hinting at a potential graphics card driver issue related to CUDA.

Links mentioned:


LM Studio ▷ #announcements (1 messages):

  • Urgent Update to LM Studio v0.2.16: @yagilb announces that LM Studio v0.2.16 is now available and urges users to update from v0.2.15. This update includes all features of v0.2.15 plus important bug fixes for erratic regenerations and erratic scrolls in chats during downloads.

LM Studio ▷ #🧠-feedback (26 messages🔥):

  • Gemma 7B Download Confusion Cleared: User @heyitsyorkie explained to @adtigerning that the Gemma 7B file from Hugging Face must be downloaded manually and placed in the My Models folder for compatibility. The issue was related to access on LM Studio and Hugging Face repositories.

  • LM Studio Update to v0.2.16 Released: @yagilb informed users, including @drawingthesun and @heyitsyorkie, that the scrolling bug they experienced has been fixed in the new update, version 0.2.16, which users were encouraged to download from LM Studio or through the app’s update feature.

  • Community Feedback on LM Studio v0.2.16: @bananatechindustries expressed enthusiasm for the new user interface in update v0.2.16, particularly appreciating the ability to see model readmes in the search. Meanwhile, @heyitsyorkie confirmed that previous bugs appear to be resolved with this update.

  • Mixed Reactions to UI and Compatibility: User @clickclack777 critiqued the use of Comic Sans and complex UI in LM Studio v0.2.16, suggesting it added unnecessary complexity. @woteva raised issues with UI scalability and model folder compatibility, citing problems with screen size and incorrect RAM requirements messages.

  • New Update Receives Praise: @macfly shared their positive impression of the LM Studio update’s look and feel, emphasizing it with an animated fire emoji.

Links mentioned:

👾 LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs


LM Studio ▷ #🎛-hardware-discussion (46 messages🔥):

  • Bargain on Vintage GPUs: @freethepublicdebt mentioned having spare TESLA K40 cards for sale, indicating they have excellent VRAM/$ but limited to CUDA 3.5. There was a mention of interest in adapting llama.cpp for these cards for cheap datacenter card usage, but skepticism remains due to their age.

  • More GPUs, More Speed?: @apnea2014 asked about the benefits of adding a second GPU for inference with LM Studio, to which @heyitsyorkie indicated that more VRAM equals more speed, and combining two cards of the same generation can yield better results.

  • Future Competition in High VRAM GPUs: @nink1 shared optimism about AMD potentially challenging Nvidia with their latest earnings report surge and potential for high VRAM GPUs. @christianazinn and @ptable debated about the consumer market focus of AMD, noting the popularity of Nvidia’s 4090 cards for AI applications.

  • AMD’s Enterprise Push: Contributions from @exio4 emphasized that while consumer Nvidia GPUs still exceed AMD’s matrix throughput, AMD’s latest chips like the MI300X might disrupt Nvidia’s enterprise AI dominance with superior memory and bandwidth specs, as discussed in a TechWireAsia article. @nink1 posited AMD’s potential growth in embedded AI markets, despite current CUDA compatibility issues.

  • Consumer GPU Discussions for LLMs: Participants such as @barduk, @wolfspyre, and @heyitsyorkie discussed whether AMD cards like the Radeon RX 7800 XT Core Edition are suitable for running LLM models compared to Nvidia’s offerings. The consensus appears to be that while AMD cards can be used, Nvidia cards are recommended for their ease of setup and broader compatibility with AI frameworks.

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (34 messages🔥):

  • Model Performance Report: @drawless111 mentions testing Gemma 2B IT and 7B IT (non-supersized versions) on LM Studio version 0.2.15, indicating they perform impressively.
  • Specs Question Answered: @heyitsyorkie confirms that even a system with 15 11 gen and 8 GB RAM can run Q4_K_M on LM Studio v0.2.15.
  • Gemma Model Struggles: Users like @ascrowflies are reporting quality issues with Lonestriker’s 7B IT quant, while @heyitsyorkie acknowledges it’s the best available until llama.cpp is fixed.
  • Gemma Model Compatibility: @yagilb recommends a Gemma 2B model on Hugging Face which resolves some issues users (@issaminu and @rumpelstilforeskin) are experiencing with the model.
  • Excitement for IQ Series Models: @drawless111 celebrates the successful implementation of IQ1, IQ2, and IQ3 on LM Studio, with specific stats on performance provided for IQ1.

Links mentioned:

lmstudio-ai/gemma-2b-it-GGUF · Hugging Face: no description found


LM Studio ▷ #autogen (8 messages🔥):

  • AutoGen Issues Resolved: User @thebest6337 encountered a weird problem with AutoGen but later fixed the issue by uninstalling and reinstalling all AutoGen Python packages.
  • Sharing the Solution Encouraged: @heyitsyorkie suggested that sharing the fix could help others with similar issues.
  • The Classic IT Fix: @heyitsyorkie humorously linked to a Tenor GIF portraying the quintessential IT advice: “Have you tried turning it off and on again?”

Links mentioned:

It Problem Phone Call GIF - It Problem Phone Call Have You Tried Turning It Off And On Again - Discover & Share GIFs: Click to view the GIF


LM Studio ▷ #langchain (1 messages):

  • Chunk Size Matters: User @simas93 discussed how the preprocessing of text for embeddings is influenced by the model’s embeddings, specifically indicating that chunk_size should depend on the model in use. They shared a good read on AI Stack Exchange detailing a rule of thumb for determining embedding size and proposed a specific formula for when num_categories <= 1000, suggesting to set num_embeddings to min(500, num_categories/2).

Links mentioned:

How to determine the embedding size?): When we are training a neural network, we are going to determine the embedding size to convert the categorical (in NLP, for instance) or continuous (in computer vision or voice) information to hidden


OpenAI ▷ #ai-discussions (69 messages🔥🔥):

  • ChatGPT Training with HTML and CSS: User @ls_chicha asked if it is possible to train ChatGPT with HTML and CSS files, looking for insights on incorporating coding languages into AI education.
  • GPTs Reading PDF Issues: @arani1977 encountered problems with GPTs that initially could read PDFs but then claimed they lost the ability, seeking an understanding of this inconsistency despite unaltered configuration settings.
  • Seeking Chat Client Recommendations for OpenAI API: User @oleksandrshr inquired about chat client suggestions for the OpenAI API and further expressed concerns about the slow performance of models such as Ollama, Mistral, Phi, and Gemma:2b on Ollama.
  • Understanding “Quantized Version” in AI: In response to @oleksandrshr’s question about quantized versions, @darthgustav. explained that such versions speed up a model by rounding the weights, which simplifies calculations but reduces precision and performance.
  • Concerns Over GPT-4’s Potency Rumors: User @zaatuloa brought up rumors that GPT-4 may have been powered down since its release, which was quickly debunked by user @lugui, who asserted that these claims are false.

Links mentioned:


OpenAI ▷ #gpt-4-discussions (67 messages🔥🔥):

  • Qdrant and OpenAI Embeddings Query Confusion: @thirawat_z shared frustrations about discrepancies in search results when using OpenAI embeddings with Qdrant compared to a tutorial, with their results being unrelated to their “modern art in Europe” query. They provided code snippets and results from both the tutorial and their own attempt for comparison.

  • Training ChatGPT with HTML and CSS: Users @ls_chicha, _jonpo, and @thedreamakeem discussed the possibility of training ChatGPT with HTML and CSS files. @thedreamakeem mentioned that a .json database format might be required.

  • Creating AI Conversations: @link12313 proposed an app for GPT-4 to converse with Google’s Gemini Ultra1.5, with @toror commenting that a good starting point is required for engaging dialogue.

  • GPT-4 Input Prompt Inflation Issue: @cetacean_xx reported an issue where input prompts with GPT-4 ballooned to over 30,000 tokens, with @darthgustav. suggesting it’s due to context history accumulation and recommending to remove if unnecessary.

  • ChatGPT-4 Performance and Context Limitations: @orbart expressed dissatisfaction with ChatGPT-4 due to perceived nerfs affecting usage and memory capabilities, prompting a discussion on context length and token limits with @paccer. @blckreaper contributed observations that the model’s available context from files may have been reduced.


OpenAI ▷ #prompt-engineering (202 messages🔥🔥):

  • Looping Logic with ReAct Prompting: @tawsif2781 described an issue with their chatbot agent getting stuck in a loop when using React prompting, outputting the same thought repeatedly. @darthgustav. suggested that this might be due to contextual inconsistencies or too much content causing retrieval issues from the model’s middle context.

  • Improvisation at Zero Temperature: In a discussion about generating independent thoughts at zero temperature, @darthgustav. clarified that even at zero temperature, models can follow an instruction like “improvise” and produce varied results if timestamps or slight context differences are included.

  • Avoid Negative Instructions for LLMs: Prompt crafting advice shared by @darthgustav. emphasized avoiding negative instructions, as they might translate into affirmative actions due to logic gaps in transformer AI. There was also a suggestion to use redundancy by reframing instructions in the prompt for better compliance from the model.

  • Resources for Prompt Engineering: Various users shared advice and resources for learning prompt engineering; @darthgustav. recommended Arxiv and Hugging Face, while @bambooshoots provided a direct link to OpenAI’s prompt engineering guide, and @openheroes mentioned the usefulness of custom instructions features.

  • Custom Instructions (CI) Concerns and Usage: Users @jimmysapp and @eskcanta discussed issues and solutions related to the usage and content policy compliance of custom instructions. @eskcanta provided detailed advice on effectively using CIs for roleplay and summarized conversations by incorporating consistent summaries within the conversations.

Links mentioned:


OpenAI ▷ #api-discussions (202 messages🔥🔥):

  • Breaking the ReAct Loop: User @tawsif2781 reported an issue with a ReAct prompting loop, receiving continuous identical outputs. Various techniques such as avoiding middle-context and redundant prompting, and managing temperature settings were discussed by @darthgustav. to troubleshoot this repetitive behavior.

  • Looping and Improvisation at Zero Temps: @darthgustav. clarified that even at zero temperature, the model can improvise based on the provided context. Model behavior was explored, highlighting how factors like timestamps can influence variances in output even with a consistent prompt.

  • Graphs of Thoughts and Character Consistency: Users engaged in a discussion about how “graph of thoughts” functions and whether it perpetuates bias. @eskcanta and others shared insights into maintaining character consistency and role-plays using Custom Instructions (CI) on ChatGPT.

  • Sustained AI Interactions with Roleplay Scenarios: Through a conversation with @cqoker, @eskcanta showcased how the model can be instructed for complex interactions such as roleplay, providing examples and strategies to save and switch between different character descriptions or scenarios.

  • Concerns and Ethical Implications of AI: @cqoker and @eskcanta reflected on the ethical concerns regarding AI-generated content and its realistic portrayal, discussing the importance of using the technology responsibly and adhering to OpenAI’s usage policies.

Links mentioned:


LAION ▷ #general (398 messages🔥🔥):

  • Worker Orchestration Discussion: @top_walk_town was curious about the framework used for worker orchestration. @pseudoterminalx revealed that they personally created the orchestration system, describing it as “crappy scripts and a database.”

  • Stable Diffusion 3 Anticipation: @thejonasbrothers provided insights into the upcoming Stable Diffusion 3, hypothesizing that it might utilize a similar approach to what they’ve been working on for months: a base for medium resolution and a flow matching upscaler. There’s skepticism about the potential lack of diversity in image generation, with @pseudoterminalx indicating that the images already seem to lack diversity.

  • Stable AI’s Employee Hiring Trends: @thejonasbrothers and @pseudoterminalx discussed the hiring practices at Stability AI, suggesting there’s preference towards hiring systems administrators who are transitioning into machine learning roles due to affordability. There’s also mention of a trend of hiring individuals with YouTube followings.

  • Concerns Over Closed Model Developments: The LAION community expressed concerns regarding the trend of companies like Stability AI moving model development further behind closed doors, away from the end-user’s reach. @thejonasbrothers reminisced about how earlier models like LDM/SD1 had more publicly involved code and compute use.

  • Future of Fine-tuning and Open-source Models: The discussion touched upon the profitability of open-source models and the advantages of finetuning them. @helium__ shared a link about LoRA Land, an initiative that fine-tunes Mistral-7b models to potentially outperform GPT-4, with specialized versions for various tasks.

Links mentioned:


LAION ▷ #research (73 messages🔥🔥):

  • The Retirement Debate for LAION 5B: @top_walk_town pondered if LAION 5B should be retired due to issues like link rot and data poisoning, suggesting a community effort to create new datasets with high-quality images and annotations.
  • Community Effort for Captioning: A “mob captioning” effort using cogvlm was flagged by @twoabove, suggesting ongoing initiatives in the community to improve datasets and annotation quality.
  • Model Training in Mixed Precision: In a discussion on training with mixed precision, @yoavhacohen confirmed the effectiveness of using autocast with bfloat16 on TPUs, while @top_walk_town pointed out the use of autocast and gradient scaling to address the underflow in gradients.
  • Instruct Pix2Pix State-of-the-Art: @twoabove shared a link to a research paper detailing the TinyLLaVA framework, which discusses data quality, training recipes, and how smaller multimodal models compare to larger ones.
  • LoRA Receives a Humorous Examination: @thejonasbrothers shared a link to a paper dubbed Generative Models: What do they know? Do they know things? Let’s find out!, which uses INTRINSIC LoRA to highlight the hidden capabilities of generative models without additional layers.

Links mentioned:


Nous Research AI ▷ #ctx-length-research (3 messages):

  • Bubbles Galore: @harrisonv posted a series of bubble emojis with no further context provided.
  • Enigmatic Mention: @harrisonv tagged a user with the ID <@644428303293349888> but did not follow up with any additional text or context.
  • Rwkv Commentary: @vatsadev responded to @harrisonv’s tagging of the user with a cryptic comment stating, Rwkv goes brrr here.

Nous Research AI ▷ #off-topic (13 messages🔥):

  • Spotlight on Open Source SOTA Model - Gemma: @pradeep1148 shared a YouTube video titled “Gemma Google’s open source SOTA model.” The video introduces Gemma, a lightweight, state-of-the-art family of open models derived from the research behind the Gemini models.
  • Seeking AI Marketing Experts: @danieltkilleen inquired about knowing any key opinion leaders (KOLs) in the AI marketing space, looking for recommendations.
  • Ski Bi Di Recognition: @teknium gave a shoutout to <@687315767208706059>, acknowledging their expertise in skibidis and related knowledge.
  • Discussing a Zoomer-Driven LLM: @n8programs pondered over the idea of training a Zoomer language model, triggering a light-hearted debate on generational work ethics with comments like “…we are the generation born of the grind… and aderall.”
  • Zoomers’ Love for Work: In a brief exchange, @everyoneisgross contested the notion that work is valuable, to which @hexani responded supporting @n8programs’ perception with a one-word agreement: “Factuals.”

Links mentioned:

Gemma Google’s open source SOTA model: Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Developed by Goo…


  • OpenOrca Confirmed: Users @sherlockzoozoo and @teknium discussed what oo/oo2 refers to, with @.benxh confirming that it is indeed Open Orca.
  • JARVIS Connects LLMs to ML Community: @leonidasch2 shared two GitHub links to repositories under Microsoft’s JARVIS project, which aims to connect Large Language Models with the machine learning community, and suggested checking them out for function calling applications.
  • New Diffusion Transformer Revealed: User @0xevil linked to a tweet from @EMostaque, discussing a new diffusion transformer similar to Sora that includes flow matching and other improvements. Details on multimodal inputs and transformer improvements were promised to be shared soon.
  • Challenging the Adequacy of Human Feedback: @pramod8481 shared an Arxiv link highlighting critical analysis on the use of human feedback for training and evaluating Large Language Models, emphasizing that preference scores may under-represent crucial aspects such as factuality.
  • Investigating Value Bias in LLMs: A study highlighted by @pramod8481 suggests that LLMs favor high-value options due to an implicit value function within, based on research from an Arxiv paper. The study raises concerns about value bias in LLM responses.

Links mentioned:


Nous Research AI ▷ #general (345 messages🔥🔥):

  • Gemma 7B Under the Microscope: Multiple users, including @interstellarninja, @teknium, and @gryphepadar, shared their experiences with finetuning the Gemma 7B model. They discussed issues with loss initially starting high and ways to mitigate this, such as not adding tokens during finetuning and the end results still being less effective than desired.

  • Fine-Tuned Tinyllama Showcases Capability: User @alvion427 praised @n8programs’s fine-tuned Tinyllama model for its ability to conduct multi-turn conversations. @n8programs discussed using the model to produce content more efficiently.

  • OpenCodeInterpreter Sparks Interest: Shared by @weyaxi, OpenCodeInterpreter integrates code generation with execution and refinement, trained on a large multi-turn interaction dataset. @.benxh and @teknium engaged in the discussion, touching on related datasets and their availability.

  • Using LLMs for Scoring and Classifications: Users, including @night_w0lf and @leontello, examined the use of numerical scales and classification labels in giving LLMs scoring tasks. They concurred that defining scores and using classification labels yields better results.

  • LLM Fine-Tuning for Constrained Outputs: @cf0913 and @mihai4256 discussed strategies for fine-tuning large language models (LLMs) for more constrained and reliable outputs such as JSON. @teknium and @.interstellarninja mentioned their ongoing work which includes structured finetuning to achieve a more predictable result.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (18 messages🔥):

  • Huggingface Error Corrected: @qtnx acknowledged that a typo in the Nous-Hermes-2-Mistral-7B-DPO model name on Huggingface (mixtral -> mistral) has been corrected. Model functionality remains the same.

  • Gemma 7B Finetuning Findings Shared: @stoicbatman shared results from finetuning the Gemma 7B model, indicating that a learning rate of 5e-5 yielded the best results for their experiments but did not see significant accuracy improvements.

  • Voracious VRAM Usage Noted: @gryphepadar added their observation, noting that finetuning models consumes a significant amount of VRAM compared to Mistral models, which could be a factor for computational resource planning.

  • Call for Large-Scale Experiment Scripts: @stoicbatman inquired about shell scripts for conducting large-scale model finetuning and evaluation experiments. @teknium responded by providing a link to a related GitHub project and mentioned that the initial project did not succeed, but the repository may still offer valuable insights.

  • Adjustments for Fine-Tuning and Evaluation: In a follow-up, @teknium suggested that the provided GitHub script would require significant updates to meet @stoicbatman’s experiment requirements, as it was designed to save, upload, and evaluate models for each epoch.

Links mentioned:


Mistral ▷ #general (311 messages🔥🔥):

  • Model Speculations and Benchmarks: Community members like @shivakiran_, @sublimatorniq, and others shared thoughts on the potential size and performance of Mistral-next, with some users suggesting it’s a larger model than Mixtral based on its lower serving speed. Users like @egalitaristen and @mrdragonfox mentioned testing Mistral-next on lmsys, praising its capabilities in areas like mathematics, even though the specific model size remains unknown.

  • Gemma’s Potential and Mistral Improvements: @i_am_dom suggests that Gemma could be an open-source base for tiny models and hints that Mistral could improve their 7b model by rebasing from Llama2 to Gemma. Further discussions included assumptions about data recency and knowledge cutoffs.

  • Next Model Analysis: Users such as @gunterson and _._pandora_._ speculated whether Mistral-next could be an improvement or a final version of MiQu, while others like @ethux discussed the current limitations of Apple hardware running Mixtral due to FP16 issues. There’s a general interest in the capabilities and internal details of Mistral-next, but exact details like the number of parameters are not disclosed.

  • Usage Directions and Model Access: Inquiries about using Mistral models locally without software like Ollama or LM studio were addressed by @egalitaristen, who explained that running the code is possible with guidance from model card examples on Hugging Face. @ethux also discussed hardware specifics and the availability of models like Mistral-next, which is currently only available at https://chat.lmsys.org.

  • Open Source Concerns and Ambition: Discussions highlighted a community concern that Mistral might stop open-sourcing their models, although it’s mentioned that there’s no clear indication of this move. Users like @casper_ai and @egalitaristen shared a belief that Mistral’s commitment to open-source remains due to a stated philosophy resembling Linux’s development and how it benefits safety and model improvements.

Links mentioned:


Mistral ▷ #models (15 messages🔥):

  • Mistral-next’s existence confirmed: @ethux confirmed that Mistral-next is a real development, seeming to outperform Mistral-Medium.
  • No API Access for Mistral-next Yet: @ethux mentioned that API access for Mistral-next is not currently available but suggests that details about access will be released soon.
  • Mistral versus OpenAI: @paul16307 humorously notes that Mistral might be a better version of OpenAI, jokingly adding “but French” which prompted _._pandora_._ to comment on Mistral being “thrice as good.”
  • Attractive Pricing Draws Interest: @mrdragonfox pointed out that Mistral’s pricing makes it very attractive and emphasized that Mistral is pushing the boundaries of what’s available outside of OpenAI.
  • Feedback Collection for Mistral Next: @egalitaristen inquired about creating a feedback thread for Mistral Next to post extensive thoughts and screenshots, which @mrdragonfox supported, opening a thread for such discussions.

Links mentioned:

Chat with Open Large Language Models: no description found


Mistral ▷ #deployment (28 messages🔥):

  • GPU Selection for vLLM Backend: @buttercookie6265 inquired about a guide for selecting a GPU for hosting vLLM. @mrdragonfox advised that the model typically occupies 90% of the GPU and recommended doubling the VRAM that the model requires for adequate headroom.

  • Understanding vLLM GPU Consumption: @mrdragonfox clarified that due to the quadratic scaling of the key-value store (kv), and the accumulation of context (ctx) in batching, more VRAM is necessary than the model size alone might indicate.

  • Batch Calls to vLLM Server: @louis2567 asked for the best method to call a vLLM server for batch requests. @mrdragonfox suggested using async, as vLLM does dynamic batching which can handle parallel requests, and implementation would depend on how the user chooses to handle threading/async in their code.

  • Enquiry about Maximum Tokens Per Second: .soulstealth queried about the maximum tokens per second achieved with vLLM and Mistral 8x7b on 2 x H100 GPUs. No specific performance data was given.

  • Deployment Speed for Mistral 7b in fp16: @kiraa8415 sought advice on the fastest deployment option for Mistral 7b in fp16, and @akshay_1 responded with an unclear “fastest matlab?”, which did not seem to directly address the question.

  • Response Times for Support Inquiries: @fangh reached out concerning a lack of response to their email inquiry. @mrdragonfox indicated that as Mistral has a small team, @707162732578734181 or @803073039716974593 should be contacted, but responses could be delayed.


Mistral ▷ #ref-implem (5 messages):

  • Inquiry About Mistral Data Normalization: @severinodadalt from Barcelona Supercomputing Center asked whether Mistral data has been normalized, and if so, which normalization was used and its implementation method. However, they couldn’t find any relevant information, leading them to believe maybe no normalization has been applied.
  • Lack of Basemodel Data Normalization Info: In response to @severinodadalt, @mrdragonfox stated that no basemodel will provide information regarding data normalization.
  • Questioning Inference Speed on Different VRAMs: @bdambrosio questioned whether upgrading their VRAM to run Mistral 8x7B locally in full fp16 might affect inference speed compared to the current 8-bit exl2 settings.
  • Perceived Differences Beyond Measured Metrics: @mrdragonfox acknowledged that differences are noticed because turbo, presumably a tool or metric like “turboderp”, primarily measures ppl (perplexity) and does not account for every possible improvement in performance.
  • Quantization Effects on Context Accuracy: @mrdragonfox pointed out that context accuracy might degrade a bit with quantization, an important factor to consider when seeking to improve performance by adjusting bit depth.

Mistral ▷ #finetuning (21 messages🔥):

  • New to Fine-Tuning: User @4vis expressed being new to fine-tuning and jokingly asked about fine-tuning Mistral with YouTube transcripts. @_._pandora_._ advised starting with Unsloth as it is beginner-friendly.
  • Data Doubts for Fine-Tuning: @pteromaple wondered about the amount of data required for fine-tuning, asking if 4000 instances are sufficient. @egalitaristen suggested that sufficiency depends on the narrowness of the tuning task.
  • File Format Frenzy: @pteromaple inquired about the correct data format when fine-tuning "Mistral-7B-Instruct-v0.2" with Unsloth, mentioning their current format, Alpaca. @_._pandora_._ suggested fine-tuning the base model instead and advised understanding the prompt formatting section of Unsloth’s notebook.
  • Instruct vs. Base Model Debate: @pteromaple sought to maintain instruction-following abilities while altering output formats, expressing curiosity over whether starting with an Instruct model simplifies things. @_._pandora_._ recommended using the base model for greater freedom and shared experiences about biases and language barriers in fine-tuning.
  • Hardware Hurdle for Hefty Models: @kodeurkubik questioned the feasibility of fine-tuning Mistral 7B on a Mac with 16GB of RAM, considering swapping files as a solution. @mrdragonfox mentioned that significantly fewer parameters need to be modified for style transfer and clarified that 7B should fit in 16GB VRAM using fp16 and a batch size of one.

Mistral ▷ #la-plateforme (8 messages🔥):

  • Mistral API Privacy Clarification: @akshay_1 assured @exa634 that data passing through the Mistral API is not used to train the model, reinforcing Mistral’s robust privacy policy.
  • Models Hosted in Sweden: Both @tom_lrd and @ethux confirmed to @exa634 that Mistral hosts its platform and data in Sweden, which is mentioned in their privacy policy.
  • Privacy Policy Details: @ethux posted an excerpt from the Mistral AI Privacy Policy, detailing the roles of Data Controller and Data Processor, and highlighted that Azure hosts the platform and associated data.
  • Comprehensive List of Providers: In a more detailed posting, @ethux listed Mistral’s main service providers, including Azure, Cloudflare, Kong, Lago, Mailjet, Ory, and Stripe, along with their roles and geographic details.

Links mentioned:

Privacy Policy: Frontier AI in your hands


Perplexity AI ▷ #announcements (1 messages):

  • Perplexity Partners with ElevenLabs: @ok.alex announced a new partnership with ElevenLabs, providing AI-powered voices for the Discover Daily podcast, which features episodes from Perplexity’s Discover feed. The podcast is designed to fit easily into listeners’ daily routines and is available on favorite podcast platforms.

  • Discover Daily Podcast Launched: The Discover Daily podcast offers daily dives into tech, science, and culture, using content from Perplexity’s Discover feed and narration by ElevenLabs’ voices. It promises to be a fitting companion for various moments of the day, enhancing listeners’ curiosity journey.

Links mentioned:

  • Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
  • Discover Daily by Perplexity: We want to bring the world’s stories to your ears, offering a daily blend of tech, science, and culture. Curated from our Discover feed, each episode is designed to enrich your day with insights and c…

Perplexity AI ▷ #general (290 messages🔥🔥):

  • Perplexity Pro Subscriptions: Sharing and Discounts: @irismava inquired about adding team members to a Perplexity Pro plan, while @rayinqo asked about savings when subscribed to both ChatGPT and Perplexity Pro. @tree.ai confirmed that team members can be added under the advanced plan, and @v01338 stated there are no discounts for holding multiple subscriptions. The official billing and subscription FAQ posted by @mares1317 clarifies that each employee requires an individual Pro account.

  • Experimental GPT Models: The Perplexity Labs YouTube playlist and a tweet by @perplexity_ai shared by @mares1317 highlighted the new “Experience Gemma 2B and 7B models,” which are notable for their performance despite being lightweight.

  • Problems with Perplexity as Default Search Engine: @redhare18 experienced issues using Perplexity as the default search engine with Arc, which were resolved after @ok.alex provided assistance. Other users like @shizlets also faced difficulties with the Arc Search iOS app.

  • Multiple AI Models Discussed: Users @jaicraft and @rhysd21 discussed the performance and availability of various models on Perplexity Pro, including “Experimental” and “Gemini Advanced”. The conversation touched on the functionality of models like “Gemini,” “Claude 2.1,” and “GPT-4 Turbo,” with @mares1317 and @brknclock1215 confirming that GPT-4 Turbo is supported.

  • Image Generation Feature on Perplexity Pro: There was confusion about generating images on Perplexity Pro, which @trite8q1 sought clarity on. @jaicraft and @ok.alex explained that Pro members can create images by starting a new thread and using the generate image button; the process is detailed in a blog post and an official thread.

Links mentioned:


Perplexity AI ▷ #sharing (4 messages):

  • Searching for Line0’s Identity: @edzordzinam.ali shared a Perplexity AI search link related to identifying what pline0 is.
  • Delving into Risk Factors: @moonshot85 provided a Perplexity AI search link concerning the analysis of various risks.
  • Xiaomi 14 Series Insights: @icelavaman posted a link to Perplexity AI’s search results about the Xiaomi 14 series.
  • Perplexity AI and ElevenLabs Partnership Exploration: @icelavaman also shared a Perplexity AI search result discussing a potential collaboration between Perplexity AI and ElevenLabs.

Perplexity AI ▷ #pplx-api (11 messages🔥):

  • Trouble with API Credit Purchase: @jenish_79522 is facing issues completing a transaction for API credits and seeks assistance, specifically tagging <@752478851103326241> for help.
  • Inquiry about Integrating Gemma with API: @karan01993 asked if there’s any plan to integrate Google’s Gemma with Perplexity API, looking for confirmation on future support.
  • Getting Started with Perplexity API: @brextonpham inquired about accessing the Perplexity API as a newcomer, and @icelavaman directed them to the getting started documentation and provided a contact for higher rate limits ([email protected]).
  • Payment Issue Escalation: In response to @jenish_79522’s pending transaction issue, @icelavaman advised contacting [email protected] for assistance.
  • 400 Error with ‘Assistant’ Field Resolved: @dogemeat_ reported an issue with a 400 error when using the ‘assistant’ field, and @brknclock1215 suggested a workaround involving the message order that appeared to resolve the problem.

Links mentioned:

no title found: no description found


OpenAccess AI Collective (axolotl) ▷ #general (79 messages🔥🔥):

  • In Search of Transformers Code: User @qwerty_qwer seeks Transformers code citing its simplicity and ease of setup; @nanobitz hints at considering vLLM.
  • Checkpoint Concerns: @stoicbatman reports checkpoint issues with directory visible, yet faces errors possibly during merging or evaluation.
  • Cloud Costs vs. Server Ownership: @yamashi questions the cost-effectiveness of cloud computing services after comparing the long-term rental costs with the one-time purchase of servers.
  • Hugging Face Issue Insights: @nanobitz and @stoicbatman discuss a GitHub issue regarding errors when saving with EarlyStoppingCallback, noting a loss of $60 due to this issue.
  • Model Storage Cleanup: @c.gato seeks to clean up space from downloaded models and is directed by @mihai4256 to use Hugging Face’s CLI command huggingface-cli delete-cache and specifies it might work even while running another job.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (6 messages):

  • Mixtral Optimization Ideas in the Works: @casper_ai believes they have determined effective ways to optimize the Mixtral model but lacks the skills to write a compatible backward pass, due to not being a CUDA engineer.
  • Enhancing Mixtral with Grouped Computations: @casper_ai proposes a method to optimize Mixtral by concatenating and stacking experts, then precomputing token and expert ids for efficient grouped computations across all experts.
  • Significant Acceleration Achieved in AutoAWQ: @casper_ai has achieved an impressive 8x increase in speed on Mixtral for prefilling and decoding when working with AutoAWQ.
  • Backward Pass Implementation Challenges: @casper_ai discusses the potential need to import megablocks from another implementation as they have the backward passes for various operations.
  • Resource Suggestion - Gemma Inference Engine: @curiositix suggests looking at Gemma - a lightweight, standalone C++ inference engine for implementing the backward pass that could potentially assist with @casper_ai’s optimization challenges.

Links mentioned:

GitHub - google/gemma.cpp: lightweight, standalone C++ inference engine for Google’s Gemma models.: lightweight, standalone C++ inference engine for Google’s Gemma models. - google/gemma.cpp


OpenAccess AI Collective (axolotl) ▷ #general-help (165 messages🔥🔥):

  • Understanding inference format for codellama: @nani1149 inquired about the format needed for inference after training a model with alpaca format, to which @nanobitz confirmed that the alpaca format is also used for inference, providing a link to the stanford_alpaca GitHub repo for reference.

  • Discussing documentation and community contributions: Users @yamashi and @nanobitz discussed the need for better documentation to avoid repeated questions, mentioning the potential use of gitbooks and citing the help of a large community in maintaining resources like gitbook for different projects.

  • Troubleshooting Learning Rate issues for Gemma 2B: @kearm expressed difficulty in finding the right learning rate for Gemma 2B with various attempts listed, and @stoicbatman responded by suggesting to share loss charts and discussing their own experiences.

  • Merging mixtral performance concerns: @dreamgen experienced slow merge times and GPU not being used while merging mixtral, leading to discussions with @nanobitz about potential solutions and whether running out of VRAM or operating on RAM was the issue.

  • Troubleshooting checkpoint saving error during model training: @kearm struggled with a checkpoint saving issue during model training, which was not resolved despite trying a downgrade of deepspeed as suggested by @stoicbatman. The conversation involved back and forth suggestions and references to related GitHub issues.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #community-showcase (9 messages🔥):

  • DreamGen’s New AI Models Released: @dreamgen announced the launch of new AI models for story-writing and role-playing, trainable with Axolotl and Unlosth, and detailed on Hugging Face with a collection at dreamgen/opus-v1-story-writing-and-role-playing-models. They feature ~100M tokens of human-generated data and are based on an extended version of ChatML, with further instructions available in the Opus V1 guide.
  • Prompt Template Oversight Corrected: @nanobitz noticed that @dreamgen seemed to have forgotten to update the tokenizer’s chat template for the new models; @dreamgen acknowledged the issue, confirming that version 7b did not update as intended.
  • Possible Prompt Leakage in Opus V1.2-7b: ‘nanobitz’ reported an issue when testing @dreamgen’s new models, suggesting that the prompts might be leaking user and assistant roles during conversation starts in chat mode. @dreamgen responded with a link to the prompting format code to clarify the setup, prompt formating code.
  • Further Review Needed on Formatting Issue: @dreamgen is looking into the earlier mentioned “leak” reported by @nanobitz, who indicated the need to investigate more after noticing user/assistant content in the final assistant message.
  • Phi-2 Model Fine-tuned With Axolotl: @finetuningllms shared a link to their fine-tuning of the Phi-2 model, noting high performance and promising to soon add a model card including an image, available at axra/phi-2-x-0.1.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #runpod-help (6 messages):

  • RunPod Image Availability Concerns: @stoicbatman inquired if the RunPod image was deleted as they were unable to find it.
  • Helpful Direction to Docker Hub: In response, @nanobitz shared a direct link to Docker Hub where the RunPod image tags can be found.
  • Confusion Over GitHub Readme: @stoicbatman followed up to mention that the GitHub readme is no longer redirecting to the actual RunPod image.
  • Seeking the Latest Link: @nanobitz asked @stoicbatman if they have the latest link, attempting to address the redirection issue mentioned.
  • Reliance on Docker Hub Over GitHub: @stoicbatman confirmed using the image from Docker Hub but expressed confusion as the GitHub readme previously redirected to the RunPod image, which is no longer the case.

Links mentioned:

Docker: no description found


HuggingFace ▷ #announcements (1 messages):

  • Visualizing Aya Dataset: User @416019758492680203 provides a visualization of the Aya dataset for better insights and understanding.
  • Image Generation Upgrade: With the new release of Proteus V0.4, @1093866142608670772 enhances the capabilities of image generation, available at Proteus V0.4 space.
  • Interactive Text-to-Image RAG Prompts: User @942079288952381461 created an interactive demo to play with over 1.4 million text2image prompts using RAG, accessible here.
  • Serverless Hosted API for Inference: @319141699605626881 shares a serverless inference solution hosted on a free Colab environment, with details on GitHub.
  • Innovating with ProteinBERT and Fluently Models: Links to ProteinBERT model weights by @403280164433297409 and the accompanying paper were shared, along with the Fluently diffusion model demo by @1056663454519406652, available at Fluently space.

HuggingFace ▷ #general (149 messages🔥🔥):

<ul>
  <li><strong>Seeking Performance Clarity</strong>: User <code>@0ldgranpa</code> inquires about optimal model types and performance fixes for his hardware specifications. There are no responses to guide them yet.</li>
  <li><strong>GPU Memory Workarounds</strong>: <code>@alifthi</code> asks for solutions to run large models like Mistral with limited GPU memory, and <code>@typoilu</code> suggests using llama.cpp or accelerate for CPU offloading.</li>
  <li><strong>Hardware Curiosity</strong>: <code>@zorian_93363</code> compares ASIC mining machines' capabilities to potential uses for running models, and <code>@vipitis</code> explains the difference between computational tasks and discusses current hardware such as Google's TPU and Graphcore's IPU.</li>
  <li><strong>Exploring GPT Alternatives</strong>: <code>@amirgame197</code> asks why GPT 3.5 is unlimited and free on chat.openai.com but paid on api.openai.com, suggesting he’s seeking free alternatives for API usage, without receiving a direct answer.</li>
  <li><strong>Accidental Template Confusion</strong>: In a coding issue, <code>@levisco</code> initially struggles with using the create_sample feature from the transformers QuestionAnsweringPipeline, but discovers it was only a typo in their code.</li>
</ul>

Links mentioned:


HuggingFace ▷ #today-im-learning (4 messages):

  • Flutter Game Inquiry: User @.konoh inquired about a flutter game, but provided no further context or details.
  • Hugging Face Open Sources “DoReMi”: User @neuralink shared a link to an open-sourced Hugging Face project on GitHub named DoReMi, part of the nanotron repository.
  • User Feels Overwhelmed by Complexity: @cursorop expressed feeling overwhelmed by the complexity of the project shared by @neuralink, using the :blobsweat: emoji to convey their sentiment.
  • Seeking Advice on Imitation Learning for Robotics: @alefram asked the community for tips or resources for learning about imitation learning as it applies to robotics, but no responses were provided within the given messages.

Links mentioned:

nanotron/examples/doremi at main · huggingface/nanotron: Minimalistic large language model 3D-parallelism training - huggingface/nanotron


HuggingFace ▷ #cool-finds (5 messages):

  • Benchmarking AI Models: User @ryzxl announced a comprehensive benchmarking initiative for AI models, comparing platforms like gpt-3.5-turbo-instruct and Mistral. The initiative covered key datasets including ASDiv, BBQ, BigBench, and more. Full details and the leaderboard can be found in their LinkedIn post.

  • Reminding About Posting Etiquette: User @cakiki reminded @ryzxl to avoid cross-posting the same message multiple times to prevent spam.

  • Deep Unsupervised Learning Course Announcement: User @omrylcn. shared information about the Spring 2024 offering of Berkeley’s Deep Unsupervised Learning course, covering Deep Generative Models and Self-Supervised Learning, similar to previous offerings.

  • Large Action Models (LAMs): User @fernando_cejas shared a blog post discussing Large Action Models (LAMs), which are AI systems that perform human-like tasks within digital environments through neural networks and symbolic reasoning.

  • Warp Dev Referral: User @gjyotin305 posted a referral link to Warp Dev, but provided no additional context or information about the link.

Links mentioned:


HuggingFace ▷ #i-made-this (26 messages🔥):

  • Unofficial ChatGPT API via Selenium Raises Concerns: @.infinityhawk shared a link to an unofficial ChatGPT API created with Selenium (Github Repo). Both @myg5702 and @cakiki raised potential ethical and practical concerns, such as contravening OpenAI’s terms of service and risking IP or RP bans.

  • Optimization Techniques for Stable Diffusion XL: @felixsanz published a comprehensive article detailing optimization methods for Stable Diffusion XL, enabling image generation on GPUs with just 6 GB memory (Read the Article). Despite timing the release coinciding with the announcement of Stable Diffusion 3, @paccer commended the educational value and efforts.

  • Cheaper Access to OpenAI GPT-4 Models via New API: @exrew introduced an API offering affordable access to OpenAI GPT-4 models, with a free plan for trial and a flexible credit system for various models (Find the API here).

  • Real-time Text Streaming Chat Interface with Gemma: @not_lain created a text streaming chat interface utilizing the new Gemma AI model, promising fast performance (Experience it here).

  • Browser-Based Speaker Embeddings with WavLMForXVector: @davidre95 has contributed to transformers.js by submitting a pull request to support WavLMForXVector, enabling running speaker embeddings models directly in the browser (PR on GitHub; Model on HuggingFace).

Links mentioned:


HuggingFace ▷ #reading-group (5 messages):

  • Neural Circuit Diagrams Presentation Scheduled: @chad_in_the_house confirmed there will be a recording of the neural circuit diagrams presentation.
  • Time Confirmation for a Live Event: @chad_in_the_house mentioned the presentation will take place at 7pm EST today.
  • Consideration for Time Zones: @gschwepp_84093 noted that the presentation time translates to 00:00 UTC, expressing potential difficulty in attending due to the late hour.

HuggingFace ▷ #diffusion-discussions (5 messages):

  • Inquiring about Interlingua-Based Translators: User @hobojesus6250a expressed interest in finding or creating an Interlingua-based translator on Hugging Face and discussed the potential need to extend an existing model due to time constraints.
  • Looking for Ways to Expand Class Limit: @agusschmidt asked how to run the facebook/bart-large-mnli model with more than 10 classes, referencing a previous discussion that suggested it was possible when running the model locally.
  • Friendly Caution from HuggingMod: The automated moderation bot @HuggingMod reminded users <@345587852052267018> and <@745207885201539072> to slow down their posting, indicating they might be sending too many messages in a short span of time.

HuggingFace ▷ #computer-vision (3 messages):

  • Multi-label Image Classification Tutorial Drops: User @nielsr_ shared a tutorial notebook for multi-label image classification using SigLIP, a strong vision backbone, although any vision model from the Transformers library can be substituted.

  • Too Much Zeal from HuggingMod: @745207885201539072 received a gentle warning from HuggingMod to slow down their message posting speed on the server.

  • Forge Ahead with Emotion Recognition: @rodricota_ began a discussion on building an emotion recognition model and expressed a desire to troubleshoot some issues.


HuggingFace ▷ #NLP (49 messages🔥):

  • Peft’s Persistent Problem: @grimsqueaker mentioned a significant bug where peft does not save the right heads for non-auto-configured architectures. The workaround involved random parameter adjustments until finding a config that works but compromises had to be made.

  • Reformer Research Ruminations: @devbravo shared their current research focus on developing smaller, more memory-efficient models with Reformer architecture to run on edge devices. A reminder to keep it slow appeared from @HuggingMod, prompting @devbravo to slow down their rapid posting.

  • GPT Length Logistics: @vipitis corrected @nrs9044 by stating that Transformers are not recurrent but fully parallel, and confirmed that the size of self-attention matrices in GPT indeed scales quadratically with sequence length.

  • Generating Positive and Negative Sentiments: @jimmyfromanalytics inquired about fine-tuning Flan T5 for creating synthetic data for sentiment analysis. Discussions revolved around prompt engineering and potentially exploring decoder-only models for better performance.

  • Fine-Tuning vs. Large Model Dilemma: @arkalonman sought insights regarding whether to fine-tune a larger LLM like Mistral 7B for text classification, versus sticking with a BERT variant. The conversation with @lavi_39761 led to a consensus that efficient encoder models might be a better focus than more substantial models for classification purposes.

Links mentioned:


HuggingFace ▷ #diffusion-discussions (5 messages):

  • Exploring Interlingua-Based Translators: hobojesus6250a raised a question about whether anyone has experimented with creating or tweaking an Interlingua-based translator on Hugging Face. They expressed interest in extending an existing model for a university project due to time constraints.
  • Expanding Classes for BART MNLI Model: agusschmidt inquired about how to run the BART-large-mnli model with more than 10 classes, suggesting they are aware of the possibility when running it locally and seeking guidance on how to implement this.
  • Friendly Bot Reminders to Avoid Spam: HuggingMod, the Hugging Face moderator bot, issued reminders to @345587852052267018 and @745207885201539072 to slow down their message posting as they were sending messages too rapidly.

Latent Space ▷ #ai-general-chat (52 messages🔥):

  • Prompt Engineering with GPT-4: `@henriqueln7` expressed disappointment as GPT-4 failed to effectively rewrite prompts as per their request, instead generating responses akin to a new assistant's. They plan to test further in the playground.
  • Announcement of Stable Diffusion 3: `@rubenartus` shared [an announcement](https://stability.ai/news/stable-diffusion-3) about Stable Diffusion 3's early preview with enhanced multi-subject prompt performance and spelling abilities. They also provided a [link](https://twitter.com/EMostaque/status/1760660709308846135) to more model details.
  • Google's New Model Gemini Pro 1.5: `@nuvic_` discussed the capabilities of Gemini Pro 1.5 highlighting its 1,000,000 token context size and ability to use video as input, as explored through Google AI Studio.
  • Assessing Reddit's Data Deal with Google: Users like `@guardiang` and `@pennepitstop` provided perspectives on the financial and strategic implications of [Google’s reported $60 million/year data deal](https://news.ycombinator.com/item?id=39471964) with Reddit ahead of its IPO.
  • Gemini Image Generation Paused: `@swyxio` posted a [link](https://blog.google/products/gemini/gemini-image-generation-issue/) to a Google blog where the SVP took responsibility for issues with Gemini's image generation feature, which resulted in a temporary pause of the function.

Links mentioned:


Latent Space ▷ #ai-announcements (6 messages):

  • LLM Paper Club T5 Discussion: @ivanleomk announced a session of the LLM Paper Club discussing the T5 paper with @bryanblackbee. The event was scheduled to happen in 5 minutes with a link to join the discussion: Join LLM Paper Club.
  • Regretting Missing the Paper Club: @swyxio expressed regret for missing the LLM Paper Club on T5 led by @bryanblackbee, hinting at the need for a recording of the session.
  • AI in Action Event: @kbal11 promoted an AI in Action event with @yikesawjeez focusing on local models. A link to the session was provided: Learn About Local Models.
  • Praise for AI Event Management: @swyxio complimented @kbal11 on the successful management of the AI in Action session led by @yikesawjeez.

Links mentioned:


Latent Space ▷ #llm-paper-club-west (16 messages🔥):

  • LLM Paper Club Asia Edition Kicks Off: @ivanleomk invites participants to join the discussion, offering a platform for anyone to ask questions or discuss topics by joining the stage as a speaker or chatting if they’re more comfortable with that.
  • Central Repository for Notes and Insights: @bryanblackbee provides a link to notes, which is a central repository for the discussions taking place in the LLM Paper Club.
  • Inquiries to the Community About Model Vocabularies and Constraints: @mattoshimasu is curious about whether new models they’re discussing have a smaller set of vocabulary and also asks about the text length and verb count constraints.
  • Fine-Tuning Mechanisms in NLP Explained: In response to @healthymonkey’s question, the community discusses fine-tuning in NLP tasks like T5 for sentiment classification, touching on whether the head/linear layer is replaced, like in computer vision.
  • Technical Comparison of Encoder-Decoder vs. Decoder-Only Architectures: @hanzo4958 sparks a discussion about the differences between encoder-decoder and decoder-only architectures in traditional NLP tasks, noting the rising popularity of decoder-only models.
  • Parting Gratitude and Positive Feedback on the Session: Several participants, including @healthymonkey, @hanzo4958, @thehippoguy, @edwin_75513_08956, and @lord_idiot, express their thanks and appreciation for the detailed session and notes before leaving the discussion.

Links mentioned:

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It’s the all-in-one workspace for you and your team


Latent Space ▷ #ai-in-action-club (136 messages🔥🔥):

  • Local Models and LoRA Discussion: Users discussed their experiences with local AI models and the LoRA (Low-Rank Adaptation) technique. @markredito clarified that LoRA is an adapter placed on top of a generative model to influence its output and is commonplace in platforms like Stable Diffusion.

  • Latent Space Final Frontiers Event: @kbal11 shared details about the Latent Space Final Frontiers event, which focuses on pushing AI boundaries and features a research/startup competition with notable judges from companies like GitHub, Replit, and LlamaIndex.

  • ComfyUI for Stable Diffusion: @markredito provided a GitHub link to ComfyUI, which is described as a powerful and modular GUI, API, and backend for Stable Diffusion with a graph/nodes interface.

  • AI Model Merging Trend: @swyxio shared a Hugging Face blog post discussing the emerging technique of model merging that allows combination of multiple LLMs to create state-of-the-art models for cheap, highlighting the use of the mergekit library.

  • Civit.ai Model Gallery Concerns: @kbal11 pointed out the prevalence of stylized and sexualized images of young women in the Civit.ai model gallery, sparking a lighthearted but poignant discussion about the content generated by AI and shared within the community.

Links mentioned:


Eleuther ▷ #general (107 messages🔥🔥):

  • Skeptical Takes on Simulating Personhood: @sparetime. expressed skepticism about the claim that GPT-4 and a scratchpad can simulate a human, questioning the model’s ability to faithfully generate realistic experiences. @rallio. responded with a detailed explanation that the simulation would include creating a set of fake memories and layers to emulate human behavior and perspective, even noting recent improvements in memory consistency.

  • Discord Member Shares Benchmark Critique Video: @cahya.wirawan shared a YouTube video link titled “Everything WRONG with LLM Benchmarks (ft. MMLU)!!!” which criticizes benchmarks for large language models, sparking a conversation about the validity and effectiveness of current LLM benchmarks.

  • Eleuther Community Discusses Improving LLM Consistency: In a technical discussion, @rallio. suggested that the issues related to consistency in simulating memories for Large Language Models (LLMs) have been potentially mitigated according to recently published research such as Google’s TrueTeacher and Propsegment.

  • The Hallucination Debate: @rallio. mentioned a company called Superfocus which claims to have achieved near 100% factual accuracy for LLMs, implying a solution to the hallucination problem. This sparked a debate with @fern.bear over the veracity of these claims and the nature of solving the hallucination issue with LLMs.

  • Creating Lifelike NPCs in Virtual Worlds: @rallio. discussed their ambition to create persistent NPCs that could interact with humans in virtual worlds without revealing their artificial nature. They explained this would utilize the formulated approach for consistency and memory simulation in conjunction with fine-tuning and context.

  • Community Shout-Out for Collaboration: @hawk1399 prompted the community to consider a project based on a paper outlining the use of diffusion models to generate high-performing neural network parameters, inviting others to contribute to continued research in the field.

Links mentioned:


Eleuther ▷ #research (70 messages🔥🔥):

  • Model Naming Ethics Debated: @thooton_ expressed frustration over misleading model naming practices, suggesting that a model named “7b” should not exceed 7.99b parameters. They highlighted the inconsistency with “gemma-7b” actually having 8.5b parameters, while “gemma-2b” is closer to its stated size with 2.5b parameters.
  • Clarifications on Embedding Sizes: In a discussion with @catboy_slim_, it was clarified that “gemma-7b” includes 8.5 billion parameters with the embedding size considered, but the number matches the correct leading digit when embeddings are excluded.
  • New Paper on Minimizing Data Loss: @jckwind shared excitement for a new paper on data efficiency and minimizing information loss during layer transmissions, advocating its novelty and potential usefulness.
  • Searchformer Beats Traditional Planners: @jckwind highlighted “Searchformer”, a Transformer that outperforms traditional symbolic planners by solving Sokoban puzzles while utilizing fewer search steps than A* search.
  • Simplicity in AI Alignment with Reinforce: Discussion around a paper suggested that simpler REINFORCE-style optimization could be more effective for RLHF (Reinforcement Learning from Human Feedback) compared to the canonical PPO method, which @canadagoose1 mentioned discussing extensively.

Links mentioned:


Eleuther ▷ #interpretability-general (17 messages🔥):

  • Training of Chinese Lens Underway: @mrgonao mentioned that the training of the Chinese lens is in progress and will finish in a few hours. Additionally, there is an issue with the 13b model showing uniform output which will be checked alongside the Chinese lens comparison.
  • Unlearning in Language Models: @millander highlighted a recent academic publication titled Survey and formalization of LLM unlearning. It can be accessed here for detailed insights on unlearning processes in Large Language Models (LLMs).
  • Identical Tokenizer Across Models?: Such issue prompted @mrgonao to inquire whether the tokenizer used for the 13b model is the same as that of the 7b model, which may associate with the model’s odd behavior of “thinking in Chinese” when applied with a Chinese lens.
  • Lens Training Could Lead to Intra-Translation: @butanium contributed a hypothesis that training the tuned lens exclusively on Chinese content might impel the lens to translate from English to Chinese, expecting an inclination toward the presence of Chinese tokens in an English setting.
  • Troubleshooting Dataset Anomalies: @mrgonao is experiencing unexpected dataset behaviors in translation tasks and is seeking to rectify potential issues, mentioning that the wrong languages are paired with words. The related GitHub repository can be found here.

Links mentioned:


Eleuther ▷ #multimodal-general (4 messages):

  • False Negatives in Scaled Datasets not a Concern: @_.hrafn._ opined that exact false negatives are unlikely at the scale of current datasets like datacomp or metaclip, particularly with balanced datasets. They suggested generating unimodal embeddings or computing similarity scores on the fly to mitigate concerns.
  • Creating Own Model to Exclude Hard Negatives: @_.hrafn._ further proposed the idea of using one’s own model during training to compute similarity scores in order to exclude particularly hard negatives.
  • Irrelevance of Solution for Non-Image-Text Projects: @tz6352 responded that the discussed issue and solutions are not applicable for them as they are not working on Image-Text projects.
  • Loss Masking as a Viable Solution: @.solux discussed the possibility of masking the loss for samples that are too close, suggesting it as a potential solution when there’s no good way to identify false negatives in the training process.

Eleuther ▷ #gpt-neox-dev (8 messages🔥):

  • Exploration of Pre-training Techniques: @pminervini shared an arXiv paper which discusses the impact of sequence composition and causal masking during pre-training of language models. The findings suggest that intra-document attention can significantly improve performance on various tasks.

  • PR Gratitude and Prompting Protocol: @tastybucketofrice thanked @441658587404697600 for their pull requests and encouraged future pings for faster merges.

  • LoRA Finetuning Inquiry: @norabelrose inquired about the feasibility of using LoRA finetuning with the gpt-neox codebase, signaling their current use of Hugging Face and PyTorch Lightning for similar tasks.

  • Potential Shift to NeoX Codebase for Finetuning: Faced with issues using PyTorch native FSDP, @norabelrose contemplated using the gpt-neox repository for a NeoX 20B finetune.

  • Resolution Acknowledgment: @80melon acknowledged the resolution of a previously unstated issue brought up by @norabelrose.

Links mentioned:

Analysing The Impact of Sequence Composition on Language Model Pre-Training: Most language model pre-training frameworks concatenate multiple documents into fixed-length sequences and use causal masking to compute the likelihood of each token given its context; this strategy i…


LlamaIndex ▷ #blog (3 messages):

  • Turn RAG into Full-Stack: @wenqi_glantz shared a tutorial on converting RAG notebook into a full-stack app with ingestion and inference services. She explains the setup of an ingestion service and more in her guide. Tweet Screenshot
  • Rapid Re-ranking with ColBERT: ColBERT, highlighted by @lateinteraction, provides a faster alternative for document re-ranking at 100x the speed of BERT-based models. @Haotianzh is credited for its efficiency and improved performance over dense retrieval in this tweet. Tweet Screenshot
  • Advanced RAG becomes Easily Accessible: The create-llama release now includes LlamaPack for advanced RAG concepts, allowing full-stack web app implementation in just two lines of code. Community-contributed modules simplify the integration of advanced RAG features as highlighted in the announcement. Tweet Screenshot

LlamaIndex ▷ #general (150 messages🔥🔥):

  • Querying RAG in QueryPipeline: @lapexer asked about implementing a simple RAG in QueryPipeline with modules prompt, retriever, and llm. @cheesyfishes directed them to the documentation for guidance and an overview (How to write a simple RAG).

  • IngestionPipeline Integration Challenges: @emmepra faced ValidationError issues while deploying the IngestionPipeline using ChromaVectorStore and TextEmbeddingsInference in Docker services. After multiple iterations and community support, especially from @whitefang_jr and @cheesyfishes, they resolved the issue by using consistent import paths between core and legacy modules.

  • LlamaIndex Import Inconsistencies: Users like @pymangekyo and @oopskapootz reported inconsistencies and errors relating to module imports in LlamaIndex’s new version. It was suggested by @whitefang_jr and @cheesyfishes to reinstall LlamaIndex and create a new environment if prior versions were installed to resolve import issues (e.g., pip uninstall llama-index and pip install llama-index).

  • LlamaParse Enterprise Deployment Possibilities: @self.1 inquired about the possibility of LlamaParse being open-source or self-hostable considering privacy concerns. @cheesyfishes pointed out that enterprise deployments are being considered but are not yet available.

  • Strategies for RAG Response Consistency: @a3lita sought advice to improve the reliability of responses in RAG, specifically questioning the settings around LLM temperature. @kapa.ai explained several techniques such as prompt optimization, evaluation and benchmarking, context augmentation, and multi-modal evaluation to address this issue.

Links mentioned:


LlamaIndex ▷ #ai-discussion (3 messages):

  • In Search of Code Invocation Models: @gooooooofy inquired about models or finetunes capable of generating code invocations like python scripts or shell commands with the correct arguments.
  • Gorilla LLM Almost Fits the Bill: @gooooooofy mentioned that Gorilla LLM is similar to what they need but noted that it specializes in API calls and appears to be a smaller model.

CUDA MODE ▷ #general (16 messages🔥):

  • A Steal on GPUs for Deep Learning: @andreaskoepf scored three RTX 3090s for 1.7k euros, aiming to convert a mining rig for deep learning tasks, especially for fine-tuning and serving large language models (LLMs). They outlined the specifications and considered a significant deal given the prevailing prices. Part 1 and Part 2 of their blog detail the transformation process to a deep learning rig.

  • Jim Keller’s Critique of CUDA: User @itali4no shared a link where Jim Keller criticized Nvidia’s CUDA, calling it “a swamp” like x86, for being cumbersome and not beautifully constructed, evolving through the addition of multiple functionalities.

  • The Preferable GPU for Deep Learning: A discussion about choosing GPUs for deep learning had @iron_bound pointing out the advantages of used 3090s over the new 4060 ti, mainly due to better memory bandwidth and PCIe support. Meanwhile, @cropinky. mentioned that the 4060 ti’s 16GB VRAM is usually insufficient for LLM tasks.

  • Quantized Model Computations Explained: @andreaskoepf explained that quantized models perform matrix multiplications at a higher internal resolution, and provided GitHub links illustrating the dequantization process.

  • Guidance on Buying Second-Hand GPUs: In response to queries on purchasing second-hand GPUs for deep learning, @cropinky. advised that it’s a risk but suggested stress testing the GPU, checking for fan wear, and replacing thermal components as necessary for maintaining performance.

Links mentioned:


CUDA MODE ▷ #triton (2 messages):

  • Triton’s Role in Education and Deployment: @_hazler inquired whether integrating with Triton offers any advantages in speed or deployment platforms. @srush1301 answered that it was primarily an educational undertaking, although it also enables Jax support via Pallas and offers a simplified version for researchers to modify.

CUDA MODE ▷ #cuda (17 messages🔥):

  • Catching Up with CUDA Profiling: @dvruette battled through installation errors and is now exploring ncu to delve into low-level CUDA profiling.

  • Open Repository for CUDA-Accelerated BnB: @zippika announced their new GitHub repository torch-bnb-fp4, which hosts their faster alternative to bitsandbytes with minor differences in output and requires cuda compute >= 8.0.

  • Touting Token Speed Triumphs: @zippika highlighted a significant speed boost achieved by their library, showcasing a performance jump from 24 tokens/s to a max of 29 tokens/s.

  • Test Script to Benchmark BnB Performance: @zippika shared a detailed Python script for comparing the performance of the default bitsandbytes and their own torch-bnb-fp4 library; to execute the test, users need to toggle USE_LINEAR_HIJACK and have at least 12.8GB of VRAM available.

  • Code Improvements and Community Engagements: @zippika referencedmodifications made to CUDA ‘gemv’ kernels for optimization and expressed a commitment to enrich the repository with more examples and thorough documentation; meanwhile, @_t_v_i_ expressed enthusiasm for the work done.

Links mentioned:


CUDA MODE ▷ #torch (2 messages):

  • Exploring the World of Random Kernels: @hdcharles_74684 discussed the challenges of making random kernels accessible, particularly the clunky release of int_mm through out_dtype. They referenced the pytorch/_higher_order_ops/out_dtype.py and their work on a 4-bit triton kernel in torch/_inductor/fx_passes/post_grad.py.

  • Torch Compile’s Kernel Integration Limits: @hdcharles_74684 pointed out that torch.compile struggles with operations needing custom kernels that differ from existing ones, particularly for GPUs. They mentioned an intention to improve kernel access, such as adding weight-only int8 quantization for batch sizes larger than one.

Links mentioned:


CUDA MODE ▷ #suggestions (1 messages):

  • Gemini 1.5 Discussion Session Announced: @shashank.f1 is hosting a discussion on Gemini 1.5, welcoming everyone to join live. The link to a past session titled “A-JEPA AI model: Unlock semantic knowledge from .wav / .mp3 file or audio spectrograms” is provided with a YouTube video.

Links mentioned:

A-JEPA AI model: Unlock semantic knowledge from .wav / .mp3 file or audio spectrograms: 🌟 Unlock the Power of AI Learning from Audio ! 🔊 Watch a deep dive discussion on the A-JEPA approach with Oliver, Nevil, Ojasvita, Shashank, Srikanth and N…


CUDA MODE ▷ #jobs (1 messages):

  • SIXT is hiring ML Engineer in Munich: @ppeter0480 posted a job opening for an ML Engineer at SIXT in Munich. The role requires knowledge and skills in NLP and Generative AI, and solid engineering abilities. Interested parties can apply through the provided SIXT job listing.

Links mentioned:

Apply now: Senior Machine Learning Engineer (m/f/d) | Munich: The job of your dreams in Munich: Senior Machine Learning Engineer (m/f/d). Join the SIXT team! We are looking forward to your application!


CUDA MODE ▷ #beginner (3 messages):

  • Politeness Reigns Supreme: Users @0ut0f0rder and @dpearson exchanged pleasantries, appreciating each other’s helpfulness and agreeing on the importance of learning.
  • Seeking Help with OpenCV in Google Colab: @dpearson is utilizing Google Colab’s GPUs to run C/C++ code with ‘nvcc4jupyter’ but is facing issues with not being able to include <opencv2/opencv.hpp>. They are looking for a solution or an alternative to test their colorToGrayscaleConverter function on an image.

CUDA MODE ▷ #youtube-recordings (1 messages):

marksaroufim: Lecture 6 on youtube https://www.youtube.com/watch?v=hIop0mWKPHc


CUDA MODE ▷ #jax (11 messages🔥):

  • AMD GPUs Lack Support for FA2: @mrrational and @iron_bound both reported issues running FA2 training on their AMD GPUs, specifically the 7900xtx, even with the triton version for @iron_bound. The backward function/kernel appears to be missing, causing failures.

  • Potential Solution for Backward Function: @_t_vi_ suggested using Triton-autodiff on GitHub to help @iron_bound get the backward kernel for FA2 training on an AMD GPU; however, @srush1301 clarified it would still require adjustments as it mainly differentiates mathematical functions.

  • Limited AMD PyTorch Support for FAv2: @drisspg informed the channel that AMD has added some limited FAv2 support in PyTorch’s nightly builds, but @iron_bound’s subsequent error message indicates that the 7900xtx GPU isn’t supported yet, as it’s expecting gpu architecture gfx90a and not gfx11.

  • Further Clarification on GPU Architecture: @iron_bound explained the architectural differences between the AMD GPUs, noting that the 7900 series targets “wave32” while data-center cards support “wave64.” He also mentioned that AMD developers are currently focused on their mi300 product, signaling that lower-priority support issues may not be addressed promptly.

  • Exploring Code for Having Wave Matrix Multiplication (WMMA): @iron_bound shared a goal to potentially create a kernel targeting WMMA by referencing the code from the flash-attention GitHub repository, as RDNA architecture supports WMMA in contrast to data-center cards that use XDL.

Links mentioned:


CUDA MODE ▷ #ring-attention (46 messages🔥):

  • Exploration of Facebook’s Xformers: @jeremyhoward provided a link to Xformers’ FMHA initializations on GitHub with a particular focus on line 417, spotlighting their repository as a subject of interest.

  • PyTorch Forum Discussion on Equivalence of JAX Lax Scan: @andreaskoepf shared a PyTorch forum discussion that appears to be asking about an equivalence in PyTorch for JAX’s lax.scan. The link includes CSS styling details, likely extracted from the webpage.

  • Introducing Ring Attention PyTorch Implementations: @ericauld and @iron_bound introduced GitHub repositories for ring attention, lucidrains/ring-attention-pytorch and exists-forall/striped_attention, which explore the Ring Attention concept from Berkeley AI and striped attention codes respectively.

  • Benchmarks and Implementation Discussions: @iron_bound presented their own ring-flash-attention benchmarks, including performance figures for different settings, while @zhuzilin96, the author of one of the discussed repos, joined the conversation, offering insights and mentioning the need for testing and enhancements such as support for returning fp32 outputs and arbitrary mask handling.

  • Collaboration Offers and Ongoing Improvements: @andreaskoepf and others offered to team up with @zhuzilin96 to further develop and optimize the ring attention implementations, with specific focus on testing, striped attention, and handling issues such as arbitrary masking for better flexibility of the models. All the while, @zhuzilin96 has been pushing commits for improvements like zigzag_ring_flash_attn_varlen_qkvpacked_func.

Links mentioned:


LangChain AI ▷ #general (70 messages🔥🔥):

  • Feedback Request for Research Tool: @d97tum shared a survey link to gather feedback for a product he is developing that addresses common research problems, such as finding relevant research papers and comprehending complex studies. He hopes the community’s insights will shape the product’s features.
  • Need for Langchain Consultant: @cybersmiths is looking for a consultant skilled in Langchain and OpenAI’s tool agent to assist with their efforts, and is willing to offer compensation for the help. This opportunity is directed to the LangChain AI Discord community.
  • Technical Discussions on Optimizing Chains: @b0otable initiated a deep dive into how to better optimize chains in LangChain, focusing on using RunnableParallel and RunnablePassthrough to maintain the input query while running multiple chains in parallel and retaining the outputs at the root level of a dict-like output.
  • API Calls and Streaming in LangChain: @critical3645, @saita_ma_, and @edartru. brought up questions about implementing streaming in agent_supervisor, calling local models like OpenHermes, and the applicability of certain tools with streams, highlighting the technical nuances of working with LangChain tools and integrations.
  • LangSmith Debugging and Visualization Tool: @b0otable shares his experience with LangSmith for debugging complex LangChain processes, recommending it as a way to ensure chains behave as expected and offering a brief guide on setting it up for new users.

Links mentioned:


LangChain AI ▷ #share-your-work (3 messages):

  • Parallel Function Calls Now Available: @gokusan8896 announced a method to enable parallel function calls in any Large Language Model (LLM). The details were shared in a LinkedIn post.
  • Seeking Feedback on Aggregate Query Platform: @rogesmith is developing a platform/library for aggregate document data queries and is considering making it public, soliciting community feedback on its usefulness.
  • Guide to Building Custom Chatbots: @deadmanabir released a comprehensive guide on how to create custom chatbots incorporating chat history using OpenAI, Qdrant DB, and Langchain JS/TS SDK. For more information and feedback opportunities, check out their Twitter post.

LangChain AI ▷ #tutorials (3 messages):

  • Introducing Chat UI with ChainLit and Friends: A YouTube video demonstrating how to create a ChatGPT-like UI locally using ChainLit, LangChain, Ollama & Gemma was shared. The video can be watched here, where viewers can clone the repository and follow along to set up their own chat interface.

  • Stock Analysis via LLMs: @rito3281 published an article discussing how Large Language Models (LLMs) can assist in understanding a company’s quarterly reports to predict future growth, risk, and market opportunities. The detailed post and a demonstration of a Stock Portfolio Summarizer app can be found here.

  • Google Colab and Ollama Meet: @schimazing announced an adaptation that uses Ollama’s new embeddings, fully hosted on Google Colab without the need for API keys. More information is available in the linked Twitter post.

Links mentioned:


Datasette - LLM (@SimonW) ▷ #llm (19 messages🔥):

  • Codespaces for LLM Playground: @derekpwillis shared a template repository to run LLM in codespaces, finding it effective for orca-mini-3b while expressing concern about the support for larger models.
  • Positive Feedback on Codespaces Configuration: @simonw praised the barebones .devcontainer configuration in the codespace template and found it to be highly useful as an example. The same user also noted a long startup time, which seemed to involve compiling many components from scratch.
  • Untangling a Codespaces Quirk: @simonw encountered a bug where llm-gpt4all was not recognized as available initially but worked after running llm models. He suggested using llm chat -m orca-mini-3b-gguf2-q4_0 to keep the model in memory for faster follow-up messages.
  • Prompt Crafting Versus Direct Query: @tariqali compared an old-school prompt crafting that gives more control to modern, more direct queries with LLM (like RLHF), noting the usefulness of the former approach in specific circumstances, such as resuming conversations with new chatbot instances.
  • Exploring Larger World Model Integration: @simonw expressed interest in running the Large World Model’s LWM-Text-1M-Chat, which may require a GPU instance for pytorch models considering it’s trained on a large dataset.

Links mentioned:


LLM Perf Enthusiasts AI ▷ #general (5 messages):

  • Richard Socher hints at a solution to AI hallucination: @res6969 shared a tweet by @RichardSocher suggesting that significant progress might have been made in addressing the issue of AI hallucination, showing up-to-date references with no errors.

  • Wondering about the Wizardry Behind Non-hallucinatory AI: @res6969 speculated that to prevent hallucinations, the AI might be utilizing some state-of-the-art embeddings along with an instructional validator.

  • Globe Explorer: Your Personalized Wikipedia: @sincethestudy shared a tweet about a new platform called Globe Explorer, which acts as an on-demand custom Wikipedia page powered by GPT-4, marking an evolution in how we discover information. Visit the tool at explorer.globe.engineer.

  • GPT-4 Powers New Discovery Engine: @sincethestudy announced the launch of Globe Explorer, a discovery engine that uses GPT-4 as its backend, paving the way for enhanced information discovery experiences.

Links mentioned:


LLM Perf Enthusiasts AI ▷ #finetuning (1 messages):

  • Finetuning Dilemma with GPT-4-Turbo: User @pantsforbirds has successfully been embedding entire documents for 1-shot data extraction using gpt-4-turbo and is contemplating finetuning for more complex tasks. They inquire whether the finetuning dataset should include entire example documents or just relevant sections.

LLM Perf Enthusiasts AI ▷ #offtopic (4 messages):

  • Globe Explorer Sparks Information Discovery Excitement: @joshcho_ shared a link to Globe Explorer, highlighting it as akin to a custom Wikipedia page. They remarked on entering a new age of information discovery.
  • Discovery Spreads Beyond Original Post: @nosa_ followed up by pointing to a previous Discord conversation where @sincethestudy had already introduced Globe Explorer.
  • Viral Before Official Spread Attempt: @joshcho_ humorously noted that the Globe Explorer already went viral before the call to spread the word was even seen.

Links mentioned:

Tweet from brian-machado-finetuned-7b (e/snack) (@sincethestudy): Globe Explorer is kinda like a custom wikipedia page on anything you want. We are entering a new age of information discovery. go try it: http://explorer.globe.engineer/


LLM Perf Enthusiasts AI ▷ #prompting (2 messages):

  • Setting Up Max Token Limit: @ayushsharma mentioned the need to set the max_token_limit in the constructor, but provided no further details or context around this request.
  • Prompting LLMs for Non-Overlapping Grid Components: @firefox8975 inquired about writing prompts to organize different-sized components into a grid without overlap and questioned the LLM’s effectiveness at such spatial tasks. They sought advice on ensuring components do not overlap within an X by Y grid.

Alignment Lab AI ▷ #oo (2 messages):

  • GLAN Paper Spotted: User @.benxh shared a link to a paper on GLAN (Generative Latent Nearest Neighbors), asking if anyone is working with it. They included the research paper for reference.
  • Interest in GLAN Expressed: @entropi responded with interest to the mention of GLAN, indicating that they found the shared paper on the Generative Latent Nearest Neighbors algorithm intriguing.

Skunkworks AI ▷ #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=953U3FxHF-Q