> AI Discords for 2/3-4/2024. We checked **20** guilds, **308** channels, and **10449** messages for you. Estimated reading time saved (at 200wpm): **780 minutes**.

We’ve anecdotally gotten examples of refusal to follow instructions approximating laziness:

image.png

but it is hard to tell when it is luck of a bad draw or shameless self promotion.

This is why it’s rare to get official confirmation from the top:

image.png

Still, laziness isn’t a well defined technical term. It is frustrating to know that OpenAI has identified a problem and fixed it, but is not sharing what exactly it is.


Table of Contents

[TOC]

PART 1: High level Discord summaries

TheBloke Discord Summary

  • Polymind Plugin Puzzle: @doctorshotgun is enhancing Polymind with a plugin that integrates PubMed’s API to bolster the article search capabilities. Development complexities arise with the sorting of search results.

  • AI Model Roleplay Rig: Users recount their engagements with AI models for roleplaying, noting HamSter v0.2 from PotatoOff as a choice for detailed, unrestricted roleplay. Meanwhile, significant VRAM usage during the training of models like qlora dpo is a common challenge, with the use_reentrant flag in Axolotl set to False being a key VRAM consumption factor.

  • Tailoring FLAN-T5 Training Tips: In the quest for training a code generation model, @Naruto08 is guided to consider models like FLAN-T5, with resources like Phil Schmid’s fine-tuning guide available for reference. Meanwhile, @rolandtannous provides the DialogSum Dataset as a viable resource for fine-tuning endeavors on a p3.2xlarge AWS EC2 instance.

  • Merging Model Mastery: @maldevide introduces a partitioned layer model merging strategy with an inventive approach to handle the kvq with a 92% drop rate and a 68% drop rate for partitioned layers. The methodology and configuration are openly shared on GitHub Gist.

  • Local Chatbot Configuration Conundrum: @aletheion is on the hunt for a method to integrate a local model database lookup for an offline chatbot, and @wildcat_aurora recommends considering h2ogpt as a solution. Furthermore, @vishnu_86081 is exploring ChromaDB for character-specific long-term memory in a chatbot app.


Nous Research AI Discord Summary

  • Tackling GPT-4’s Lyricism Limitations: Members discussed GPT-4’s issues with generating accurate lyrics, noting that using perplexity with search leads to better outcomes than GPT-4’s penchant for fabricating lyrics.

  • Quantization Roadmap for LLMs: Topics included strategies for quantizing models, such as using llama.cpp for quantization processes and discussing the knowledge requirements for efficient VRAM usage in models like Mixtral, which can require up to 40GB in 4bit precision.

  • Innovative Model Merging Solutions: The community highlighted frankenmerging techniques with the unveiling of models like miqu-1-120b-GGUF and MergeMonster, and touched upon new methods like emulated fine-tuning (EFT) considering RL-based frameworks for stages of language model education.

  • Anticipation and Speculation Around Emerging Models: Conversations buzzed about the forthcoming Qwen2 model, predicting significant benchmarking prowess. Preference tuning discussions mentioned KTO, IPO, and DPO methods, citing a Hugging Face blog post, which posits IPO as on par with DPO and more efficacious than KTO.

  • Tools and Frameworks Enhancing AI Interaction and Testing: Mentioned solutions included text-generation-webui for model experimentation, ExLlamaV2 for OpenAI API compatible servers, and Lone-Arena for self-hosted LLM chatbot testing. Additionally, the community took note of a GitHub discussion regarding potential Apple Neural Engine support within llama.cpp.


LM Studio Discord Summary

  • Ghost in the Machine! LM Studio Not Shutting Down: Users reported that LM Studio continues running in task manager after the UI is closed. The suggested workaround is to forcibly terminate the process and report the bug.

  • CPU Trouble with AVX Instruction Sets: Some users encountered errors due to their processors lacking AVX2 support. The community pointed out that LM Studio requires AVX2, but a beta version might accommodate CPUs with only AVX.

  • AMD GPU Compute Adventures on Windows 11: For those wishing to use AMD GPUs on Windows 11 with LM Studio, a special ROCm-supported beta version of LM Studio is essential. Success was reported with an AMD Radeon RX 7900 XTX after disabling internal graphics.

  • Whisper Models and Llama Combinations Spark Curiosity: Integrating Whisper and Llama models with LM Studio was a topic of interest, with users referred to certain models on Hugging Face and other resources like Continue.dev for coding with LLMs.

  • Persistent Processes and Erratic Metrics in LM Studio: Users experienced problems with LM Studio’s Windows beta build, including inaccurate CPU usage data and processes that persist post-closure. Calls for improved GPU control in LM Studio ensued within community discussions.


Mistral Discord Summary

  • LLama3 and Mistral Integration Insights: Community members speculated on the architecture and training data differences between LLama3 and other models, while Mixtral’s effectiveness with special characters and long texts was a hot topic. Performance comparisons between OpenHermes 2.5 and Mistral, particularly “lost in the middle” issue with long contexts, were also discussed. Details on handling markdown in prompts and troubleshooting with tools like GuardrailsAI and Instructor were exchanged.

  • Model Hosting and Development Dilemmas: AI hosting on services like Hugging Face and Perplexity Labs was considered for its reliability and cost-effectiveness. A discussion on CPU inference for LMMs raised points about the suitability of different model sizes and quantization methods, with Mistral’s quantization featuring prominently. A new user was guided towards tools like Gradio and Hugging Face’s hosted models for model deployment without powerful hardware.

  • Fine-tuning Focuses and Financial Realities: Questions on fine-tuning for specific domains like energy market analysis were addressed, highlighting its feasibility but also the existing constraints due to Mistral’s limited resources. The community explored current limitations in Mistral’s API development, citing the high costs of inference and team size as critical factors.

  • Showcasing AI in Creative Arenas: Users showcased applications such as novel writing with AI assistance and critiqued AI-generated narratives. Tools for improving AI writing sessions like adopting Claude for longer context capacity were suggested. Additionally, ExLlamaV2 was featured in a YouTube video for its fast inference capabilities on local GPUs.

  • From Random Remarks to Platform Peculiarities: A Y Combinator founder called out to the community for insights on challenges when building in the space of LLMs. On a lighter note, playful messages like flag emojis popped up unexpectedly. Meanwhile, in la-plateforme, streaming issues with mistral-medium not matching the behavior of mistral-small were discussed, drawing ad-hoc solutions like response length-based discarding.


LAION Discord Summary

  • Collaboration Woes with ControlNet Authors: @pseudoterminalx voiced frustrations about challenges collaborating with the creators of ControlNet, citing a focus on promoting AUTOMATIC1111 at the expense of supporting community integration efforts. This reflects a wider sentiment of difficulty in implementation among other engineers.

  • Ethical Debates on Dataset Practices: Ethical concerns were raised surrounding the actions of Stanford researchers with respect to the LAION datasets, insinuating a shift towards business priorities following their funding achievements, potentially impacting public development and resource access.

  • Comparing AI Trailblazers: A discussion emerged comparing the strategies of Stability AI with those of a major player like NVIDIA. The conversation questioned the innovative capacities of smaller entities when adopting similar approaches to industry leaders.

  • Hardware Discussions on NVIDIA Graphics Cards: The engineering community engaged in an active exchange on the suitability of various NVIDIA graphics cards for AI model training, specifically the 4060 ti and the 3090, taking into account VRAM needs and budget considerations.

  • Speculations on Stability AI’s Next Moves: Anticipation was building with regard to Stability AI’s forthcoming model, prompting @thejonasbrothers to express concerns about the competitiveness and viability of long-term projects in light of such advancements.


HuggingFace Discord Summary

  • Demo Difficulties with Falcon-180B: Users like @nekoli. reported issues with the Falcon-180B demo on Hugging Face, observing either site-wide issues or specific outages in the demos. Despite sharing links and suggestions, resolutions seemed inconsistent.

  • LM Deployment and Use Queries: Queries emerged regarding deployment of LLMs such as Mistral 7B using AWS Inferentia2 and SageMaker, and how to access LLMs through an API with free credits on HuggingFace, although no subsequent instructional resources were linked.

  • Spaces Stuck and Infrastructure Woes: There were reports of a Space in a perpetual building state and potential wider infrastructure issues at Hugging Face affecting services like Gradio. Some users offered troubleshooting advice.

  • AI’s Role in Security Debated: Concerns were voiced over the misuse of deepfake technology, such as a scam involving a fake CFO. This highlights the importance of ethical considerations in the development and deployment of AI systems.

  • Synthesizing Community Insights Across Disciplines: The discussions covered a range of topics including admiration for the foundational “Attention Is All You Need” paper, advancements in Whisper for speaker diarization in speech recognition, the creation of an internal tool for summarizing audio recordings with a privacy-centric approach, and user engagement in a variety of Hugging Face community activities like blog writing, events, and technical assistance.

  • Hugging Face Community Innovates: The Hugging Face community shared a host of creations, from a proposed ethical framework for language model bots to projects like Autocrew for CrewAI, a hacker-assistant chatbot, predictive emoji spaces based on tweets, and the publication of the Hercules-v2.0 dataset for powering specialized domain models.

  • Explorations in Vision and NLP: Zeal was high for finding resources and collaborating on projects such as video summarization with timestamps, ethical frameworks for LLMs, spell check and grammar models, and the pursuit of Nordic language model merging with resources like a planning document, tutorial, and Colab notebook.

  • Scam Alerts and Technical Challenges in Diffusion Discussions: A scam message was flagged for removal, a GitHub issue with AutoModelForCausalLM was detailed, and the Stable Video Diffusion model license agreement was shared to discuss weight access, all reflecting the community’s efforts to maintain integrity and solve complex AI issues.

  • Engagement in the World of Computer Vision: Questions popped up about using Synthdog for fake data generation, finding current models for zero-shot vision tasks, and creating a sliding puzzle dataset for vision LLM training, suggesting an active search for novel approaches in AI.


OpenAI Discord Summary

  • Local LLMs Spark Interest Amidst GPT-4 Critiques: Engineers discuss potential alternatives to GPT-4, highlighting Local LLMs such as LM Studio and perplexity labs as viable options. Users express concerns about GPT-4’s errors and explore the performance of other models like codellama-70b-instruct.

  • GPT-4 Glitches Got Engineers Guessing: Reports have surfaced around @ mention issues and erratic GPT behavior, including memory lapses, indicating possible GPT-4 system inconsistencies. The user base is also grappling with missing features like the thumbs up option and sluggish prompt response times.

  • Prompt Engineering Puzzles Professionals: AI Engineers share frustration over ChatGPT’s overuse of ethical guidelines in storytelling and suggestions for steering clear of AI-language patterns to maintain humanlike interactions in AI communications. Recommendations to use more stable GPT versions for instruction consistency are also favored.

  • Hardware Hurdles in Hosting LLMs: Deep dives into hardware setups for running Local LLMs reveal engineers dealing with system requirements, notably the debate over RAM vs. VRAM. The community also voices skepticism about the information’s credibility on AI performance across different hardware setups.

  • AI Assistance Customization Conundrums: Detailed discussions ensue over refining GPT’s communication for users with specific needs, such as generating human-like speech for autistic users, and strategies to avoid name misspellings. Additionally, some users encountered unanticipated content policy violation messages and speculated on internal issues.


OpenAccess AI Collective (axolotl) Discord Summary

  • GPU Troubleshooting for Engineers: To address GPU errors on RunPod, @dangfutures recommended the command sudo apt-get install libopenmpi-dev pip install mpi4py. Additionally, @nruaif stated 80gb VRAM is necessary for LoRA or QLoRA on Llama70, with MoE layer freezing enabling Mixtral FFT on 8 A6000 GPUs.

  • Scaling Ambitions Spark Skepticism and Optimism: In a new Notion doc, a 2B parameter model by OpenBMB claimed comparable performance to Mistral 7B, generating both skepticism and excitement among engineers.

  • Finetuning Woes and Code Config Tweaks: @cf0913 experienced the EOS token functioning as a pad token after finetuning, which was resolved by editing the tokenizer config as suggested by @nanobitz. Also, @duke001. sought advice on determining training steps per epoch, with sequence length packing as a potential strategy.

  • Adapting to New Architectures: An issue was raised about running the axolotl package on an M1 MacBook Air, with a response from @yamashi about submitting a PR to use MPS instead of CUDA. Discussions also revolved around implementing advanced algorithms on new hardware like the M3 Mac.

  • Memory Troubles with Differential Privacy: @fred_fups struggled with out-of-memory issues when using differential privacy optimization (DPO) with qlora, and @noobmaster29 confirmed DPO’s substantial memory consumption, allowing only microbatch size of 1 with 24GB RAM.

  • RunPod Initialization Error and Configuration Concerns: @nruaif shared logs from RunPod indicating deprecated configurations and errors, including a missing _jupyter_server_extension_points function and incorrect ServerApp.preferred_dir settings. @dangfutures suggested exploring community versions for more reliable performance.


CUDA MODE (Mark Saroufim) Discord Summary

CUDA Curiosity Peaks: CUDA’s dominance over OpenCL is attributed to its widespread popularity and Nvidia’s support; Python continues to be a viable option for GPU computing, offering a balance between high-level programming ease and the nitty-gritty of kernel writing, as detailed in the CUDA MODE GitHub repository. Members also discussed the impact of compiler optimizations on CUDA performance, emphasizing the significance of even minute details in code, while advocating for robust CUDA learning through shared resources like tiny-cuda-nn.

PyTorch Parsers Perspire: Tips were shared on how to efficiently use the torch.compile API by specifying compiled layers, as seen in the gpt-fast repository. There’s a bonafide interest in controlling the Torch compiler’s behavior more finely, with the PyTorch documentation offering guidance. Amidst PyTorch preferences, TensorFlow also got a nod, mainly for Google’s hardware and pricing.

Lecture Hype: Anticipation grows as CUDA MODE’s fourth lecture on compute and memory architecture is heralded, with materials found in a repository jokingly criticized for its “increasingly inaccurately named” title, lecture2 repo. The lecture promises to delve into the nitty-gritty of blocks, warps, and memory hierarchies.

Job Market Buzzes: Aleph Alpha and Mistral AI are on the hunt for CUDA gurus, with roles integrating language model research into practical applications. Positions with a focus on GPU optimization and custom CUDA kernel development are up for grabs, detailed in the Aleph Alpha job listing and Mistral AI’s opportunity.

CUDA Beginners Unite: Rust gained some spotlight in lower-level graphics programming and the discussion tilted towards its viability in CUDA programming, garnering interest for CUDA GPU projects in Rust, like rust-gpu for shaders. The Rust neural network scene is warming up, with projects like Kyanite and burn to ignite the coding fire.


Eleuther Discord Summary

  • TimesFM Training Clarified: A corrected sequence for TimesFM model training was shared to emphasize non-overlapping output paths based on the model’s description. Meanwhile, the conversation about handling large contexts in LLMs spotlighted the YaRN paper, while a method for autoencoding called “liturgical refinement” was proposed.

  • MoE-Mamba Delivers Impressive Results: According to a recent paper, “MoE-Mamba” SSM model surpasses other models with fewer training steps. Strategies, such as adding a router loss to balance experts in MoE models and stabilizing gradients via techniques from the Encodec paper, were discussed for improving AI efficiency.

  • Interpretability Terms Defined: In the realm of interpretability, a distinction was noted between a “direction” as a vector encoding monosemantic meaning and a “feature” as the activation of a single neuron.

  • Organizing Thunderous Collaborations: A meeting schedule for Tuesday 6th at 5pm (UK time) was confirmed concerning topics like testing at scale, where Slurm was mentioned as a tool for queuing numerous jobs.

  • Multimodal MoE Models Explored: Discussions veered toward merging MoEs with VLMs and diffusion models for multimodal systems, aiming for deeper semantic and generative integration, and investigating alternatives like RNNs, CLIP, fast DINO, or fast SAM.

  • GPT-NeoX “gas” Parameter Deprecated: An update on GPT-NeoX involves the deprecation of the "gas" parameter as it was found non-functional and a duplicate of "gradient_accumulation_steps", with the warning that past configurations may have used smaller batch sizes unintentionally. A review of the related pull request is underway.


Perplexity AI Discord Summary

  • Polyglot Perplexity: Users demonstrated interest in Perplexity AI’s multilingual capabilities, with discussions about its proficiency in Chinese and Persian. Conflicting experiences were shared regarding Copilot’s role in model performance, but consensus on its exact benefits remains unclear.

  • Criticizing Customer Care: User @aqbalsingh faced difficulties with the email modification process and the iPhone app’s functionality, leading to their premium account cancellation. They and @otchuda shared dissatisfaction with the level of support provided by Perplexity AI.

  • Excitement and Analysis via YouTube: YouTube videos by @arunprakash_, @boles.ai, and @ok.alex provide analysis and reviews on why users might prefer Perplexity AI over other AI solutions, with titles like “I Ditched BARD & ChatGPT & CLAUDE for PERPLEXITY 3.0!”

  • Sharing Search Success: Users exchanged Perplexity AI search results that impacted their decisions such as upgrading to Pro subscriptions or assisting with complex problems, highlighting the utility and actionable insights provided by Perplexity’s search capabilities.

  • Mixtral’s Monetization Muddle: Within the #pplx-api channel, there’s ongoing curiosity about Mixtral’s pricing, with current rates at $0.14 per 1M input tokens and $0.56 per 1M output tokens. The community showed interest in a pplx-web version of the API, prompting discussion about business opportunities for Perplexity AI, although no official plans were disclosed.


LangChain AI Discord Summary

  • Seeking Solutions for Arabic AI Conversations: Members discussed technology options for interacting with Arabic content, where an Arabic Language Model (LLM) and embeddings were suggested as most technologies are language-agnostic. Specific alternatives like aravec and word2vec were mentioned for languages not supported by embedding-ada, such as Arabic.

  • Tips for Cost-Effective Agent Hosting: For a research agent with a cost structure of 5 cents per call, recommendations included hosting a local LLM for controlled costs, as well as deploying services like ollama on servers from companies like DigitalOcean.

  • Books and Learning Resources for LLM Enthusiasts: A new book titled “LangChain in your Pocket: Beginner’s Guide to Building Generative AI Applications using LLMs” was announced, providing a hands-on guide covering LangChain use cases and deployment, available at Amazon. Additionally, an extensive LangChain YouTube playlist was shared for tutorials.

  • Interactive Podcasts Leap Forward with CastMate: CastMate was introduced, enabling listeners to interact with podcast episodes using LLMs and TTS technology. A Loom demonstration was shared, and an iPhone beta is available for testing through TestFlight Link.

  • Navigating Early Hurdles with LangChain: Users reported encountering errors and outdated information while following LangChain tutorials, indicating potential avenues for improving the documentation and support materials. Errors ranged from direct following of YouTube tutorial steps to issues with the Ollama model in the LangChain quickstart guide.


LlamaIndex Discord Summary

  • RAG Pain Points Tackled: @wenqi_glantz, in collaboration with @llama_index, remedied 12 challenges in production RAG development, with full solutions presented on a cheatsheet, which can be found in their Twitter post.

  • Hackathon Fueled by DataStax: @llama_index acknowledged @DataStax for hosting and catering a hackathon event, sharing updates on Twitter.

  • Local Multimodal Development on Mac: LlamaIndex’s integration with Ollama now enables local multimodal app development for tasks like structured image extraction and image captioning, detailed in a day 1 integration tweet.

  • Diving Deep with Recursive Retrieval in RAG: @chiajy explored recursive retrieval in RAG systems and shared three techniques—Page-Based, Information-Centric, and Concept-Centric—in their Medium article, Advanced RAG and the 3 types of Recursive Retrieval.

  • Hybrid Retrieval Lauded for Dynamic Adjustments and Contributions: @cheesyfishes confirmed the Hybrid Retriever’s alpha parameter can be dynamically altered, and @alphaatlas1 advised a hybrid retrieval plus re-ranking pipeline, spotlighted the BGE-M3 model, and called for contributions on sparse retrieval methods detailed at BGE-M3 on Hugging Face.


Latent Space Discord Summary

  • Request for GPT API Federation: @tiagoefreitas expressed interest in GPT stores with APIs, wishing for @LangChainAI to implement federation in OpenGPTs for using GPTs across different servers via API.
  • Embracing Open Models Over Conventional Writing: Open models’ dynamic output, such as that of mlewd mixtral, was lauded over traditional writing for enhancing enjoyment and productivity in content creation.
  • Rise of Specialized Technical Q&As: @kaycebasques highlighted Sentry’s initiative as part of a growing trend towards creating specialized technical Q&A resources for developers, enhancing information accessibility.
  • Performance Praise for Ollama Llava: @ashpreetbedi shared a positive experience with Ollama Llava’s impressive inference speed when run locally, suggesting robust performance on consumer-grade hardware.
  • Career Choices in Tech Under Scrutiny: With the tech industry presenting multiple paths, @mr.osophy’s career dilemma encapsulates the juggle between personal interest in ML Engineering and immediate job opportunities.

Relevant Links:

  • No specific link was provided regarding federation in OpenGPTs.
  • For insights into the concept of model merging in AI, reference: Arcee and mergekit unite.
  • To understand the role of specialized technical Q&A platforms like Sentry, visit: Sentry Overflow.

Datasette - LLM (@SimonW) Discord Summary

  • Game Alchemy Unveils Hash Secrets: There’s a theory suggesting that the unexpected delay in generating new combinations in a game could be due to hashing mechanics, where new elements are created upon a hash miss from a pool of pre-generated combinations.
  • Visualizing the Genealogy of Game Words: Participants are interested in creating a visual representation of the genealogy for word combinations in a game to gain deeper insights, potentially using embeddings to chart crafting paths.
  • Take Control with a Bookmarklet: A JavaScript bookmarklet is available that leverages the game’s localStorage to export and auto-save crafted items, enabling players to keep track of all ingredients they’ve crafted directly within the gaming experience.
  • Llama 2 AI Engine Revealed: The AI powering the inventive element combinations in the game is llama 2, as disclosed by the creator in a posted tweet and is provided by TogetherAI.
  • Element Order Affects Crafting Success: The sequence in which elements are combined in the game has been found to impact the crafting result, with some combinations only successful if items are layered in a specific order, and the server remembers the sequence attempted to prevent reversal on subsequent tries.

DiscoResearch Discord Summary

  • German Language Models Boosted: @johannhartmann reported improvements in mt-bench-de scores by utilizing German dpo and laserRMT, and has been merging German 7B-models using dare_ties. Despite sharing links to the resources, the cause of specific performance changes, including a decrease in math ability, remains unclear.
  • Research Quest for LLM Context Handling: @nsk7153 sought research materials on large language models (LLMs) capable of managing long-context prompts, sharing a Semantic Scholar search with current findings.
  • Introducing GermanRAG for Fine-Tuning: @rasdani announced the release of the GermanRAG dataset, designed for fine-tuning Retrieval Augmented Generation models, and provided the GitHub repository for access and contribution.
  • Scandinavian Benchmark Enthusiasm Projected onto German Models: @johannhartmann expressed interest in developing a benchmark similar to ScandEval for evaluating German language model performance.
  • Upcoming German Hosting Service: In the #discolm_german channel, flozi00 mentioned they are currently working on provisioning a German hosting service.

Alignment Lab AI Discord Summary

  • Diving into Training Data for Mistral-7B Open-Orca: @njb6961 sought details on replicating Mistral-7B Open-Orca with its curated filtered subset of most of our GPT-4 augmented data. The dataset identified, SlimOrca, comprises around 500,000 GPT-4 completions and is designed for efficient training.

  • Dataset Discovery and Confirmation: The SlimOrca dataset was confirmed by @ufghfigchv as the training data used for Mistral-7B Open-Orca. The model’s training configuration should be accessible in the config subdirectory of the model’s repository.

  • Commercial Contact Conundrum: @tramojx’s request for marketing contact details for a listing and marketing proposal went unanswered in the message history provided.


Skunkworks AI Discord Summary

  • Skewed Perspectives in AI Discussions: The conversation touches on contrasting approaches to embedding by considering the use of whole document text embeddings as opposed to vision embedded techniques. The discussion is framed around the potential for reimplementation of an encoder/decoder model, with a curiosity about the specific involvements of such a task.

LLM Perf Enthusiasts AI Discord Summary

  • BentoML Eases Model Deployment: @robotums reported a smooth experience in deploying models with BentoML, specifically using a VLLM backend on AWS, describing the process as “pretty easy, you just run the bento.”

  • DSPy Framework Elevates Language Model Programming: @sourya4 highlighted the launch of DSPy, a Stanford initiative aimed at transforming the way foundation models are programmed. A supplemental YouTube video provides further insight into DSPy’s capabilities for creating self-improving LM pipelines.


AI Engineer Foundation Discord Summary

  • AIEF Bulgaria Chapter Makes Waves: The AIEF Bulgaria Chapter held its second monthly meet-up with 90 participants, featuring ‘Lightning Talks’ on a wide range of topics and fostering networking opportunities.
  • Diverse Lightning Talks Spark Interest: Presentations on QR Code Art, Weaving The Past, LMMs (Large Language Models), Zayo, and strategies for building a defensible business in the age of AI were a highlight, with full recordings promised for the chapter’s YouTube channel soon.
  • Spotlight on ChatGPT Implementation Strategy: A session on “ChatGPT Adoption Methodology” by Iliya Valchanov offered insights into integrating ChatGPT into business processes, with shared resources linked through a Google Slides document.
  • Sharing Success on Social Media: The AIEF Bulgaria lead, @yavor_belakov, took to LinkedIn to share highlights from the meet-up, reflecting the vibrancy and advancements of the AI engineering community involved with AIEF.
  • Presentations Capturing Technical Innovation: The slides from the meet-up presentations, including those on QR Code Art, historical knitting, an LLM command-line tool, reimagined employee management with Zayo, and robust business models in AI, underscore the technical diversity and innovation within the AIEF community.

PART 2: Detailed by-Channel summaries and links

TheBloke ▷ #general (1738 messagesđŸ”„đŸ”„đŸ”„):

  • Plugin Development Adventures: User @doctorshotgun is working on coding a plugin for Polymind, aiming to improve its article search functionality with PubMed’s API. They are currently incorporating pymed to construct and parse search queries, encountering challenges with the sorting and relevance of search results.

  • Exploring Miqu: Several users, including @nextdimension and @netrve, discuss the usefulness of the local LLM model miqu-1-70b. While some find it useful, others report it produces unsatisfactory results, which may be attributed to its generation parameters.

  • Interest in Mixtral Instruct: Discussions regarding the efficiency and quality of responses are ongoing, with users like @doctorshotgun highlighting slower response times when processing large RAG contexts on the 70B model.

  • BagelMIsteryTour Emerges: The BagelMIsteryTour-v2-8x7B-GGUF model receives praise as @ycros attributes its success to merging the Bagel model with Mixtral Instruct. The model is good for tasks like roleplay (RP) and general Q&A, according to user testing.

  • Oobabooga vs Silly Tavern: User @parogar expresses frustration over Oobabooga (likely a local LLM runner) API changes that hinder Silly Tavern’s connection. They are seeking ways to revert to a previous version that was more compatible.

Links mentioned:


TheBloke ▷ #characters-roleplay-stories (678 messagesđŸ”„đŸ”„đŸ”„):

  • Discussions on Model Performance and Preferences: Users shared experiences with various AI models for roleplaying, with mentions of goliath 120b, mixtral models, and variations like limaRP and sensual nous instruct. @potatooff suggested the HamSter v0.2 model for uncensored roleplay with a detailed character card, using Llama2 prompt template with chat-instruct.

  • Technical Deep Dive into DPO and Model Training: There was a technical conversation about the large VRAM usage for DeeperSpeed (DPO) and its impact on training AI models, with various users discussing their struggles with fitting models like qlora dpo on GPUs due to, as @doctorshotgun explained, the gradient_checkpointing_kwargs setting use_reentrant being set to False by default in Axolotl, which they suggest changing for less VRAM usage.

  • Seeking Advice for Optimizing Character Cards: @johnrobertsmith sought advice on optimizing character cards for AI roleplay, with suggestions to keep character descriptions around 200 tokens and use lorebooks for complex details like world spells. @mrdragonfox shared an example character card and endorsed using lorebooks for better character definition.

  • Exploring Various Models’ VRAM Consumption: Users including @c.gato, @giftedgummybee, and @kalomaze discussed the resource-intensive nature of certain AI models, specifically when using DPO, and shared their experiences with large consumption due to duplications needed for DPO’s caching requirements.

  • Miscellaneous Conversations and Jokes: Amongst the technical and performance-focused discussions, there were lighter moments with users joking about winning arguments with AI (@mr.devolver) and random jabs at found objects being “smelly” (@kaltcit and @stoop poops).

Links mentioned:


TheBloke ▷ #training-and-fine-tuning (18 messagesđŸ”„):

  • Choosing the Right Model for Code Generation: @Naruto08 is seeking advice on which model to train for code generation with a custom dataset in [INST] {prompt} [/INST] format. They have 24GB of GPU memory available and want to ensure proper model and training approach selection.

  • Inquiry on Specific Model Fine-Tuning: User @709986_ asked if the model em_german_mistral_v01.Q5_0.gguf can undergo fine-tuning, but did not provide details on the desired outcome or specifics of the fine-tuning process.

  • Finetuning Flan-T5 on Limited Resources: @tom_lrd queried about the dataset size and hardware requirements for fine-tuning a flan-t5 model, while @rolandtannous responded with experience of performing LoRA fine-tuning on flan-t5-base using AWS instances and shared a relevant dataset located at DialogSum Dataset on Huggingface.

  • Accessible Fine-Tuning of FLAN-T5: @rolandtannous shared details about the ease of fine-tuning FLAN-T5 base models, given their size (approximately 900MB-1GB), and pointed to the use of a p3.2xlarge AWS EC2 Instance with a NVIDIA V100 by Phil Schmidt in related experiments. They also provided a comprehensive guide on fine-tuning FLAN-T5 for dialogue summarization using the SAMSUM dataset.

  • Clarifying “Uncensored” Models on Huggingface: @thisisloading inquired about “uncensored” models on Huggingface, leading to a discussion about the process of removing alignment from such models, as detailed by Eric Hartford in his blog post: “Uncensored Models”. The procedure is akin to “surgically” removing alignment components from a base model, enabling further customization through fine-tuning.

Links mentioned:


TheBloke ▷ #model-merging (5 messages):

  • Innovative Model Merging Technique: @maldevide detailed a novel approach to model merging, where layers are partitioned into buckets and merged individually, with a unique treatment for kvq that involves a 100% merge weight but with a high drop rate of 92%.
  • Partitioned Layer Merging Results: Following the new approach, @maldevide mentioned that each partition, if there are four, would be merged at a 68% drop rate, suggesting this specific drop rate has been impactful.
  • Interest in the New Approach: @alphaatlas1 showed interest in @maldevide’s merging method, asking to see the configuration or the custom code.
  • Access to New Model Merging Code: @maldevide responded to the request by providing a link to their configuration in the form of a GitHub Gist, allowing others to view and potentially use the described technique.

Links mentioned:

tinyllama-merge.ipynb: GitHub Gist: instantly share code, notes, and snippets.


TheBloke ▷ #coding (6 messages):

  • Local Model Lookup Quest: @aletheion is seeking help on how to implement a feature where a chatbot can perform a lookup action in a local/vector database to provide answers while keeping everything offline. They expressed openness to using existing frameworks or solutions.

  • h2ogpt Suggested for Local Bot Implementation: @wildcat_aurora shared a GitHub repository for h2ogpt, which offers private Q&A and summarization with local GPT, supporting 100% privacy, and touted compatibility with various models, which could be a solution for @aletheion’s query.

  • API Confusion Unraveled: @sunija expressed frustration over Ooba’s API requiring a “messages” field despite documentation suggesting it wasn’t necessary, but then realized the mistake and self-recognized the dislike for making web requests.

  • Model Evaluation Success: @london reported that models, Code-13B and Code-33, succeeded in evaluations on EvalPlus and other platforms after being asked for submission by another user.

  • Chatbot App Aims for Character-Specific Long-Term Memory: @vishnu_86081 is looking for guidance on setting up ChromaDB for their chatbot app that allows users to chat with multiple characters, aiming to store and retrieve character-specific messages using a vector DB for long-term memory purposes.

Links mentioned:

GitHub - h2oai/h2ogpt: Private Q&A and summarization of documents+images or chat with local GPT, 100% private, Apache 2.0. Supports Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/: Private Q&A and summarization of documents+images or chat with local GPT, 100% private, Apache 2.0. Supports Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/ -



Nous Research AI ▷ #off-topic (56 messagesđŸ”„đŸ”„):

  • GPT-4’s Lyric Quirks: @cccntu discussed the limitations of GPT-4 in generating lyrics accurately, mentioning that using perplexity with search yields better results than the AI, which tends to fabricate content.
  • Greentext Generation Challenges: @euclaise suggested that 4chan’s greentext format may be difficult for AI to learn due to lack of training data, while @teknium shared a snippet showcasing an AI’s attempt to mimick a greentext narrative involving Gaia’s Protector, highlighting the challenges in capturing the specific storytelling style.
  • Call for Indian Language AI Innovators: @stoicbatman invited developers and scientists working on AI for Indian languages to apply for GPU computing resources and infrastructure support provided by IIT for advancing regional language research.
  • Llama2 Pretrained on 4chan Data?: @stefangliga claimed that 4chan content is in fact part of llama2’s pretraining set, countering the assumption that it might be deliberately excluded.
  • Apple Accused of Creating Barriers for AR/VR Development: @nonameusr criticized Apple’s approach to its technology ecosystem, arguing that the company’s restrictive practices like charging an annual fee just to list apps and the lack of immersive VR games for Vision Pro are hindrances for AR/VR advancement.

Links mentioned:


  • Embracing the EFT Revolution: @euclaise shared a paper introducing emulated fine-tuning (EFT), a novel technique to independently analyze the knowledge gained from pre-training and fine-tuning stages of language models, using an RL-based framework. The paper challenges the understanding of pre-trained and fine-tuned models’ knowledge and skills interplay, proposing to potentially combine them in new ways (Read the paper).
  • Frankenmerge Hits the Ground: @nonameusr introduced miqu-1-120b-GGUF, a frankenmerged language model built from miqu-1-70b and inspired by other large models like Venus-120b-v1.2, MegaDolphin-120b, and goliath-120b, highlighting the CopilotKit support (Explore on Hugging Face).
  • FP6 Quantization on GPU: @jiha discussed a new six-bit quantization method for large language models called TC-FPx, and queried its implementation and comparative performance, with @.ben.com noting the optimal precision for the majority of tasks and its practical benefits in specific use-cases (Check the abstract).
  • Mercedes-Benz of Models: @gabriel_syme surmised the potential sizes of new models being discussed, with users speculating about the upcoming Qwen 2 model and its performance compared to predecessors like Wen-72B. Chatter in this topic included expectations of model sizes and benchmark performance.
  • The New Merge on the Block: @nonameusr presented MergeMonster, an unsupervised algorithm for merging Transformer-based language models, that features experimental merge methods and performs evaluations before and after merging each layer (Discover on GitHub).

Links mentioned:


Nous Research AI ▷ #general (550 messagesđŸ”„đŸ”„đŸ”„):

  • Quantizing Emojis: Members @agcobra1 and @n8programs were engaged in a teaching session on how to quantize models using llama.cpp. The process involves cloning the model, pulling large files with git lfs pull, and then using the convert.py script for conversion and ./quantize for quantization.

  • Qwen2 Release Anticipation: The Qwen model team was hinting at the release of Qwen2, expected to be a strong contender in benchmarks, potentially even surpassing the performance of Mistral medium. @bratao shared a GitHub link hinting at Qwen2’s upcoming reveal.

  • Discussions on Future Digital Interfaces: @nonameusr and @n8programs delved into a speculative conversation about the potential future of brain-computer interfaces, imagining scenarios where thoughts could directly interact with digital systems without the need for traditional input methods.

  • Text Generation UI and API Ergonomics: @light4bear recommended text-generation-webui for easily experimenting with models, whereas @.ben.com offered an OpenAI API compatible server experiment with ExLlamaV2 for testing downstream clients.

  • Experiments and Comparisons in Preference Tuning: @dreamgen inquired about the practical comparison between KTO, IPO, and DPO methods for aligning language models. A subsequent Hugging Face blog post was referenced that discusses corrected IPO implementation results, showing IPO on par with DPO and better than KTO in preference settings.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (90 messagesđŸ”„đŸ”„):

  • Hermes Model Confusion Cleared: @teknium clarified the difference between Nous Hermes 2 Mixtral and Open Hermes 2 and 2.5, which are 7B Mistrals, with Open Hermes 2.5 having added 100,000 code instructions.
  • Mixtral’s Memory-Antics: @teknium and @intervitens discussed that Mixtral models requires about 8x the VRAM of a 7B model and about 40GB in 4bit precision. @intervitens later mentioned that with 8bit cache and optimized settings, 3.5 bpw with full context could fit.
  • Prompt Probing: @tempus_fugit05 received corrections from @teknium and .ben.com on the prompt format they’ve been using with the Nous SOLAR model, pointing to usage of incorrect prompt templates.
  • Expert Confusion in MoEs Explained: .ben.com explained how in MoEs, experts are blended proportionally to the router’s instructions, emphasizing that while experts are chosen per-layer, their outputs must add up correctly in the final mix.
  • Lone-Arena For LLM Chatbot Testing: .ben.com shared Lone-Arena, a self-hosted chatbot arena code repository on GitHub for personal testing of LLMs.

Links mentioned:


LM Studio ▷ #💬-general (225 messagesđŸ”„đŸ”„):

  • Persistent Phantom: User @nikofus reported that even after closing LM Studio UI, it continued to show in the task manager and use CPU resources. To address this, @heyitsyorkie suggested force killing the process and creating a bug report in a specific channel.

  • LM Studio’s Ghostly Grip: @vett93 questioned why LM Studio remains active in Task Manager after the window is closed. @heyitsyorkie explained it’s a known bug and the current solution is to end the process manually.

  • AVX Instruction Frustration: Users @rachid_rachidi and @sica.rios faced errors due to their processors not supporting AVX2 instructions. @heyitsyorkie clarified that LM Studio requires AVX2 support, but a beta version is available for CPUs with only AVX.

  • Roaming for ROCm: @neolithic5452 inquired about getting LM Studio to use GPU compute on an AMD 7900XTX GPU instead of just CPU for a Windows 11 setup. @quickdive. advised using a special beta version of LM Studio that supports ROCm for AMD GPU compute capability, available in the channel pinned messages.

  • Whispers of Integration: @lebonchasseur showed interest in experiences combining Whisper and Llama models with LM Studio, whilst @muradb inquired about suitable vision models. Users were pointed towards Llava and explicitly to one on the Hugging Face model page.

Links mentioned:


LM Studio ▷ #đŸ€–-models-discussion-chat (149 messagesđŸ”„đŸ”„):

  • Model Recommendation for Specific PC Specs: @mesiax. inquired about the best performing model for a PC with 32GB RAM and 12GB VRAM that fully utilizes the GPU. While @wolfspyre offered some advice, ultimately recommending that they start testing and learning through experience, as no one-size-fits-all solution exists.

  • Model Updates and Notifications: User @josemanu72 asked whether they need to manually update a model when a new version is published. @heyitsyorkie clarified that updating is a manual process, as LLMs create a whole new model rather than update an existing one.

  • VP Code Versus IntelliJ Plugins: @tokman expressed a preference for IntelliJ over VS Code and inquired about the availability of a similar plugin for IntelliJ after discovering a useful extension for VS Code. @heyitsyorkie mentioned a possible workaround with the IntelliJ plugin supporting local models through server mode.

  • Continue Integration and Usage: @wolfspyre discussed the benefits of Continue.dev, which facilitates coding with any LLM in an IDE, and @dagbs pointed to a channel that could be a general discussion space for integrations.

  • Query on Image Generation Models: @kecso_65737 sought recommendations for image generation models. @fabguy suggested Stable Diffusion (SDXL) but noted it’s not available on LM Studio, and @heyitsyorkie emphasized the same while mentioning Automatic1111 for ease of use outside LM Studio.

Links mentioned:


LM Studio ▷ #🧠-feedback (11 messagesđŸ”„):

  • Model Download Mystery: Stochmal faced issues with downloading a model, encountering a ‘fail’ message without an option to retry or resume the download process.
  • Apple Silicon VRAM Puzzle: @musenik reported that even with 90GB of VRAM allocated, the model Miquella 120B q5_k_m.gguf fails to load on LM Studio on Apple Silicon, whereas it successfully loads on Faraday.
  • LM Studio vs. Faraday: @yagilb shared a hypothesis that LM Studio might try to load the whole model into VRAM on macOS, which could cause issues, hinting at a future update to address this.
  • In Search of Hidden Overheads: @musenik suggested looking into potential unnecessary overhead in LM Studio when loading models, as Faraday loads the same model with a switch for VRAM and functions correctly.
  • Download Resumability Requested: @petter5299 inquired about the future addition of a resume download feature in LM Studio, expressing frustration over downloads restarting after network interruptions.

LM Studio ▷ #🎛-hardware-discussion (217 messagesđŸ”„đŸ”„):

  • Seeking General-Purpose Model Advice: User @mesiax. inquired about the best performance model to run locally on a PC with 32GB of RAM and 12GB of VRAM, wishing to utilize the GPU for all processing. Fellow users didn’t respond with specific model recommendations, instead, conversations shifted towards detailed hardware discussions on GPUs, RAM speeds, and PCIe bandwidth for running large language models.
  • RAM Speed vs. GPU VRAM Debate: Users, including @goldensun3ds, discussed the influence of RAM speed on running large models, considering an upgrade from DDR4 3000MHz to 4000MHz or faster. Conversations revolved around system trade-offs, such as RAM upgrades versus adding GPUs, and touched upon hardware compatibility and performance expectations.
  • P40 GPU Discussions Spark Curiosity and Concern: Members like @goldensun3ds and @heyitsyorkie debated the suitability of Nvidia Tesla P40 GPUs for running large models, such as the 120B Goliath. Issues raised included driver compatibility, potential bottlenecks when pairing with newer GPUs, and P40’s lack of support for future model updates.
  • Ryzen CPUs and DDR5 RAM Get a Mention: Discussion by @666siegfried666 and .ben.com briefly pointed out the advantages of certain Ryzen CPUs and DDR5 RAM for local model inference, although the X3D cache’s effectiveness and Navi integrated NPUs were debated.
  • Viable High-VRAM Configurations Explored: Users like @quickdive. and @heyitsyorkie examined the potential of different GPU setups, including P40s, 3090s, and 4090s for deep learning tasks. The consensus leaned towards using higher VRAM GPUs to avoid bottlenecks and improve performance.

Links mentioned:


LM Studio ▷ #đŸ§Ș-beta-releases-chat (42 messagesđŸ”„):

  • Image Analysis Capability in Question: @syslot confirmed that Llava-v1.6-34b operates well, while @palpapeen expressed difficulties making it analyze images, despite the vision adapter being installed and an ability to send images in chat. For @palpapeen, the configuration worked for Llava1.5 7B but not Llava1.6 34B.

  • Discussions on Model and Processor Compatibility: @vic49. mentioned an issue discussed on GitHub: separating the model and processor using GGUF formatting prevents the GGUF from utilizing the higher resolution of version 1.6.

  • The ROCm Path Struggle on Windows 11 with AMD: @sierrawhiskeyhotel experienced a “Model error” with AMD hardware on Windows 11 but eventually resolved it by turning off internal graphics and using GPU Preference settings, confirming successful use of an AMD Radeon RX 7900 XTX.

  • Desire for More GPU Control Expressed: Following a discussion on troubleshooting ROCm configuration and GPU utilization, @fabguy, @heyitsyorkie, and @yagilb concurred that more control over which GPU is used would be beneficial, an issue addressed within the community.

  • New Windows Beta Build and Reported Issues: @yagilb shared a link to a new Windows beta build, featuring an improvement to how LM Studio shows RAM and CPU counts. @fabguy reported inconsistent CPU usage metrics and lingering processes after closing the app, while @heyitsyorkie suggested the process bug was not easily reproducible.

Links mentioned:


LM Studio ▷ #autogen (4 messages):

  • Error in POST Request: @merpdragon shared a pastebin link containing an error they encountered when making a POST request to /v1/chat/completions. The shared log indicates an issue while processing the prompt about children driving a car.
  • LM Studio Setup with Autogen Issues: @j_rdie mentioned having LM Studio set up with autogen, confirming the token and model verification, but facing an issue where the model won’t output directly, only during autogen testing.
  • Starting with Autogen Guide: @samanofficial inquired about how to start with autogen, and @dagbs provided a link to a channel within Discord for further guidance. However, the specific content or instructions from the link cannot be discerned from the message.

Links mentioned:


LM Studio ▷ #langchain (2 messages):

  • Local LLM Setup Needs ENV Var or Code Alteration: User @tok8888 posted a code snippet illustrating that for local setup, one must either set an environment variable for openai_api_key or modify the code to include the API key directly. They showed an example with the API key set to "foobar" and altered the ChatOpenAI initialization.

  • Inquiry About LM Studio for Appstore Optimization: User @disvita asked the group how they can utilize LM Studio for App Store optimization, but provided no further context or details in their query.


Mistral ▷ #general (278 messagesđŸ”„đŸ”„):

  • LLama3 Speculation and Mixtral Tips Wanted: User @frosty04212 wondered if Llama3 would have the same architecture but different training data, while @sheldada sought tips for prompting Mixtral effectively, mentioning odd results. @ethux inquired about the use case of Mixtral, whether through API, self-hosted, or other methods.

  • Character Conundrums with Mistral: @cognitivetech brought up issues with Mistral’s handling of special characters, noting problems with certain characters like the pipe (|) and others when processing academic text. They discussed challenges with input over 10,000 characters and the variability of results with different characters and model variations.

  • Model Performance Discussions: @cognitivetech and @mrdragonfox exchanged observations on model inference times with OpenHermes 2.5 versus Mistral, noting differences when using different tooling. They also touched on the phenomenon known as “lost in the middle,” where performance issues arise dealing with relevant information in the middle of long contexts.

  • Aspiring Image Model Developers Connect: User @qwerty_qwer offered 600 million high-quality images to anyone developing an image generative model, sparking discussions with @i_am_dom and @mrdragonfox on the feasibility and computational challenges of training a model from scratch.

  • Function Calling Feature Request and Office Hours Critique: @jujuderp lamented the absence of function calling and JSON response mode in Mistral API, referencing a community post, while @i_am_dom offered a critique of the office hours sessions, comparing them to Google’s approach on Bard discord and noting a lack of informative responses from Mistral AI.

Links mentioned:


Mistral ▷ #models (45 messagesđŸ”„):

  • Exploring AI Hosting Options: User @i_am_dom suggested Hugging Face as a free and reliable hosting service for AI models; later, they also mentioned Perplexity Labs as another hosting option. @ashu2024 appreciated the information.

  • Best Models for CPU Inference Explored: @porti100 solicited advice on running smaller LLMs coupled with RAG on CPUs, @mrdragonfox recommended the 7b model but warned that it would be slow on CPUs. There was a brief discussion revolving around performance differences on lower-end systems and the efficiencies of various 7b quantized models.

  • Mistral’s Superior Quantization Highlighted: @cognitivetech shared their experience that Mistral’s quantization outperformed other models, especially since version 0.2. They emphasized the need to test full models under ideal conditions for an accurate assessment.

  • Execution Language Impacts AI Performance: @cognitivetech reported significant differences in performance when using Go and C++ instead of Python, while @mrdragonfox argued that since the underlying operations are in C++, the interfacing language shouldn’t heavily impact the outcomes.

  • Getting Started with Mistral AI: Newcomer @xternon inquired about using Mistral AI without laptop components powerful enough to run the models, leading to suggestions to use Gradio for a demo web interface or Hugging Face’s hosted models for an easy browser-based experience. @adriata3 pointed out options for local CPU usage and recommended their GitHub repository with Mistral code samples, along with Kaggle as a potential free resource.

Links mentioned:


Mistral ▷ #deployment (17 messagesđŸ”„):

  • Mistral mishap with markdown: @drprimeg1 struggled with Mistral Instruct AWQ not outputting content inside a JSON format when given a prompt with Markdown formatting. Their current approach to classification can be found here, but the model responds with placeholders instead of actual content.

  • Markdown mayhem in models: @ethux suggested that @drprimeg1’s problem could be due to the Markdown formatting, noting that the model tries to output JSON but ends up displaying markdown syntax instead.

  • GuardrailsAI to guide prompt effectiveness: @ethux offered a solution by recommending GuardrailsAI as a tool for ensuring correct output formats and mentioned its capability to force outputs and retry upon failure. They also included a reference to the tool at GuardrailsAI.

  • Teacher forcing talk: @ethux mentioned that GuardrailsAI implements a form of teacher forcing by providing examples of what went wrong and how to correct it, while also being predefined.

  • Instructor Introduction: As another recommendation for structured output generation, @ethux shared a link to Instructor, a tool powered by OpenAI’s function calling API and Pydantic for data validation, described as simple and transparent. Additional insights and a community around the tool can be accessed at Instructor’s website.

Links mentioned:


Mistral ▷ #finetuning (17 messagesđŸ”„):

  • Guidance on Fine-tuning for Energy Markets: @tny8395 inquired about training a model for automated energy market analysis and was informed by @mrdragonfox that it’s possible to fine-tune for such a specific purpose.
  • Channel Clarification and Warning Against Spam: @mrdragonfox guided @tny8395 to keep the discussion on fine-tuning in the current channel and reminded them that spamming will not elicit additional responses.
  • Mistral and Fine-tuning API Development: @a2rette asked if Mistral plans to work on fine-tuning APIs. @mrdragonfox responded, highlighting the current limitations due to the cost of inference and small team size, concluding that for now it is a “not yet.”
  • Resource Realities at Mistral: @mrdragonfox provided context on Mistral’s operational scale, explaining that despite funding, the industry’s high costs and a small team of around 20 people make certain developments challenging.
  • Seeking Fine-tuning Info for Mistral with Together AI: @andysingal inquired about resources for fine-tuning Mistral in combination with Together AI but did not receive a direct response.

Mistral ▷ #showcase (7 messages):

  • ExLlamaV2 Featured on YouTube: @pradeep1148 shared a YouTube video titled “ExLlamaV2: The Fastest Library to Run LLMs,” highlighting a fast inference library for running LLMs locally on GPUs. They also provided a GitHub link for the project and a Google Colab tutorial.
  • Novel Writing with AI Assistance: @caitlyntje described their process of using AI to write a novel, involving generating an outline, chapter summaries, and then iterating over each chapter to ensure consistency, style, and detail. The process was carried out in sessions due to limitations in token handling on their MacBook.
  • Careful Monitoring During AI-Assisted Writing: In a follow-up, @caitlyntje mentioned the necessity of careful oversight to maintain the logical flow and timeline when using AI for writing.
  • Model Capacity Recommendation: Reacting to limitations mentioned by @caitlyntje, @amagicalbook recommended trying Claude, which allegedly can handle up to 200k token contexts.
  • Critic of AI-Generated Discworld Narrative: @swyxio, a fan of Terry Pratchett, critiqued an AI-generated Discworld narrative for not capturing the essence of iconic characters like the witches, leading to a halt in reading.

Links mentioned:

ExLlamaV2: The Fastest Library to Run LLMs: A fast inference library for running LLMs locally on modern consumer-class GPUshttps://github.com/turboderp/exllamav2https://colab.research.google.com/github



Mistral ▷ #random (2 messages):

  • YC Founder Seeks LLM Challenges: User @znrp, a founder in Y Combinator, is reaching out for insights into the challenges community members face with building in the space of LLMs. They’re open to direct messages for a quick chat.
  • Flags Fly High in Random: User @gafty expressed their excitement or playfulness with a simple emoji message containing the Romanian flag and a crazy face.

Mistral ▷ #la-plateforme (4 messages):

  • Stream Gate Closed: @jakobdylanc experienced a problem where mistral-medium was not sending the final empty chunk in streamed responses, differentiating it from expected behavior observed in mistral-small.
  • Quick Fix, Not a Full Solution: In response to the streaming issue, @drones_flier suggested discarding responses below a certain length as a temporary workaround, although they noted it might not be ideal for all use cases.

LAION ▷ #general (361 messagesđŸ”„đŸ”„):

  • The Frustration with Fooocus: @pseudoterminalx expressed dissatisfaction working with the authors of controlnet and adapting their models to other platforms, describing difficulties in collaboration and a focus on promoting AUTOMATIC1111. Concerns about the willingness of these developers to consider community needs were discussed.
  • Concerns Over Controlnet Adoption and Stanford Researchers: Several users like @astropulse and @pseudoterminalx shared their struggles with implementing ControlNet into their projects, noting a lack of information and support. Discussions arose around the ethics and actions of Stanford researchers related to LAION datasets, implying a business-first mentality and a lack of public development following their funding success.
  • Debate on Tech Giants and AI Model Training: Users like @pseudoterminalx, @thejonasbrothers, and @drhead discussed the alignment of Stability AI’s practices with NVIDIA’s, touching on the strategy of following the tech giant’s footsteps and questioning the independent innovative capacities of smaller entities.
  • Graphics Card Discussion for AI Models: In a series of exchanges, users like @ninyago and @vrus0188 discussed the adequacy of various NVIDIA graphics card models for running AI models, such as the 4060 ti and the 3090, with consideration for VRAM and budget.
  • Stability AI’s New Model Release Speculation: Several users, including @thejonasbrothers and @vrus0188, conversed about Stability AI’s upcoming new model, with @thejonasbrothers lamenting their own six-month project given the new model’s capabilities and expressing disappointment in having to compete with substantial resources like that of Stability AI.

Links mentioned:


LAION ▷ #research (3 messages):

  • Emoji Speaks Louder Than Words: User @nodja posted an interesting pair of 👀 without any accompanying text, leaving us all in suspense.
  • Qwen-VL-Max on Hugging Face Gains Attention: @nodja shared a link to the Hugging Face space for Qwen-VL-Max, but it was noted to be a duplicate of Qwen/Qwen-VL-Plus with the same accompanying images of the model’s avatar.
  • Clarification and Retraction: Shortly after sharing the link, @nodja followed up with a simple “nevermind,” indicating the previous message may have been posted in error.

Links mentioned:

Qwen-VL-Max - a Hugging Face Space by Qwen: no description found


HuggingFace ▷ #general (289 messagesđŸ”„đŸ”„):

  • Discussions about Falcon-180B and Demos Not Working: Users like @nekoli. reported issues with accessing HuggingFace demo pages, like Falcon-180B, indicating either site-wide issues or specific demo outages. Links to Falcon-180B and suggestions were shared, but success seemed varied.
  • Questions on LLM Deployment and Usage: @rishit_kapoor inquired about tutorials for deploying Mistral 7B via AWS Inferentia2 and SageMaker, while @_sky_2002_ sought information on using LLMs via an API with free credits on HuggingFace.
  • Technical Assistance for Spaces: @dongd. sought help for a Space stuck in a building state, with @not_lain offering troubleshooting advice. The conversation touched on dependency issues and the functionality of 'factory rebuild'.
  • Hugging Face Infrastructure Issues: @lolskt and @wubs_ highlighted possible infrastructure issues at Hugging Face, possibly affecting Gradio and other services, while users shared methods like hardware switching to resolve issues.
  • Impact of AI on Security: @aifartist reflected on the implications of deepfake technology in light of a news story about a deepfake CFO involved in a scam, expressing concern over the technology's potentially damaging uses.

Links mentioned:


HuggingFace ▷ #today-im-learning (5 messages):

  • The “Attention” Paper Blows Minds: @sardarkhan_ expressed amazement after reading the groundbreaking “Attention Is All You Need” paper, signaling a significant impact on their understanding of the subject.
  • Speech Recognition Evolves: @myke420247 successfully experimented with converting wav to text on his company’s call recordings using Whisper and pyannote for speaker diarization, achieving better results than Google’s paid service from 2018.
  • Tool Crafting for Audio Summaries: @n278jm is in the process of creating an internal tool to summarize audio recordings from consultations, sharing code attempts and highlighting a commitment to privacy with local processing only.

HuggingFace ▷ #cool-finds (11 messagesđŸ”„):

  • Sentiment Analysis Deep Dive: @andysingal highlighted a detailed tutorial on sentiment analysis combining Hugging Face and Deepgram. It showcases how to create charts to understand sentiment shifts over time and includes visuals of the sentiment analysis charts. Sentiment Analysis with Hugging Face and Deepgram

  • Blog Posting on Hugging Face Hub: @lunarflu encouraged @imcoza1915 to draft a community blog post on Hugging Face Hub to increase visibility of their work, linking to the Hugging Face Blog Explorers community.

  • Publication of Agent-Helper Langchain: @4gentbur3k shared a link to a Hugging Face blog post discussing the integration of Hugging Face’s transformers with Langchain for advanced NLP applications. The post demonstrates how combining these tools improves language understanding and generation. Agent Helper Langchain Blog Post

  • Art Forge Labs AI Art Generation: @wubs_ expressed amazement at a post from Art Forge Labs detailing significant enhancements in both speed and quality for AI-driven art generation, but did not provide an accessible URL for the content mentioned.

  • Fine-tuning Models Resource List: @andysingal shared a progress update on creating a resource list for fine-tuning models, offering a link to the GitHub repository where the list is being compiled. Access the fine-tuning list at the llm-course GitHub repository.

Links mentioned:


HuggingFace ▷ #i-made-this (9 messagesđŸ”„):

  • Ethical LLM Framework Proposition: @lunarflu suggested the idea for a general ethical framework for language model bots that could be accepted globally, triggering a thought-provoking discussion.
  • CrewAI Gets an AutoCrew: @_yannie_ shared their GitHub project Autocrew, a tool to automatically create a crew and tasks for CrewAI, complete with an inviting repository image and description.
  • Newcomer Ready to Contribute: User @__codenerd__ introduced themselves to the chat, expressing eagerness to showcase their best work in the community.
  • Hacker’s Digital Assistant Unveiled: @n278jm introduced a hacker-oriented chat assistant on HuggingFace - intriguingly titled Your personal hacker helper, aiming to assist with analyzing hacking needs and tool outputs.
  • Emoji Predictions from Tweets: @pendrokar adapted @748130998935617676’s TorchMoji to craft a HuggingFace space that predicts possible follow-up emojis based on English text, using a dataset of 1 billion tweets from 2017, available at DeepMoji on HuggingFace.
  • Hercules-v2.0 Dataset Launch: @locutusque announced the release of Hercules-v2.0, a comprehensive dataset to power specialized domain models, and shared the dataset’s performance on the Open LLM Leaderboard, including a warning about sensitive content.
  • Artforge Labs’ AI Image Generator: @wubs_ unveiled Artforge Labs, an AI image generation service that offers unlimited image creation with a risk-free trial and monthly subscription. It aspires to rival MidJourney and is based on SDXL Turbo models and can be explored at artforgelabs.com.

Links mentioned:


HuggingFace ▷ #reading-group (9 messagesđŸ”„):

  • Organizing HuggingFace Events: @lunarflu created a placeholder for an upcoming event (time TBD) and shared the link Join the Event. A new channel was also created specifically for asking questions in a structured manner.
  • Recording Sharing Protocol Established: @chad_in_the_house sent the Google Drive link of the recorded session to @811235357663297546, indicating plans for it to be posted in the channel and potentially on YouTube.
  • Drive Link to Session Recording: @lunarflu shares the Google Drive recording of the previous session with the group. It could form the basis of a potential YouTube channel for wider sharing.

Links mentioned:


HuggingFace ▷ #diffusion-discussions (12 messagesđŸ”„):

  • Slow Down, Speedy!: HuggingMod reminded @548866140697264129 to ease on their message frequency: a gentle nudge for some serenity in the chat.
  • Scam Alert Raised: @meatfucker flagged a classic Discord scam circulating and suggested its removal, alerting the HuggingFace moderation team.
  • Technical Troubles with CogVLM: User @furkangozukara sought assistance for an issue detailed on GitHub, specifically with AutoModelForCausalLM and a Half-Char dtype error. The posted GitHub issue offers more insight into their predicament.
  • Navigating License Agreements for AI Models: @pseudoterminalx shared a link to the Stable Video Diffusion model license agreement and asked if Diffusers weights could be accessed, highlighting license compliance in the field of AI research.
  • Epochs and Training Dilemmas: @bitpattern questioned the long training times indicated in their script logs, and @pseudoterminalx advised to reduce the number of epochs, hinting at potential overfitting or inefficiency in the training process.

Links mentioned:


HuggingFace ▷ #computer-vision (7 messages):

  • Exploring Synthdog for Fake Data Generation: @swetha98 is looking for guidance on using Synthdog in the Donut model for creating fake document images but can’t find scripts or the images needed for the process.
  • In Search of the Latest One-shot Models: @ursium inquires about advanced alternatives to CIDAS/Clipseg_Rd64_refined for less artifacty zero-shot vision models, noting the one-year-old model might be outdated.
  • Rapid Messaging Alert: @HuggingMod gently reminds a user to slow down after rapid message posting on the channel.
  • Sliding Puzzle Dataset for Vision LLM: @harsh_xx_tec_87517 announces the publication of a sliding puzzle dataset designed for training vision LLMs and shares the dataset on Hugging Face and the source code on GitHub for generating such datasets, seeking feedback from the community.
  • Models Struggle with Puzzle Dataset: In response to @gugaime’s query, @harsh_xx_tec_87517 mentions they have only implemented the dataset generator so far, with models like ChatGPT-4 and LLaMA failing to solve the puzzle, prompting further work on fine-tuning LLaMA.

Links mentioned:


HuggingFace ▷ #NLP (9 messagesđŸ”„):

  • Seeking Knowledge for Video Summarization: @karam15. is looking for research papers, models, or GitHub repositories related to summarization of videos with timestamps. Suggestions and references to pertinent resources are requested.
  • Mistral 7B Deployment on AWS Inquiry: @rishit_kapoor is seeking tutorials or materials on deploying Mistral 7B using AWS Inferentia2 and SageMaker. This query has been posted twice indicating a strong interest in the topic.
  • Exploring Models for Spell Check and Grammar: @.bexboy is in search of finetune-able models suitable for spell checking and grammar improvement. Guidance on effective models or tools is solicited.
  • Call for Collaborators in Model Merging: @birger6875 invites the community to join experiments with model merging, particularly with a focus on Nordic languages. They provided a planning document, a tutorial, a Colab notebook, and a mention of a Discord channel specifically for model merging discussions.
  • In Search of Contribution Opportunities: @NicksđŸ€™đŸŸ expresses an interest in contributing to a project and is considering the first steps to take on this journey.

Links mentioned:

merge-crew (Merge Crew): no description found


HuggingFace ▷ #diffusion-discussions (12 messagesđŸ”„):

  • Slow Down, Quick Poster: HuggingMod warned @548866140697264129 to slow down due to posting too quickly in the channel.
  • Watch Out for Scams: @meatfucker alerted the moderators (denoted by <@&897381378172264449>) about a classic Discord scam message and suggested its removal.
  • Troubleshooting GitHub Issue: @furkangozukara sought assistance with a problem posted on GitHub related to AutoModelForCausalLM, citing an error regarding data types (Half and Char). The user linked to the issue here.
  • Seeking Model Weights for Stable Video Diffusion: @pseudoterminalx is looking for diffusers weights and shared a link to the Stable Video Diffusion model which requires acceptance of a license agreement to access.
  • Optimizing Training Time: @bitpattern and @pseudoterminalx discussed a concern about long training times with a large number of epochs and @pseudoterminalx recommended reducing the number of epochs or adjusting command-line arguments to optimize the training schedule.

Links mentioned:


OpenAI ▷ #ai-discussions (77 messagesđŸ”„đŸ”„):

  • Exploring Local LLMs for Improved Performance: @mistermatty discussed the drawbacks of GPT-4, including “Conversation key not found” errors and subpar performance. They expressed interest in local LLMs as an alternative and received suggestions like LM Studio and perplexity labs for free LLM use from @7877.

  • Local LLMs Versus GPT-4 Benchmarking: @kotykd opined that no open-source LLM compares to GPT-4, even though models like Mixtral 8x7b, which require significant RAM, underperform GPT-3.5 in most areas.

  • Performance Spotlight on codellama-70b-instruct: @mistermatty highlighted their positive experience with codellama-70b-instruct, hosted on the perplexity labs playground. Their interaction prompted them to consider a setup for local usage of comparable LLMs.

  • Hardware Conundrum for Running LLMs: Several participants, including @mistermatty, @kotykd, @johnnyslanteyes, and @michael_6138_97508, engaged in a detailed discussion on the hardware requirements for running large LLMs locally, touching on the significance of RAM vs. VRAM, system recommendations, and possible heat issues with laptops.

  • Credibility of AI & LLM Information: The dialogue from @johnnyslanteyes, @michael_6138_97508, and @aipythonista indicated skepticism towards the reliability of information regarding AI performance on various hardware, highlighting the importance of firsthand experience and critical evaluation of sources like YouTube.

Links mentioned:

Beyond Consciousness in Large Language Models: An Investigation into the Existence of a “Soul” in
: Author: David CĂŽrtes Cavalcante Publication Date: February 3, 2024 © CC BY Creative Commons Attribution


OpenAI ▷ #gpt-4-discussions (59 messagesđŸ”„đŸ”„):

  • Alert on @mention Issues: _odaenathus reported problems using the @ system with their custom GPTs, observing that even GPTs that used to work together were not cooperating anymore, and this issue was inconsistent.
  • GPT Amnesia or Just Broken?: blckreaper mentioned difficulties with their GPTs, such as forgetting files and abruptly ending stories, while being frustrated about wasting messages trying to debug the issues.
  • The Thumbs Up Mystery: It was noted that the thumbs up feature is missing, and johnnyslanteyes clarified that it appears only when a message has to be regenerated, to inform the system if a new response is better for curation.
  • Trouble with Prompts and Logging Out: rahulhere expressed difficulties with a GPT that wouldn’t log out after OAuth authentication, and queried why “Starting Action” takes a long time.
  • Search and Ranking for GPTs Needed: Users like astron8272 and killlemon.eth are seeking ways to rank GPTs for efficiency in specific tasks like language learning and are inquiring about GPT agents for marketing research, as well as easier searching functionalities for GPT Agents.

OpenAI ▷ #prompt-engineering (51 messagesđŸ”„):

  • Custom GPT Instructions Concern: @gameboy2936 seeks advice on setting custom GPT instructions to ensure the bot communicates with humanlike nuances without reverting to AI writing style, such as using overly ornate language. Their bot should avoid AI-style phrases and maintain a consistent human speech pattern.

  • Stealth2077 Encounters Over-Moderation: @stealth2077 complains about GPT-4’s inclination to incorporate ethical considerations into responses, even when it seems inappropriate for the user’s storytelling purposes. Efforts to instruct the AI to omit such themes have not resolved the issue.

  • Need for Assistant Model Stability Highlighted by Madame_Architect: @madame_architect recommends using the Assistant Model on a stable GPT version rather than the unpredictable Preview Model to maintain instruction consistency. She indicates that over-moderation is impacting output quality.

  • Prompt Specificity Tactics to Avoid Misspelled Names: Users @titaniumsporks and @snovov discuss issues with ChatGPT misspelling names leading to derailment during conversations, with suggestions for being specific and using appropriate platforms.

  • Users Report Sudden Content Policy Violation Messages: @papa_jhon. expresses frustration over receiving unexpected policy violation messages for seemingly innocent prompts, joined by @lugui who suggests it might be an internal problem that could resolve later.


OpenAI ▷ #api-discussions (51 messagesđŸ”„):

  • Exploring Human Speech for Autistic User: User @gameboy2936 seeks assistance with customizing GPT’s communication style to create more human-like interactions, sharing detailed instruction examples.

  • Assistant vs Custom GPT for Stability: @madame_architect recommends switching from Custom GPT to Assistant models due to the instability of preview models and the frustration of prompt engineering on a “moving target.”

  • Creative Writing Struggles with Over-Moderation: Users @stealth2077 and @madame_architect discuss challenges in maintaining creative control over writing as ChatGPT attempts to adhere to ethical considerations, which @stealth2077 finds restrictive for their specific use case.

  • Editing Tips for Consistency in Storytelling: @johnnyslanteyes offers a tip to @stealth2077 on highlighting specific story sections to guide ChatGPT in editing, while also tackling the issue of forced values in narrative content.

  • Issues with Policy Violation Responses and AIs: Users @papa_jhon. and @lugui converse about unexpected policy violation responses to innocent prompts, suggesting that it may be an internal problem that could be resolved later.


OpenAccess AI Collective (axolotl) ▷ #general (119 messagesđŸ”„đŸ”„):

  • Troubleshooting for “cuda argument error”: @dangfutures advised running sudo apt-get install libopenmpi-dev pip install mpi4py to address issues with GPUs on RunPod.
  • Memory and Strategy for Finetuning: Concerning memory requirements, @nruaif stated 80gb VRAM would be needed for LoRA or QLoRA on Llama70. They also noted freezing MoE layers can allow 8 A6000 GPUs to manage Mixtral FFT, and finetuning using LoRA is half the speed but more sample efficient.
  • RunPod GPU Leasing Explorations: @yamashi considered investing in a machine to rent on RunPod when not in use, while @casper_ai pointed out a large number of GPUs are needed to rent to RunPod. @le_mess suggested using vast.ai for single machines.
  • Component Preference: @yamashi concluded LoRA is preferred over QLoRA, and @nanobitz mentioned choosing to target the router and attention for layer updates when finetuning with LoRA.
  • Emerging Quantization and Model Training Methods: @dangfutures and @casper_ai discussed the potential of training with AWQ and Marlin Quant, acknowledging it’s possible with recent advancements, and @casper_ai plans to benchmark the speeds of various quantization methods.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (88 messagesđŸ”„đŸ”„):

  • Scaling to New Performance Heights: @casper_ai and others discussed a new 2B parameter model by OpenBMB potentially matching the performance of Mistral 7B, highlighting both skepticism and astonishment at the claimed benchmarks. The detailed exploration, shared in a Notion page, emphasizes the importance of optimizing model training.

  • Implementation of Advanced Algorithms: @yamashi praised the potential of implementing their word sense disambiguation (WSD) algorithm, noting that it seems straightforward and possibly more effective than current methods.

  • Mac Compatibility Queries for Axolotl: Discussions by @yamashi about running Axolotl on the new M3 Mac, encountering issues like the model defaulting to CPU instead of GPU, waiting on torch and transformers for half-precision support on Mac, and submitting a pull request to help others interested in running on Mac.

  • Fine-Tuning Techniques Debated: @casper_ai and @c.gato engaged in a detailed conversation about applying training strategies of large models, specifically supervised finetuning (SFT) and different training phases mentioned in the MiniCPM discovery, noting that these methods might not directly apply to finetuning.

  • Maximizing Data Utilization Explored: @dreamgen and others expressed interest in the extensive data utilized in the new training strategy from OpenBMB, particularly the implications for large models and the potential need for similar experimental setups for fine-tuning approaches.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #other-llms (1 messages):

cf0913: https://huggingface.co/chatdb/natural-sql-7b


OpenAccess AI Collective (axolotl) ▷ #general-help (18 messagesđŸ”„):

  • Batch Size Calculations Conundrum: @duke001. sought advice on how to determine the number of steps for each epoch during training and was puzzled by the discrepancy between theoretical calculation and actual observations on wandb. @nanobitz suggested looking into sequence length packing and also mentioned the save_per_epoch option to help with proportionate checkpoint saving.

  • EOS Token Identity Crisis in Finetuning: @cf0913 experienced an issue where, after finetuning, the EOS token appeared to act as the pad token for deepseek-coder-instruct, leading to manual adjustments. @nanobitz proposed editing the tokenizer config to swap the tokens, which @cf0913 confirmed worked without issues.

  • Support Request for Axolotl on MacBook Air M1: @mini_09075 faced errors trying to install the axolotl package, due to the lack of CUDA support on an M1 Apple chip. @yamashi responded with a mention of their barebones PR that might substitute MPS for CUDA, but warned it’s not recommended for use as it stands.

  • Obsolete Branch Baffles User: In a quest to perform Medusa training on a local machine, @mini_09075 used an outdated branch but soon realized that it might not be possible, which was implied by @yamashi asking why the outdated branch was in use.

Links mentioned:

GitHub - ctlllll/axolotl: Go ahead and axolotl questions: Go ahead and axolotl questions. Contribute to ctlllll/axolotl development by creating an account on GitHub.


OpenAccess AI Collective (axolotl) ▷ #rlhf (7 messages):

  • Troubles with DPO: @fred_fups reported difficulties getting differential privacy optimization (DPO) to work, facing out-of-memory (OOM) issues particularly when using it with qlora compared to regular Mistral qlora.
  • DPO’s High Memory Demand: @noobmaster29 confirmed that DPO indeed consumes significantly more memory, noting that 24GB RAM only supports a microbatch size of 1 with a context size of 2048.
  • Recommendation for Alternative: In response to issues with DPO, @dreamgen suggested trying out unsloth, as sample packing does not work with DPO.
  • Inquiry about Various Optimization Techniques: @dreamgen inquired if anyone has experience with other optimization methods, naming KTO, IPO, and others for potential experiments.

OpenAccess AI Collective (axolotl) ▷ #runpod-help (5 messages):

  • Initialization Issues with OpenBSD SSH on Runpod: @nruaif shared a log snippet indicating the successful start of the OpenBSD Secure Shell server but encountered a deprecation warning regarding a _jupyter_server_extension_points function missing in notebook_shim.
  • Deprecated Config Warning: The logs provided by @nruaif also included a FutureWarning regarding the deprecation of ServerApp.preferred_dir config in jupyter-server 2.0, advising to use FileContentsManager.preferred_dir instead.
  • Runpod Docker Configuration Error: The same log mentions a critical error where /workspace is found to be outside of the root contents directory, causing a bad config encounter during initialization.
  • Intermittent Secure Cloud Issues: @dreamgen noted that similar issues happen often in the secure cloud environment and suggested that it might not always be related to the use of network volumes.
  • Frustration with Runpod Issues: @dangfutures expressed dissatisfaction with Runpod, suggesting alternatives like community versions tend to yield better results.

CUDA MODE (Mark Saroufim) ▷ #general (25 messagesđŸ”„):

  • CUDA vs OpenCL Discussion: User @Voudrais questioned the preference for CUDA over OpenCL. @andreaskoepf responded that CUDA’s advantages include popularity and strong support from Nvidia, welcoming everyone to the group regardless of their parallel programming preferences.
  • Python Over CUDA or OpenCL?: Python as a language for GPU computing sparked a conversation led by @vim410. @andreaskoepf shared the resource list from the CUDA MODE GitHub repository, acknowledging the push towards high-level programming while noting the continued relevance of direct kernel writing.
  • Lecture Repository Renaming and Organization: There was a reorganization of CUDA MODE lecture content by @andreaskoepf, merging lectures into one repository now named “lectures” on GitHub. Discussions with @jeremyhoward involved considerations for redirecting from old links and updating video descriptions to accommodate the new repository structure.
  • Machine Learning Visualization Shared: @latentzoo shared a visualization related to tiny-cuda-nn’s fully fused MLP, with a tweet link associated. @andreaskoepf suggested the image might be from a YouTube video on Tensor Cores, further adding only related video content should be shared.
  • Upgrading to a New Dev Machine: @andreaskoepf initiated a discussion about upgrading to a new development machine, considering slow-building part by part due to the high cost of pre-built machines like Lambda workstations. The post opened up the possibility of community interest in system building.

Links mentioned:


CUDA MODE (Mark Saroufim) ▷ #cuda (99 messagesđŸ”„đŸ”„):

  • Discoveries in Grayscale Conversion Speed and Accuracy: @artste experimented with various approaches for converting RGB to grayscale, finding that integer math proved fast yet imprecise. An optimal balance between speed and precision is achieved with a float lookup table, reaching near-identical results to the benchmark about 2.8 times faster (Notebook detailing the experiments).

  • Batch Processing Enhances Performance: When stacking images horizontally to imitate a batch, @artste found that the performed optimizations resulted in a grayscale conversion process that is up to 3.98 times faster than the unoptimized case for a batch size of 16 images.

  • Compiler Intricacies Meaningfully Impact CUDA Performance: Discussions by @andreaskoepf and others reveal that seemingly minor changes, like adding ‘f’ to denote 32-bit floats, can greatly affect the operation time on GPUs, emphasizing the intricacies of GPU optimizations.

  • CUDA Tools and Repositories Shared Among Members: Members shared various resources and tools such as Godbolt, a CUDA web compiler/explorer, and multiple GitHub repositories—including tiny-cuda-nn for a fast neural network framework, and lectures on CUDA from cuda-mode/lectures—which facilitate CUDA learning and experimentation.

  • Learning and Debugging CUDA with PyTorch: @edd0302 sought advice on managing CUDA and PyTorch projects, while @jeremyhoward and others discussed the idiosyncrasies of compiling CUDA code with PyTorch, highlighting challenges such as forced recompilation and potential improvements to be considered by PyTorch development (@marksaroufim indicated an openness to feedback for improvement).

Links mentioned:


CUDA MODE (Mark Saroufim) ▷ #torch (5 messages):

  • Fast & Furious PyTorch Code Tip: @tantara shared a link to a PyTorch code section from gpt-fast repo, suggesting that it might be helpful to specify the compiled layers when using the torch.compile API.

  • Torch Compiler Fine-Grain Control Unveiled: @marksaroufim provided additional insights, mentioning the use of torch.compiler.disable() and recommended the PyTorch documentation on finer grained APIs to control torch.compile.

  • TensorFlow: The Alternative Debate: @Voudrais humorously suggested using TensorFlow instead of PyTorch, which led @andreaskoepf to create a dedicated channel for TensorFlow discussions. He acknowledged the benefits of Google’s accelerator resources and competitive pricing, but cautioned against platform lock-in issues.

Links mentioned:


CUDA MODE (Mark Saroufim) ▷ #announcements (2 messages):

  • New CUDA Lecture Incoming: @andreaskoepf announced CUDA MODE - Lecture 4: Intro to Compute and Memory Architecture will start soon, focusing on Ch 4 & 5 of the PMPP book, covering blocks, warps, and memory hierarchy.

  • Lecture Notes Available: @tvi_ mentioned that the notes for the upcoming lecture which includes Ch 4 & 5 discussions can be found in the repository, humorously referred to as the “increasingly inaccurately named lecture2 repo”.


CUDA MODE (Mark Saroufim) ▷ #jobs (2 messages):

  • Aleph Alpha is in Search of CUDA Stallions: @piotr.mazurek shared a job posting where Aleph Alpha is looking to hire savvy professionals for their product team. Specifically, the role involves translating research on language models into practical applications, impacting Fortune 2000 companies and governments, and can be found here.

  • Mistral AI Wants YOU for GPU Magic: @megaserg. highlighted an opportunity at Mistral AI, seeking experts in serving and training large language models on GPUs. The job entails writing custom CUDA kernels and maximizing the potential of high-end GPUs like the H100, within a role posted here.

Links mentioned:


CUDA MODE (Mark Saroufim) ▷ #beginner (10 messagesđŸ”„):

  • C++ vs C for CUDA: @evil_malloc asked whether C++ is a prerequisite for CUDA/Triton, and @_tvi_ responded that proficiency in C++ is not necessary, but some familiarity is beneficial, especially when using CUDA with PyTorch.

  • Seeking C++ Mastery: @umaiskhan sought advice on effectively learning C++, and @stefangliga recommended LearnCpp.com, a free resource with extensive tutorials and examples.

  • Rust’s Status in CUDA Programming: @greystark. inquired about how Rust is supported for CUDA programming today, and @andreaskoepf highlighted the lack of active projects but shared the rust-gpu repository for GPU shaders in Rust.

  • Exploring Rust with CUDA: @andreaskoepf further suggested using Rust-CUDA to @greystark. and others interested, which provides a guide for writing GPU crates with CUDA support in Rust.

  • Rust Neural Network Development: @andreaskoepf followed up with more active Rust repositories related to neural network development leveraging CUDA, mentioning Kyanite and burn as projects to explore.

Links mentioned:


CUDA MODE (Mark Saroufim) ▷ #pmpp-book (2 messages):

  • Comparing Matrix Multiplication Timings: @antoniooooooooooooo shared their timing results for 1024x1024 matrix multiplication: CPU at 5,308,033ÎŒs, GPU Original at 131,237ÎŒs, GPU Row at 43,896ÎŒs, and GPU Columns at 32,179ÎŒs. They asked for a comparison to see if the relationship between those timings makes sense.

  • Seeking Theoretical and Coding Answers: @antoniooooooooooooo inquired about a resource for more theoretical answers as well as a repository with coding solutions related to the exercises in the PMPP book.


Eleuther ▷ #general (88 messagesđŸ”„đŸ”„):

  • Model Training Queries and Misinterpretations: A conversation around TimesFM model training led to clarifications by users like @Hawk and @mrgonao. Hawk originally questioned the efficiency of a training process before offering a corrected sequence (input:1-32 -> output 33-160 -> input 1-160 -> output 161-288), concluding that there should be no overlap in output patches according to the model’s description.

  • Seeking Insights on Large Context LLMs: User @nsk7153 inquired about research on handling large contexts in LLMs, to which @stellaathena responded by sharing a YaRN paper, a compute-efficient method to extend context window length.

  • A Novel Training Method Proposal: User @worthlesshobo brought up an intricate discussion on autoencoding and proposed a method referred to as “liturgical refinement.” They suggested a technique involving alternately freezing and unfreezing components of encoder-decoder models to potentially achieve more effective representations.

  • Ideas on Model Fusion and Constraints: User @win100 speculated about fusing model tensors from separate models (A and B) for improved pre-training, aligning with the concept of the FuseLLM project. @!BeastBlaze provided insights on a related approach taken in the LeMDA paper which focuses on augmenting feature embeddings.

  • Development of a Web UI for LLMs: User @318yang_ announced the development and deployment of a web UI for large language models (LLMs), simple-ai.io, which is an open-source project that the community can utilize in their projects. They mentioned plans to integrate Ollma for local runs with this new UI.

Links mentioned:


Eleuther ▷ #research (32 messagesđŸ”„):

  • MoE-Mamba Outshines its Peers: In a discussion about a recent paper, @afcruzs shares an Arxiv link to research presenting an SSM model “MoE-Mamba,” which outperforms various state-of-the-art models with fewer training steps.
  • Striving for Equilibrium in Experts’ Models: @catboy_slim_ hints at possible efficiency losses in Mixture of Experts (MoE) models due to imbalanced layer assignments and suggests considering an extra router loss to restore balance.
  • Computing Concepts go Origami: @digthatdata posts a Quantamagazine article relating computation to origami, sparking a brief exchange on its potential link to in-context learning in AI.
  • Accessing Mamba Checkpoints: In the wake of discussing potential issues with SSMs, @woog expresses interest in obtaining checkpoints for the Mamba model, which @random_string_of_character points towards being available upon request and also on Hugging Face’s Model Hub.
  • Gradient Stabilization and Encodec: A conversation unfolds around gradient stabilization with @nostalgiahurts referencing the Encodec paper’s approach to handling multiple loss types, which introduces a normalizing mechanism for balancing gradients during training.

Links mentioned:


Eleuther ▷ #interpretability-general (1 messages):

  • Clarifying “Direction” vs “Feature” in Interpretability: @pinconefish posed a question about terminology in interpretability, asking if a “direction” refers to a vector in the embedding space that encodes monosemantic meaning. They noted that “direction” might help distinguish between the activation of a single neuron (also referred to as a “feature”) and vectors in embedding space, which could be useful in discussions about semantic meanings at different levels in a model.

Eleuther ▷ #lm-thunderdome (5 messages):

  • Schedule Locked In: @asuglia confirmed with @981242445696221224 and @1072629185346019358 that Tuesday 6th at 5pm (UK time) would be a suitable meeting slot.
  • Invitation Preparation: @hailey_schoelkopf agreed to the proposed meeting time and requested DMs for email addresses to send an invite to @asuglia and @1072629185346019358.
  • Large Scale Testing Tactics Discussed: @mistobaan inquired about approaches for testing at scale, mentioning options like queues with workers or long single machine runs.
  • Slurm for Scaling Tests: In response to @mistobaan, @.johnnysands mentioned they utilize Slurm by queuing a large number of jobs to manage scale testing.
  • Exploration of Prompt Previews: @Goyim sought insight into the possibility of previewing prompts for specific tasks and the formatting of multiple_choice prompts submitted to models.

Eleuther ▷ #multimodal-general (9 messagesđŸ”„):

  • Exploring MoE for Multimodal Approaches: @martianulcrizat showed interest in guidance on Mixture of Experts (MoE) models for creating multimodal systems, hinting at integrating a transformer diffusion model with a VLM (visual language model) like LLaMA.
  • Seeking Deeper Semantic and Generative Integration: @martianulcrizat discussed the potential for a tighter integration between semantic understanding and generative capabilities within a VLM by employing MoE frameworks.
  • Search for VLM and Diffusion Model Combination Techniques: @martianulcrizat inquired about approaches for combining VLMs with diffusion models beyond the conventional methods involving QFormer, Adaptor layers, and cross-attention with continuous token representations.
  • Acceptance of Shared Papers on Integration Methods: !BeastBlaze acknowledged the relevance of papers shared by @martianulcrizat which potentially could assist in VLM and diffusion model integration.
  • Alternative Simplifications to Combining VLMs with Diffusion Models: !BeastBlaze mentioned new literature, albeit not readily available, which suggests the feasibility of using simple RNNs and CBOW to achieve similar outcomes to that of large models like CLIP, thereby enabling leaner methods like fast DINO or fast SAM.

Eleuther ▷ #gpt-neox-dev (2 messages):

  • Clarification on “gas” Parameter Functionality: @catboy_slim_ noted a pull request to remove the "gas" parameter, stating it is non-functional and redundant with "gradient_accumulation_steps". They caution that historical runs using "gas" with values other than 1 may have an effectively smaller batch size than intended.
  • Review Incoming for “gas” Parameter: @tastybucketofrice responded that they will review the issue regarding the "gas" parameter today.

Perplexity AI ▷ #general (96 messagesđŸ”„đŸ”„):

  • Language Barrier? No problem!: @bondesign inquired if typing in Chinese is possible, and the interest was echoed with a subsequent message in Chinese. @cookie_74700 similarly asked if Persian could be spoken, highlighted by linked responses by @mares1317, underscoring Perplexity’s multilingual capabilities.

  • Confusion around Copilot’s role: @oudstand shared observations on how using Copilot seemed to improve model performance. Meanwhile, others like @dlysltrading faced issues with Copilot, which were resolved by refreshing the webpage, and @stocktown sparked a debate on the rationale of using write mode with Copilot active.

  • Navigating Perplexity’s Customer Service: @aqbalsingh expressed frustration with changing the email on their account and the absence of an upload button on the iPhone app, leading to the cancellation of their premium account. Despite responses from Perplexity, the user remained disappointed with the support responsiveness.

  • Perplexity AI Discord Integration Woes: @otchuda lamented the missing Discord integration for a quick Perplexity response, prompting discussions and links shared by @icelavaman and @ok.alex but confirming that there are no current plans to bring back the textual Perplexity bots to Discord.

  • Exploring API keys and Use Cases: @elanutta queried about generating API keys for Open AI with a Perplexity account, while @glisteningsunlight reported and self-resolved a delay issue when attempting to get Perplexity to summarize a PDF. Further, discussions around usage quotas and product comparisons between ChatGPT Plus and Perplexity Pro were undertaken by @felirami, @general3d, and @maverix..

Links mentioned:


Perplexity AI ▷ #sharing (14 messagesđŸ”„):

  • Videos Highlighting Perplexity AI: @arunprakash_, @boles.ai, and @ok.alex all shared YouTube videos showcasing Perplexity AI, discussing its benefits and features, and explaining why users might choose it over other AI options. The videos titled “Do we really need Perplexity AI Pro Subscription?”, “Perplexity and Play.HT Don’t Play! Plus, a Hindenburg Review!”, and “I Ditched BARD & ChatGPT & CLAUDE for PERPLEXITY 3.0!” can be found on their respective YouTube links, YouTube links for Play.HT review, and YouTube links for Perplexity 3.0 review.

  • Sharing Revelatory Perplexity Searches: Users @rocktownarky, @bwatkins, @maverix., @gamezonebull, @epic9713, and @darkspider1987 shared direct links to their Perplexity AI search results which provided valuable insights leading to a Pro subscription decision and helped with complex decision-making. The shared results can be accessed through the provided Perplexity AI search links, links for Prop 1 decision, links for ‘what is’, links for how to get, and links for AI image motion.

  • Public Search Sharing Tips: @me.lk advised @maverix. and @gamezonebull to ensure their search threads are public by clicking on the share button in the top right corner, while @noremac258 noted that @darkspider1987’s search result wasn’t viewable, indicating the importance of making searches public for community sharing.

  • Redirect Link with Null Description: @ok.alex posted a Discord redirect with no description provided for the content it leads to.

  • Productive Weekend with Perplexity: @johnweisenfeld shared a LinkedIn post about a productive weekend thanks to Perplexity and a mention about difficulties with other AI services, praising OpenAI for helping to get a code project off the ground.

Links mentioned:


Perplexity AI ▷ #pplx-api (7 messages):

  • Mixtral Pricing Inquiry: @paul16307 asked about pricing for Mixtral since Perplexity removed 13b from API pricing. @icelavaman responded with current rates: $0.14 per 1M input tokens and $0.56 per 1M output tokens.
  • Potential Rate Limit Increase: @aiistheonlyway queried about getting a quick rate limit increase but did not receive a response in the provided messages.
  • Curiosity About Mixtral’s Future Pricing: Following the pricing details, @paul16307 inquired if the future pricing for Mixtral would be lower, but no response was provided.
  • Request for pplx-web API Version: @makadoro_95229 suggested a business opportunity for Perplexity to offer a pplx-web version of the API, which would give results akin to the website, aiding in the creation of chat assistants for other websites. @defektivex supported this idea, mentioning that many have requested this feature and expressed hope for a similar API in the future.
  • Integration of Perplexity AI with Siri: @out_a_time6794 asked about setting up Perplexity AI with Siri to function as a shortcut for queries, with no follow-up response provided in the conversation.

LangChain AI ▷ #general (34 messagesđŸ”„):

  • Seeking the Right Tech for Arabic Content: @mukhtor inquired about technology for chatting with Arabic content. @lhc1921 suggested an Arabic Language Model (LLM) and embeddings, mentioning that most technologies are language-agnostic, while @hro_ffs_why_cant_i_use_my_name mentioned that embedding-ada mainly supports “French, English, German, Spanish, and Portuguese” but listed aravec and word2vec as potential alternatives.

  • Cost-Effective Hosting for Autonomous Agent: @charly8323 sought advice on hosting a cost-effective autonomous research agent while keeping costs below the revenue from a 5 cents per call price structure. @engineered.mind recommended a local LLM for controllable costs, and @truethinker suggested deploying ollama on a server like DigitalOcean.

  • Upcoming Book Alert: @mehulgupta7991 announced the release of their new book titled “LangChain in your Pocket: Beginner’s Guide to Building Generative AI Applications using LLMs,” detailing how to use LangChain for various applications, available on Amazon.

  • Efficiency Tips for Long Document Translation with LangChain: @o3omoomin asked for efficient methods to translate lengthy documents with LangChain to avoid token limitations. The user was exploring segmentation into smaller chunks and sought example code for more streamlined handling.

  • Hosting and Fine-Tuning Challenges Discussed: Various members, including @lhc1921, @nrs, and @sullynaj, discussed hosting and fine-tuning models in the cloud. Suggestions included using local models, Google Colab, and Cohere embeddings, with potential strategies for training an Arabic model using a relevant dataset.

Links mentioned:


LangChain AI ▷ #share-your-work (7 messages):

  • Introducing CastMate for Interactive Podcast Experiences: @darrelladjei shared the launch of CastMate, a platform allowing users to listen and interact with their favorite podcast episodes, featuring fakes using LLMs and human-quality TTS. They provided a Loom demonstration and invited feedback, while also offering an iPhone beta: TestFlight Link.

  • GUI Discussions for Artificial Agents: User @clickclack777 inquired about which GUI is being used, which led to @robot3yes mentioning their work on Agent IX, a side project meant to interface with bots.

  • New Book Alert: Guide to Generative AI Applications: @mehulgupta7991 announced their debut book, “LangChain in your Pocket: Beginner’s Guide to Building Generative AI Applications using LLMs”, which covers a range from basic to advanced use cases involving LangChain. They shared a broken Amazon link, which instead displayed a CAPTCHA verification page.

  • Meet the Author and Data Scientist: In a follow-up message, @mehulgupta7991 introduced themselves as a data scientist with experience at DBS Bank and shared their “Data Science in your Pocket” Medium and YouTube channels. They specifically pointed to a LangChain YouTube playlist for tutorials.

  • Seeking Tips for a Goal-Setting Assistant: @mark_c_ requested architectural advice for creating a goal-setting assistant that manages long-term and short-term goals and helps with weekly scheduling, mentioning their background as an ex-coder. They were interested in starting with prompt engineering but anticipated the need for a more complex workflow.

  • A Tool for AI-enhanced Due Diligence in Investment: @solo78 introduced a project tool utilizing Langchain to perform deep due diligence for investing in platforms and companies. They shared their Medium blog post detailing the project’s journey and sought thoughts from the community: Medium Article.

Links mentioned:


LangChain AI ▷ #tutorials (5 messages):

<ul>
  <li><strong>Next.js and LangChain for SMART Portfolios</strong>: User <code>@flo_walther</code> shared a <a href="https://www.youtube.com/watch?v=1LZltsK5nKI">YouTube video</a> on building a SMART portfolio website using <strong>Next.js 14, Langchain, Vercel AI SDK</strong>, and more, highlighting an AI chatbot that can be trained on your data.</li>
  <li><strong>Tutorial Troubles</strong>: <code>@stuartjatkinson</code> expressed frustration that steps in YouTube tutorials for LangChain have changed or yield errors when followed directly.</li>
  <li><strong>LangChain Starting Stumbles</strong>: <code>@treym1112</code> encountered errors while following the langchain quick tutorial on the LangChain website, specifically with the <strong>Ollama model</strong>, resulting in an <em>AttributeError</em> concerning the missing 'verbose' attribute.</li>
  <li><strong>LangChain Guide Launched</strong>: <code>@mehulgupta7991</code> announced the release of their book "<em>LangChain in your Pocket: Beginner's Guide to Building Generative AI Applications using LLMs</em>" on <a href="https://amzn.eu/d/dqQJzV1">Amazon</a>, describing it as a hands-on guide covering a range of use cases and LangServe deployment.</li>
  <li><strong>Meet the Data Scientist and Content Creator</strong>: <code>@mehulgupta7991</code> shared their professional background as a data scientist at DBS Bank and mentioned their Medium+YouTube channel "<em>Data Science in your Pocket</em>" which features around 600 tutorials, including a <a href="https://youtube.com/playlist?list=PLnH2pfPCPZsKJnAIPimrZaKwStQrLSNIQ">LangChain playlist</a>.</li>
</ul>

Links mentioned:


LlamaIndex ▷ #blog (5 messages):

  • RAG Development Challenges Solved: @wenqi_glantz detailed 12 pain points when building production RAG and, together with @llama_index, provided a full solution list for each, available on a newly released cheatsheet. The announcement and solutions can be found in their Twitter post.
  • Hackathon Hustle at DataStax: The hackathon kicked off at 9 am with @llama_index appreciating @DataStax for providing the venue and food. Insights into the event are shared on their Twitter update.
  • Multimodal Models on MacBooks: @llama_index announced their new integration with Ollama, enabling the development of local multimodal applications such as structured image extraction and image captioning. More details are available in their day 1 integration tweet.
  • Multilingual Embedding Optimization Techniques: A new article by Iulia Brezeanu on @TDataScience discusses selecting the right embedding model for multilingual RAG, addressing language biases in benchmarks. The article helps navigate models not optimized for English and is shared via LlamaIndex’s tweet.
  • Discord Welcomes LlamaIndex’s Slack Bot: @llama_index has released a Discord version of their popular Slack bot. Interested users can access it through the link shared in their tweet announcement.

LlamaIndex ▷ #general (19 messagesđŸ”„):

  • Seeking Clarity on AI Interpretation: @meowmeow008 is exploring how the AI can interpret SQL queries and subsequent requests, like calculations of percentages, and wonders about potential misunderstandings concerning the AI’s capabilities.
  • Azure’s AI Gives Mixed Results: @aldrinjoseph experienced issues when switching from Azure OpenAI 3.5 Turbo to Azure OpenAI 3.5 Turbo 16K, with the latter generating answers outside the given context.
  • LlamaIndex More Reliable than LangChain: @7leven expressed a preference for LlamaIndex over LangChain, criticizing the latter for frequently breaking its documentation and causing less trouble when integrated.
  • Tweaking Hybrid Retriever Without Reinstantiation: In a discussion about the Hybrid Retriever, @cheesyfishes confirmed to @7leven that the alpha parameter can be adjusted dynamically in the Python code without needing reinstantiation.
  • RAG App Development and Integrating Chat History: @jameshume is seeking guidance on incorporating chat history into an app that leverages multiple components including a custom VectorDBRetriever and CondenseQuestionChatEngine; @dirtikiti explained a simple method of tracking and including chat history in new prompts.

Links mentioned:

Usage Pattern - LlamaIndex 🩙 0.9.44: no description found


LlamaIndex ▷ #ai-discussion (7 messages):

  • Exploration of Recursive Retrieval: @chiajy provided insights on recursive or iterative retrieval in developing self-learning RAG systems that can deeply delve into unstructured data. They shared their Medium article detailing three recursive retrieval techniques: Page-Based, Information-Centric, and Concept-Centric, available at Advanced RAG and the 3 types of Recursiv
.

  • Appreciation for Recursive Retrieval Techniques Article: User @jerryjliu0 expressed appreciation for @chiajy’s article on recursive retrieval in RAG systems, calling it a “nice article!”

  • Showcasing LlamaIndex Comparisons: @andysingal shared an article that compares embedding techniques from Jina AI, Nomic AI, and FlagEmbedding, discussing their integration with Llamaindex. The article, titled “Unveiling the Power of Llamaindex”, explores the synergy of these technologies in AI: Unveiling the Power of Llamaindex.

  • Introduction of BGE-M3 Embedding Model: @alphaatlas1 introduced the BGE-M3 embedding model, highlighting its multi-functionality, multi-linguality, and multi-granularity features. The model can perform dense retrieval, multi-vector retrieval, and sparse retrieval, supporting over 100 languages and processing various input granularities up to 8192 tokens, explained on Hugging Face: BGE-M3 on Hugging Face.

  • Recommendation for Retrieval Pipeline in RAG: @alphaatlas1 recommended a hybrid retrieval plus re-ranking pipeline for RAG retrieval, to leverage the benefits of various methods for higher accuracy. They mentioned that the BGE-M3 model simplifies embedding retrieval by not requiring additional instructions for queries and invited community contributions for sparse retrieval methods.

Links mentioned:


Latent Space ▷ #ai-general-chat (29 messagesđŸ”„):

  • Exploring GPT Stores and APIs: @tiagoefreitas inquired about GPT stores with APIs similar to OpenRouter and OpenGPTs under a public server, expressing a wish that @LangChainAI would implement federation in OpenGPTs. They clarified that federation would allow using GPTs from other servers through API while managing their own server.

  • Open Models Over Traditional Writing: @slono criticized the traditional writing approach, highlighting the enjoyment and productivity of working with stochastic models like mlewd mixtral. The discussion suggested a preference for open models’ dynamic output over the standard writing methodologies.

  • Sentry Dives into Q&A: @kaycebasques pointed out a trend in Q&A solutions, exemplifying this by sharing Sentry’s approach which has created a vast Q&A resource for over 20 programming languages and frameworks. This indicates a broader movement towards specialized technical Q&A platforms.

  • Llava Inference Speed Impresses: @ashpreetbedi shared their positive experience regarding inference speed while running Ollama Llava locally on their MacBook, contributing to the community’s understanding of the tool’s performance.

  • Career Crossroads in Tech: @mr.osophy conveyed hesitation about accepting a job unrelated to their interest in ML Engineering, weighing the benefits of becoming a better candidate for desired roles against immediate but unrelated job opportunities. The dilemma underscored the challenges tech professionals face when aligning career moves with personal aspirations and financial constraints.

Links mentioned:

  • Arcee and mergekit unite: Several months ago, I stumbled upon an innovative technique in the world of language model training known as Model Merging. This SOTA approach involves the fusion of two or more LLMs into a singular, 

  • Sentry Overflow: Just now, as I was searching for the Bash script syntax on how to check if a directory exists (for the 63rd time because I have the memory span of an agitated Chihuahua) I noticed something interestin


Latent Space ▷ #llm-paper-club-east (1 messages):

swyxio: check out the summary


Datasette - LLM (@SimonW) ▷ #ai (28 messagesđŸ”„):

  • Combos Pulled from a Pregenerated Pool: @dbreunig theorizes that crazier combos in a game pulse and have a delay because new ones are generated when there’s a hash miss, implying that past combinations are stored and used.
  • Intriguing Game Mechanics Spark Curiosity: @chrisamico and @cameron_y express a desire to visualize the genealogy of the game’s word combinations, speculating about possible insight embeddings could provide into crafting paths.
  • Harnessing the Crafting Code: @madacol provides a JavaScript bookmarklet that interacts with the game’s localStorage to export and auto-save crafted items and discoveries, adding a new dimension to the gameplay experience by retrieving all ingredients crafted.
  • Game Powered by LLM-2: @madacol clarifies the AI behind the game’s clever combinations, posting that the creator is using llama 2 as disclosed in a tweet by @nealagarwal and provided by TogetherAI.
  • Crafting Direction Matters: @madacol discovered that in this game, the order of combining elements affects the outcome, with some successful results only when certain items are placed on top of others, and notes the server’s memory of attempted combinations prohibits reversing the order after a try.

DiscoResearch ▷ #general (8 messagesđŸ”„):

  • Boosting German Model Performance: @johannhartmann discussed improvements in mt-bench-de scores after using German dpo and laserRMT, with a focus on merging German 7B-models using dare_ties.
  • Curiosity About Mysterious Improvement Methods: Upon request for details by @philipmay, @johannhartmann provided links to German dpo and laserRMT, but admitted to not fully understanding why performance changes occurred, especially a drop in math ability.
  • Seeking Research on LLMs for Large Context: @nsk7153 inquired about research on handling large language models (LLMs) for large context and shared a Semantic Scholar search link to the materials they’ve reviewed.
  • Introduction of GermanRAG Dataset: @rasdani proudly released the GermanRAG dataset for fine-tuning Retrieval Augmented Generation. They shared the GitHub link to the dataset and encouraged customization and enhancement.
  • Scandinavian Language Model Envy: @johannhartmann discovered the ScandEval benchmark for Scandinavian natural language generation and expressed a desire for something similar for the German language.

Links mentioned:


DiscoResearch ▷ #discolm_german (1 messages):

flozi00: I am working on it at the moment to provide an german hosting service


Alignment Lab AI ▷ #general-chat (5 messages):

  • Inquiry about mistral-7B open-orca training data: @njb6961 showed interest in replicating mistral-7B open-orca and asked if the curated filtered subset of most of our GPT-4 augmented data used for training would be released.
  • Search for the specific dataset: @njb6961 speculated that the dataset in question might be SlimOrca, which includes ~500k GPT-4 completions and is curated to improve performance with less compute.
  • Confirmation of dataset: @ufghfigchv confirmed that the SlimOrca dataset is indeed the subset used and mentioned that the training configuration for the model should be in the config subdirectory of the model’s repository.
  • Request for marketing contacts: @tramojx reached out to the admin seeking contact for a listing and marketing proposal, but no response was provided in the available message history.

Links mentioned:

Open-Orca/SlimOrca · Datasets at Hugging Face: no description found


Skunkworks AI ▷ #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=N5lDUZRI8sc


Skunkworks AI ▷ #bakklava-1 (1 messages):

  • Query on Document Embeddings versus Vision Embeddings: @epinnock reached out to @far_el with a query about creating a version of llava with whole document text embeddings, contrasting it with a vision embedded approach. They ponder the task as potentially being a partial reimplementation of an encoder/decoder model, and seek clarification on what the assignment might entail beyond that.

LLM Perf Enthusiasts AI ▷ #reliability (1 messages):

  • Easy Deployment with BentoML: User @robotums mentioned successfully deploying open-source software models with BentoML using a VLLM backend on AWS. They described the process as quite straightforward: “it’s pretty easy, you just run the bento.”

LLM Perf Enthusiasts AI ▷ #prompting (1 messages):

  • Introducing DSPy, a Language Model Programming Framework: User @sourya4 introduced [DSPy](https://github.com/stanfordnlp/dspy), a Stanford project for programming—not just prompting—foundation models. They also shared a YouTube video titled “SBTB23: Omar Khattab, DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines,” describing advancements in prompting language models and their pipeline integration.

Links mentioned:


AI Engineer Foundation ▷ #general (1 messages):

  • AIEF Bulgaria Chapter Buzzing with Activity: @yavor_belakov, the lead of AIEF’s first chapter in Bulgaria, highlighted the second monthly AIEF BG meet-up with 90 attendees and the introduction of ‘Lightning Talks’. Networking, pizza, and knowledge exchange are at the core of the event.
  • Insights into Lightning Talks Now Available: Presentations from the recent event are shared for sneak peeks into topics like QR Code Art, Weaving The Past, LMMs, Zayo, and building a defensible business in the age of AI. Full recordings to be posted on their YouTube channel.
  • Exploring ChatGPT Adoption: Included in the talks was a presentation on “ChatGPT Adoption Methodology” by Iliya Valchanov, details are available in the shared Google Slides document.
  • LinkedIn as a Window to AIEF Bulgaria’s Progress: @yavor_belakov also posted highlights from the event on LinkedIn, showcasing the strength and excitement of the AIEF foundation community.

Links to the Google Slides presentations and the LinkedIn post are not fully provided, so they cannot be accurately shared.

Links mentioned: