> AI Discords for 1/25/2024. We checked **20** guilds, **297** channels, and **5898** messages for you. Estimated reading time saved (at 200wpm): **557 minutes**.

OpenAI released a new GPT4 Turbo version yesterday (our notes here). We’re using this opportunity to conduct a natural experiment for summarization. This version is generated with the ā€œoldā€ GPT4T from Nov 2023 (Dev Day), stay tuned for the next email with the 2024 Jan 25th version for comparison and commentary.

—

Table of Contents

[TOC]

PART 1: High level Discord summaries

TheBloke Discord Summary

  • Troubleshooting Model Loading Error: User @sco7116 experienced an error when attempting to load OpenHermes-2.5-Mistral-7B-4.0bpw from Huggingface and was advised by @itsme9316 to ensure they’re using exllamav2 for correct version compatibility. The error message was ā€œCould not locate pytorch_model-00001-of-00002.bin.ā€

  • RHEL Debate in ML Contexts: There was an active discussion on the pros and cons of using Red Hat Enterprise Linux (RHEL) in machine learning settings. User @.dontriskit shared perspectives on infrastructure preferences and challenges encountered with RHEL.

  • Highlighting Dataset Generation for GPT Flaw Understanding: @kaltcit proposes generating a dataset aimed at fingerprinting common GPT shortcomings like looping and topic drift, positing that it could be pivotal in understanding and addressing these issues systematically.

  • LLMs Running on Consoles: The guild was intrigued by demonstrations of large language models like Llama and Mistral running on unconventional hardware such as the Nintendo Switch, showing the novelty and potential wide applicability of these models in various platforms.

  • LangChain Fine-Tuning Stumbles: New user @nandavikas is seeking assistance to apply LangChain for fine-tuning the Llama2 model to extract information from PDFs, having previously achieved the task using PyTorch, and is lacking relevant guidance from LangChain documentation.


OpenAI Discord Summary

  • GPT-4’s Need for Speed: Users reported GPT-4 having speed inconsistencies, with the API being slower compared to web usage, especially during peak times. Dedicated capacity was mentioned as a potential solution for enterprise-scale consumers, yet problems persisted when API loads were high.

  • Api-discussions & Prompt-engineering Channel Crossover: The challenge of prompt engineering with GPT-3.5 involved efforts to chunk large text for grammar correction, highlighting token limits and memory constraints. The use of OpenAPI, Python scripting, and Custom Actions was advised, and GPT-4 Turbo was suggested as a superior alternative for processing large documents.

  • DALL-E’s Typo Trouble: Users noted that DALL-E tends to include misspellings in its text generation within images, with community discussions suggesting shorter text inputs to possibly mitigate the issue. A community link provided further insights into this ongoing problem.

  • NLP Tools Touted as Text Transformers: With GPT models struggling with large document processing, users like @doublefelix looked towards external NLP tools, such as semantic-text-splitter on PyPI, for a potential remedy. The conversation underscored the importance of maintaining historical context across API calls and the possibility of leveraging ChatGPT Plus for cost-effective solutions.

  • Conceptualizing Collaboration and Compliance: Queries about deploying GPT-4 Vision on Azure and team account collaboration were met with concerns about account issues linked to unusual activity, possibly due to VPN/proxy usage or flagged prompts. There was also mention of problems saving/updating GPT bots, specifically related to policies on mimicking living artists.


Nous Research AI Discord Summary

  • Extended Context Capabilities in Focus: The best current method to extend context capabilities is being discussed, with finetuning suggested as a viable solution. Mistral instruct v0.2 disabled sliding windows to scale rope_theta, and configurations for MistralLite and main branch models are aligning in context window management. An impressive feat by LLaMA-2-7B-Chat extended its context window to 16,384 with minimal samples and training steps, while SelfExtend offers a non-finetuning alternative for context extension.

  • AI’s Societal Impact and Technological Puzzles: Technology’s role in societal polarization is being considered, with the observed fluctuating activity on Twitter prompting theories of AI slowdown. A resource for quantizing LLMs to GGUF format using a shared notebook is shared, aiding in the process of converting models.

  • Everyone Coder and Hermes Mixtral Benchmarks Sought: A human eval benchmark has been requested for the quantized Everyone Coder 33B, using the GGUF format, available via TheBloke on Hugging Face, supported by Massed Compute. There’s interest in seeing benchmarks for a combined Hermes and Mistral model, humorously referred to as Hermes Mixtral.

  • Launch of OpenAI Embedding Models & Synthetic Data Innovations: OpenAI has released a new generation of embedding models, notable for data privacy and cost reductions, detailed in their announcement post. Additionally, the Genie method for generating high-quality content-grounded data suggests potential advancements in Long-Form Question-Answering and summarization, as documented in the arXiv paper.

  • Enhancing AI Operations with GPU Acceleration: Machine learning computations are being executed using WebGL for GPU acceleration, with discussions involving systems for real-time word embedding adjustments. Model training requires high-end motherboards for multi-GPU setups, with the Mixtral instruct outperforming several fine-tuned models and prototypes of new GPU-enhanced models making headway.

  • LLMs in the Limelight for Code and Cybersecurity: For fine-tuning CodeLlama 34B, 4xA6000 GPUs might only suffice for a qlora, not a full fine-tune. T5 faces fine-tuning challenges with stability issues, while LLMs like WhiteRabbitNeo-13B and WhiteRabbitNeo-33B-v1 are recommended for offensive cyber tasks. The EvalPlus Leaderboard is a resource for evaluating AI coders, and there’s dialogue on hyperparameter essentials for fine-tuning with Llama Factory.

  • Project Obsidian Scripts and Refactors: A Python script for the 3b Obsidian model is sought for remote execution, and efforts are underway to refactor code for compatibility with the latest llava repo.


OpenAccess AI Collective (axolotl) Discord Summary

  • Base Over Instruct for Norwegian Fine-tuning: @le_mess advised using the base model instead of the Instruct variant for fine-tuning in languages with scarce training data, informed by @henriklied’s experience with a Norwegian dataset.

  • Epoch Caution to Avoid Overfitting: To prevent overfitting during fine-tuning, @le_mess recommended stopping at 4 epochs, especially in light of @henriklied’s observation of a constant eval loss after 10 epochs.

  • Block Characters Potential in Training: Unique block characters like ā–€ and ā–„ were debated by @suikamelon and @c.gato regarding their efficacy and tokenization in training models like Mistral over ChatML tags, with only a small dataset.

  • Quantization and Config Optimizer Discussions: Amidst quantization talks, @dangfutures preferred AWQ while @bjoernp and @noobmaster29 discussed the importance of using the config optimizer over the default deepspeed optimizer, referenced to a deepspeed config PR #1208.

  • Grant Celebration and Dataset Introduction: @gahdnah retracted a previous concern upon observing active development, and there was a celebratory note for a grant received. Meanwhile, @dangfutures highlighted a new dataset on Hugging Face for the Snorkel model, based on Mistral improvements.


LM Studio Discord Summary

  • LM Studio in Educational Settings: @josemanu72 looked for ways to run LM Studio as a server for connecting from students’ desktops, with a solution involving a frontend UI suggested by @fabguy.

  • Hardware Headaches and Help: Discussions about utilizing GPUs ranged from @xmiruso solving GPU utilization issues by reloading models to @gitanos seeking advice on GPU value, with a recommendation to choose a used Nvidia 3090 over the 4060ti 16GB RAM. Meanwhile, @mudf00t reported VRAM detection problems on Nobara with an RTX 3090, with no immediate solution.

  • Open Source Model Musings: @vbwyrde discussed the release of ā€œInternā€ (InternLM) and its supposed 200k context window and function calling capabilities. There were also strategic discussions around Meta releasing open-source models such as Llama2 and using models like Solar-10b for function calling.

  • Troubleshooting and Tips: Various users reported issues from a bug switching MoE models to errors in recent LM Studio updates leading to models failing to load, such as @pdg needing to downgrade to version 0.2.10 to address the issue. A broken link for empty strings in Autogen was reported by @sunglasses.emoji, and suggestive solutions for better model performance were shared.

  • Software Snafus: Conversations highlighted struggles with frameworks for open models and eccentric behaviors exhibited by models such as Mistral hallucinating nonexistent directories. There was an amusing note about OpenAI’s models not recalling the current API, affecting context during training.


Mistral Discord Summary

GPU Rental to Mistral Integration: GPU rental options including runpod, vast, and lambda were discussed, with Kaggle also mentioned as offering free access up to 30 hours weekly. Mistral 7B use cases and integration challenges were shared, seeking insights for effective implementations, referencing Hugging Face’s Mistral 7B model.

Memory Matters in Model Finetuning: Discourse around Mixtral’s large memory appetite for inference highlighted that 26GB is required across four T4 GPUs, with actual usage potentially higher than expected. Efficiency debates compared exllamav2 and bnb 4 bit for quantization, with a nod to exllamav2 GitHub for running LLMs efficiently.

Evaluating LLMs Beyond Traditional Metrics: Emphasis was placed on the inadequacy of BLEU and ROUGE metrics for LLMs, suggesting elo rankings (arena.lmsys.org) and benchmarks like MMLU and Alpaca eval for better performance measurements. The introduction of a normalized Alpaca eval market version was mentioned without further details.

Creative Showcases and Random RAG Tips: A tool named SoContextual.com that integrates AI for browser searches including DOM references was showcased, working with MistralAI and spotted on Hacker News. Meanwhile, prompt optimization for RAG applications was touched upon, recommending DSPy and sharing a prompting guide.

Platform Puzzles and API Anomalies: A billing page bug causing the monthly limit to reset to €150 was reported, while API bugs concerning the ā€˜max_tokens’ parameter and early stopping issues were discussed, including a posted GitHub issue. Hosting queries affirmed Mistral’s API is located on Azure in Sweden.


Eleuther Discord Summary

  • Quest for SAM Model Fine-tuning Code: @the_alt_man sought a codebase for fine-tuning Meta’s SAM model, only to learn that it wasn’t included in the original release. AutoGluon was discussed as a tool integrating with Lightning but limitations on GPU usage were noted.

  • Exploration of Federated Learning: A collaborative effort was made to discuss the practicality of federated learning, especially multinode training without infiniband and specifics of model merging. A reference to the DiLoCo study was provided, which can be found on arXiv.

  • Deep Dive into Proxy Tuning LLMs: @digthatdata introduced a method for tuning Large Language Models (LLMs) via proxies, potentially streamlining the tuning process. This alternative approach is detailed in a paper available on arXiv.

  • GPT-NeoX QLoRA Tuning Troubles: @kenakafrosty asked for assistance with tuning GPT-NeoX 20B using QLoRA, facing a non-descending loss issue. It was clarified that NeoX does not currently support QLoRA, and help was redirected to GitHub for solutions with trl, transformers, and peft.

  • Testing Woes and Collaboration in GPT-NeoX Development: Updates to Python, PyTorch, and CUDA brought about issues with running tests, setting off a discussion on the necessity of a functional testing suite for the project. Efforts to fix testing processes, track forked test failures, and provide compute access to project collaborators are active, as exemplified by this GitHub issue.


LAION Discord Summary

  • Dataset Disappearance Disrupts Developers: The LAION dataset, specific to laion2b-en aesthetics scores, is currently inaccessible as the dataset author has requested a temporary disablement. Engineers are advised to stay tuned to official announcements for updates on dataset access.

  • Voice Chat on the Verge: A new voice chat interface that integrates Whisper and WhisperSpeech with an LLM has been demoed, promising reduced latency and more natural conversation. Collaborators are being sought to further improve the system; details can be found on the Hacker News announcement.

  • Image Is Everything: Methods for image captioning using AI have sparked discussion around the importance of clear prompts to minimize hallucinations, centering on accurate descriptions of visible content only.

  • Competition Calls for Creative Coders: The AI Tech Sprint invites developers to contribute to a project focused on clinical encounter notes with a chance to win prize money. Interested parties should register their interest.

  • AI’s Big Expense Report: Discussions have taken place around the high costs involved in training AI models like Stable Diffusion, while acknowledging expected cost reductions due to technological advancements over time.

  • Google’s Missed SaaS Boat?: Google’s heavy reliance on advertising revenue was critiqued, and an alternative path focusing on SaaS models, like that by OpenAI, was suggested to potentially lead to deeper financial impacts.

  • Byte-Level Transformers Nearly There: Interest is rising around byte-level transformers with an expectation of significant progress soon, demonstrated by recent related arXiv research.

  • Restoration and Recognition Reimagined: Technological strides in text-to-image diffusion and identity preservation were highlighted through text-to-image diffusion tech and ID-preserving generation systems, promising new capabilities for AI-generated imagery.

  • Scaling the Summit with SUPIR: A paper on SUPIR, an image restoration method leveraging generative priors and model scaling, gained attention due to its innovative approach and mention among the top papers on Hacker News, detailed in the arXiv submission.


Perplexity AI Discord Summary

  • Perplexity Pro’s Power Unveiled: Enthusiasts discussed Perplexity Pro features, such as unlimited Copilot queries and access to AI models like GPT-4 and Claude 2.1, shedding light on enhanced capabilities beyond the standard offering.

  • Privacy Policies Under Scrutiny: Concerns about Perplexity’s data retention policies led to clarifications that deleted threads are purged after 30 days; however, ambiguities in privacy policies about account and search data provoked calls for clearer communication for user reassurance.

  • API Queries and Billing Grievances: Technical discussions unveiled discrepancies between Perplexity AI’s website and API, with the latter producing inferior code outputs, and users, including @aiagileguy, confronted billing issues, such as double charges, without quick resolutions.

  • Perplexity in Tutorials and Education: Users shared success stories and practical uses of Perplexity AI, like smoothing out the transition from Excel to Smartsheet and aiding in explaining complex astronomy concepts in educational settings.

  • A Vote for Perplexity over Giants: A YouTube video titled ā€œI use Perplexity MORE than Google and ChatGPTā€ depicted Perplexity AI as a superior choice over mainstream options such as Google and ChatGPT for tasks like content creation.


HuggingFace Discord Summary

  • HuggingFace Introduces Social Post Explorers: @lunarflu invited the HuggingFace community to join for early access to the ā€œPostsā€ feature which aims to provide a focused space for AI & ML discussions away from the noise of platforms like Twitter or LinkedIn.

  • Pretraining Predicaments and Cost-Effective Tactics: GPU-intensive model pretraining, as in the Llama-2-7B model, has the community contemplating less resource-heavy alternatives like fine-tuning or employing LoRA/QLoRA adapters.

  • Desperate for Data Set Evaluation Standards: @rosebei3ngan3g highlighted the lack of frameworks to evaluate data sets for large language models, which contrasts sharply with numerous model evaluation frameworks.

  • Insightful Innovations in Dataset Analysis and Demo Displays: A GitHub project on dataset cards analysis and a multi-language text-to-speech HuggingFace demo, WhisperSpeech, showcase the dynamic range of work within the HuggingFace ecosystem.

  • Recognition for Revolutionary Models and Metrics: Google’s Lumiere model, combining a Space-Time UNET for fluid video generation capabilities, is turning heads in the community alongside interest in a new release of gradio 4.16, which includes features such as support for Polars Dataframe and a new Gallery component, detailed in the changelog.


LlamaIndex Discord Summary

  • LlamaIndex Enthralls with LLM Webinar: A webinar on the LLMCompiler featuring a discussion on parallel function calls in agents was announced by @jerryjliu0, with resources such as LlamaPack and a Notebook being available. The related paper can be found here, and the webinar details here.

  • Stack of Innovations Unveiled: The Slack Bot Tutorial by @seldo instructs on integrating organizational learning into bots; Zilliz Cloud Pipeline is newly linked with LlamaIndex, as covered in a guest blog post. Version 0.9.38 of LlamaIndex now supports OpenAI’s latest embedding models with further details in their release notes, while TypeScript users get LlamaIndex.TS in a 0.1.0 version supporting the same.

  • Discourse in #general Heats up Around Retrieval and Customization: LlamaIndex lacks its own LLM for TextGenerationInference yet is compatible with Langchain’s. Complex retrieval scenarios and incorporation of OpenAI’s updated embedding models were also debated. In response to a query about extracting answers without sufficient context, a link to modifying default prompts was shared: Usage Pattern.

  • Zep’s Multifaceted Chat Features Scrutinized: Zep’s ability to remember conversations and perform entity extraction sparked interest with @yoursaviorjesus sharing Zep Documentation. Clarifying LlamaIndex’s functionality, @cheesyfishes described it as akin to Amazon Kendra with adaptability across any vector store or language model.

  • Innovations in Knowledge Graphs Shared: @chiajy demonstrated a self-learning knowledge graph with recursive retrieval and multi-hop reasoning, through a Harry Potter book demo. For a detailed exploration of this work, consult Harry Potter and the Self-Learning Knowledge Graph.


Latent Space Discord Summary

  • LLM Paper Club maintains no-recording policy: The Latent Space’s LLM Paper Club sessions will not be recorded to encourage open sharing. No option for replays will be provided.
  • Dreams Meet AI in Morpheus-1: A tweet launched the news about Morpheus-1, a multi-modal generative ultrasonic transformer designed for lucid dream induction, set for beta release in Spring 2024. The excitement revolves around its novel approach.
  • GPT-4 Turbo and New Embeddings Roll Out: OpenAI has released an updated GPT-4 Turbo model and new embedding models. Detailed notes and announcements were shared, highlighting improvements and potential impacts on AI applications.
  • Martian’s LLM Benchmarks Go Live: Martian has debuted a Model Router at Martian’s LLM Leaderboard to evaluate different LLM inference products, backed by open-source documentation and tools.
  • Asia Joins the Fold in LLM Paper Club: The LLM Paper Club has expanded to Asia, offering discussions on seminal papers like ā€œAttention Is All You Needā€. The club is soliciting suggestions for future papers and feedback to enhance the beta experience.

DiscoResearch Discord Summary

  • Mixtral and Merging Models Under the Lens: mergekit’s author provided insights in a GitHub comment, influencing DiscoResearch mixtral training approaches. Importance is placed on the correct application of auxiliary loss for Mixture of Experts (MoE) training.

  • Rethinking Data Filtering and Model Training Approaches: A new paper challenges the efficacy of quality filtering for pretraining data, pointing towards data selection aligned with model performance on target tasks, mentioned here. Conversations revolve around adopting new training methods like Direct Preference Optimization (DPO) and Key Term Optimization (KTO), with insights on using the DPO Trainer detailed in Hugging Face’s TRL documentation.

  • Advancements in Embedding Development and Usage: The German Jina embeddings model is soon to be released, promising enhancements for ranking applications. OpenAI’s new embedding models, featuring improved multilinguality, signify a leap forward, as detailed here.

  • Translation and Language Model Fine-tuning Feats: DiscoLM German 7B v1 has been successfully finetuned with a custom dataset intended to translate Middle High German to modern German. This fine-tuning process is eagerly anticipating versions based on Mixtral-Instruct.

  • Impending Efficiencies in Embedding Technologies: Upcoming embeddings are set to outrun OpenAI’s on the MIRACL benchmark, delivering a 12x saving on vector database costs with only 256 dimensions, as teased by @Nils_Reimers in this tweet.


LLM Perf Enthusiasts AI Discord Summary

  • OpenAI Unleashes Embedding Innovation: OpenAI’s latest announcement details the launch of GPT-4 Turbo and new embedding models, tools for better API control, and imminent lower pricing for GPT-3.5 Turbo. Engineers are spotlighting shortened embeddings as a leap in efficiency, looking forward to integrating these into their systems.

  • Updated OpenAI APIs Ease Developer Journey: OpenAI’s commitment to enhancing the API experience includes new moderation models and API usage management tools aimed at refining developers’ oversight. Developers now have an updated documentation guide to navigate the latest models and features.

  • Ease Over Open Source: The debate between OpenAI and open-source models unfolds, with professionals like @res6969 pointing out the speed of feature implementation with OpenAI, while others advocate for the customizability of open-source alternatives.

  • Convenience Can Trump Customization: Despite the availability of open-source models for personal fine-tuning, members like @potrock stress the straightforward, out-of-the-box convenience offered by OpenAI’s embedding models.

  • Striking a Cost-Effectiveness Balance: The economic conversation shifts to cost benefits of OpenAI’s new larger embedding models, as discussed by @shacrw and @michelcarroll, presenting a balancing act between storage savings and API costs in the wake of these updates.


LangChain AI Discord Summary

  • Welcome Aboard to LangChain Universe: @quarknova, currently interning at INRIA, showcased their interest in applying LangChain in their projects, prompting them to consider the advantages of the GitHub version against its commercial counterpart.

  • Custom AI Personalities Now Tailored: @jstansbe examined the possibility of creating customized AI entities like an ā€œElon Musk AI,ā€ and @ksolo__ contributed a resource, introducing the concept of finetuning and sharing a deep learning course link.

  • LangChain Applause for Chatbot Creation: @johnnucleus recognized the LangChain community for its effective assistance in rapidly developing a chatbot integrated with web search functionalities using LangChain and Streamlit.

  • LLMs Turn Data Synthesizers: Discussion involved using Large Language Models (LLMs) for synthetic data generation to feed traditional ML models, with special mentions of employing LLMs for RAG generation to create SQL queries based on context and schema.

  • Manipulating PARQUET in LangChain: @benjaminbascary and @johnny2x2 exchanged insights on handling PARQUET files within LangChain, with code examples via pandas and the DataFrameLoader feature.

  • Dive into LangServe’s Capabilities: @veryboldbagel shared examples and resources on creating custom tools and agents using LangServe and LCEL, stressing the utility of LangGraph to construct agents with enhanced expressive power.

  • The Mysterious Case of the Missing Stream Response: @hiranga.g faced challenges with stream responses while experimenting with LangServe’s agent_with_history, highlighting a potential bug when incorporating Agents via LangServe with chain.streamLog().

  • SQL Chain’s Battle with Database Giants: @johnny2x2 recounted experiences of SQL Chain’s difficulty in handling large databases and found that crafting curated views with descriptive names within the databases amplified performance.

  • Improved SQL Query Management Through Refinement: @johnny2x2 described the shift from utilizing a local AI to relying on OpenAI for SQL query processing in order to maintain data privacy, leading to a more efficient querying process within LangChain.

  • Task Processing Elevated with Chain Workflow: Introducing a new methodology, @johnny2x2 describes the transition to using each SQL query as an individual tool in their task processing chain, leading to significant improvements in workflow management.

Please note that any direct references to usernames were included as they were considered contextually relevant based on the information provided.


Datasette - LLM (@SimonW) Discord Summary

  • Heads-Up for an LLM Library Update: @SimonW announced an upcoming update to the openai library within the LLM project, with detailed information for testers in a GitHub comment.

  • LLM Strides Towards 0.13: The up-and-coming 0.13 Milestone of the LLM release, aimed at enhancing command-line accessibility to large language models, is documented on the GitHub milestone page.

  • Call for Coders to Tackle Readline Bug: There’s an open call for assistance regarding a readline issue in LLM where arrow keys display ANSI codes instead of navigating the cursor, as detailed in this GitHub issue.

PART 2: Detailed by-Channel summaries and links

TheBloke ā–· #general (1212 messagesšŸ”„šŸ”„šŸ”„):

  • Trying Out Hermes 2.5: User @sco7116 asks for help testing a model named OpenHermes-2.5-Mistral-7B-4.0bpw-h6-exl2 from Huggingface, but struggles with the error ā€œCould not locate pytorch_model-00001-of-00002.bin.ā€ @itsme9316 suggests they are not loading it with exllamav2 and should switch to the correct version.
  • Pros and Cons of Using RHEL for ML: @kquant reports successfully running Linux and moves to Ubuntu for synthetic dataset generation. The discussion includes various opinions on using Red Hat Enterprise Linux for machine learning, with @.dontriskit sharing insights on preferred infrastructure and challenges with RHEL in an ML development context.
  • DPO Scripts and Techniques Shared: @kaltcit offers information on their approach to dataset pruning optimization (DPO) for models, focusing on generating a dataset that captures common GPT failure patterns such as looping and topic drifting. They suggest such a dataset could be the most comprehensive collection of GPT flaws.
  • Fascination with LLM Predictions: Users discuss various finetunes and merges with models like Lama and ChatGPT. @itsme9316 finds that a 7B 20 million token discord message finetune surpasses several larger merges in quality and even suggests they might attempt a 500 million token finetune.
  • AI Running on Nintendo Switch: Users shared videos and comments on running LLMs on surprising hardware. @kquant expresses amazement at the possibility of running models like Llama and Mistral on a Nintendo Switch, while @kalomaze contributes to the theme with media showing LLMs in action on unconventional platforms.

Links mentioned:


TheBloke ā–· #characters-roleplay-stories (74 messagesšŸ”„šŸ”„):

  • ExLLamaV2 Loader Support Question: @ks_c initially questioned whether the exllamav2 loader in oobabooga supported min_p, despite not being an hf loader, but then confirmed that it was merged into exllama.
  • CPU Mode Confusion Cleared: @neriss informed @keyboardking that exl2 cannot run on CPU after @keyboardking asked about its utility compared to gguf in CPU-only mode.
  • Model Configuration Comparison: @dreamgen inquired about differences in rope_theta and sliding_window configurations among various models, sharing links to the bagel, Mistral instruct, and dolphin config files. @jondurbin replied, explaining inheritance from the base model and the potential for future changes.
  • Role-Play Models Discussion: @shadowplague requested recommendations for models adept at generating content for disrespectful, abusive, racist, and dirty-talking scenarios for role-play, with @c.gato and @kalomaze pointing out that existing models can be prompted for such content and suggesting Kunoichi DPO v2 or Fett-uccine for role-play purposes.
  • 7B Parameter Models for RP: Discussing the best role-play models with 7B parameters, members provided various suggestions such as HamSter-0.2, Kunoichi DPO v2, and Fett-uccine, while touching on both 6 and 8-bit quantization depending on VRAM capacity.

Links mentioned:


TheBloke ā–· #training-and-fine-tuning (20 messagesšŸ”„):

  • Chatbots Learning Names and Styles: @lordofthegoons is looking to train a chatbot to have a consistent conversation style and remember its own name, much like the Samantha model. @amogus2432 suggests using 10 to 100 examples for style transfer, but @dirtytigerx recommends more, citing that the Samantha model used around 6,000 multiturn conversations.
  • The Quest for Financial Advisor Chatbot: @VR is seeking advice on creating a financial investment advisor chatbot that can run on a 24GB GPU and utilize RAG for up-to-date stock price information, trends, and expert analysis. They are considering prompt tuning versus fine-tuning on financial documents.
  • Building a Unique Chatbot Persona: @lordofthegoons aims to create a chatbot with a specific persona by producing a custom dataset. They note difficulties in achieving variation when using ChatGPT to generate examples and are considering manually creating the dataset due to challenges with rate limiting.
  • Financial Constraints Influence Dataset Building: @dirtytigerx points out the high cost associated with using the GPT-4 API for dataset generation and the inefficiency of waiting out ChatGPT’s rate limits. They suggest experimenting with local large language models (LLMs) as a more cost-effective option.
  • Rate Limits Prompt Creative Solutions: @lordofthegoons expresses intent to manually build a chatbot dataset using ChatGPT while dealing with rate limitations. @dirtytigerx further advises that utilizing services like runpod to run large LLMs locally might be cheaper and more efficient than facing rate limits with OpenAI’s API.

TheBloke ā–· #model-merging (19 messagesšŸ”„):

  • Optimizing Weight in Model Merging: @sanjiwatsuki proposed a hypothesis that setting the model weight slightly above 1.0 could be optimal due to the TIES resolution process potentially causing some effective weight to drop out.
  • Exploring Negative Weights in Script: @kquant inquired whether negative numbers could break the merging script. @sanjiwatsuki expressed uncertainty but speculated that the code might handle negative weights without issues.
  • Selective Model Assimilation: @kquant discussed the possibility of selectively merging models to assimilate desired characteristics, mentioning methods like DARE and SLERP that could potentially combine two models with high evaluation scores on different benchmarks.
  • SLERP and Overfit Models’ Performance: @kquant noted an unexpected result where two overfit models were merged using SLERP and managed to maintain their test positions, which raised questions about the impacts of overfitting in model merging contexts.
  • Merging Methods Clarification Needed: @kquant mentioned a need to better understand how DARE and SLERP differ in the context of model merging, expressing a desire to conduct more research and testing.

TheBloke ā–· #coding (1 messages):

  • New User Seeking LangChain Guidance: User @nandavikas expressed difficulty in replicating a fine-tuning process with LangChain that they previously accomplished using PyTorch. They are seeking assistance to fine-tune Llama2 for extracting specific information from PDFs and couldn’t find relevant documentation in LangChain docs.

OpenAI ā–· #ai-discussions (35 messagesšŸ”„):

  • Speed Concerns with GPT-4 Document Analysis: @romansh2302 highlighted a significant speed discrepancy between GPT-4 on the web and the gpt-4-1106-preview model via the API, with the latter being slower. @lugui responded that varying API speeds can occur during peak usage and mentioned the possibility of requesting dedicated capacity, which tends to cater to company-scale use.

  • Looking for GPT-4 Performance Solutions: While seeking a speed remedy, @romansh2302 was informed by @lugui that the speed differential is tied to peak times and API loads, a situation not easily fixed. @og_tort suggested considering GPT-3.5 as an alternative; however, @romansh2302 found it less effective for document analysis.

  • Users Faced with Account Issues: @hellomyfriend1576 reported receiving a warning of unusual activity from their system when using GPT-4. Answers from community members like @lugui and @og_tort entertained the possibility of VPN or proxy usage and potentially flagged prompts as reasons for the issue.

  • Typos in DALL-E’s Text Generation Prompted a Query: @alwayspercipient noted DALL-E frequently includes misspellings in its image creation. This issue has been discussed in the community, as pointed out by @muyfashionista, who also offered a community link on the subject and mentioned that using shorter text might reduce errors.

  • Confusion Over AI Services and Team Accounts: Users like @paras4887 and @leithwhitley posed questions about specific use cases, such as deploying GPT-4 Vision on Azure and issues about collaboration using a shared paid team GPT account. Solutions or clear guidance were not offered in the provided message chain.

Links mentioned:

TuringsSolutions/PFAF750 Ā· Datasets at Hugging Face: no description found


OpenAI ā–· #gpt-4-discussions (105 messagesšŸ”„šŸ”„):

  • Troubleshooting ā€˜Always expand code output’ Feature: @darthgustav. clarified for @angry_coder that ā€œAlways expand code outputā€ means wrapped code blocks will always be expanded for easy reading, after testing the feature themselves.
  • Unpacking Libraries in GPT: @bambooshoots suggested uploading a library as a zip file with a .py file that unzips it and adds the /mnt/data/ folder to the system path, a method supported in the past. @darthgustav. expressed concern about potential security issues and halted testing of this environment expansion.
  • CustomGPT Edits Require New Conversations: For @elegante94, @darthgustav. confirmed that to see the effects of a CustomGPT edit, a new conversation is required as ongoing conversations won’t update with the new changes.
  • Image Attachments in GPT Prompt Instructions: @elegante94 queried about the effectiveness of attaching images to prompt instructions, and @darthgustav. responded that using concise language is better than attaching images because DALL-E will improvise creative elements.
  • Saving/Updating GPT Bots Issues: @rodney.leonardo encountered errors when saving/updating a GPT bot and sought assistance. @darthgustav. suggested removing the knowledge, saving as private, then reattaching files one by one, noting a possible block due to mimicking a living artist which is not permitted.

OpenAI ā–· #prompt-engineering (558 messagesšŸ”„šŸ”„šŸ”„):

  • Engaging in Prompt Engineering: @darthgustav. advised @novumclassicum on creating API configurations for custom workflows, including the use of OpenAPI for Python scripting and Custom Actions in GPT. They also discussed strategies for ensuring standardized output despite potential deviation.
  • Document Chunking Dilemma: @doublefelix sought a way to chunk a large text into paragraphs for grammar correction via AI, while @eskcanta recommended a method using Python for smaller sections. Avi and Felix debated the best practice for this task, with Avi suggesting the use of semantic text splitting via AI.
  • AI-Powered Workflow Challenges: @doublefelix tested various approaches to directing GPT-3.5 to add paragraph markings and address grammatical issues in a transcribed sermon, encountering issues with compliance and hallucinations due to token limits and memory holes in the AI context.
  • Exploring GPT-4 Turbo as a Solution: @darthgustav. proposed utilizing GPT-4 Turbo’s Python Tool to semantically analyze text and generate paragraphs, bypassing the chunking limitations @doublefelix faced with GPT-3.5. Darthgustav also highlighted issues with excessive token usage and lost context in large document processing.
  • Considering NLP Tools for Text Splitting: Frustrated with the complexity of managing GPT-3.5’s limitations, @doublefelix decided to explore third-party NLP tools like semantic-text-splitter for potentially automating the text chunking process for large documents while acknowledging the higher capabilities of GPT-4 Turbo for such tasks.

Links mentioned:

How do you maintain historical context in repeat API calls?: Each time I make a call to the API it starts off with no prior context, unlike the chat.openai.com scenario. Is there a way to maintain state of the model during a session? response = openai.Completi…


OpenAI ā–· #api-discussions (558 messagesšŸ”„šŸ”„šŸ”„):

  • Prompt Engineering Proves Challenging: User @doublefelix has been working to prompt GPT 3.5 to process large amounts of text and add paragraph breaks. Despite various strategies, including breaking the task into smaller chunks, the AI has difficulty managing context and often ignores parts of the input text or hallucinates.

  • Finding the Right Approach: The conversation highlighted the complexity of managing AI context and the notion of ā€˜document retrieval curves’. @darthgustav. suggested that GPT-4, particularly with ChatGPT Plus, might be able to better manage the task due to its larger context window and ability to process attached files using retrieval-augmented generation (RAG).

  • API vs. ChatGPT for Cost-Effective Solutions: @doublefelix is exploring options to keep costs minimal while automating the processing of transcribed sermons. @darthgustav. pointed out that using the ChatGPT interface with custom instructions could avoid token costs associated with the API method.

  • The Potential of Custom Prompts: @darthgustav. noted the importance of structuring prompts with explicit instructions and ā€œopen variablesā€ that encode instructions, which might allow for more nuanced control over the AI’s output and help with the task of breaking text into sections.

  • Exploring Alternatives and Considering Next Steps: The conversation indicated a fallback plan involving Custom GPTs and traditional NLP methods with a Python tool as alternatives. @doublefelix plans to research NLP packages, such as a semantic-text-splitter found on PyPI, to find a workable solution.

Links mentioned:

How do you maintain historical context in repeat API calls?: Each time I make a call to the API it starts off with no prior context, unlike the chat.openai.com scenario. Is there a way to maintain state of the model during a session? response = openai.Completi…


Nous Research AI ā–· #ctx-length-research (7 messages):

  • Seeking the Best Context Extension Solution: @cryptossssun inquired about the best current method to extend context capabilities. @_cherrry responded positively to a paper, suggesting finetuning is a viable solution.

  • Mistral Instruct Moves Away from Sliding Window: @dreamgen discussed the implications of Mistral instruct v0.2 disabling sliding windows in favor of scaling rope_theta, questioning the effectiveness of the sliding window approach. They shared a configuration file showing these changes.

  • MistralLite Mimics Main Branch Configurations: @dreamgen also noted that amazon/MistralLite follows the same configuration strategy as its main branch counterpart concerning context window management.

  • Remarkable Efficiency in LLaMA-2-7B-Chat Context Extension: @stellaathena highlighted an impressive feat where the LLaMA-2-7B-Chat model’s context window was extended to 16,384 with only 100 samples and 6 training steps.

  • SelfExtend as a Non-Finetuning Alternative: In the discussion on context extension, @leontello mentioned SelfExtend as an intriguing option for those who prefer not to fine-tune their models.

Links mentioned:

config.json Ā· mistralai/Mistral-7B-Instruct-v0.2 at main: no description found


Nous Research AI ā–· #off-topic (5 messages):

  • Technological Impact on Society’s Extremes: @ldj contemplates that technology might lead to further polarization, where the least capable individuals become more entrenched in degeneracy, while the most capable are propelled towards greater self-improvement.
  • Twitter’s Fluctuating Activity Puzzle: @fullstack6209 observes a radical change in the frequency of new Twitter posts, questioning if anyone else noticed this shift from 2-3 posts every 10 minutes to about 70 posts in a minute.
  • Twitter’s AI Slow Down Theory: @fullstack6209 brings up a suggestion that Twitter might have deliberately slowed down the AI to explain the changes in the frequency of posts observed.
  • Quantizing LLMs Made Easy: @pradeep1148 shares a YouTube video titled ā€œAutoGGUF Quantize LLMs in GGUF format in one click,ā€ providing a resource for converting large language models to GGUF format using a shared notebook.

Links mentioned:


Nous Research AI ā–· #benchmarks-log (2 messages):

  • Benchmark Request for Everyone Coder 33B: @benxh requested a human eval benchmark for a quantized version of Everyone Coder 33B, which uses the new GGUF format introduced by llama.cpp. The model was made available by TheBloke on Hugging Face, and the quantisation was supported by Massed Compute.
  • Call for Hermes Mixtral Evaluation: User @teknium expressed a desire to see a benchmark on a combined Hermes and Mistral model, referring to it as Hermes Mixtral, and requested it with a hopeful šŸ™ emoji.

Links mentioned:

TheBloke/Everyone-Coder-33B-Base-GGUF Ā· Hugging Face: no description found


  • New OpenAI Embedding Models Launch: @tsunemoto shared OpenAI’s announcement unveiling a new generation of embedding models, GPT-4 Turbo, updated moderation models, and cost reductions for GPT-3.5 Turbo. Data privacy by default and improved API management tools were highlighted, as well as lower pricing for new embedding models.

  • Genie: A Method for High-Quality Synthetic Data: @metaldragon01 introduced a paper about Genie, a novel method for creating high-quality content-grounded data, detailed in the published work on arXiv. Genie is claimed to produce data so refined that in human evaluations, it was found to be natural and high-quality, with implications for improving Long-Form Question-Answering and summarization models.

Links mentioned:


Nous Research AI ā–· #general (362 messagesšŸ”„šŸ”„):

  • Introducing GPU-Accelerated AI: User @n8programs discussed performing machine learning computations using WebGL, typically used for texture processing in video games, by packing data into textures because WebGL doesn’t support raw buffers (as @everyoneisgross and @n8programs exchanged ideas). This method enables GPU acceleration, as practiced by TensorFlow.js, despite current limitations such as a maximum vector size of 16,000 elements.
  • Wise Words on Model Training: @intervitens advises that a server or high-end desktop (HEDT) motherboard is necessary for effective multi-GPU setups due to the requirement of ample PCI-e lanes, suggesting second-hand Gen2 EPYC for a balance of performance and economy. The discussion includes options like mining-style chassis, two-slot spacing motherboards for 4x 2-slot graphics cards, and bespoke water cooling setups.
  • Real-Time Word Embedding Adjustments: User @everyoneisgross described a system for quick on-the-fly adjustments to word embeddings in a word2vec model, allowing real-time feedback by increasing or decreasing weights based on user input. This process is functionally quick on small corpuses, making it practical for expansion or refinement based on fresh data fetched by an LLM, which in this case was Mistral instruct.
  • Mixtral Instruct Surprisingly Strong: @gabriel_syme asked why Mixtral instruct performs better than many fine-tuned models. @intervitens responded that Mixtral instruct is particularly adept at following instructions and there might be unresolved issues with MoE model fine-tuning.
  • Exploring GPU Enhancements for Models: @carsonpoole shared progress on adapting a variant of the phi2 model with modified fine-tuning, planning to publish weights on Hugging Face and potentially develop a LLaMA-based model variant. The model has also been fine-tuned with chatml data and tokens, integrating them into the model’s knowledge.

Links mentioned:


Nous Research AI ā–· #ask-about-llms (35 messagesšŸ”„):

  • Seeking Specs for CodeLlama Fine-Tuning: @ganymede123 inquired about workstation specifications for fine-tuning CodeLlama 34B and considered 4xA6000 GPUs. @teknium responded that it would only suffice for a qlora and stated that a full fine-tune would require nearly a full DGX setup.

  • T5 Fine-Tuning Difficulties: @maxpappa is facing challenges with aligning a fine-tuned version of T5, noticing deterministic output and reward-accuracies steady at 0.5. Despite tweaking optimizers and schedulers, suggestions such as avoiding paged 8bit Adam by @locutusque, and clamping infs particularly in the encoder from @carsonpoole were offered to handle the apparent numerical instability.

  • LLMs for Offensive Cyber and CTFs: @useewhynot sought recommendations for LLMs suitable for offensive cyber or CTF challenges. @kenakafrosty and @georgejrjrjr recommended WhiteRabbitNeo-13B and WhiteRabbitNeo-33B-v1, with options available on HuggingFace.

  • Evaluating AI Coders: @findmyke inquired about the best coding LLM currently available. @.ben.com linked to the EvalPlus Leaderboard, which evaluates AI coders with rigorous tests, as a resource for making an informed decision.

  • Fine-Tuning with Llama Factory: @moconna expressed an intent to fine-tune Mixtral using Llama Factory and asked for advice on necessary hyperparameters. No specific hyperparameters or templates were provided in the discussion that followed.

Links mentioned:


Nous Research AI ā–· #project-obsidian (3 messages):

  • Seeking Python Script for 3B Obsidian: @vic49. is in search of a simple Python script that utilizes the transformers library to work with the 3b Obsidian model. They specify that the code should allow for remote execution (remote code = true).
  • Code Refactor in Progress: @qnguyen3 confirms they are working on refactoring code to be compatible with the latest llava repo for enhanced functionality with 3b Obsidian.

OpenAccess AI Collective (axolotl) ā–· #general (219 messagesšŸ”„šŸ”„):

  • Base Model Preferred for Fine-tuning in Specific Cases: @le_mess recommended using the base model over the Instruct variant when fine-tuning on languages with limited foundational training data. The suggestion came after @henriklied shared his fine-tuning approach with a dataset of 100k Norwegian articles.

  • Training Length and Overfitting Concerns: @le_mess advised to stop training at 4 epochs to prevent overfitting, responding to @henriklied, who observed flatlining eval loss during his 10-epoch fine-tuning. @henriklied also shared a link to the output from a debug flag (prepare_dataset) to diagnose the training setup.

  • Effective Chat Format for Training Models: @suikamelon and @c.gato discussed the effectiveness of different chat formats for training language models, with @suikamelon introducing ā€œBlockMLā€ using unique block characters to potentially improve token efficiency. There was also a mention of the challenges related to integrating ChatML tokens in training due to their rare occurrence.

  • Discussions on Model Training with Uncommon Tokens: @suikamelon reported tentative success in using unique block characters in place of ChatML tags, noting ā–€ and ā–„ offered reliable tokenization with Mistral, despite a limited dataset of about 100 examples.

  • Regarding Model Quantization and Optimizer Settings: @dangfutures expressed preference for quantization using AWQ and @bjoernp sought clarification on whether setting the optimizer in their config would override the default deepspeed optimizer, leading to @noobmaster29 confirming that the config optimizer should be used due to removal from default deepspeed config. A relevant deepspeed config PR by @winglian was also mentioned (PR #1208).

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #axolotl-dev (1 messages):

  • Quick Retraction from gahdnah: @gahdnah retracted a message after noticing the active development in the latest commits, indicating a well-monitored and rapidly evolving project area.
  • Community Celebrates a Grant: @gahdnah expressed excitement and congratulations for a grant received, celebrating the positive news with emoji flair. šŸŽ‰šŸŽ‰šŸŽ‰

OpenAccess AI Collective (axolotl) ā–· #general-help (47 messagesšŸ”„):

  • Loading Models without BitsandBytes: @matanvetzler encountered a ValueError when attempting to load a qlora-trained model into vLLM due to it not supporting bitsandbytes quantization. They were advised that vLLM can load the model in fp16, or use AutoAWQ for quantization to fit within VRAM constraints.
  • Merging QLoRA-trained Models: @stefangliga provided a link to a merge script on GitHub for combining a qlora-trained model back into the base model, a process separate from the model serving mechanism. They further suggested that Tim Dettmers recommends merging a quantized model, linking to a twitter post for elaboration.
  • SQL Dataset Confusion: @sadaisystems expressed concern about an extraordinarily low loss after a few training steps on a SQL dataset, wondering if this indicates a lack of diversity in the dataset or the model’s proficiency. @caseus_ reasoned that SQL’s deterministic nature might explain the low loss and suggested halting training unless there are complex cases in the data.
  • Benchmark Evaluations During Training with Axolotl: @caseus_ informed @sadaisystems about the do_bench_eval: true option in axolotl to run mini evaluations during training, pointing out that they use datasets from dharma-1 as benchmarks, which are useful for relative improvement checks.
  • Continuous Pretraining Inquiry: @nickbro0355 asked for assistance on how to conduct continuous pretraining on Mistral, with @dangfutures indicating they have been attempting that, and @caseus_ inquiring about the specific dataset to provide further help.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #datasets (7 messages):

  • New Dataset Dropped: User @dangfutures highlighted a new dataset on Hugging Face which was used to train the Snorkel model. The dataset leverages only the prompts from UltraFeedback, with no external LLM responses.

  • Snorkel-Mistral Training Methodology Explained: The dataset’s creation involved generating 5 response variations from Mistral-7B-Instruct-v0.2 for each prompt, reranking them with PairRM, and then applying Direct Preference Optimization (DPO) for LLM updates across three iterations.

  • Mistral Gets an Upgrade: The user @dangfutures expressed that Mistral 7 has been finetuned, which is likely a reference to the improvements mentioned in the dataset methodology.

  • ALPACA’s Evaluation Metric Mentioned: The user @dangfutures mentioned a number associated with ALPACA, although the specific context or meaning of ā€œ34 percentā€ was not clarified.

  • Impressive Performance Noted: A follow-up by @dangfutures indicated that despite the initially perceived low percentage, the performance was noted to be better than an older version of GPT-4.

  • A Playful Response: User _dampf shared a gif from Tenor in what could be interpreted as a reaction to the preceding messages, though the context of the gif’s use was not elucidated in the conversation.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #rlhf (2 messages):

  • Seeking Wisdom on DPO Training Plots: @noobmaster29 asked if there are any resources available to better understand the dpo training plots. However, there were no responses or resources provided in the chat history.
  • Dataset Dilemma for DPO Training: @noobmaster29 inquired about the necessary components for a dpo dataset, mentioning they have included prompt/input and chosen rejected pair columns but are experiencing issues with the dataset processing. There was no further clarification or troubleshooting advice provided in the chat history.

OpenAccess AI Collective (axolotl) ā–· #community-showcase (1 messages):

pradeep1148: https://www.youtube.com/watch?v=wlPxEq_Mtkc


LM Studio ā–· #šŸ’¬-general (115 messagesšŸ”„šŸ”„):

  • Seeking Classroom Connection: @josemanu72 inquired about running LM Studio as a server in a classroom and connecting from students’ desktops. @fabguy suggested a frontend and a reverse proxy setup; later mentioned solving the issue in another channel.
  • GPU Woes on Ubuntu: @xmiruso faced difficulties with LM Studio not utilizing the GPU on their Ubuntu setup with a Geforce Nvidia 3090. After discussing with @fabguy and others, the problem was resolved by ejecting and reloading the model, leading to increased processing speed.
  • Proxy Issues Hinder Model Search: User @laooopooo_02864 faced challenges using the model search function due to proxy issues, and @heyitsyorkie deduced they might be in a country where huggingface is blocked.
  • Local Model Access from Mobile: @cloakedman sought a way to access their LLM model from a phone, and @wildcat_aurora provided a solution with a GitHub link to LM_Chat_TTS_FrontEnd (front-end) that permits interaction with LM Studio models.
  • Discussing Best Practices and Error Resolution: Different users discussed the best coding AI in LM Studio, parallel model running, contacting support for errors, as well as recounting issues and fixes they’ve experienced, including GPU driver warnings shared by @fate4real regarding AMD graphics cards.

Links mentioned:


LM Studio ā–· #šŸ¤–-models-discussion-chat (54 messagesšŸ”„):

  • C++ Redist Resolution to Model Loading Error: @rparada encountered a model loading error with several models including Stable Code, Deepseek, Codellama, but was able to resolve the issue by updating the C++ redistributables on the suggestion of @heyitsyorkie.

  • Assessment of Model Capabilities: @heyitsyorkie commented that the Magicoder DS 6.7b and GPT4 are closely matched in performance, while also elaborating that there isn’t a single local multimodal open-source model available to rival GPT4.

  • Azure Cloud Usage for GPT4: @mickael6102 shared that their company is running GPT4 locally on Azure cloud. This sparked a conversation with @vbwyrde about data privacy concerns, costs, and the relation between Microsoft and OpenAI regarding usage and control of proprietary data.

  • Open Source Model Options: @vbwyrde discussed a new model called ā€œInternā€ (InternLM) and provided a link to it with claims about its exceptional abilities such as a 200k context window and function calling. @mickael6102 responded with interest and mentioned using Solar-10b for function calling.

  • Debating the Strategy of Meta’s Llama2: In response to @vbwyrde, @.gumdro and @ptable speculated on Meta’s rationale for providing open-source models like Llama2, suggesting reasons such as setting a standard to benefit from downstream product development and taking market space from competing services like OpenAI.

Links mentioned:


LM Studio ā–· #🧠-feedback (4 messages):

  • Bug Alert in Switching MoE Models: @msz_mgs reported a bug where changing from a 4X MoE model to a 2X MoE model caused an error that wouldn’t allow changes, necessitating an app restart.
  • Thanks and Request for Model Details: @yagilb acknowledged @msz_mgs’s bug report and asked to share details about the 2x moe and 4x moe models being used for further investigation.
  • Insight on MoE Model Configuration: @dagbs offered a tip regarding setting the num_experts_used config, suggesting that for a 4x model, the correct setting should be 2 experts.
  • Performance Issues with Latest Update: @golangorgohome expressed concern about version 0.2.11 performing poorly on Windows 11 with 32GB RAM, citing slow search icon response and long search times despite having a fast internet connection.

LM Studio ā–· #šŸŽ›-hardware-discussion (11 messagesšŸ”„):

  • GPU Preferences for Value: @gitanos inquired about the value of the 4060ti 16GB RAM, prompting a response from @heyitsyorkie who recommends investing in a used 3090 for better value, costing only slightly more in the UK.
  • Compatibility Concerns with e0.211: User @madan.pandit reported issues with models ceasing to function, specifically when using version e0.211. Another user, @heyitsyorkie, indicated no issues on their end but noted deprecations of GGML models in favor of llama.cpp.
  • Memory Error with gguf Models: @madan.pandit mentioned receiving an error about insufficient memory when attempting to utilize gguf models.
  • M2 Mac Studio Endorsed for LLMs: @heyitsyorkie advised that buying a maxxed out M2 Mac Studio is the perfect choice for running Large Language Models, noting its small form factor and aesthetic appeal.
  • Mixed Opinions on Older GPUs: A conversation between @docorange88, @wildcat_aurora, and @rugg0064 covered the viability of using P40 or M40 GPUs for machine learning. The consensus appears to favor P40 GPUs while M40 GPUs are generally not recommended.

LM Studio ā–· #🧪-beta-releases-chat (46 messagesšŸ”„):

  • VRAM Detection Issues on Nobara: @mudf00t reported that LM Studio wasn’t recognizing the VRAM on an RTX 3090 on Nobara. @yagilb provided a workaround, but noted that the solution was for Nvidia on different setups, not applicable for @pdg’s issue on a Mac M2.

  • Models Fail to Load in Final Version: @pdg encountered issues with all models after upgrading to version 0.2.11, which previously worked in an older version. Downgrading to version 0.2.10 via this link provided by @yagilb led to a new set of errors, with requests for a link to an even older version, 0.2.9.

  • App Download Stalling Issue: @khalifa007 faced a problem where the app download would get stuck. @yagilb suggested that the issue might be related to the user’s internet connection or firewall, and considering the use of a VPN might help.

  • Unusual RAM Error and Interim Fix: @pdg reported an error indicating insufficient RAM, despite having 16 GB available. They discovered that starting with a short sentence and allowing the model to respond first avoids errors when submitting longer text.

  • Insight on Context Length Settings: @mattjcly_55150 suggested that the error experienced by @pdg could be due to the initial context length setting and recommended adjusting it or the context overflow policy to avoid errors with longer input texts.


LM Studio ā–· #autogen (1 messages):

  • Broken Link for Empty Strings in Autogen: @sunglasses.emoji reported that the pinned link regarding empty strings is broken and is seeking assistance on creating a custom agent class in autogen studio. No further details or a resolution were provided.

LM Studio ā–· #open-interpreter (10 messagesšŸ”„):

  • Struggles with Frameworks for Open Models: @pefortin expressed frustration that open models like memGPT, crewai, and Open Interpreter are failing to properly use tools and elements they have access to, despite running medium-sized models such as mixtral8x7B and deepseek coder 33B.
  • mudf00t’s Model Exploration: @mudf00t is testing various models and highlighted that having an RTX 3090 allows loading significantly larger models than some others might be able to.
  • API Amnesia: @mudf00t humorously pointed out that OpenAI’s models, including those accessed via API, don’t seem to recall the current API itself, causing context issues during training.
  • Fine-Tuning Focus: @222gate mentioned discontinuing integration with memGPT and is looking into fine-tuning a mistral model for specific function calls, similar to efforts seen with memgpt datasets.
  • Hallucinating Directories with Mistral: @mudf00t shared an amusing instance where Mistral created an imaginary directory structure complete with a node app and showed code for a non-existent file.

Mistral ā–· #general (163 messagesšŸ”„šŸ”„):

  • GPU Rental Options Discussed: @mrdragonfox recommended services like runpod, vast, and lambda for renting GPUs by the hour, and later mentioned that Kaggle offers free access to GPUs for up to 30 hours per week.
  • Mistral Spending Limits and Support: @glorfsf raised an issue with changing the spending limit in the subscription options, which @mrdragonfox clarified defaults to €150. @mrdragonfox also suggested contacting [email protected] for assistance with changing spending limits.
  • BART Model Limitations and LLM Suggestions: @miraimech expressed dissatisfaction with the BART model from Hugging Face for production use, to which @mrdragonfox responded by suggesting the use of open-source models with higher context windows.
  • Model Discussion and API Issues: @ethux and @i_am_dom discussed the application of Mistral models and the intricacies behind model versions used in GitHub Copilot, with @mrdragonfox clarifying its current backend and use of GPT-3.5.
  • Mistral 7B’s Integration and Use Cases Inquiry: @sophiamyang asked for interesting use cases of Mistral models, while @ethux and @f127467 shared their experiences and challenges with model integration, seeking community insights into effective implementations.

Links mentioned:


Mistral ā–· #ref-implem (9 messagesšŸ”„):

  • Finetuning vs. Inference Memory Requirements: User @l0gr1thm1k clarified for @ethux that the memory capacity in question is for finetuning rather than training, emphasizing that the concern is about loading the model into memory.
  • Mixtral’s Memory Appetite: In response to @ethux, @l0gr1thm1k confirmed having adequate memory across four T4 GPUs to handle Mixtral’s necessity of at least 26GB for 4-bit inference.
  • On-the-Ground Memory Usage Reports: @l0gr1thm1k reports that the GPU memory usage exceeds expectations just for loading the model, suggesting that actual usage may be higher than the anticipatory figures shared.
  • Quantization Efficiency Debate: @mrdragonfox recommends using exllamav2 for quantization over bnb 4 bit, questioning the use of accelerate in the context of memory efficiency.

Mistral ā–· #finetuning (3 messages):

  • Traditional metrics fall short for LLMs: @adrienbufort emphasized that BLEU and ROUGE are not useful for evaluating Language Large Models (LLMs) or instruction-tuned LLMs, as these metrics are traditionally used for translation performance assessment.

  • ā€œEloā€ for human-like LLM evaluation: @adrienbufort highlighted ā€œeloā€, a system mimicking chess rankings as being very close to human preference for LLM evaluation, available at arena.lmsys.org, although it requires human involvement.

  • Structured evaluations via MMLU and Alpaca: @adrienbufort pointed to multiple-choice questions, like the Massive Multitask Language Understanding (MMLU) benchmark (MMLU paper) for clear LLM performance measurement, and Alpaca eval (Alpaca GitHub) for using another LLM to evaluate the responses.

  • Normalized Alpaca Eval Announcement: @akshay_1 announced that a normalized version of Alpaca eval is now available on the market.


Mistral ā–· #showcase (1 messages):

  • AI browser queries with a twist: User @sublimatorniq showcased SoContextual.com, a tool for AI browser queries that include DOM node references. This works with MistralAI and was also featured on Hacker News.

Links mentioned:

no title found: no description found


Mistral ā–· #random (8 messagesšŸ”„):

  • Newcomer seeking RAG guidance: User @xrdg joined the chat asking for advice on how to structure prompts for RAG applications. Not much detail was provided about their specific use case.
  • DSPy for prompt optimization: @akshay_1 recommended using DSPy to optimize prompt structures, sparking a brief interaction with @xrdg.
  • Shoutout from Guatemala: In a follow-up message, @xrdg sends cheers from šŸ‡¬šŸ‡¹, but doesn’t provide any further discussion points.
  • Mistral prompt examples explored: @xrdg shared that they have been using langchain, chroma, and Mistral 7B and referred to a prompting guide. They provided a link that includes an overview and various resources related to Mistral 7B.
  • Optimizing RAG Stacks: @akshay_1 suggested that @xrdg’s current RAG stack can be further optimized and inquired whether the project was a hobby or in production, but no additional context was provided by @xrdg.

Links mentioned:

Prompt Engineering Guide: A Comprehensive Overview of Prompt Engineering


Mistral ā–· #la-plateforme (35 messagesšŸ”„):

  • Early Stopping Conundrum Continues: User @digitalphotographer is still facing issues with early stopping in their prompts, despite not using control tokens or special characters. They had previously provided notebooks with reproducible examples to Mistral’s team but have not received a response.

  • Monthly Usage Limit Bug Reported: Users @ewanhc, @ethux, and @fersingb reported a bug where the monthly usage limit on the billing page resets to 150 euros after an attempt to change it, even if the intention is to lower the limit. They have reported this issue to Mistral support via email.

  • API Hosting Inquiry Cleared Up: @loicboutet inquired about the hosting location of Mistral’s API and learnt that it’s hosted on Azure in Sweden, information which was found on the privacy page.

  • API’s ā€œmax_tokensā€ Bug Surfaced: @mrxavierx discovered and reported a bug where setting ā€œmax_tokensā€ to 1 causes a 500 internal server error instead of returning a single token response or a proper validation error. The issue was documented on Mistral’s GitHub repository (Issue #122).

Links mentioned:


Eleuther ā–· #general (29 messagesšŸ”„):

  • Searching for SAM Finetuning: @the_alt_man inquired about a codebase to fine-tune Meta’s SAM model, discovering it’s not included in the original code and mentioning the use of AutoGluon toolbox that employs Lightning but limits to GPU usage.

  • Federated Learning Feasibility Discussed: @elyxlz wondered about the feasibility of multinode training without infiniband and model merging steps. @stellaathena indicated experiments on island-like device training while @smerkyg pointed to a potential recent study, which @_r0n12 identified as the DiLoCo paper from an arXiv.org link.

  • Accessing the Pile Dataset: @sk5544 sought information on how to access the Pile dataset, getting directions from @stellaathena and a direct message offer from @elyxlz.

  • Finance Lawyers Described Analogously: @catboy_slim_ offered an analogy likening the role of lawyers in finance to combat medics, conveying their reactive position in fast-paced financial events.

  • Project Contributions and Dataset Creation Appeal: @pinconefish offered ML expertise, notably in CV, to contribute to existing projects, while @stellaathena and @wonkothesensible sparked an idea for a dataset focused on analog clocks displaying 10:10 to study out-of-domain generalization, flagging potential model collapses and active learning cases.

Links mentioned:

DiLoCo: Distributed Low-Communication Training of Language Models: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected acc…


Eleuther ā–· #research (125 messagesšŸ”„šŸ”„):

  • Byt5’s Efficiency Questioned: @main.ai mentioned that Byt5 shows byte-level transformers to be less efficient, sparking debate with @the_random_lurker around the fairness of comparing token and byte sequence lengths.
  • Trouble in Paper Acceptance Land: @stellaathena expressed confusion over the mysterious rejection of a paper with seemingly high review scores, implicating a meta-reviewer’s mistake. The discussion highlights the difficulty and lack of transparency in the paper appeal process within academic conferences.
  • Proxy Tuning Large Language Models: @digthatdata shared a link to a paper on proxy-tuning LLMs, an efficient alternative to traditional tuning which uses predictions of smaller models to direct larger base models, demonstrating significant performance gains.
  • Self-Rewarding LM Paper Critique: @thatspysaspy critiqued a paper on self-rewarding LMs for using stronger models like Claude 2 and Llama-2-chat during training, suggesting it diminishes the paper’s claims and could lead to misguided future research efforts.
  • Chess.com’s Fiction of a Chess AI Rival: @clockrelativity2003 shared a Chess.com article predicting the future of AI in chess in 2024. However, @alexanderrgriffing suggested that the article is written by GPT and casts doubt on its seriousness.

Links mentioned:


Eleuther ā–· #lm-thunderdome (2 messages):

  • Fix Integration Issue Acknowledged: @hailey_schoelkopf expressed readiness to merge a fix if necessary for an unspecified integration issue, highlighting surprise at the behavior and a desire to test it personally.
  • Adding Weights and Biases Support: @hailey_schoelkopf shared a GitHub pull request by @ayulockin which adds support for Weights and Biases to the lm-evaluation-harness. They are considering the optimal placement for the newly created wandb.py file within the project structure.

Links mentioned:

feat: Add Weights and Biases support by ayulockin Ā· Pull Request #1339 Ā· EleutherAI/lm-evaluation-harness: In #359 @parambharat did proposed to add support for W&B logging. However it was done before the big refactor that got in. As a user of both lm-evaluation-harness and wandb, I have opened this PR …


Eleuther ā–· #gpt-neox-dev (16 messagesšŸ”„):

  • Seeking Guidance on QLoRA Tuning: @kenakafrosty inquired about resources or info related to tuning GPT-NeoX 20B with QLoRA and mentioned having issues with loss not decreasing during training. @stellaathena clarified that the NeoX library doesn’t support QLoRA and suggested reaching out on GitHub for help with trl, transformers, and peft which @kenakafrosty is using.

  • pytest Issues with GPT-NeoX: @catboy_slim_ mentioned removing --forked from pytest and highlighted the need for a separate effort to get pytest running cleanly again for the project.

  • Tests Failing When Forked: @catboy_slim_ reported major updates to Python, PyTorch, and CUDA, and while able to run a basic model, expressed concern over the inability to manually validate every possible branch, indicating that tests need to work and created an issue on GitHub.

  • Testing Framework Discussions for Torch: @catboy_slim_ expressed doubts about existing testing frameworks adequately handling PyTorch code due to infrequent testing of such code by developers.

  • Project Collaborators on Validation and Compute Access: @tastybucketofrice is arranging compute access for collaborators, including @337128969059172353, to further test their changes to the project and extended an offer to @catboy_slim_ for compute access to assist in testing.

Links mentioned:

Tests fail when run with pytest —forked Ā· Issue #1132 Ā· EleutherAI/gpt-neox: Describe the bug When tests are run with pytest —forked per the instructions in /test/README.md, a large number of tests fail with the error: RuntimeError: Cannot re-initialize CUDA in forked subp…


LAION ā–· #general (59 messagesšŸ”„šŸ”„):

  • LAION dataset seeking: User @ppwwyyxx inquired about the laion2b-en aesthetics scores, as the initial dataset link provided was disabled. A response stated that the dataset access has been temporarily disabled on the dataset author’s request, with a recommendation to check the announcements for updates.
  • Voice Chat Interface Demo Unveiled: @jpcl_ announced a new demo of a complete voice chat interface, combining Whisper and WhisperSpeech with an open-source LLM, touting reduced latency for more natural conversations, and inviting collaboration to improve the system. They shared a link to the Hacker News announcement.
  • Image Captioning Strategies Discussed: Users @pseudoterminalx, @thejonasbrothers, and @limiteinductive shared approaches for image captioning with AI, with an emphasis on giving clear prompts to avoid hallucination and focusing on describing visible content only.
  • AI Tech Sprint Recruitment: @ninjaa2377 is looking for developers to join a team for the VA’s AI Tech Sprint to work on an ambitious project involving clinical encounter notes, offering potential prize money and prestige. Developers interested were directed to reach out via DM and visit the official challenge website.
  • Pirated US Channels Operate Freely?: @pseudoterminalx mentioned that local cable companies use pirated channels from the US without repercussions, claiming the government in their unspecified country isn’t influenced by foreign companies or bribery.

Links mentioned:


LAION ā–· #research (71 messagesšŸ”„šŸ”„):

  • Google’s Monetization Trouble: User @max_voltage discussed Google’s struggle to find financially significant business models beyond advertising over the past 15 years, pointing out that perhaps they ought to have taken a SaaS approach similar to OpenAI instead of fully integrating DeepMind into Google.
  • Cautious Optimism for Byte-Level Transformers: @marianbasti shared cautious optimism about byte-level transformers, referencing an arXiv paper, and @thejonasbrothers humorously noted that progress always seems ā€œone month away.ā€
  • Text-to-Image Diffusion and Identity Preservation Advances: @vrus0188 shared two GitHub links describing the latest advances: RPG-DiffusionMaster for text-to-image diffusion and InstantID for ID-preserving generation, and @chad_in_the_house confirmed its coolness.
  • Scaling-Up Image Restoration (SUPIR): @thejonasbrothers linked to an arXiv submission that introduces SUPIR and its capabilities for image restoration guided by textual prompts, highlighting the paper’s presence among top papers on Hacker News.
  • High Costs of AI Model Training Discussed: Users @vrus0188, @chad_in_the_house, @thejonasbrothers, and @limiteinductive engaged in a conversation about the substantial costs of training AI models like Stable Diffusion, though acknowledging that costs are expected to decrease with time and technological advancements.

Links mentioned:


Perplexity AI ā–· #general (87 messagesšŸ”„šŸ”„):

  • Perplexity Pro Features Clarified: In response to @winnie.zhao’s inquiry, @icelavaman and @mares1317 provided a link detailing Perplexity Pro’s features like unlimited Copilot queries, the ability to upload files for content exploration, and access to powerful AI models including GPT-4 and Claude 2.1.
  • Data Retention Concerns Voiced: Several users, led by @emisaurus_hex and @firesonwires, expressed confusion and concern over Perplexity’s data retention policy. Clarifications by @icelavaman, a presumed Perplexity expert, indicated that deleted threads are removed from servers after 30 days.
  • Questions on Using Perplexity’s Models: Users like @divyanshu0500, @odobostudio, @lukas8a, and others asked technical questions regarding JSON output from models, file upload limits, and the efficiency of models like Claude and GPT-4 for summarizing PDFs and academic work.
  • Understanding Account and Search Data Policies: The discussions about Perplexity’s privacy policy highlighted some ambiguity, prompting suggestions for clearer policy wording to avoid misinterpretation and to confirm if search data is indeed retained for the lifetime of an account.
  • Community Interaction and Technical Support: Members like @danielagmz888 and @icelavaman offered assistance on issues ranging from applying credit codes to addressing concerns. There was also a lighthearted exchange between @reflext and @sedierta about pro subscription costs and the performance of various models.

Links mentioned:

  • What data does Perplexity collect about me?: Explore Perplexity’s blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
  • What is Perplexity Pro?: Explore Perplexity’s blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
  • Perplexity - AI Companion): Ask anything while you browse
  • Perplexity Blog: Explore Perplexity’s blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.

Perplexity AI ā–· #sharing (4 messages):

  • Intersection of Search and AI: @jsudaniel highlighted the CEO’s connection to Google Search and OpenAI, noting that Perplexity AI serves as an intersection of these technologies. They shared a YouTube video titled ā€œI use Perplexity MORE than Google and ChatGPTā€ discussing the benefits of using Perplexity AI.

  • Perplexity Eases Smartsheet Learning Curve: @nicknalbach found that Perplexity AI provided efficient answers to problems encountered while transitioning from Excel to Smartsheet. Perplexity helped him overcome the steep learning curve where other resources provided scattered solutions.

  • Conceptual Aid for Astronomy Education: @coloradocomplex mentioned using Perplexity to help explain concepts in their astronomy class, showing the usefulness of Perplexity AI in education.

  • No additional info: A link was shared by @coloradocomplex, but no context or additional information regarding the content or purpose of the link was provided.

Links mentioned:

I use Perplexity MORE than Google and ChatGPT: Main Takaways From this Video: ā€œI use Perplexity more than ChatGPT, BARD, and Microsoft Copilots for five main reasons, including its use in content creation…


Perplexity AI ā–· #pplx-api (5 messages):

  • Web vs API Discrepancy in Code Output: @benhirap expressed that the website version of Perplexity AI produces much better code than the API version.
  • Seeking API and Labs Parity: @stijntratsaert_01927 inquired about the default parameters used by Perplexity AI labs as they are experiencing difficulty replicating lab results via the API.
  • Billing Issues Need Resolution: @aiagileguy reported an issue with being double charged and reached out to [email protected] for a refund of credits but has not received a resolution after more than 1-2 business days.
  • Assistance Requested for Support Concern: Following the billing issue, @aiagileguy is seeking help or pointers to expedite the refund process from Perplexity AI.

HuggingFace ā–· #announcements (3 messages):

  • Community Highlights - Social Exploration: @lunarflu praises the HuggingFace community for its focus on ML content and invites members to join the organization for early access to the ā€œPostsā€ feature. It’s highlighted as a less noisy alternative compared to Twitter or LinkedIn for people interested in AI & ML.

Links mentioned:

I launched my first competition !

Goal : Use AI toā€¦ā€](https://huggingface.co/posts/Tonic/783827682062088): no description found

Well, yes, if the models areā€¦ā€](https://huggingface.co/posts/vicgalle/320544784279721): no description found


HuggingFace ā–· #general (40 messagesšŸ”„):

  • Decoding the Training Dilemma: @asprtnl_50418 highlighted the sheer scale of resources needed for pretraining models, referencing the Llama-2-7b model, which required about 184k GPU hours on A100 GPUs. They also mentioned alternative cost-effective methods like fine-tuning and using LoRA/QLoRA adapters to lessen hardware demands.

  • Strategies for Training and Evaluation Split: Users @enka55 and @the_aureo discussed the challenge of splitting data into training and evaluation sets for LLM training, with suggestions including using pandas train_test_split with the stratify parameter, and supplementing with knowledge bases like RAG for topics not covered in training data.

  • Feature Extraction Fundamentals Explained: @vipitis clarified that feature extraction refers to sequence embedding via encoder-only models like BERT, with uses in tasks like clustering. They also directed users to the MTEB leaderboard for relevant models and metrics.

  • The Quest for Model Evaluation Insights: @green_eye expressed frustration over the lack of accessible qualitative assessments of models beyond benchmarks, seeking more human-readable reviews that detail where models excel or fall short.

  • Troubleshooting Model Loading Issues: @newincoding faced difficulties loading a model which @sebastian3079 diagnosed as potentially being due to hardware limitations, recommending at least 32GB of RAM for handling models with 40 billion parameters.

Links mentioned:


HuggingFace ā–· #today-im-learning (1 messages):

  • In Quest of a Data Set Evaluation Framework: User @rosebei3ngan3g expressed a need for frameworks to evaluate data sets specifically for large language models, highlighting the absence of such tools despite the availability of many frameworks for evaluating the models themselves. They questioned how data set evaluation should be approached without established frameworks.

HuggingFace ā–· #cool-finds (7 messages):

  • HuggingFace Datasets Under the Microscope: User @andysingal shared a GitHub project focusing on a large-scale analysis of dataset cards on HuggingFace. This project could be particularly insightful for anyone diving into dataset documentation in AI.
  • From Zero to ML Hero: User @pacificvoltage is exploring the basics of machine learning by reading the first chapter of ā€œUnderstanding Deep Learningā€ (udlbook), and marveled at the use of deepfake technology on a recent Machine Learning Street Talk interview with Noam Chomsky, which can be watched here on YouTube.
  • Binocular Vision on AI-Generated Text: @tea3200 introduced a paper from arXiv that presents Binoculars, a novel detector that claims to distinguish human from machine-generated text with over 90% accuracy, without requiring any training data or model-specific modifications.
  • SemEval2024 Shared Task Spotlight: User @vipitis mentioned a GitHub shared task for the SemEval2024-task8 competition, focused on multidomain, multimodel, and multilingual machine-generated text detection, potentially related to the ā€œBinocularsā€ approach just shared.
  • On the Flutter Wing with AI: @akindelemichael sparked interest with a new package for integrating ONNX models in Flutter apps, coinciding with a growing trend noted by @osanseviero for AI capabilities in Flutter, including a Flutter SDK for HuggingFace Inference APIs.

Links mentioned:


HuggingFace ā–· #i-made-this (12 messagesšŸ”„):

  • A Launchpad for Nemo Project on HuggingFace: @tonic_1 expressed enthusiasm for launching a Nemo model project and the idea of writing a detailed blog post on HuggingFace resonated well. @not_lain agreed, responding with a commitment to write a post as soon as possible.

  • WhisperSpeech Demo Hosted on HuggingFace: @tonic_1 introduced a WhisperSpeech demo on HuggingFace, which allows for multi-language text-to-speech and the creation of a voice print with a minimal audio input.

  • CheXRay Analysis in Development: @tonic_1 shared a link to CheXRay, a work-in-progress tool for analyzing Chest X-Rays, indicating active projects and development in medical imaging AI.

  • Community Blogpost Outreach by @lunarflu: @lunarflu reached out to @mateomd_dev suggesting that a community blog post could help increase reach for @mateomd_dev’s work, and provided a link to HuggingFace’s blog section. @mateomd_dev showed interest in refining their article for the HuggingFace audience.

  • Upcoming wav2vec2-bert Model Announcement: @yehors announced the pending publication of a wav2vec2-bert model based on the Common Voice 10 dataset, indicating the model is in the final stages of preparation.

Links mentioned:


HuggingFace ā–· #reading-group (3 messages):

  • Encouragement for Isamu: @lunarflu expressed support for a user named Isamu, telling them to take their time and including a heart emoji for emphasis.

  • Text-to-Video Model Lumiere Raises the Bar: @fishie22 discussed Google’s new Lumiere model, explaining its innovative use of a Space-Time UNET that maintains temporal consistency and can generate video at a notable 16fps for 80 frames. They provided a link to the research paper: Google’s Lumiere Research.

  • Positive Feedback on Medium Article Benchmarking: @starsupernova tweeted about a Medium article, praising its benchmarking as ā€œSuper greatā€ and adding a smiley face emoji to emphasize their positive feedback.

Links mentioned:

Lumiere: A Space-Time Diffusion Model for Video Generation: We introduce Lumiere — a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion — a pivotal challenge in video synthesis. To this end, we …


HuggingFace ā–· #diffusion-discussions (1 messages):

spikespiegel5112: How to load LoRA model in local?


HuggingFace ā–· #computer-vision (5 messages):

  • Brief Inquiry on LMM Architecture: besiktas asked about the design choices behind using idefics/flamingo resampler/cross-attention in the LMM currently in training instead of a simpler approach like linear projection or a pretrained vision encoder.

  • Gemini Pro Vision AI Introduced: ahmed3ibrahim discussed trying out the Swift API’s Gemini Pro Vision AI, mentioning its key features like handling multiple images in one request and providing a comprehensive API health report.

  • Curiosity About CVPR2024 Papers: iloveh8 was looking for a way to see all papers, both rejected and accepted, for CVPR2024 but did not receive a response.

Links mentioned:

Gemini Pro Vision AI API Documentation (swift-api-swift-api-default) | RapidAPI: no description found


HuggingFace ā–· #NLP (15 messagesšŸ”„):

  • TorToiSe Pronunciation Engine Acknowledged: @mr_nilq mentioned that TorToiSe offers the best quality in TTS but is slow, sharing a link to a modified version that is 5x faster.
  • Seeking Advice on Training AI for Q&A: User @ysk.dev is considering options for training AI on 10k Q&A pairs, debating between Amazon Lex and training VDS, and inquiring about the hardware specs needed for running a server with long answers.
  • Help Requested for Transformer ImportError: User @srovnbh faced an error importing TFTrainer from the transformers package and received suggestions to ensure the correct version is installed.
  • Talk on Trusting ā€˜Black Box’ Models: @vipitis shared a link to a talk about evaluating ā€œblack boxā€ models, questioning the trust in models when users can’t see behind the API.
  • Windows Compatibility Issue for Bits and Bytes: @kingpoki realized the reason for their issue was the lack of Windows support for an unnamed application or feature they referred to as bits and bytes.

Links mentioned:

talks.cam : Replicating and auditing black-box Language Models.: no description found


HuggingFace ā–· #diffusion-discussions (1 messages):

spikespiegel5112: How to load LoRA model in local?


HuggingFace ā–· #gradio-announcements (1 messages):

  • Gradio Hits 4.16 with Robust Features: @abidlabs announced the release of gradio 4.16 boasting major features such as native support for Polars Dataframe, a new Gallery component usable as an input, improved low-latency streaming for chatbots, and automatic documentation for custom components. This ā€œHUGE releaseā€ is detailed in their comprehensive changelog, available here.

Links mentioned:

gradio/CHANGELOG.md at main · gradio-app/gradio: Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work! - gradio-app/gradio


LlamaIndex ā–· #announcements (1 messages):

  • Webinar on LLMCompiler Incoming: @jerryjliu0 reminded everyone about a webinar happening in 10 minutes featuring the authors of the LLMCompiler paper, which details a framework for parallel function calls in agents. The framework, envisioned to boost performance and efficiency, can be explored in their paper (LLMCompiler Paper) and further resources like LlamaPack and a Notebook are available at their dedicated links.

Links mentioned:

LlamaIndex Webinar: Efficient Parallel Function Calling Agents with LLMCompiler Ā· Zoom Ā· Luma: LLMs are great at reasoning and taking actions. But previous frameworks for agentic reasoning (e.g. ReAct) were primarily focused on sequential reasoning, leading to higher…


LlamaIndex ā–· #blog (7 messages):

  • Slack Bot Tutorial Shared: IFTTT announced a new OSS repository with a step-by-step guide to build a Slack bot that can learn from conversations and answer organizational questions, written by @seldo. The bot is built on the @SlackHQ platform. Build your Slack bot.

  • Zilliz Cloud Pipeline Integrated with LlamaIndex: LlamaIndex highlighted their collaboration with @zilliz_universe on integrating the Zilliz Cloud Pipeline into LlamaIndex, enhancing retrieval services and multi-tenancy support. Check out the guest blog post.

  • LlamaIndex Supports New OpenAI Embedding Models: The LlamaIndex team has released version 0.9.38, which includes day 0 support for @OpenAI’s latest embedding models. For more details, see the release notes.

  • Good Prompting Out of the Box with LlamaIndex: IFTTT mentioned an often overlooked feature of LlamaIndex, emphasizing that it creates effective prompts by default, which can be customized if desired. Further insights available here.

  • LlamaIndex Now Available in TypeScript: Announcement from IFTTT that LlamaIndex.TS version 0.1.0 has been released, extending support for @OpenAI’s latest embeddings to TypeScript thanks to a quick contribution from @yi_ding. For TypeScript enthusiasts, visit LlamaIndex.TS 0.1.0 release.

  • Qdrant Engine Included in LlamaIndex.TS Release: The 0.1.0 version of LlamaIndex.TS also comes with added support for @qdrant_engine. The update was highlighted as a bonus feature in the TypeScript release. Check this feature out.

Links mentioned:


LlamaIndex ā–· #general (38 messagesšŸ”„):

  • LLM not available for LlamaIndex Text Inference Server: @cheesyfishes confirmed that LlamaIndex does not currently have a large language model (LLM) for the TextGenerationInference server but mentioned that the Langchain one works with a wrapper.
  • Configuring Chat Engine with similarity_top_k: In response to @richard1861, @whitefang_jr provided a Python code snippet to configure the similarity retrieval count of the chat engine in LlamaIndex, using similarity_top_k=5.
  • Retrieval Challenges in Domain-Specific Use Cases: @lancerninja and @cheesyfishes discussed a more complex retrieval scenario involving rephrasing questions using an LLM before executing another retrieval, aiming for improved performance but concerned about increased response times due to multiple steps.
  • Anticipating Integration with New OpenAI Embedding Models: @ayfri shared a link to OpenAI’s announcement about new embedding models and API updates. @cheesyfishes responded, hinting at upcoming support for these new features in LlamaIndex.
  • Customizing Prompts for Contextualized Responses in LlamaIndex: @shri_j asked about obtaining answers from OpenAI when the query information isn’t in the provided context. @cheesyfishes directed toward modifying default prompts to allow for such functionality, sharing a link to documentation.

Links mentioned:


LlamaIndex ā–· #ai-discussion (5 messages):

  • Exploring Zep’s Capabilities: User @yoursaviorjesus inquired if anyone has experience with Zep, pointing out its features like chat history memory and entity extraction. They provided a link to Zep’s documentation and various quick start guides: Zep Documentation.

  • Inquiring LlamaIndex’s Nature: @zeekg_46676 asked if LlamaIndex is a vector store or operates like Amazon Kendra which uses natural language search. @cheesyfishes clarified that LlamaIndex is more akin to Kendra and is versatile, capable of using any vector store or language model for various operations.

  • Demonstrating Self-Learning Knowledge Graph: @chiajy shared their work on a self-learning knowledge graph RAG workflow that features recursive retrieval, automated creation, and multi-hop reasoning, exemplified through a Harry Potter book demo. A detailed explanation and the ramifications of this knowledge graph can be found in their write-up: Harry Potter and the Self-Learning Knowledge Graph.

Links mentioned:


Latent Space ā–· #ai-general-chat (36 messagesšŸ”„):

  • No LLM Paper Club Recording: @kbal11 responded to @farrealdori with the information that sessions of the LLM Paper Club are not recorded to allow participants to share details about their work more freely, thus no replay is available.
  • Introducing Morpheus-1: @shivdinho shared a link to a tweet announcing Morpheus-1, described as the world’s first multi-modal generative ultrasonic transformer designed to induce and stabilize lucid dreams, and noted its innovative nature.
  • Go-Go-Labs Coding Sprint: @slono provided a link to a GitHub repo, showcasing that 5k lines of code were written in 4 days for yaml-custom-tags experiments, indicating swift progress towards project completion.
  • GPT-4 Turbo & Embedding Models Update: @dimfeld shared OpenAI’s announcement on the release of an updated GPT-4 Turbo preview model and new embedding models, while @swyxio linked notes on the matter from Twitter.
  • Martian’s LLM Leaderboard Launch: @cute_hamster_07119 announced the launch of Martian’s Model Router at https://leaderboard.withmartian.com/, a tool helping to evaluate various LLM inference products, with @coffeebean6887 and @fanahova discussing the documentation and open-source aspect of the project.

Links mentioned:


Latent Space ā–· #ai-event-announcements (1 messages):

  • LLM Paper Club Asia Launches: @ivanleomk announced the kickoff of the LLM Paper Club in Asia, starting with a discussion on the ā€œAttention Is All You Needā€ paper. Interested individuals can sign up for future notifications and access the event here.

Links mentioned:


Latent Space ā–· #llm-paper-club (8 messagesšŸ”„):

  • Asia Paper Club Timing: @ivanleomk thanked everyone for participating in today’s paper club and mentioned that next week’s discussion may cover Self-Rewarding Language Models. They are open to other paper suggestions and note that @796917146000424970 or they will cover it if there are no volunteers.
  • Beta Test Feedback Request: @aimuggle expresses gratitude for participation and is requesting feedback to improve the paper club, which is still in a beta phase.
  • Clarification on Self-Instruction: @stealthgnome inquired whether ā€œself-instructā€ is the input for ā€œself-reward,ā€ suggesting an interest in discussing the interplay between these concepts.
  • Upcoming US Paper Club Schedule: @ivanleomk asked about the scheduled paper for next week’s US paper club, and @eugeneyan provided the Pythia paper as the topic of discussion, listing the authors and their arXiv links.
  • Appreciation for Pythia Paper Info: @ivanleomk showed appreciation for the details @eugeneyan provided for the forthcoming discussion on the Pythia paper.

Links mentioned:

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling: How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16…


DiscoResearch ā–· #mixtral_implementation (2 messages):

  • Mergekit Guidance for Mixtral Training: @philipmay shared a GitHub issue comment from the author of mergekit that may inform the DiscoResearch mixtral training, questioning the finetuning process post-merging models with options like ā€œhiddenā€ or ā€œrandom.ā€
  • Auxiliary Loss Key for MoE Training: @bjoernp acknowledged the potential helpfulness of the shared mergekit information, stressing that getting the auxiliary loss right is crucial for MoE (Mixture of Experts) training.

Links mentioned:

Mixtral branch: What option should I choose when I want to do some finetuning after the merge? Ā· Issue #116 Ā· cg123/mergekit: The parameter description of ā€œhiddenā€ and ā€œrandomā€ does not exactly explain what to do when I want to finetune later. Is it even useful (possible) to finetune after merging with &q…


DiscoResearch ā–· #general (23 messagesšŸ”„):

  • Quality Data Filtering May Not Be King: @bjoernp shared a fascinating paper from arXiv which challenges the standard practice of filtering pretraining data for quality, suggesting that ā€œqualityā€ filtering doesn’t always correlate with improved model performance. The study proposes selecting data to maximize model performance on target tasks, avoiding biases of handpicked data quality notions. Read the abstract here.

  • Experimenting with Preference Signals for LLMs: User @hammadkhan suggested an experiment involving Supervised Fine-Tuning (SFT) where a prompt’s completion is changed from positive to negative, potentially influencing the learning of language models.

  • KTO: A Different Approach to Training Models: @bjoernp mentioned that Key Term Optimization (KTO) could be utilized for training models. It is likened to Direct Preference Optimization (DPO) but with binary signals, relating completions to being either desirable or undesirable.

  • Guidance on Using KTO with Datasets: In a detailed explanation, @hammadkhan outlined how the KTO loss can be maximized for model generation utility, contrasting it with DPO which requires preference-based paired data. Hugging Face’s TRL documentation and the paper by Rafailov et al., 2023, provide further context on the DPO Trainer and expected dataset formats. See the TRL documentation.

  • Binary Labels for Continual Model Updates: @hammadkhan brought up ContextualAI’s suggestion of Kahneman-Tversky Optimisation (KTO), which uses binary good or bad labels for model updates, simplifying the labelling process in production environments.

  • OpenAI Launches GPT4-Turbo and Reduces GPT3.5 Prices: @rasdani highlighted an announcement from @OfficialLoganK about OpenAI’s launching of GPT-4 Turbo, updates to GPT-3.5 Turbo, including significant price reductions, and new API features like scoped API keys and embedding dimension specifications. More details on OpenAI’s Blog.

Links mentioned:

  • DsDm: Model-Aware Dataset Selection with Datamodels: When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that int…
  • DPO Trainer: no description found
  • Tweet from Logan.GPT (@OfficialLoganK): Great news for @OpenAIDevs, we are launching: - Embedding V3 models (small & large) - Updated GPT-4 Turbo preview - Updated GPT-3.5 Turbo (*next week + with 50% price cut on Input tokens / 25% price …

DiscoResearch ā–· #embedding_dev (12 messagesšŸ”„):

  • Neue deutsche Jina Modellankündigung: @sebastian.bodza informierte über die bevorstehende Verƶffentlichung des deutschen Jina Modells ā€œjinaai/jina-embeddings-v2-base-deā€ auf hf. Dieses kƶnnte für Ranking-Zwecke hilfreich sein.

  • Exploring Question Generation with Mixtral: @sebastian.bodza teilte Beispiele für Fragegenerierung auf GitHub und erwƤhnte die Nutzung von Mixtral in 4 bit gptq with vllm für diese Aufgabe.

  • Community Collaborative Efforts: @bjoernp zeigte Interesse an @sebastian.bodzas Arbeit und bot Unterstützung an, insbesondere beim Generieren von positiven und schwierigen negativen Beispielen.

  • Neue OpenAI-Embedding-Modelle verƶffentlicht: @bjoernp wies auf die Verƶffentlichung neuer OpenAI-Embedding-Modelle hin, die auch eine verbesserte Mehrsprachigkeit aufweisen. Der Beitrag enthƤlt einen Link mit weiterführenden Informationen: Read more about it here.

  • Automatische Generierung von QualitƤtsdaten mit Genie: @bjoernp teilte einen Link zu einer Studie über das Genie-Verfahren zur automatischen Erstellung hochwertiger datenbasierte Inhalte, das mƶglicherweise nützliche Filtermechanismen enthƤlt.

Links mentioned:


DiscoResearch ā–· #discolm_german (6 messages):

  • Finetuning Success with DiscoLM German: @thomasrenkert reported success in finetuning DiscoLM German 7B v1 using unsloth, and is looking forward to DiscoLM German versions based on Mixtral-Instruct.
  • Middle High German Translation Data: In response to @hammadkhan’s inquiry, @thomasrenkert clarified that the finetuning was done on a custom dataset for translating Middle High German to Modern German.
  • Bjoernp Acknowledges DiscoLM Update: @bjoernp commended @thomasrenkert’s finetuning achievement with a brief message of approval.
  • Impressive Embeddings Efficiency Announced: @hammadkhan shared a tweet from @Nils_Reimers about upcoming embeddings that significantly outperform OpenAI’s on the MIRACL benchmark with only 256 dimensions, offering a potential 12x saving on vector database costs.

Links mentioned:

Tweet from Nils Reimers (@Nils_Reimers): @OttoZastrow @jerryjliu0 Yes, embeddings is a massive focus for us, with amazing launches upcoming. E.g. OpenAI 54.3 on MIRACL with 3072 dimensions versus our upcoming 256 dimensional-like model wit…


LLM Perf Enthusiasts AI ā–· #embeddings (2 messages):

  • OpenAI Unveils Next-Gen Embedding Models: User @potrock shared OpenAI’s announcement about new embedding models launch, GPT-4 Turbo and moderation models, tools for API usage management and upcoming reduced pricing on GPT-3.5 Turbo. The enhancements aim to refine developers’ control over API keys and provide insights into API usage.
  • Documentation for New Features Available: Accompanying the announcement, OpenAI has updated its documentation to guide users through the new embedding models and the updated GPT and moderation models. The documentation is a key resource for developers using these APIs.
  • Navigational Misstep in Message Posting: @shacrw noted a misdirection in the message posting, suggesting that the announcement should have been shared in a different channel, likely intended for a focused discussion on the new updates. The correct channel was indicated with a link.

Links mentioned:

New embedding models and API updates: We are launching a new generation of embedding models, new GPT-4 Turbo and moderation models, new API usage management tools, and soon, lower pricing on GPT-3.5 Turbo.


LLM Perf Enthusiasts AI ā–· #announcements (1 messages):

mat_mto: Thanks Jeff! love all the work you’re doing so far


LLM Perf Enthusiasts AI ā–· #openai (16 messagesšŸ”„):

  • OpenAI Unveils New Models and Lower Prices: @potrock shared a blog post announcing new embedding models, updates to GPT-4 Turbo and moderation models, addition of API management tools, and soon-to-come lower pricing on GPT-3.5 Turbo.
  • A Win for Embedding Efficiency: @potrock highlighted the benefits of the new shortened embeddings, while @res6969 expressed eagerness to upgrade their system to include the updated models, citing the unnecessary move to open-source embeddings given these improvements.
  • OpenAI: The Simple Solution for Shipping Features: @res6969 reflected on the ease of using OpenAI for quickly implementing features, compared to managing independent open-source models.
  • The Dilemma of Convenience vs. Community Models: While @potrock acknowledged the convenience of OpenAI’s solutions, he also pointed out the availability of many great open-source embedding models that allow for personal fine-tuning.
  • Economic Trade-Offs in Model Selection: @shacrw and @michelcarroll discussed the cost benefits of using OpenAI’s newer, larger embedding models with dimension shortening, mentioning storage savings and comparable API costs that could lead to overall reduced expenditure.

Links mentioned:

New embedding models and API updates: We are launching a new generation of embedding models, new GPT-4 Turbo and moderation models, new API usage management tools, and soon, lower pricing on GPT-3.5 Turbo.


LangChain AI ā–· #general (12 messagesšŸ”„):

  • Welcome to the AI Galaxy: @quarknova, a newcomer from ENS interning at INRIA, expressed interest in using LangChain for their projects and queried the community for tips, contemplating the use of the GitHub version over the commercial one.

  • Crafting AI Personalities: @jstansbe inquired about the possibility of creating custom AI personalities like an ā€œElon Musk AIā€ without relying on external AI APIs. @ksolo__ responded with a resource, suggesting that the process is known as finetuning, and provided a link to a course on deep learning.

  • Shoutout to LangChain’s Efficiency: @johnnucleus applauded the LangChain community for enabling the swift creation of a chatbot with web search capabilities using LangChain and Streamlit, expressing amazement at the efficiency and simplicity.

  • Generating Synthetic Data with LLMs: @rajib2189 is exploring the use of Large Language Models (LLMs) for synthesizing data to train traditional machine learning models, while @johnny2x2 shared how he employs LLMs for RAG generation to produce SQL queries from a context and schema.

  • Working with PARQUET in LangChain: @benjaminbascary sought assistance for manipulating PARQUET files in LangChain, leading to @johnny2x2 providing a code snippet showing how to import and use PARQUET files as document sources, using pandas for loading and the DataFrameLoader from LangChain.

Links mentioned:

Finetuning Large Language Models: no description found


LangChain AI ā–· #langserve (3 messages):

  • LangServe Agent Examples Promoted: User @veryboldbagel shared links to LangServe agent examples, including one not listed in the main examples section at the LangServe main examples and a specific example for a configurable agent executor.
  • Custom Agent Construction with LCEL: @veryboldbagel clarified that for defining custom tools, an off-the-shelf OpenAI tools agent suffices, and further instructed on constructing a custom agent with LCEL, recommending the LangGraph for defining custom agent runtime with more expressive power.
  • Stream Response Issues in LangServe: @hiranga.g reported an issue with not receiving a stream response while using the example agent_with_history and experimenting with RemoteRunnable from langchain.js; There was also a mention of a bug when using Agents with LangServe, suggesting that chain.streamLog() might be a workaround, which did not yield the expected results.

Links mentioned:


LangChain AI ā–· #share-your-work (2 messages):

  • Exploring SQL chain limitations: @johnny2x2 shared insights on handling SQL queries with LangChain for a manufacturing company’s order delays. They found that SQL Chain struggles with large databases, but creating curated views within the database with descriptive names improves performance.
  • Refinements lead to better query management: By embedding questions that would return a query within a custom multi-vector retriever, @johnny2x2 initially found the local AI ran examples too often—a challenge that was mitigated by using OpenAI to process SQL queries while keeping the data private with local LLM.
  • Enhanced chain workflow with tool-oriented queries: Now abandoning local AI for SQL code generation, @johnny2x2 adopts a new strategy where each query acts as a tool in their task processing chain, leading to improved results in their workflow which involves a sequence of generating tasks, processing tasks with SQL tools, and evaluating information for task loops.

Datasette - LLM (@SimonW) ā–· #llm (3 messages):

  • Impending LLM Release Upgrade Preview: @simonw announced plans to release an update for the LLM, which involves a significant upgrade to the openai library. Testers are invited with details provided in a GitHub comment.

  • Anticipating the 0.13 Milestone: More information about the forthcoming LLM release, labeled as 0.13 Milestone, can be found in the dedicated GitHub milestone page.

  • Request for Readline Issue Resolution: @simonw is seeking assistance for a readline issue within LLM, where arrow keys yield ANSI codes instead of cursor navigation, as described in this GitHub issue.

Links mentioned:


Skunkworks AI ā–· #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=wlPxEq_Mtkc


Skunkworks AI ā–· #bakklava-1 (1 messages):

arielnlee: Anyone working on bakklava-2?!