> AI Discords for 2/5/2024. We checked **20** guilds, **308** channels, and **5078** messages for you. Estimated reading time saved (at 200wpm): **418 minutes**.

The Chinese models (Yi, Deepseek, and Qwen, to a lesser extent Zhipu) have been quietly cooking up a storm. Qwen’s release this week claims strong performance vs Mistral and Llama2 equivalents:

image.png

with up to 32k token context. The technical report also discusses a number of evals made on multilingual, RAG, agent planning, and code generation capabilities. The Qwen team are also showing serious dedication to the downstream ecosystem, releasing with HF transformers compatibility and official AWQ/GPTQ 4/8bit quantized models.


Table of Contents

[TOC]

PART 1: High level Discord summaries

TheBloke Discord Summary

  • Quantization Quest for 70B LLM: An exploration of quantizing a 70B LLM on vast.ai was shared with suggestions like creating a large swap file and potentially using USB-attached SSDs to circumvent powerful GPU requirements.

  • GPTZero Faces Scrutiny: Debates over the effectiveness of AI content detection tools like GPTZero sparked discussions, highlighting its potential unreliability in detecting subtly augmented prompts.

  • Introducing Sparsetral: A new Sparse MoE model based on Mistral was introduced, emphasizing its efficient operation and selective weight application during forward passes, garnering interest and sparking inquiries into its training intricacies.

  • Merging Vs. Fine-Tuning Dilemma: There’s an ongoing debate on whether it’s more efficient to individually fine-tune models for separate datasets or to combine datasets and fine-tune a single model, with the community generally leaning towards the latter for coherence.

  • Knowledge Sharing in AI: Community members discussed a range of topics involving LLM performance and handling, including strategies to augment memory capabilities and sharing of problem-solving tactics, reinforcing the collaborative spirit within the guild.

  • Deep Dive into DPO and Character Generation: Insights were exchanged on merging adapters in the training process using Direct Preference Optimization (DPO), with a focus on role-playing character generation, as well as tactics to prevent overfitting in such models.

  • Eldritch ASCII Art Conversations: Attempts at creating ASCII art using various models prompted discussions on the evolving capabilities of language models in creative endeavors.

  • Enigmas of Model Merging: A desire to comprehend model merging led to sharing resources that delve into the tensor operations involved, accompanied by recommendations of tools like ComfyUI-DareMerge for the task.

  • Coding Discussions Span Character Memory and 3D Generation: Inquiries about setting up ChromeDB for character-specific long-term memory were seen alongside promotions of a text-to-3D gen AI project, solutions for OpenAI costs, and shared links to code-generating LLMs with detailed explanations.


Nous Research AI Discord Summary

  • Kanji Generation’s Complex Quest: The community discussed the challenges in training a model for Japanese Kanji generation, with @lorenzoroxyolo referencing a Stable Diffusion experiment for inspiration. The use of a controlnet was suggested by .ben.com as an alternative method for nuanced tasks such as this.

  • AI Scams Surface on Social Media: A surge in AI-related scams on platforms like Facebook prompted community members to discuss the importance of awareness and the detrimental impact fictional narratives can have on AI’s perception.

  • Meta’s Pioneering VR Prototypes at SIGGRAPH: Meta’s advancement in VR technology, specifically the development of new headset prototypes with retinal resolution and advanced light field passthrough, was presented at SIGGRAPH 2023 and shared by @nonameusr with relevant articles from Road to VR and a developer blog post.

  • Finetuning Frozen Networks and New AI Models: Discussions revolved around the effectiveness of fine-tuning normalization layers in frozen networks, as described in an arXiv paper, and the sharing of information on new models like bagel-7b-v0.4, DeepSeek-Math-7b-instruct, and Sparsetral-16x7B-v2 across various Hugging Face repositories, each with unique capabilities and suggested improvements.

  • Performance Review and Anticipated Release: The community scrutinized the performance of different models, including Qwen 1.5’s release which some found underwhelming compared to its predecessor. Additionally, an announcement was made about an unspecified release happening in 23.9 hours; fblgit unveiled a new model-similarity analysis tool on GitHub for community contribution.

  • Transformer Matrices and LLM Conversation Memory Debate: Practical engineering advice was shared, such as fusing QKV matrices in transformers for efficiency. Users also explored techniques for managing conversation history with LLMs, noting the potential use of langchain and the benefits of summarizing history or utilizing long-context models to navigate context size limitations. Concerns were raised over licensing changes between various Hermes datasets for commercial use.


Eleuther Discord Summary

  • Initiative to Build Foundational Models Kindles Interest: @pratikk10 seeks collaboration on creating foundational models, acknowledging diverse applications like text-to-text/image/video. However, @_3sphere highlights the prohibitive costs of such models, discussing the matter in the context of the newly released Qwen1.5, a 72B parameter chat model detailed in their blog post and repository.

  • Grapple with Interpreting LLMs: Debates on the effectiveness of interpretability in large language models (LLMs) ensue, drawing parallels to the human genome project and questioning the relationship between interpretability and intelligence. There’s a critical assessment of AGI claims and model capabilities, particularly the authenticity of performance on benchmarks such as the MMLU.

  • Scaling Law Reconnaissance: @stellaathena discusses possible efficiency improvements in scaling laws research, referencing Kaplan et al. and Hoffman et al. with interest in reducing the necessity for numerous runs. Contributions from @clashluke and others ponder the application of PCGrad in multi-task loss handling, the role of hypernetworks in generating LoRA weights, and the use of varying activation functions like polynomials, substantiated by a neural-style Python file and facebookresearch’s code.

  • Bootstrapping Pooled Variance for Rigorous Model Evaluation: @hailey_schoelkopf updates the lm-evaluation-harness to use pooled variance for standard error, an optimally chosen approach over combined variance as detailed here, prompting @stellaathena to recommend preserving both methods with expert-use warnings.

  • Discern Vector Semantics in Activation Functions: @digthatdata and @norabelrose dissect the conception of vectors as directions, operators, and Euclidean representations in the context of deep learning, complemented by the introduction of [model-similarities](https://github.com/fblgit/model-similarity), a tool for layer-wise parameter space analysis of different models.

  • Remedying LM Pre-training with Paraphrased Web Data: New initiatives like Web Rephrase Augmented Pre-training (WRAP), documented in an arxiv paper, aim to augment large model pre-training by improving data quality, with @elliottdyson suggesting a comparative study to gauge WRAP’s advantage over mere fine-tuning.


HuggingFace Discord Summary

  • AI Journey from Novice to Pro: @dykyi_vladk discussed their ML learning curve, focusing on specific models and techniques, while @lunarflu highlighted the importance of building demos and sharing them as a vital step to advance as an ML professional.

  • A100G Launch May Cause Server Hiccups: Server downtime raised questions about its relation to the A100G launch; @lunarflu offered to escalate the issue. For full AI model utilization, @meatfucker recommended distributing tasks across multiple GPUs.

  • Experts in Computer Vision Sought: @danielsamuel131 invited computer vision specialists to share their expertise with the community.

  • Papillon Flaunts NER & Sentiment Tool: An NER and sentiment analysis tool developed by @8i8__papillon__8i8d1tyr, based on Flair and FLERT, was shared; find it on GitHub.

  • LLaMA-VID Debuts for Long Videos: The LLaMA-VID model, designed to support hour-long videos, was introduced by @tonic_1. However, user concerns over empty model cards and lack of details may hinder usage. An arXiv paper on cost-efficient LLM strategies was also shared by @jessjess84.

  • Boosting Conversational AI with Fine-Tuning: @joeyzero seeks resources on conversational datasets for fine-tuning a chatbot. Meanwhile, @denisjannot struggled with fine-tuning Mistral 7b for YAML generation and eyed the Instruqt model for improvement. @meatfucker recommended few-shot learning techniques for making precise YAML modifications.

  • Ankush’s Finetuning Mastery: Ankush Singal’s finetuned model, based on previous work from OpenPipe, earned community kudos. The model is available at Hugging Face.

  • Schedule Flex for Reading Group: @ericauld may need to postpone a planned talk due to jury duty, with @chad_in_the_house expressing support. Keep an eye on the events calendar for potential changes.

  • The Meme Highlighting AI Progress: @typoilu presented an article detailing AI advancements through the lens of a meme in the “Mamba Series LLM Enlightenment,” but community engagement on the content remains to be seen. Read the article here.


LM Studio Discord Summary

  • Model Selection Mayhem: Amidst numerous models, @hades____ sought advice on picking the right one, with suggestions pointing to use-case focus over a universal solution.
  • DeepSeek-Math-7B Launches: DeepSeek-Math-7B was unveiled by @czkoko as the latest entry in the model arena, catered specifically to mathematical problem-solving and research, and available on GitHub.
  • LM Studio Polishes Performance: LM Studio v0.2.13 brings new features such as Qwen 1.5 support, pinning models and chats, and quality of life updates, downloadable at LM Studio’s website with open-sourced Qwen1.5 models on Hugging Face.
  • Hardware Discourse Heats Up: Operating system compatibility, GPU utilization, and speed optimizations in token generation dominated discussions, as users shared experiences with LMStudio on different hardware configurations; recommendations included using Ubuntu 22 and quantization methods, as illustrated in comparisons on YouTube.
  • Beta and Feature Beckoning: A beta preview of LM Studio v0.2.13 dropped inviting feedback, while users appealed for a model leaderboard in-app but meanwhile can reference rankings on Hugging Face Leaderboard and OpenRouter; a GUI for server access and improved chat interfaces were hot topics.
  • RAG System Question Creation Quest: @varelaseb sought techniques for generating questions in a RAG system, inquiring about Tuna without much background info available on it.

Mistral Discord Summary

  • LangChain’s Uncertain Future: Discord users expressed concerns over the longevity of LangChain’s utility, suggesting a need for stabilization after a week. Meanwhile, the deterministic nature of Mistral 8x7B faced scrutiny, establishing its probabilistic behavior adjustable by a temperature parameter, yet no definitive resolution was reached on the deterministic inquiry.

  • Mistral’s Emoji-Terminating Quirk: A peculiar terminating behavior was observed in Mistral models, where responses ended with the “stop” finish_reason but still contained an emoji. The issue was noted across all three Mistral API models, signaling a potential area for debugging or insight into response construction.

  • AI’s Philosophical Conundrum: Philosophical implications of LLMs were deliberated, with participants promoting a deeper understanding of AI’s foundational principles. This discussion underscores the evolving complexity of AI impacts on broader intellectual fields.

  • Prompt Precision Practices: Discussion around refining prompts to improve accuracy was active, with the conversion of a PHP class to a JSON schema being one method shared. However, methods for synthetic dataset generation, despite being a hot topic, remained close-guarded due to its value as a source of income.

  • Fine-Tuning Pad Pains: Concerns about padding during fine-tuning were voiced, pointing out issues with models not generating end-of-sentence tokens using the common tokenizer.pad = tokenizer.eos practice. This suggests a need for optimized fine-tuning approaches to enhance model performance.

  • Lean Chatbots and Starry Success: A Discord chatbot capable of multi-user interaction and supporting multiple LLMs, including Mistral, made headlines with its lean 200-line code and functionality like vision support and streamed responses, attracting attention with over 69 GitHub stars. GitHub - jakobdylanc/discord-llm-chatbot

  • Flags and Fast GIFs: Among lighter interactions, users shared flag emojis and humorous GIFs, hinting at a casual and engaging community dynamic. Specifically mentioned was a Sanic the Hedgehog GIF from Tenor, celebrated for its humor in the context of a language settings discussion. Sanic The Hedgehob GIF - Tenor


CUDA MODE Discord Summary

  • Enthusiastic Engineering for Dual GPU Builds: Community member @morgangiraud is assembling a dual build with 2 x 4070 TI SUPER GPUs and considering VRAM trade-offs between newer and older cards. They shared their build costing 4k in total through PCPartPicker.

  • Library Launch to Lighten LLM Load: @andreaskoepf highlighted FlashInfer, an open-source library aimed at boosting performance of LLM serving, by optimizing Self-Attention and other key operations.

  • Precision Predicament Perplexes PyTorch Programmer: @zippika encountered inaccuracies with dequantize and linear operations in PyTorch and sought the cause, speculating it may be rounding issues or related to disabled C++ flags like "__CUDA_NO_HALF_OPERATORS__".

  • GPU Kernel Conundrums in JAX: @stefangliga introduced Pallas, an experimental JAX extension for writing custom GPU/TPU kernels, while @nshepperd shared insights on using pure CUDA kernels within JAX. @marvelousmit inquired about methods to print Triton kernels and the code JAX executes for kernel profiling.

  • Recorded Resourcefulness for CUDA Coders: Mark Saroufim assured users that despite technical delays, Lecture 4’s recording has been uploaded to YouTube, promising HD quality shortly after.

  • Fast.ai Favoritism Flows Freely: Users @einstein5744 and @joseph_en signal satisfaction with fast.ai’s educational resources, particularly regarding a course on diffusion models and the DiffEdit paper.

  • Weighing Words of Wisdom for March 9th Meet-up: @jku100 is tentatively set to speak on March 9th about their work with the torch.compile optimizations, which has shown promise in AI acceleration techniques.

  • Preparing for PMPP’s Fifth Lecture: @jeremyhoward scheduled lecture 5 for the weekend, linking the Discord event, while @lancerts queried about the ‘swizzled order’ concept discussed in a PyTorch blog and its coverage in the PMPP book.


OpenAI Discord Summary

  • ChatGPT File Upload Fiasco: Users, including @sherlyleta, @guti_310, and @lugui, have brought up ongoing issues with the ChatGPT file upload feature, which has been glitchy since the previous week. @lugui mentioned that resolution is on the horizon.
  • Firmware Fix Frenzy in Manufacturer Talk: Debates heated on manufacturer responsibility for technical issues, with @aipythonista advocating for firmware updates as solutions instead of relying on content like Louis Rossman’s, citing potential brand bias.
  • Mistral GPT Alternatives Gather Steam: Amidst talks of local GPT-3.5 instances and the infeasibility pointed out by @elektronisade, @riaty recommended the open-source Mistral 8x7b for homelab diagnostics.
  • Trademark Tangles Trouble GPT Customizer: @zurbinjo faced trademark obstacles when naming a GPT, with clarification from @solbus about the prohibition due to OpenAI’s branding guidelines.
  • Perfecting PDF Presentations to AI: A brief exchange prompted by @wazzldorr explored whether AIs perform better processing PDFs or extracted text for scientific articles, with @lugui assuring AI’s capability to handle PDFs effectively.

LangChain AI Discord Summary

  • RunPod and BGE-M3 Catch Engineers’ Attention: In the general channel, @lhc1921 highlighted RunPod for competitively priced GPU nodes and introduced the BGE-M3 multi-functional embedding model, providing the GitHub repository and research paper. @kapa.ai detailed using OpenAIEmbedder with LangChain, citing LangChain’s JavaScript documentation and Python documentation.

  • Bearer Token and Setup Woes in LangServe Discussions: The langserve channel saw a sharing of tips on AzureGPT.setHeader with bearer token by @veryboldbagel, referencing Configurable Runnables documentation and APIHandler examples. @gitmaxd provided a guide for Hosted LangServe setup, while @lucas_89226 and @veryboldbagel engaged in troubleshooting discussions, with a suggestion to use LangServe GitHub discussions page for further help.

  • Showcasing Innovations and Job Openings in Share-Your-Work Channel: @siddish introduced AI Form Roast by WorkHack on Product Hunt for online form optimization, and @shving90 highlighted TweetStorm Express Flow for crafting tweets via a Twitter post. The Dewy knowledge base for RAG applications was presented by @kerinin with a blog post, while @hinayoka announced job opportunities in a crypto project. @felixv3785 showcased a Backlink Outreach Message Generator tool for SEO.


Latent Space Discord Summary

  • AI Team Formation Tactics: Discussions about setting up an internal AI engineering team suggested starting solo to showcase value before scaling up. Eugene Yan’s articles on real-time ML and team configurations were recommended, illustrating various organizational tactics including centralization and embedding data scientists into product teams.

  • DSPy Series Simplified: A video series on DSPy prompted requests for a more digestible explanation, indicating the community’s readiness to collaborate on grasping its concepts. The DSPy explained video was shared as a starting point for those interested in learning more.

  • GPT-4, the Procrastinator?: GPT-4’s perceived initial laziness became a topic of amusement, supported by Sam Altman’s tweets and several Reddit discussions that ultimately confirmed GPT-4 should now be “much less lazy,” according to the shared community feedback.

  • Philosophical Digital Library Envisioned: The concept of a digital library with AI philosophical agents led to suggestions of leveraging tools like Botpress and WorkAdventure for development, indicating an interest in merging philosophical discourse with AI technology.

  • Technical Setup Exchanges: Engineers like @ashpreetbedi shared their technical setups, which involve tools such as PyCharm and ScreenStudio, reflecting a shared interest in the practical aspects of engineering environments and tooling.


LAION Discord Summary

Call for Collaboration in Foundational Models: A discussion initiated by @pratikk10 invites interested parties to contribute to the creation of foundational models across different media, including text, image, and video, seeking exchanges with serious creators.

Bias Watch in Reinforcement Learning: RLHF’s introduction of significant biases is debated, with @pseudoterminalx and @astropulse noting the potentially counterproductive effect on base model development, while also observing a distinctive style in Midjourney’s images potentially rooted in such biases.

Tackling Textual Bias in Pixart: Conversations reveal challenges in unlearning textual biases from datasets, specifically version 5.1 of pixart. Critique is directed at the use of the JourneyDB dataset, with suggestions to find more robust alternatives for unbiased text modalities.

Innovative Reading of Ancient Texts: The Vesuvius Challenge 2023 Grand Prize announcement highlighted a successful method for reading 2000-year-old scrolls without unrolling them, using a TimeSformer model and a particle accelerator, although at a high cost of $40,000 per scroll.

Chinese Machine Learning Thrives Despite Restrictions: Discussions ponder the success of Chinese ML entities in light of GPU restrictions, noting their preemptive procurement of NVIDIA’s H100s and A100s before restrictions came into play, questioning the overall impact on technological progress.

Critique of Hugging Face’s OWLSAM: @SegmentationFault shared and commented on the performance of OWLSAM in a Hugging Face Space, indicating that the model lacked coverage in visual representation and accuracy in object detection.


LlamaIndex Discord Summary

  • LlamaIndex Gears Up for Big Release: A significant release of LlamaIndex is due this week, with cleanups signaling an important update for users planning to upgrade their LlamaIndex systems.

  • Boosting Multi-modal Applications on MacBooks: LlamaIndex’s recent integration allows building multi-modal applications on a MacBook, enhancing image reasoning. The related announcement and developments were shared in a tweet.

  • Home AI Triumphs at Hackathon with PDF Search Innovation: Home AI’s unique implementation of a RAG-powered search engine for home filtering won Best Use of PDF Parser at an in-person hackathon, details of which can be found in this tweet.

  • Hackathon Spurs LlamaIndex Enhancement: The hackathon saw participation from nearly 200 people, offering feedback for the LlamaIndex team, and a resource guide catering to developers was circulated.

  • Building Engineer-Conscious Chatbots: One discussion focused on creating a chatbot for engineers to interact with standards documents using LlamaIndex, supported by a GitHub project at GitHub - imartinez/privateGPT.

  • Navigating Vector Search Challenges: Users discussed methods to improve vector search results with Qdrant, sharing insights into embedding and score analysis, and highlighted the usage of TypeScript code example for generating and comparing embeddings with Ollama.


OpenAccess AI Collective (axolotl) Discord Summary

  • New Player in Town - Qwen1.5: The OpenAccess AI community is abuzz with the release of Qwen1.5, promising higher performance with quantized versions. The release was accompanied by a comprehensive blog post and various development resources. However, some users already see room for improvement, noting the absence of a 30b model and the need for including standard deviation in benchmarks to account for noise.

  • GCP’s Competitive Edge: GCP is extending olive branches to enterprise customers with A100 instances available at $1.5-2 per hour on demand. This rate is significantly lower than what non-enterprise customers pay, spotlighting strategies employed by cloud providers to manage their ecosystems of users and resellers, including spotlight deals like GCP’s L4 spot price at $0.2 per hour.

  • Bridging the Quantization Gap: Within the Axolotl development community, there’s discussion around Hugging Face’s suggestion to quantize before merging models. @dreamgen suggests that this approach could benefit performance, sparking conversation about the potential of Bayesian optimization within the Axolotl framework and reports of a smoother implementation process.

  • Axolotl’s Growing Pains: Axolotl users are reporting installation woes, noting dependency conflicts, specifically with torch and xformers. A suggested fix involves using torch 2.1.2. Also, there’s a call to simplify YAML configurations in Axolotl, indicating that ease of use and accessibility are high on the developer wishlist, along with creating a Hugging Face Spaces UI for a more beginner-friendly setup.

  • Crickets in RLHF: A lone message in the #rlhf channel, seemingly directed to a specific user, asks for configurations related to zephyer, leaving much to the readers’ imagination regarding context or importance, and offers too little to chew on for the tech-hungry engineer audience.


Perplexity AI Discord Summary

  • Pro Payment Problems: Users like @officialjulian experienced payment issues with the Pro upgrade—funds were deducted without service activation, suggesting an error with Stripe’s payment system, while @yuki.ueda faced unresponsive customer support over billing inquiries. It was recommended to reach out to [email protected] for assistance.

  • AI Ethics in Education Debated: @worriedhobbiton highlighted the need for AI to offer unbiased and culturally sensitive support, especially in educational contexts like Pasco High School, reflecting the challenges in serving diverse student populations.

  • Mismatched AI Research Responses: User experiences like @byerk_enjoyer_sociology_enjoyer’s underscore the limitations of AI in delivering relevant search outcomes, as shown in the shared Perplexity AI search result, which did not match the research query about source validation.

  • Speedy Summary Solutions Sought: @sid.jjj expressed the need to improve API response times when generating summaries, noting that current processes take about 10 seconds for three parallel links, underlining a performance benchmark concern for AI engineers.

  • Discrepancies Detected in AI Usage: Concerns were raised by @jbruvoll about inconsistencies between interactive and API use of Perplexity AI, directing to a specific Discord message for further detail, which highlights the importance of aligning AI behavior across different interfaces.


DiscoResearch Discord Summary

  • Danish Delight in Model Merging: A Danish language model utilizing the dare_ties merge method achieved 2nd place on the Mainland Scandinavian NLG leaderboard, introduced by @johannhartmann and detailed here.

  • Merge Models Minus Massive Machines: @sebastian.bodza noted that LeoLM models can be merged without GPUs, highlighting alternatives like Google Colab for performing model merging tasks.

  • German Giant - Wiedervereinigung-7b-dpo-laser: @johannhartmann unveiled a 7b parameter German model combining top German models, named Wiedervereinigung-7b-dpo-laser, which has high MT-Bench-DE scores.

  • Merging Models More than a Score Game: The conversation between @johannhartmann and @bjoernp suggested an improvement in actual use-cases, like chat functions, after merging models, beyond just achieving high scores.

  • Code Cross-Language Searchability Soars: Jina AI released new code embeddings that support English and 30 programming languages with an impressive sequence length of 8192, as shared by @sebastian.bodza. The model is available on Hugging Face and is optimized for use via Jina AI’s Embedding API.


Alignment Lab AI Discord Summary

  • Taming Llama2’s Training Loss: An engineer faced an unexpected training loss curve with LLama2 using SFT which might be due to a high learning rate. A peer recommended switching to Axolotl and provided specific configuration examples that suggest increasing the learning rate.

  • Ankr Networking for Node Collaboration: @anastasia_ankr reached out to discuss node infrastructure with the team and was directed to contact @748528982034612226 for further dialogue.

  • Community Engagement: Users @xterthy and @aslawliet kept the community active with brief greetings, contributing to a friendly atmosphere.

  • Awaiting Direct Communications: @mizzy_1100 flagged a direct message for @748528982034612226’s attention, indicating important pending communications.

  • Celebrating Collaborative Spirit: @rusch compared the Discord server to an “amazing discordian circus,” highlighting its dynamic and entertaining nature for knowledge sharing and innovation.


Datasette - LLM (@SimonW) Discord Summary

  • Audacity Echoes with Intel’s AI: Audacity’s integration of Intel’s AI tools adds powerful local features: noise suppression, transcription, music separation, and experimental music generation, presenting a challenge to expensive subscription services.

  • Thesis Enhancement with LLM: On LLM integration, @kiloton seeks advice for handling PDFs and web searches, and whether chat histories can be transferred and stored across different models.

  • SQL, Simplified with Hugging Face: @dbreunig points to the integration potential of llm with Hugging Face’s transformers, highlighting the Natural-SQL-7B model for its advanced Text-to-SQL capabilities and deep comprehension of complex questions.


LLM Perf Enthusiasts AI Discord Summary

  • Qwen1.5 Debuts with Open Source Fanfare: Qwen1.5 has been introduced and open-sourced, offering base and chat models across six sizes. Resources shared by @potrock include a blog post, GitHub repository, a presence on Hugging Face, Modelscope, a user-friendly demo, and an invitation to join the Qwen Discord community.
  • Efficiency Breakthrough with Qwen1.5: The 0.5B Qwen1.5 model has shown promise by exhibiting performance on par with the much larger Llama 7B model, signaling a new wave in model efficiency optimization as shared by @potrock.

PART 2: Detailed by-Channel summaries and links

TheBloke ▷ #general (1293 messages🔥🔥🔥):

  • Model Quantization Guidance: @xmrig explored quantizing the 70B LLM model on vast.ai due to local resource limitations. Advice offered included creating a large swap file and potentially using external USB-attached SSDs, avoiding the need for a powerful GPU during the quantization process (@spottyluck, @rtyax, @stoop poops).

  • GPTZero Analysis: Users debated the effectiveness of AI content detection tools such as GPTZero, with suggestions that it may not reliably detect more subtly augmented prompts and that it’s seen as a student’s tool, making it far from product-ready (@mr.userbox020, .meathead, @kaltcit, @righthandofdoom, @itsme9316).

  • Sparsetral, a New Sparse MoE Model: @morpheus.sandmann shared a Sparse MoE model based on Mistral, underscoring efficient operation on high-end hardware. The sparse model uses only a portion of weights during forward passes, applying adapters selectively through a router, which sparked the community’s interest but also raised questions on the intricacies of its training and functionality (@netrve, @itsme9316).

  • Merging vs. Single Model Fine-tuning: @givan_002 inquired about the efficiency of fine-tuning separate models for different datasets versus fine-tuning one model on a combined dataset. The consensus leaned towards using one comprehensive set for coherence and optimization (@amogus2432).

  • Community Assistance with LLM Tasks: Users @kaltcit, @potatooff, and others discussed various topics from LLM performance to practical advice on using LLMs and related technologies, like swapping VRAM to augment memory, showcasing collaborative problem-solving and knowledge sharing within the community.

Links mentioned:


TheBloke ▷ #characters-roleplay-stories (457 messages🔥🔥🔥):

  • Dynamic Adapter Merging in Training: @jondurbin recommends merging the adapter from SFT only after DPO, rather than before, continuing with the adapter from SFT throughout the process. Insights came while discussing the Direct Preference Optimization (DPO) Trainer as detailed in the TRL documentation.

  • Discussions on DPO and Adapter Loading: @dreamgen shares information that the DPO Trainer expects a specific dataset format, citing examples from the Anthropic/hh-rlhf dataset. Issues such as the attribute name train conflicting with the latest transformers version are addressed by @jondurbin.

  • CharGen v2 - A Model for Roleplaying Creatives: @kalomaze unveils CharGen v2, a model designed to generate character descriptions for role playing, featured on Hugging Face with a live version available here. The model creates character descriptions in a dialogue format, generating one field at a time to allow for partial re-rolls and reduce repetition.

  • Fine-Tuning Role Play Models with Diverse Data: Users discuss strategies for preventing overfitting in role play models, suggesting a mix of RP data with varied datasets like The Pile or MiniPile at the start of each epoch (@kalomaze). @stoop poops and @flail_. exchange views on enhancement tactics like incorporating assistant data with RP data to avoid dumb outputs.

  • Eldritch ASCII Art Endeavors: @c.gato and others experiment with generating ASCII art using various models like Mixtral, Miqu, and GPT-4. The conversation showcases attempts at creating simple ASCII art with varying degrees of success, highlighting the limited but improving abilities of language models in this creative task.

Links mentioned:


TheBloke ▷ #training-and-fine-tuning (2 messages):

  • Fine-tuning Llama2-7b-chat with website content: @gabrielelanzafame inquired about the possibility of fine-tuning Llama2-7b-chat using text scraped from a website. They want to train the model to generate copy in the brand’s tone or evaluate the brand tone in given copy.
  • Strategies for Fine-tuning with Multiple Datasets: @givan_002 asked whether it is more effective to fine-tune a separate model for each individual dataset—airoboros, hermes, limarp—and then merge them, or to combine all datasets and fine-tune a single model.

TheBloke ▷ #model-merging (3 messages):

  • Seeking Wisdom on Model Merging: @noobmaster29 expressed a desire to understand model merging at a deeper level, beyond just demo notebooks, seeking sources for reading or video explanations.
  • Diving Deep into Model Merging: @maldevide referenced their own gist as a thorough breakdown of model merging at the tensor operation level, aimed at providing a deeper understanding.
  • Tool Suggestion for Model Merging: @maldevide suggested using ComfyUI-DareMerge, a tool that facilitates model merging for SD1.5 and SDXL, as a convenient resource already present in their notebook.

Links mentioned:

GitHub - 54rt1n/ComfyUI-DareMerge: ComfyUI powertools for SD1.5 and SDXL model merging: ComfyUI powertools for SD1.5 and SDXL model merging - GitHub - 54rt1n/ComfyUI-DareMerge: ComfyUI powertools for SD1.5 and SDXL model merging


TheBloke ▷ #coding (4 messages):

  • Creating Character Memory with ChromeDB: @vishnu_86081 inquired about implementing long-term memory for each character in a chatbot app by using ChromeDB. They’re currently using ooba web UI API for text generation and MongoDB for storing messages, and they seek guidance on setting up ChromeDB to separate messages of each character.

  • Seeking Shared Links for neThing.xyz: @rawwerks requested the re-sharing of links for their text-to-3D gen AI project called neThing.xyz, expressing concerns over their OpenAI costs while offering free user trials.

  • Code-13B and Code-33B Links Reshared: @london shared links to Code-13B and Code-33B, two Large Language Models (LLMs) trained to generate code with detailed explanations, available on the Hugging Face platform. These models were trained using the datasets Python-Code-23k-ShareGPT and Code-74k-ShareGPT, with the former taking 42 hours and the latter 6 days & 5 hours to train.

Links mentioned:


Nous Research AI ▷ #off-topic (28 messages🔥):

  • Quest for Kanji Mastery: User @lorenzoroxyolo expressed frustration with their attempts to train a model on Japanese Kanji generation, referring to a Stable Diffusion experiment by @hardmaru for inspiration. User .ben.com recommended considering a controlnet instead of stable diffusion for nuanced tasks like rendering complex images.

  • Model Training Challenges Discussed: Amidst the concerns about unsatisfactory kanji generation models, .ben.com suggested @lorenzoroxyolo could learn from the IDS repository to understand the structure of characters better and potentially optimize model training.

  • AI Breeding Game Theory Emerges: @bananawalnut69 speculated about an “AI breeding” game to generate child models using a method akin to the GAN approach, with @Error.PDF responding that reinforcement learning might align with the concept.

  • Scams Awareness Raised: Discussion about the prevalence of AI-related scams on Facebook led @Error.PDF to lament the misinformation and shallow perceptions fostered by fictional narratives, which results in references to Skynet or WALL·E in serious AI talks.

  • Meta Pushes VR Boundaries: @nonameusr shared a link to a Road to VR article and an accompanying developer blog post about Meta’s new VR headset prototypes with retinal resolution and advanced light field passthrough capability presented at SIGGRAPH 2023.

Links mentioned:


  • Boosting Frozen Networks via Fine-Tuning: @euclaise discussed the potential of fine-tuning only the normalization layers of frozen networks, hinting it could be a promising approach as an alternative to LoRA. This concept is based on the findings from a recent arXiv paper.

  • New Flavor of Bagel Unleashed: @nonameusr shared a link to the non-DPO version of the Mistral-7b model fine-tuned, known as bagel-7b-v0.4 on Hugging Face. It’s reported that this version is better for roleplay usage, and the model card outlines compute details and data sources, with a DPO variant expected soon.

  • DeepSeek Unveils Math-Savvy Model: @metaldragon01 introduced the DeepSeek-Math-7b-instruct model, alongside links to use cases and a paper detailing the model’s capabilities for mathematical reasoning using chain-of-thought prompts. Model details can be found on Hugging Face.

  • Sparse Modeling with Sparsetral: @mister_poodle provided a link to Sparsetral-16x7B-v2, a model trained with QLoRA and MoE adapters. The Hugging Face model card supplies key information on training, prompt format, and usage.

  • Sparsetral Ramblings on Reddit: @dreamgen pointed to a Reddit post about Sparsetral, a sparse MoE model derived from Mistral, alongside several links to resources and papers. The discussion also suggests improving Sparsetral by initializing experts from Mixtral, with the model available on Hugging Face.

Links mentioned:


Nous Research AI ▷ #general (514 messages🔥🔥🔥):

  • IPO Performance Revisited: @teknium commented on IPO paper recommendations improving IPO, but @dreamgen noted DPO still outperforms IPO in tests atop open hermes. @teknium acknowledged outdated information and a missed update.
  • Quantization Sensitivity Discussed: @dreamgen highlighted that DPO is sensitive to beta settings, potentially problematic for users unable to run extensive beta sweeps. @teknium replied with Hermes Mixtral sensitivity to beta, indicating the issue is broader.
  • Upcoming Release Anticipation: @main.ai announced that 23.9 hours remain until an unspecified release based on AoE time, linking to a tweet for confirmation.
  • Introduction of model-similarity Tool by fblgit: @fblgit presented a new tool for analyzing model similarities, capable of understanding weight differences and parameter alignment between various models. The tool is open for contributions on GitHub.
  • Quality Concerns over Qwen 1.5 Release: Amidst the Qwen 1.5 release, @nonameusr expressed underwhelmed sentiment, pointing to minimal benchmark improvements over Qwen 1. Meanwhile, @euclaise and @metaldragon01 discussed the prospects of smaller Qwen models and a potential Qwen-Miqu model merge.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (36 messages🔥):

  • Fusing QKV Matrices Optimizes Performance: @carsonpoole clarified to @sherlockzoozoo that fusing the Query, Key, and Value (QKV) matrices in transformer models is mathematically identical but slightly faster, as it reduces the number of operations and memory loads required.
  • Understanding LLMs for Conversation Memory: In a conversation initiated by @lushaiagency, @4biddden mentioned using langchain to handle conversation history with LLMs, while .ben.com and @samuel.stevens discussed issues around context size and history breakdown, suggesting that summarizing history or using long-context models could be solutions.
  • Licensing Questions on Hermes 2.5 Dataset: @tculler91 expressed concerns about the licensing change between the OpenHermes and Hermes 2.5 datasets, looking for clarification for commercial use.
  • Configuring Special Tokens for OpenHermes: @gabriel_syme sought advice on token configurations for fine-tuning an OpenHermes model, and @teknium advised including `“

Eleuther ▷ #general (363 messages🔥🔥):

  • Creating Foundational Models Discussion: User @pratikk10 expressed an interest in connecting with anyone considering creating their own foundational models for various applications, including text-to-text/image/video.
  • High Cost of Foundational Models Highlighted: In response to @pratikk10, user @_3sphere pointed out the expensive nature of developing foundational models.
  • Qwen1.5 Model Release and Details: User @johnryan465 shared links to Qwen1.5, a 72B parameter chat model, including its introduction, repositories, and demos (Qwen1.5 Blog Post, Qwen GitHub, Hugging Face Space).
  • Interpreting Large Language Models: Discussions spanning various users including @rami4400, @_3sphere, and @fern.bear debated the effectiveness and potential of interpretability in neural networks like LLMs, with comparisons to the human genome project and skepticism about whether interpretability aligns with the nature of intelligence.
  • Concerns About AGI Claims and Model Capabilities: Skepticism was expressed by @fern.bear, @vara2096, and @worthlesshobo regarding the performance claims of some models, like Qwen 1 and 2 on the MMLU benchmark, and the potential of overfitting or “cheating” on test sets.

Links mentioned:


Eleuther ▷ #research (61 messages🔥🔥):

  • Clarification on Multi-task Loss Handling: @clashluke discussed PCGrad’s role in improving estimates over manual per-gradient reweighting. They raised concerns about magnitude preservation in gradients using this method, referencing the official code from facebookresearch/encodec.

  • Scaling Studies Query: @stellaathena initiated a discussion on whether a post-hoc correlation could convert Kaplan et al.-style scaling laws study to Hoffman et al.-style, potentially reducing the number of training runs needed. They shared a link to the related Twitter conversation and blog post for further exploration.

  • Hypernetworks for LoRA Weights: Dialogue between @.rend, @thatspysaspy, and others covered the idea of using hypernetworks to generate LoRA weights for pretrained language models tailored to specific contexts. An issue on davisyoshida/lorax detailed a related interest, and @thatspysaspy offered a code sample for experimentation.

  • Discussing CNN Training Methodology: @jstephencorey voiced confusion about a video’s explanation of CNN training, sparking debate on how layers learn and the effectiveness of pruning. Users like @Hawk and @xylthixlm discussed the accuracy of the video’s claims, although opinions differed on the technicalities.

  • Polyquant Activation Function Debate: In a conversation about substituting traditional activation functions with alternatives like polynomials, @clashluke and others scrutinized a novel architecture’s performance on ImageNet without typical activation functions. @fern.bear questioned the nonlinearity of the proposed product function, while @clashluke highlighted its optimization potential.

Links mentioned:


Eleuther ▷ #scaling-laws (2 messages):

  • Exploring the Tensor Programs Issue: @lucaslingle explained that @.johnnysands’ mention of “wrong init” refers to the findings from Tensor Programs 4 and 5 papers, which show that the “standard parameterization” can cause infinite logits when the model width increases indefinitely.

Eleuther ▷ #interpretability-general (10 messages🔥):

  • Simple Model Analysis Tool Launched: @fblgit introduced model-similarities, a tool for computing per-layer cosine similarities in parameter space of different models.
  • Comparison with CCA/CKA: @xa9ax inquired about the insights offered by the model-similarity tool compared to Canonical Correlation Analysis (CCA) or Centered Kernel Alignment (CKA), and @digthatdata clarified that the tool specifically contrasts the parameter space.
  • Directions in Vector Space Explained: @norabelrose and @pinconefish discussed the notion of a “direction,” and @norabelrose clarified it’s a 1D subspace with orientation in vector space rather than just any unit vector.
  • Understanding Vectors Beyond Coordinates: @digthatdata outlined that in deep learning, a vector is understood as both a direction and a magnitude, not just a position in space, and further elaborated on how cosine similarity measures the angle between two vectors.
  • Vectors as Directions and Operators: @digthatdata continued to explain that a vector can function as an “operator,” illustrating with the vectors representing KING, QUEEN, MAN, and WOMAN, how the difference vector z between WOMAN and MAN acts semantically to alter the meaning from KING to QUEEN.

Links mentioned:

GitHub - fblgit/model-similarity: Simple Model Similarities Analysis: Simple Model Similarities Analysis. Contribute to fblgit/model-similarity development by creating an account on GitHub.


Eleuther ▷ #lm-thunderdome (16 messages🔥):

  • Statistics Snafu Solved: @hailey_schoelkopf clarified that bootstrapping for standard error on MMLU matches the pooled variance formula, not the combined variance. The proper formula for pooled variance is now selected for use in their project over the current combined variance formula.

  • Statistical Dragons Ahead: @stellaathena suggested keeping both the old and new statistical formulas in their codebase for expert use, with a humorous warning comment added by @hailey_schoelkopf: # here there be dragons.

  • Harnessing Language Model Evaluation: @jbdel. prepared a clean fork containing updates for the lm-evaluation-harness in anticipation of a meeting, with all changes available in a commit on GitHub. The updated harness allows evaluation of language models with specific arguments and task settings.

Links mentioned:


Eleuther ▷ #gpt-neox-dev (3 messages):

  • Seeking Pre-tokenized Validation/Test Datasets: @pietrolesci expressed dissatisfaction with the current validation set created using the pile-uncopyrighted dataset from Hugging Face. They inquired about the availability of pre-tokenized validation/test splits in a manner akin to the pre-tokenized training set on Hugging Face.

  • Introducing Web Rephrase Augmented Pre-training (WRAP): @elliottdyson shared an arxiv paper proposing WRAP, a method to improve large language model pre-training by paraphrasing web data into higher quality formats, which could potentially reduce compute and data requirements.

  • Comparing WRAP Efficacy to Fine-Tuning: @elliottdyson pondered if using WRAP would be more beneficial compared to just fine-tuning models on the same data and suggested that a comparative study on data processed with and without WRAP followed by fine-tuning could shed some light on its efficacy.

Links mentioned:

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling: Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such data requires an abundance o…


HuggingFace ▷ #general (321 messages🔥🔥):

  • AI Training and Career Progression Tips: User @dykyi_vladk shared their learning journey in ML over the past year mentioning specific models and techniques they’ve studied. @lunarflu recommended building demos and sharing them as a next step to becoming a professional.

  • Server Troubles Amid A100G Launch: @lolskt reported the server being down for nearly 12 hours and queried if it was related to the A100G launch. After a discussion, @lunarflu directed to send details to email and offered to forward the issue to the team.

  • Modifying Model Responses: User @tmo97 sought advice on prompting models to stop giving warnings. @lunarflu suggested techniques to modify prompts and discussed the balance of safety and user control in AI responses.

  • HuggingFace Fellowship & Model Uploading Queries: @not_lain answered multiple questions about using the HuggingFace platform, including the process to join the fellowship program and details about uploading custom pipeline instances and models.

  • Inference Performance and Hardware Utilization: User @prod.nova inquired why their 4 RTX A5000 GPUs weren’t being utilized to their full potential during generation. @meatfucker clarified that inference usually utilizes a single GPU and suggested running an instance on each card to distribute the task.

Links mentioned:


HuggingFace ▷ #today-im-learning (1 messages):

  • Pyannote Praised for Performance: @marc.casals.salvador expressed admiration for Pyannote, a tool for speaker diarization, citing its excellent performance.
  • Introduction of Diarizationlm: @marc.casals.salvador brought to attention Diarizationlm, a module that simultaneously trains Automatic Speech Recognition (ASR) and diarization to improve annotations by correcting diarizations through the speech itself.

HuggingFace ▷ #cool-finds (5 messages):

  • Empty Model Cards on HuggingFace: @tonic_1 pointed out that model cards are empty on HuggingFace, which can be a hindrance for users seeking model information.

  • Interactive Gradio Demo Lacks Visibility: @tonic_1 shared excitement about a cool gradio demo but expressed concerns over the lack of details provided for this potentially overlooked model.

  • Innovative LLaMA-VID Model Introduced: A model card for LLaMA-VID was shared by @tonic_1, detailing an open-source chatbot that supports hour-long videos by using an extra context token. The model and more information can be found here.

  • DV Lab’s Under-the-Radar Model Hosted on HuggingFace: @tonic_1 mentioned discovering another compelling but underappreciated model by DV Lab on HuggingFace, expressing hope to be able to serve it.

  • Exploring Cost-Efficient LLM Usage: @jessjess84 shared an arXiv paper describing research on cost-efficient strategies for inference and fine-tuning of large language models (LLMs), including distributed solutions leveraging consumer-grade networks.

Links mentioned:


HuggingFace ▷ #i-made-this (3 messages):

  • Ankush’s Finetuning Feat: User @andysingal announced a finetuned model developed by Ankush Singal. The model is an optimization based on a previous model from OpenPipe and comes with an installation and usage guide.

  • Community Applause for New Finetuned Model: @osanseviero congratulated @andysingal for the creation of the new finetuned model, hailing it as very cool and fiery with a 🔥 emoji.

Links mentioned:

Andyrasika/mistral-ft-optimized-dpo · Hugging Face: no description found


HuggingFace ▷ #reading-group (3 messages):

  • Discover a Memetastic AI Milestone: User typoilu shared an article titled “A Meme’s Glimpse into the Pinnacle of Artificial Intelligence (AI) Progress in a Mamba Series LLM Enlightenment” claiming it contains insightful information about AI progress. There was no further discussion on the content. Read Here

  • Scheduled Talk May Need Rescheduling: @ericauld might have jury duty on Friday, suggesting postponing the planned talk to Friday the 16th. An adjustment in the event calendar may be needed.

  • Chad Brings Support with a Dash of Luck: In response to the potential rescheduling, @chad_in_the_house showed understanding and wished @ericauld good luck with jury duty if it happens.

Links mentioned:

no title found: no description found


HuggingFace ▷ #computer-vision (1 messages):

  • Invitation to Share Computer Vision Expertise: User @danielsamuel131 has made an open call for those with expertise in computer vision to come forward and share their knowledge. Interested individuals are encouraged to drop him a direct message.

HuggingFace ▷ #NLP (11 messages🔥):

  • Fine-Tuning Frenzy: @joeyzero is looking to fine-tune a chatbot with noisy old data and seeks resources on conversational data sets and tools for managing data sets easily. The user is open to suggestions including datasets ready for conversational use, robust editing tools, and any tutorials or tips for such NLP projects.

  • Papillon’s NLP Tool Shared: @8i8__papillon__8i8d1tyr shared a tool for Named Entity Recognition (NER) and Sentiment Analysis based on Flair and FLERT with a link to the GitHub repository.

  • YAML Fine-Tuning Challenges: @denisjannot experienced issues with fine-tuning Mistral 7b for YAML generation, where unintended parts of YAML are modified upon a second request to alter specific parts.

  • Trajectory for YAML Fine-Tuning: @denisjannot mentions plans to train the Instruqt model to see if it improves the YAML modification issue. There’s a request for ideas to enhance the fine-tuning process without including modification examples in the training dataset.

  • Few-Shot Learning Suggestion: @meatfucker advised including examples of the desired modification in the prompt to induce one-shot or few-shot learning, which should work well even if the examples aren’t in the training dataset. This is intended to guide the model for YAML modifications.

Links mentioned:

bootcupboard/flair/SentimentalNERD.py at main · CodeAKrome/bootcupboard: It’s bigger on the inside than the outside! Contribute to CodeAKrome/bootcupboard development by creating an account on GitHub.


LM Studio ▷ #💬-general (103 messages🔥🔥):

  • Confusion in Choosing the Right Model: User @hades____ expressed feeling overwhelmed by the hundreds of available models, looking for guidance on how to select the best one for their needs. Suggestions pointed to resources and the idea of focusing on specific use-cases rather than seeking a “one size fits all” model.

  • Model Compatibility Queries: Various users, including @dicerx and @foxwear, inquired about connecting LM Studio with specific applications and models. The conversation involved potential compatibility with iOS apps and multimodal models, seeking clarity on integration possibilities.

  • Resource and Template Requests: Users like @Jonatan, @funapple, and @ayelwen requested resources for code examples, prompt templates, and explanations of model differences, highlighting a community need for easily accessible and straightforward documentation.

  • Technical Assistance Sought for LM Studio: Participants @plaraje, @perkelson, @ts9718, and @joelthebuilder asked for technical help with issues ranging from prompt generation quirks to software functionalities in LM Studio, such as changes in zoom behavior and server connection guidance.

  • Jokes and Light-Hearted Comments Flow: Amidst technical discussions, users like @sica.rios, @rugg0064, and @wildcat_aurora infused humor into the conversation, joking about AI’s capabilities and making light of language misunderstandings while interacting with the AI models.

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (3 messages):

  • Brief Mention of Channel Reference: @egalitaristen mentioned a channel with the code <#1167546635098804284>, but provided no context or further details.
  • Inquiry About a Model Usage: @delfi_r_88002 inquired if anyone is using the model llava-v1.6-34b.Q4_K_M.gguf, but did not offer further information or context.
  • New Model DeepSeek-Math-7B Released: @czkoko announced the release of DeepSeek’s new model, DeepSeek-Math-7B. The model can be found on GitHub with the corresponding metadata provided from the page.

Links mentioned:

GitHub - deepseek-ai/DeepSeek-Math: Contribute to deepseek-ai/DeepSeek-Math development by creating an account on GitHub.


LM Studio ▷ #announcements (1 messages):

  • LM Studio v0.2.13 Drops with Qwen 1.5: LM Studio announced @yagilb the release of LM Studio v0.2.13, featuring support for Qwen 1.5 across a range of model sizes (0.5B, 1.8B, 4B, 7B, 72B). Users can download the new version directly from https://lmstudio.ai or update through the app.

  • Pin Your Favorites in LM Studio: The new LM Studio update allows users to pin models and chats to the top of their lists, making favorite tools more accessible.

  • Qwen’s New Models Now Open Source: Qwen1.5 models have been released and open-sourced, with sizes ranging from 0.5B to 72B, including base, chat, AWQ, GPTQ, GGUF models available on various platforms including Hugging Face and LM Studio.

  • Integration and Accessibility Enhancements: Qwen1.5 features quality improvements and integrates into Hugging Face transformers, removing the need for trust_remote_code. With APIs offered on DashScope and Together, the recommended model to try is Qwen1.5-72B-chat on https://api.together.xyz/.

  • Performance Tuning in LM Studio App: The latest LM Studio release eliminates subprocesses used to measure CPU and RAM on Windows, leading to improvements in the app’s performance.

Links mentioned:


LM Studio ▷ #🧠-feedback (9 messages🔥):

  • Shoutout for LM Studio’s Simplicity: @drawless111 shared a link to a Medium blog post praising LM Studio for allowing anyone to experience the power of large language models with a simple user interface and no technical expertise needed.
  • Feature Request Acknowledgment: @heyitsyorkie responded to @foobar8553’s call for a resume feature by pointing to a current feature request and asked to show support for it.
  • Appreciation for LM Studio App: @gli7ch.com expressed gratitude towards LM Studio app developers for making work with LLMs easier, especially in terms of integrating them with automation tasks.
  • Inquiry About Investment Opportunities: @ahakobyan. asked about investment opportunities which led to a witty exchange between @fabguy and @ptable, both humorously claiming they’d accept money for their fabricated and competing “foundations.”
  • Questioning Odd Zoom Shortcuts: @perkelson questioned the logic behind LM Studio using non-standard zoom shortcuts, differing from those commonly used in browsers.

Links mentioned:


LM Studio ▷ #🎛-hardware-discussion (40 messages🔥):

  • Troubleshooting LMStudio on Linux: @heyitsyorkie advised @aswarp to use the latest Ubuntu 22 to avoid glibc errors when running LMStudio, adding that the Linux build still has compatibility issues with older Ubuntu versions.
  • GPU Utilization Mastery for Chatbots: @heyitsyorkie reassured @shylor that near-full utilization of GPU memory without spilling into shared RAM is good, as too much use of shared RAM can degrade performance over time with LLM chatbot interactions.
  • Speeding Up Token Generation: @roscopeko discussed ways to reduce the time to the first token, mentioning a 6-second delay with a powerful setup; members like @aswarp and @alastair9776 suggested different approaches including picking another model, quantization, or trying out the AVX beta.
  • Shadow PC Machinery Under the Microscope: @goldensun3ds shared a YouTube video detailing a comparison of running LLMs on a Shadow PC compared to a powerful local PC setup, noting the Shadow PC’s slower performances in loading models.
  • Grappling with Hardware Resources & Model Configurations: @robert.bou.infinite detailed available GPU configurations for running LMStudio across various Nvidia offerings in data centers, outlining the implications of NVLINK technology and Kubernetes Pods compatibility, suggesting that multiple GPU setups can be leveraged effectively for LLM workloads.

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (14 messages🔥):

  • New Beta Update Rolls Out: @yagilb announces LM Studio version 0.2.13 Preview - Build V3, featuring the ability to pin models and chats and performance improvements. The update is available for download with Mac and Windows links provided, and feedback is encouraged through a specified Discord channel. Download here.

  • Leaderboard for Best LLMs Request: @kyucilow requests a leaderboard tab for the best LLMs to make selection easier, @minorello suggests referencing that in a certain channel, while @heyitsyorkie and @re__x provide links to external resources featuring LLM rankings. Hugging Face LLM Leaderboard and OpenRouter Rankings.

  • GUI for Server Access Desired: @_jayross expresses a wish for a web server GUI for remote access of the LM Studio server component; @goldensun3ds offers a workaround using Parsec for remote access and inquires about Intel ARC GPU support in LM Studio.

  • Chat Interface Issues and Error Reporting: @wolfspyre encounters a potential issue where, after ejecting a model in LMS’ chat interface and modifying settings, the chat system seems to stop responding. The problem is being investigated for bugs or unexpected behavior.

  • Dependency Installation Tips Shared: @greg0403 suggests installing the blast library to address a potential issue, recommending the use of sudo apt-get install ncbi-blast+.

Links mentioned:


LM Studio ▷ #autogen (1 messages):

lowkey9920: Try autogen studio . It’s two commands to get started with a ui


LM Studio ▷ #langchain (1 messages):

  • Inquiry into Methods for Question Creation: @varelaseb expressed interest in improving a RAG system and asked for advice on methods for question creation over a dataset. Specifically, they mentioned a lack of information on Tuna despite hearing good things about it.

Mistral ▷ #general (118 messages🔥🔥):

  • LangChain Lamentations: Discord users @akshay_1 and @mrdragonfox express dissatisfaction with LangChain, with the former suggesting its utility may only last a week before components need solidifying.
  • Search for Old Mac Compatible Interpreters: @zhiyyang seeks a model interpreter for Mac OSX versions under 11, with suggestions ensuing but no specific solutions provided.
  • Mistral 8x7B Determinism Discussed: @zaragatungabumbagumba_59827 queries about the deterministic nature of Mixtral 8x7B during inference, and @mrdragonfox explains that the behavior is probabilistic, influenced by an adjustable temperature parameter.
  • Philosophical Perspectives on AI: A philosophy student @zaragatungabumbagumba_59827 inquires into the philosophical implications of LLMs, receiving recommendations from @mrdragonfox and others to explore fundamental AI concepts.
  • Synthetic Data Generation Secrets Stay Secret: In a discussion about dataset generation, @mrdragonfox mentions the utility of GitHub repositories like airoboros and augmentoolkit, but declines to share specific methods, highlighting synthetic data generation as a crucial income source.

Links mentioned:


Mistral ▷ #deployment (6 messages):

  • Prompt Enhancement Queries: @drprimeg1 inquired about the additional information @gbourdin includes in prompts to improve accuracy.
  • PHP to JSON Schema for Better Prompts: @gbourdin described their method of converting a PHP class to a JSON schema to refine prompts and offered to share the code privately.
  • Code Sharing Offer Accepted: @drprimeg1 expressed interest in seeing @gbourdin’s code structure for their prompts, prompting a private message with the details from @gbourdin.

Mistral ▷ #finetuning (1 messages):

  • Padding Dilemmas in Fine-Tuning: User @ramin2024 is seeking advice on how to properly set padding times for fine-tuning, mentioning that the common practice of setting tokenizer.pad = tokenizer.eos sometimes results in the fine-tuned model not generating an end-of-sentence (eos) token. They shared that they and others have encountered this issue.

Mistral ▷ #showcase (1 messages):

  • Discord Chatbot Hits Star Milestone: User @jakobdylanc announced their Discord chatbot now has over 69 stars on GitHub. The bot features support for multiple LLMs such as Mistral and offers multi-user chat, vision support, streamed responses, all in just 200 lines of code check out the repository.

Links mentioned:

GitHub - jakobdylanc/discord-llm-chatbot: Supports OpenAI, Mistral, ollama, oobagooba and more • Multi-user chat • Vision support • Streamed responses • 200 lines of code 🔥: Supports OpenAI, Mistral, ollama, oobagooba and more • Multi-user chat • Vision support • Streamed responses • 200 lines of code 🔥 - GitHub - jakobdylanc/discord-llm-chatbot: Supports OpenAI, Mistr…


Mistral ▷ #random (5 messages):

  • Expression through Flags: @jujuderp posted an emoji of the North Korean flag 🇰🇵, while @matmatgamer shared the emoji of the French flag 🇫🇷.
  • Shoutout to a Fire Playlist: @jakobdylanc gave a shoutout to <@421429282804465666> for playing some great tunes, describing it as bumping fire.
  • The Need for Speed: @bam4d responded with a sense of urgency, replying with gotta go fast.
  • Fast and Furry-ous: @bam4d shared a humorous GIF from Tenor featuring a quick, parody version of Sonic the Hedgehog known as Sanic, along with a note about language settings on Tenor’s site.

Links mentioned:

Sanic The Hedgehob GIF - Sanic The Hedgehob Running - Discover & Share GIFs: Click to view the GIF


Mistral ▷ #la-plateforme (2 messages):

  • Emoji Terminator Causes Unexpected Behavior: @jakobdylanc identified a peculiar issue where Mistral models terminate with finish_reason as “stop” but still include an emoji in the content of the response. They provided a detailed snippet of the chat response, highlighting this anomaly across all three Mistral API models.

CUDA MODE ▷ #general (29 messages🔥):

  • Enthusiasm for Dual GPU Builds: @morgangiraud is excited about assembling a new dual build with 2 x 4070 TI SUPER to gain experience with dual GPU training. They also consider the trade-off between more VRAM with older models like the 3090 and the desire for brand new cards.

  • Choosing the Right Components: Community members, including @__boatbuilder__ and @morgangiraud, are discussing the advantages of using PCPartPicker and resources like the /r/selfhosted and r/buildapc subreddits to assist in building their setups.

  • European vs US Pricing: @morgangiraud shares that there weren’t significant deals available for their PC components, attributing it to higher prices in Europe, and reveals a total cost of 4k for the setup. A complete parts list was provided on PCPartPicker.

  • Multi-GPU Setup Considerations: @jeremyhoward and _tvi_ weigh in on the multi-GPU conversation, discussing that a good motherboard can overcome the lack of P2P communication in consumer cards and the strategic advantage of upgrading with a second big card later.

  • Accelerating LLM Serving: @andreaskoepf highlights FlashInfer, an open-source library designed to improve the performance of Large Language Model serving, which targets optimizing Self-Attention and other transformative operations critical to LLMs.

Links mentioned:


CUDA MODE ▷ #cuda (27 messages🔥):

  • Floating Point Precision Troubles: @zippika reports inaccuracies when they dequantize and then apply an nn.Linear operation in PyTorch, finding that results differ significantly from expected fp16/fp32 calculations. The dequantize function reportedly works well, but subsequent operations lead to errors, and this is suspected to be connected to rounding differences between fp32 and fp64.

  • CUDA Synchronization or C++ Flag Issues?: In their debugging process, @zippika contemplates whether CUDA synchronization issues or the disabling of standard PyTorch C++ flags might be causing unexpected behavior when performing linear operations after dequantization. The mentioned disabled flags include "__CUDA_NO_HALF_OPERATORS__", "__CUDA_NO_HALF_CONVERSIONS__", "__CUDA_NO_HALF2_OPERATORS__", and "__CUDA_NO_BFLOAT16_CONVERSIONS__".

  • Quantization Error Monologue: @zippika expresses frustration over their implementation details in a CUDA .cu file, trying to understand why the function dequantizeBlockwise_impl_stream results in significant numerical discrepancies when used as a precursor to matrix multiplication operations. They suggest that perhaps transposing the weight matrix or a mistaken switch of the K/N dimension parameters might be contributing to the issue.

  • Dequantize Function Under Scrutiny: Debugging efforts by @zippika focus on the qlinear_impl function, where the dequantizeBlockwise_impl_stream function is used. @zippika notes that the dequantize function appears to function perfectly until utilized within their custom linear operation, after which the results diverge sharply from expectations.

  • Stability Across Functions Questioned: @zippika details differences between two PyTorch functions, mm_normal and mm_qlinear, with the former producing stable and expected results, while the latter is faster but appears unstable and is suspected to contain synchronization issues or other errors. The user shares snippets of the contrasting functions in search of feedback on the discrepancies.

Links mentioned:

Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors: Tensor Cores have been an important unit to accelerate Fused Matrix Multiplication Accumulation (MMA) in all NVIDIA GPUs since Volta Architecture. To program Tensor Cores, users have to use either leg…


CUDA MODE ▷ #torch (18 messages🔥):

  • Calling All AI Innovators: @vim410 shared an exciting opportunity for developers to participate in NVIDIA’s Generative AI RTX Developer Contest to win a GeForce RTX 4090 GPU and other prizes. The contest encourages building innovative AI projects using NVIDIA’s technologies.

  • Optimizing GPTQ with Toro and Triton: @jku100 discussed enhancing gptq performance using torch.compile and domain knowledge, leading to a kernel that matches customized CUDA performance. They also provided a benchmarking discussion and expressed that their work shows the potential of combining Torch and Triton tools.

  • Invitation to Share Innovation: @marksaroufim responded positively to @jku100’s achievements, offering an opportunity to give a talk about their work, highlighting the significant potential of torch.compile for AI optimization.

  • Potential Talk Scheduled for March 9th: @jku100 tentatively agreed to present their work on March 9th, upon confirmation, to discuss further optimizations in torch.compile and the implications on AI acceleration techniques.

  • Skepticism on Contest Incentives: @naisdi commented critically on NVIDIA’s contest, implying the prize to cost ratio may not be worthwhile for participants, and comparing it unfavorably with NVIDIA’s marketing practices involving influencers.

Links mentioned:


CUDA MODE ▷ #beginner (3 messages):

  • Software Engineer Seeks MLOps Wisdom: @einstein5744, a software engineer from an ML company, is looking to enhance skills in MLOps and is currently experimenting with fast.ai and Diffusers library. They are open to suggestions and advice on delving deeper into machine learning operations.

  • Is it the fast.ai Diffusion Course?: @jeremyhoward inquired if @einstein5744 is engaged with the fast.ai diffusion course to gain a deeper understanding of machine learning techniques.

  • Course Gratitude from a Fast Learner: @joseph_en expressed gratitude for the fast.ai course and specifically mentioned finishing a review of the DiffEdit paper in chapter 11, appreciating the availability of such resources.


CUDA MODE ▷ #pmpp-book (2 messages):

  • Lecture 5 Event Scheduled: @jeremyhoward announced the addition of the event for lecture 5, which will be occurring this weekend, with a Discord event link provided for members to join.

  • Inquiry About Swizzled Order: @lancerts referenced the PyTorch blog on accelerating Triton and inquired whether the concept of swizzled order was covered in the PMPP book, suggesting they are yet to encounter it but may not have reached that part of the book.


CUDA MODE ▷ #youtube-recordings (4 messages):

  • Lecture 4 Recording Inquiry: @zonepg inquired about the availability of a recording for Lecture 4, as they were unable to attend the live stream due to timezone constraints.
  • Typical Upload Schedule Shared: @morgangiraud informed that recordings are usually uploaded at the beginning of the week.
  • Technical Issues Delay Lecture 4 Recording: @marksaroufim mentioned that there were technical issues which delayed the recording, but assured that the video would be posted that day.
  • Lecture 4 Recording Now Live: @marksaroufim announced the upload of the Lecture 4 recording and added that HD quality should be available in about an hour.

CUDA MODE ▷ #jax (5 messages):

  • Exploring Pallas Extension for JAX: @stefangliga shared a Pallas Quickstart Guide, highlighting that Pallas is an experimental extension which simplifies writing custom kernels for GPU and TPU within JAX. Pallas operates at a lower level of abstraction requiring consideration of memory access and computations across hardware accelerators, translating to Triton on GPUs and to Mosaic on TPUs.

  • User Contemplation: @stefangliga posted a thinking face emoji (🤔), which may indicate a moment of reflection or a need for further clarification about the discussed material.

  • Using Pure CUDA Kernels in JAX: @nshepperd provided a link to a resource explaining how to use pure CUDA kernels in JAX, offering additional flexibility when working with custom operations (Custom Operations for GPUs).

  • Inquiry About Triton Kernels and JAX Execution Details: @marvelousmit asked if there’s a method to print out Triton kernels and the actual code JAX executes, expressing an interest in understanding the underlying calls when profiling, especially related to sgemm kernel launches.

  • Seeking JAX Reverse Engineering Tools: Further, @marvelousmit mentioned being accustomed to reverse engineering kernel code using Triton+Torch with inductor and inquired about the equivalent process for JAX.

Links mentioned:

Pallas Quickstart — JAX documentation: no description found


OpenAI ▷ #ai-discussions (31 messages🔥):

  • Troubles with ChatGPT File Uploads: Users @sherlyleta, @guti_310, and @lugui reported that the ChatGPT file upload feature has been malfunctioning since the end of last week, with @lugui assuring that it will be resolved soon.
  • Debating Manufacturer Responsibility: @aipythonista countered @johnnyslanteyes’s suggestion to view Louis Rossman’s content for a list of issues, arguing that firmware updates should address these concerns and that claims may be biased as the brand in question is considered to be of higher quality than competitors.
  • Localizing GPT-3.5 Discussion: @czarcast inquired about hosting a local instance of GPT-3.5 for homelab diagnostics, @elektronisade noted that GPT-3.5 is not available for such use, prompting @riaty to recommend the open-source alternative, Mistral 8x7b.
  • Debating Model Efficiency and Smarts: @kotykd and @riaty discussed the viability of running the open-source model Mistral 8x7b locally, acknowledging its high resource requirements but arguing its potential utility depending on the user’s needs.
  • Consideration of Novel 3D Language Model: @red_code proposed a concept of a 3D vector language model to represent words and characters, sparking a brief reaction from @darthgustav. questioning the feasibility in terms of performance.

OpenAI ▷ #gpt-4-discussions (50 messages🔥):

  • Trademark Troubles for Custom GPT: @zurbinjo encountered issues using the term “Midjourney” in a GPT name, and @solbus clarified that trademark rights prevent this usage according to OpenAI’s branding guidelines.
  • Persistent Technical Turbulence: Numerous users including @Aleks, @sherlyleta, @thatjay_, and @realspacekangaroo experienced persistent issues with file uploads and ChatGPT functionalities across various browsers, with some success after switching browsers.
  • Users Grapple with GPT-4 Glitches: @dexmeighan and _odaenathus reported that GPT-4 is getting stuck during conversations, affecting usage across different web browsers, with the situation gradually improving for some.
  • Customization Confusion for GPT Builders: Users expressed difficulties with custom GPT features, such as setting a voice (@diphon), enforcing instruction sequence (@woodenrobot), and changing the identity displayed (@scootmandu_ai).
  • Seeking Feedback and Exposure for Custom GPTs: @_loier looked for ways to test and obtain feedback for a custom GPT designed for role-playing games, leading to discussions about the best methods and platforms to accomplish this.

Links mentioned:

Brand guidelines: Language and assets for using the OpenAI brand in your marketing and communications.


OpenAI ▷ #prompt-engineering (2 messages):

  • PDF vs Extracted Text for AI: @wazzldorr asked for advice on the best method to provide a scientific article to the AI, pondering between a PDF format or extracting the text and inputting it directly into the chat. In response, @lugui assured that the AI should be able to handle the PDF without any issues.

OpenAI ▷ #api-discussions (2 messages):

  • PDF vs. Extracted Text Question: User @wazzldorr asked about the best way to address a scientific article, questioning whether to provide the whole document in PDF format or to extract the text and provide it in the chat directly.
  • PDF Handling Capability: In response, @lugui assured that the AI should be able to handle the PDF with no problems.

LangChain AI ▷ #general (28 messages🔥):

  • Cloud GPU Nodes at Competitive Prices: User @lhc1921 mentioned that RunPod provides good offering on GPU nodes in the cloud, suggesting it may be a resource to consider for those requiring cloud computing services.

  • Introducing BGE-M3 Multi-Functional Embedding Model: @lhc1921 shared a link to a new embedding model known as BGE-M3 that excels in multi-functionality, multi-linguality, and multi-granularity. For more details, they provided the GitHub repository and the accompanying paper.

  • How to Use OpenAIEmbedder for Embeddings: In response to @natuto_uzumaki1808’s inquiry, @kapa.ai provided detailed instructions on generating embeddings after declaring embedder, including code examples in both JavaScript and Python, and referred users to LangChain’s JavaScript documentation and Python documentation.

  • Comparing Data Preprocessing in Llama Index and LangChain: User @arrmlet sought insights on the differences in data preprocessing between llama-index and LangChain and shared a Stack Overflow question for community response.

  • Translating HTML with LangChain While Preserving Styles: @o3omoomin raised a query about translating HTML into another language using LangChain, emphasizing the importance of maintaining the original style and formatting.

Links mentioned:


LangChain AI ▷ #langserve (33 messages🔥):

  • Bearer Token Troubles: @peterlandis inquired about how to pass in request headers using a bearer token when working with AzureGPT. In response, @veryboldbagel provided guidance referring to examples from the Configurable Runnables documentation and the APIHandler examples for full endpoint customization.

  • LangServe Learning Curve: A conversation between @lucas_89226, a self-described jaded former enthusiast, and @veryboldbagel focused on issues with setting up LangServe. @lucas_89226 faced errors when trying to use the sample code and the playground, which led to troubleshooting that included checking the client code, considering whether the LangServe server code had been altered, and verifying the correct OpenAI API endpoint configuration.

  • LangServe Setup Guide Announcement: @gitmaxd shared their experience with Hosted LangServe and offered a helpful guide complete with a deployment video and walkthrough for setting up a simple LangServe template.

  • Experiencing Errors with LangServe: @lucas_89226 reported encountering an error message while attempting to invoke a LangServe endpoint. In contrast, @veryboldbagel requested the full traceback and suggested that @lucas_89226 confirm if the error persists with the original unmodified server code.

  • Support Offer for Troubleshooting LangServe: As @lucas_89226 continued to troubleshoot their LangServe setup, @veryboldbagel remained engaged, offering additional support and advising @lucas_89226 to open an issue or start a discussion on the LangServe GitHub discussions page if needed.

Links mentioned:


LangChain AI ▷ #share-your-work (6 messages):

  • AI Form Roast Promises Form Optimization: @siddish showcased AI Form Roast by WorkHack, an AI tool aimed at analyzing online forms and providing feedback to enhance completion rates and user experience. They’re inviting feedback and support on Product Hunt.

  • OranScribe TweetStorm Express Flow Launch: @shving90 shared a tweet about OranScribe’s new feature that aids users in generating multiple versions of a tweet for diverse audience engagement, promising it only takes 10 minutes with their TweetStorm Express Flow. More can be learned from their Twitter post.

  • Introducing Dewy - Simplified RAG Applications: @kerinin introduced a knowledge base platform called Dewy, which simplifies getting Retrieval Augmented Generation (RAG) applications up and running. They provided a blog post for more details on Dewy’s abilities to automate document extraction, indexing, and retrieval.

  • Crypto Project Seeking Passionate Professionals: @hinayoka is on the lookout for professionals to fill various roles in an exciting crypto project, with vacancies including Web3 Developer, Game Developer, Web Developer, Moderator, and UI/UX Designer. Interested applicants are encouraged to reach out with their resume and portfolio.

  • Create Personalized AI Backlink Outreach Messages: @felixv3785 introduced a tool built with Vercel AI SDK and Langchain that generates personalized messages for backlink outreach. To test the free tool, visit Backlink Outreach Message Generator.

Links mentioned:


Latent Space ▷ #ai-general-chat (66 messages🔥🔥):

  • Sharing Setups & Tools: @ashpreetbedi shared details about their recording and development setup involving pycharm and screenstudio.
  • Strategizing AI Team Structures: @30sleeps discussed the idea of setting up an internal AI engineering team, possibly starting as a solo venture to demonstrate value before scaling up. @quicknick123 and @eugeneyan offered insights into team formations and considerations for scaling ML projects, with @eugeneyan recommending reading material on real-time ML and team configurations.
  • DSPy Series Breakdown Interest: @lightningralf sought a more accessible breakdown of a video series on DSPy, with @kbal11 and @coffeebean6887 expressing interest and suggesting the community could assist with understanding.
  • Lighthearted Banter on GPT-4: Members like @swyxio and @coffeebean6887 humorously discussed the perceived laziness of GPT-4, with links to relevant tweets and Reddit threads shared to highlight community feedback on the model’s performance.
  • Building Philosophical AI Agents: @dereklomas proposed creating an AI-powered digital library with philosophical agents interacting with each other, with @fanahova suggesting tools like Botpress and WorkAdventure for development.

Links mentioned:


LAION ▷ #general (61 messages🔥🔥):

  • AI Foundational Models: A Call to Collaborate: @pratikk10 is reaching out to individuals interested in creating foundational models, such as text to text/image/video. Conversations with serious thinkers and creators in this area are welcomed.
  • Bias in Reinforcement Learning: @pseudoterminalx discusses the concern that RLHF (Reinforcement Learning from Human Feedback) introduces significant bias into model weights, which is counterproductive for developing a base model. @astropulse echoes this sentiment, noting the distinctive style apparent in Midjourney’s v3-4 images, which could be a result of such biases.
  • Challenges in Unlearning Textual Bias: @thejonasbrothers shares the trials faced in attempting to unlearn text biases within pixart, especially stemming from version 5.1 datasets. @pseudoterminalx criticizes the utilization of the JourneyDB dataset, advocating that it should be discarded for more robust alternatives.
  • Revelations in Ancient Text: @itali4no shares details about the unveiling of the Vesuvius Challenge 2023 Grand Prize winners, who have successfully developed methods to read 2000-year-old scrolls without opening them. @nx5668 praises the transformative use of a TimeSformer model to detect ink in scroll scans, while noting the high cost of $40k per scroll for scanning using a particle accelerator.
  • Discussion on Chinese ML Expertise Amid Restrictions: @qwerty_qwer questions how Chinese entities are exceeding in machine learning despite GPU restrictions, which leads to a brief discussion about the sufficiency of GPU access and the implications of recent export restrictions. @kenjiqq informs that major Chinese corporations had procured substantial quantities of NVIDIA’s H100s and A100s before restrictions were enforced.

Links mentioned:


LAION ▷ #research (5 messages):

  • Hugging Face’s OWLSAM Spotlight: User @SegmentationFault shared a Hugging Face Space dedicated to OWLSAM, which combines OWLv2 with SAM optimization.
  • Visual Representation Lacks Coverage: @SegmentationFault noted disappointment in OWLSAM’s performance stating it “did not capture a lot of things in some tests I made”.
  • OWLSAM Misidentifies Objects: @SegmentationFault also pointed out that during their tests, OWLSAM tended to “capture wrong objects too” indicating a potential issue with the model’s object detection accuracy.

Links mentioned:

OWLSAM - a Hugging Face Space by merve: no description found


LlamaIndex ▷ #announcements (1 messages):

  • Big LlamaIndex Release Incoming: User @jerryjliu0 announced that a substantial LlamaIndex release is expected this week, featuring many cleanups. Users planning to upgrade their LlamaIndex version should anticipate this upcoming update.

LlamaIndex ▷ #blog (6 messages):

  • LlamaIndex Enhances Multimodal Capabilities: The integration with @llama_index enables the building of complete multi-modal applications that can run on a MacBook, taking image reasoning to the next level. Find out more in this tweet.
  • Home AI Wins Best Use of PDF Parser at Hackathon: The first in-person hackathon highlighted winning projects like Home AI, which incorporates a RAG-powered search engine to filter homes by innovative criteria. Check out the winners in this tweet.
  • Hackathon Cultivates Innovation and Feedback: Nearly 200 people participated in the hackathon, forming teams and providing real-time feedback to the LlamaIndex team on the user experience. More details can be found in this announcement.
  • Valuable Hackathon Resource Guide Shared: LlamaIndex’s resource guide was well received at the hackathon, catering to a range from beginners to experts. The guide is available here.
  • New RAG CLI Tool Unveiled: A new CLI tool powered by @llama_index, Mistral-7B, and bge-m3 offers an LLM-powered grep for on-device file search and customization. Learn about this tool and its capabilities in this tweet.

Links mentioned:

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It’s the all-in-one workspace for you and your team


LlamaIndex ▷ #general (33 messages🔥):

  • Integrating Knowledge Graphs with Azure: User @senshi2904 asked if there is a way to store and access knowledge graphs created by llama knowledge graph index using Azure CosmosDB Gremlin. There were no suggestions or responses provided in the messages.
  • Improving Prompt Effectiveness for Llama 2: @wrapdepollo inquired about techniques to prevent Llama 2 from forgetting instructions when prompted with too much text, seeking tips and custom prompts from the community. No solutions were directly offered in the conversation.
  • Replit Bounties Inquiry: @d.j147 queried where to contact for Replit bounties, but no guidance was offered within the given messages.
  • Internal Document Summarization Challenge: @mysterious_avocado_98353 sought advice on how to summarize the most recent document regarding AAPL, with created_time as metadata, while @akshay_1 mentioned that a query expansion via a large language model (llm) may be required, offering to DM resources later.
  • Updating Vector Stores in LlamaIndex: @ramihassanein pointed out that vector stores like MongoDB and DeepLake might need updating to BasePydanticVectorStore, similar to the recent update of AstraDB. @cheesyfishes responded by encouraging contributions through pull requests (PRs) and updated that they address updates as they are pointed out.

Links mentioned:


LlamaIndex ▷ #ai-discussion (25 messages🔥):

  • Seeking Assistance for Chatbot Creation: @cay7man expressed interest in building a chatbot for engineers to query standards documents. @rick_03848 responded with a link to a GitHub project that uses LlamaIndex for document querying, which can be found at GitHub - imartinez/privateGPT.

  • Vector Search Woes Uncovered: @gavmor reported having issues with vector search results using Qdrant, finding non-relevant nodes returned higher than expected nodes. @cheesyfishes replied, suggesting the review of response.source_nodes for debugging.

  • Retrieval Tips and Code Sharing: @gavmor inquired about printing vector scores and comparing node similarity, eventually sharing TypeScript code for generating and comparing embeddings using Ollama and the query “How old is John?” as an example.

  • Architecture Quirks and Best Practices: @gavmor showed concern about the proper usage of embeddings and LLM objects in their retrieval setup, leading to a discussion about the implementation. The conversation included a critique by @cheesyfishes on the LlamaIndex’s LLM object containing embeddings.

  • Planning for Future Retrieval Experiments: @gavmor mentioned their plans for further retrieval experiments, including pulling more nodes for reranking but expressed reluctance to change their chunking strategy.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (41 messages🔥):

  • Qwen1.5 Model Released!: @bratao announced the release of Qwen1.5, providing links to the blog post, GitHub, Hugging Face, ModelScope, a demo, and the Discord server. The model comes in multiple sizes and includes quantized versions for better developer experience.

  • GCP Offers Competitive Rates for Enterprises: Users discussed Google Cloud Platform’s (GCP) pricing, with @nruaif revealing that enterprise customers can access A100 instances for approximately $1.5-2 per hour on demand, which is cheaper than the market rate for non-enterprise users.

  • Comparison and Benchmarks of Qwen 1.5: @yamashi mentioned that Qwen 1.5 appears better than Mistral based on benchmarks, expressing disappointment at the lack of a 30b model.

  • Complaints about Inconsistent Metrics in Benchmarks: @dreamgen and @yamashi discussed the benchmarks associated with the new models, suggesting that the inherent noise in benchmarks should be acknowledged in results, and that standard deviation should be reported, although often omitted.

  • Insight into Reseller Dynamics and Pricing: The conversation revealed how large cloud providers like AWS and GCP set higher non-enterprise prices, presumably to encourage resellers like RunPod, with further mention of spot price deals, such as the GCP L4 spot price at $0.2 per hour, as shared by @nruaif.

Links mentioned:

Introducing Qwen1.5: GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction In recent months, our focus has been on developing a “good” model while optimizing the developer experience. As we progress towards…


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (14 messages🔥):

  • Quantizing Before Merging: @dreamgen highlighted that Hugging Face suggests quantizing the base model before merging, as qlora does, which differs from the Axolotl script’s approach.
  • Axolotl Possibility For Bayesian Optimization: @mihai4256 inquired about including abstractions for Bayesian optimization in the Axolotl framework for hyperparameter tuning, to which @nruaif responded that it seems easy to implement and referenced Hugging Face’s documentation that supports such optimization for training.
  • Dependency Conflicts in Axolotl Installation: @nanobitz reported dependency conflicts during a clean installation of Axolotl, involving package versions of torch where xformers had to be commented out. @dctanner proposed a workaround using torch 2.1.2.
  • Ideas for Axolotl UI Experience in Hugging Face: @dctanner proposed the idea of a user interface within Hugging Face Spaces for Axolotl that could provide a beginner-friendly experience of configuring and running training using user’s own Hugging Face account’s GPU resources.
  • Axolotl YAML Configuration Simplification Request: @nanobitz noted an anomaly with ds2 saving potentially corrupt model.safetensors and also asked for contributions in removing redundant is_*_derived_model fields from all example YAML configurations in the Axolotl project.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #rlhf (1 messages):

dangfutures: does anyone know how to the configs for zephyer <@257999024458563585>


Perplexity AI ▷ #general (25 messages🔥):

  • Pro Upgrade Payment Issue Reported: @officialjulian experienced a problem where funds were deducted for a Pro upgrade, but they received a “card has been declined” message. @mares1317 suggested contacting [email protected] for assistance.
  • Billing Support Sought on Discord: @yuki.ueda expressed trouble getting a response from support regarding a billing inquiry after two weeks. @ok.alex responded promptly, asking for a DM with the email to check the issue.
  • Confusion on Stripe’s Declined Payments: In response to @officialjulian’s issue, @icelavaman explained that the “card declined” message from Stripe implies no charge should have occurred.
  • Exploring AI Prompt Lengths in Collections: @twelsh37 inquired about prompt character limits as they had a prompt that was over by 2500 characters when adding to Collections.
  • Feedback on Claude’s Performance: @Catto mentioned that an earlier version of Claude was still available on Anthropic’s site and suggested Perplexity might consider reverting to it, citing dissatisfaction with Claude 2.1’s capabilities.

Links mentioned:


Perplexity AI ▷ #sharing (7 messages):

  • Ethical AI in Diverse Educational Contexts: @worriedhobbiton raised concerns about the application of AI at Pasco High School, questioning if AI can provide unbiased and culturally sensitive support for students, especially given their LatinX identity and socioeconomic challenges. They asked for strategies to ensure that AI offers ethical guidance.

  • Irrelevant AI Research Query Result: @byerk_enjoyer_sociology_enjoyer experienced an issue with the AI performing unrelated research when asked to assess the validity and authority of a source. They shared a Perplexity AI search result that did not meet expectations.

  • Seven Principles Inquiry: @fafu_10 provided a Perplexity AI search link without additional context or commentary.

  • AI Tools Showcase: User @vipinpg shared a Perplexity AI link about the best AI tools, however, no further context or discussion was provided.

  • Contemplating Digital vs. Biological Intelligence: @mares1317 posted a YouTube video featuring Geoffrey Hinton discussing if digital intelligence will replace biological intelligence, hosted by the University of Toronto and associated institutes.

  • A Moment with Master Yoda: @mares1317 shared a Yoda Star Wars GIF without further discussion or context.

Links mentioned:


Perplexity AI ▷ #pplx-api (4 messages):

  • Seeking Assistance: User @aiistheonlyway requested help, but did not provide details about the issue they’re facing.
  • Link to Potential Solution: @icelavaman shared a link, but the content or context of the link was not specified in the message.
  • Speeding Up Summaries: @sid.jjj is looking for ways to reduce the API response time for generating summaries from long transcripts. They note that summary generation for three links in parallel takes approximately 10 seconds.
  • Discrepancy Issue Raised: @jbruvoll asked for insights regarding a behavior discrepancy between interactive use and API usage, linking to a specific Discord message for reference.

Links mentioned:


DiscoResearch ▷ #general (27 messages🔥):

  • Dare_Ties Merge Achieves High Scores: @johannhartmann shared that they created a high scoring Danish language model using a dare_ties merge method, now ranked 2nd on the Mainland Scandinavian NLG leaderboard. Details can be found at munin-neuralbeagle-7b.

  • Merging Models Without GPUs: @sebastian.bodza mentioned that LeoLM models can be merged as desired without the need for GPUs, and that this is possible even on platforms like Google Colab.

  • Wiedervereinigung-7b-dpo-laser Unveiled: @johannhartmann introduced Wiedervereinigung-7b-dpo-laser, a 7b parameter German model that combines some of the best German models available, and discussed its high scores on MT-Bench-DE.

  • Analyzing the Effectiveness of Model Merging: In a discussion between @johannhartmann and @bjoernp, they explored whether high scores from model merging reflected an actual improvement in real-world use cases. @johannhartmann confirmed seeing improvements using chat functions.

  • Laser Treatment for Language Models Discussed: @johannhartmann experimented with laserRMT on their model, but found that the mtbench scores were higher before using the procedure. The discussion hinted at the complexity of model merging and the effects of different training processes on model performance.

Links mentioned:


DiscoResearch ▷ #embedding_dev (1 messages):

  • Jina AI Launches New Code Embeddings: @sebastian.bodza shared a link to the new code embeddings from Jina AI, which supports English and 30 programming languages for neural search applications. The model boasts 8192 sequence length and is best utilized through Jina AI’s Embedding API.

Links mentioned:

jinaai/jina-embeddings-v2-base-code · Hugging Face: no description found


Alignment Lab AI ▷ #general-chat (14 messages🔥):

  • LLama2 SFT Training Loss Issues resolved?: @jellyroger5505 encountered a problem with a strange training loss curve while using SFT on LLama2. @ufghfigchv suggested that the issue might be due to a high learning rate, but upon checking @jellyroger5505’s config, the suggested resolution was to switch to Axolotl and possibly increase the learning rate based on their config examples.

  • Ankr Seeks Connection for Growth and Development: @anastasia_ankr from Ankr reached out to connect with the engineering and/or business development team to discuss node infrastructure. @ufghfigchv pointed Ankr towards @748528982034612226 as the contact person.

  • A Simple Hello Can Mean a Lot: Both @xterthy and @aslawliet dropped brief greetings into the chat, fostering a welcoming community vibe.

  • Direct Message Awaiting a Response: @mizzy_1100 indicated the desire to communicate further, tagging @748528982034612226 and advising them to check their Direct Messages (DMs) for more information.

  • Encouraging the Circus of Ideas: @rusch playfully referred to the Discord server as an “amazing discordian circus,” promoting a sense of fun and collaboration amongst its participants.

Links mentioned:

axolotl/examples/llama-2/fft_optimized.yml at main · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.


Datasette - LLM (@SimonW) ▷ #ai (1 messages):

  • Audacity Integrates Intel’s AI Tools: @dbreunig announced that Audacity, the free audio editor, now includes a suite of free AI tools from Intel, challenging expensive subscription services. Key features are noise suppression, transcription using Whisper.cpp, music separation, and experimental generative features like music generation, all running locally on users’ computers.

Links mentioned:

Audacity now has free AI-powered sound tools from Intel - CDM Create Digital Music: The free audio editor now gets a suite of free AI tools from Intel, some competing with expensive paid subscription services. That covers useful stuff like noise suppression and transcriptions and mus…


Datasette - LLM (@SimonW) ▷ #llm (2 messages):

  • Exploring llm for Thesis Work: @kiloton has been using chatGPT for thesis brainstorming and is pleased with the results when combining file uploads and web search features. They seek advice on working with PDFs and web searches through llm, pondering if conversation histories can be ported to other models and retained locally.

  • Hugging Face Integration Interest: @dbreunig expressed interest in integrating llm with Hugging Face’s transformers, sharing a Natural-SQL-7B model that delivers strong Text-to-SQL performance and complex question understanding. The model reportedly surpasses peers in its category.

Links mentioned:

chatdb/natural-sql-7b · Hugging Face: no description found


LLM Perf Enthusiasts AI ▷ #opensource (2 messages):

  • Qwen1.5 Unleashes a New Model Generation: @potrock shared a comprehensive introduction of Qwen1.5, announcing the open-sourcing of base and chat models with six different sizes. Various resources were provided such as the blog post, links to GitHub, Hugging Face, Modelscope, a demo, and a Discord community.
  • Llama’s Potential Challenger: @potrock highlighted that the 0.5B Qwen1.5 model closely rivals the performance of Llama 7B. The revelation hints at significant efficiency improvements in the new model iteration.

Links mentioned:

Introducing Qwen1.5: GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction In recent months, our focus has been on developing a “good” model while optimizing the developer experience. As we progress towards…