> AI Discords for 2/21-23/2024. We checked **20** guilds, **318** channels, and **15439** messages for you. Estimated reading time saved (at 200wpm): **1430 minutes**.

Mistral came out swinging today announcing Mistral-Large on La Plateforme and on Azure, trailing GPT4 about 5 percentage points on their aggregated benchmarks:

image.png

The community reception has been mildly negative.

image.png

And hopes are not high for open sourcing. Notably, Mistral are also claiming that the new Mistral-Small is “significantly better” than the openly released Mixtral 8x7B.

image.png


Table of Contents

[TOC]

PART 0: Summary of Summaries of Summaries

Evaluating LLMs Performance and Cost-Efficiency:

The discussion in TheBloke Discord underscores the comparative analysis between Mistral Large and GPT-4 Turbo, with Mistral Large's performance on benchmarks like MMLU falling short despite similar cost implications, suggesting a reevaluation of cost-benefit for users and developers alike.

Technical Training Hurdles and Best Practices:

Challenges in implementing DeepSpeed to avoid out-of-memory errors and the application of DPO using the DPOTrainer highlight the technical intricacies and community-driven solution sharing, illustrating the ongoing efforts to optimize LLM training efficiency and practicality.

Advancements in AI Deception for Roleplay Characters:

The dialogue on creating AI characters capable of deception, especially with the application of survival goals, reflects the nuanced exploration of AI's narrative capabilities. The use of DreamGen Opus V1 despite tokenizer and verbosity issues underscores the creative pursuits in AI storytelling.

Intricacies of Model Merging:

The discourse led by community members on merging non-homogenous models using strategies like linear interpolation and PEFT merging methods reveals a deep dive into the complexities and potential of enhancing LLMs through model integration, marking a significant area of exploration within AI development practices.

PART 1: High level Discord summaries

TheBloke Discord Summary

  • Evaluating LLMs Performance and Cost-Efficiency: The performance of Mistral Large was compared unfavorably to GPT-4 Turbo by @timotheeee1, suggesting it may not be worth the similar costs given its performance on benchmarks like MMLU.

  • Technical Training Hurdles and Best Practices: Issues with DeepSpeed OOM errors and discussions around practical implementations of DPO using the DPOTrainer from the trl Hugging Face library prompted sharing of insights and resources among users such as @cogbuji and @plasmator.

  • Advancing AI Deception in Roleplay Characters: Dialogues on creating AI characters that can convincingly lie highlighted the improvements when applying explicit survival goals. Challenges encountered when using the DreamGen Opus V1 model were discussed, along with tokenizer issues and verbosity in AI storytelling.

  • The Intricacies of Model Merging Explored: Discussions led by @jsarnecki and @maldevide delved into the complexities of merging non-homogenous models and various strategies for successful mergers, like linear interpolation. The limitations and possibilities were articulated, drawing on resources like mergekit and advancements in PEFT merging methods outlined in a Hugging Face blog post.

  • Engineers Pine for the Past While Gazing Toward AI’s Future in Decompilation: Reminiscences of OllyDbg’s features by @spottyluck contrast with excitement for potential AI-assisted decompilation expressed by @mrjackspade. The suggestion to use a large volume of open-source projects for creating AI training data sets demonstrates forward-thinking for advancing AI capabilities in code reconstruction.


Mistral Discord Summary

Mistral Large Takes the Stage: The introduction of Mistral Large, a highly optimized language model with an 81.2% accuracy on MMLU and features such as multilingual capabilities and native function calling, stirred interest and discussion across the community. It’s available for use via platforms such as la Plateforme.

Technical Hurdles & Triumphs in LLM Deployment: Members shared experiences and exchanged technical advice on the challenges of deploying Mistral models, such as the Mistral7x8b and Mistral-7B-Instruct, on various hardware setups including Tesla V100s and local machines with limited VRAM. Tips on adjusting layer sharing, precision levels, and dealing with freezing issues were exchanged, highlighting the technical nuances of high-performance model usage.

Fine-Tuning Finesse: The community discussed fine-tuning practices, emphasizing the need for experimentation and adequate data quantities, with suggestions pointing to around 4000 instances for specific tasks. There was also a focus on the right data format for fine-tuning with Mistral models, and the necessity of understanding advanced fine-tuning techniques like LoRA.

Contemplating Commercial Impacts & Open Access: Conversations around Mistral’s shift towards more business-oriented, closed-weight models like Mistral Small and Large surfaced concerns about the future of open models. However, many members are hopeful for the continued support of open model development despite big tech partnerships.

Mistral API Insights and Queries: Queries related to the Mistral API were numerous, ranging from concerns about data privacy, with confirmations that data isn’t used for model training, to functional inquiries about running Mistral on local machines without GPUs. There was also a discussion on third-party offerings and potential integrations for extending Mistral’s capabilities.

User-Driven Design and Application Ideas: The community actively shared ideas for new applications and enhancements, including the development of plugins and mobile apps that leverage Mistral. One user proposed adding a language level setting to Mistral’s Le Chat and there’s a buzz around the feature simplicity of Mistral-Next within Le Chat, which could indicate a user preference for streamlined AI products.


LM Studio Discord Summary

  • Troubleshooting LM Studio’s White Window Woe: User @steve_evian encountered an issue where LM Studio launched to a white window; @heyitsyorkie recommended clearing .cache and %APPDATA%, which resolved the issue.

  • Exploring Multilingual LLM Presence: @.deug queried about pre-trained multilingual LLMs with Korean support; @heyitsyorkie noted a scarcity of LLMs proficient in Korean to English translation, recommending combining LM Studio with an online translator like DeepL.

  • LM Studio API Refuses to Run Headless: @muradb inquired about headless operation for LM Studio API; @heyitsyorkie clarified the current version doesn’t support it, while @krypt_lynx expressed desire for open-source and headless features, confirmed to be unavailable by @heyitsyorkie.

  • Hyperparameter Evaluation Remains a Personal Choice: @0xtotem pondered the proper dataset for hyperparameter evaluation for a RAG model—with consensus leaning towards using the closest data available, as specific guidance was lacking.

  • GPU Wars: Nvidia Faces Off AMD in User Preferences: The suitability of AMD’s GPUs for running LLMs was debated; users showed a general preference for Nvidia due to the ease of AI applications setup, despite speculation about AMD working on CUDA alternatives.

  • A Collective Effort to Aid IQ Models in LM Studio: @drawless111 succeeded in making IQ models work and offered guidance on locating specified formats on HGF; others discussed improvements and updates to various models and tools like llama.cpp.

  • Online Reinforcement Learning Without File System Access: @wolfspyre asked about LM Studio’s capability for local file system access; it was clarified that LLMs don’t have this capability, nor does LM Studio support executing commands from LLMs.

  • AutoGen Anomalies Addressed with a Classic Reboot: Users shared troubleshooting tips for AutoGen errors, including reinstalling packages and the reliable “turn it off and on again” strategy, as humorously depicted in a Tenor GIF.

  • Seeking Support for Langchain’s RAG Utilization: In a one-message brief mention, bigsuh.eth inquired about using RAG within Langchain via LM Studio, but no discussions or answers followed.

  • Open-Interpreter Connectivity Conundrum Cracked: User @nxonxi dealt with connection errors and syntax mistakes when trying to run Open Interpreter with --local flag; after troubleshooting, simple Python requests worked as a solution.


Perplexity AI Discord Summary

  • Discover Daily Podcast Unveiled: Perplexity AI, in partnership with ElevenLabs, launched the Discover Daily podcast. The episodes sourced from Perplexity’s Discover feed are available on podcast.perplexity.ai, featuring daily tech, science, and culture insights using ElevenLabs’ voice technology.

  • Sonar Models Spark Debate: New sonar-small-chat and sonar-medium-chat models along with their search-enhanced versions were introduced by Perplexity AI, leading to community comparisons with pplx-70b-online. Users reported incoherent responses from sonar models, requesting not to phase out pplx-70b-online due to its better performance and mentioning that fixes for sonar models were underway as per community insights and the API Updates.

  • Gibberish Responses from Sonar Models Under Scrutiny: Users like @brknclock1215 suggested possible mitigation of gibberish outputs by limiting response length, which contrasted with the stable output quality of pplx models even at longer lengths. Meanwhile, API users discussed fetching model details programmatically for improving user interface selections.

  • Engagements in #general Rife with AI Chat Model Discussions: The community engaged in various discussions including the retirement of Gemini in favor of possible Gemini Ultra, inconsistencies in model responses across different platforms, and leveraging Perplexity’s Pro capability for image generation.

  • Assorted Inquiries and Tests in Sharing: Members in the sharing channel delved into a mix of topics like exploring user guides for Perplexity topics, questioning the novelty of Lenovo’s technology, and sharing mixed use cases leveraging the AI for personal assistance and technical inquiries.


OpenAI Discord Summary

  • VPN Interference with OpenAI Services: @strang999 encountered an error using OpenAI services, which @satanhashtag attributed to potential VPN interference and suggested disabling web protection in VPN settings.

  • GPT-4 Context and Captcha Challenges: @orbart and @blckreaper are frustrated with ChatGPT’s reduced memory for narrative work, suspecting a decrease in tokens processed, while @little.toadstool and @necrosystv reported cumbersome captcha tests within ChatGPT.

  • Quest for Image-to-Video and Data Privacy Concerns: @sparkette was looking for a browser-based image-to-video generator and @razorbackx9x asked about AI for sorting credit report data, with @eskcanta cautioning against uploading sensitive personally identifiable information (PII).

  • Navigating Custom GPT and Assistant Differences: Users noted inconsistencies between Custom GPTs and Assistant GPTs in handling formatting and markdown, particularly when generating tables or images, with advice to refer to specific API configurations.

  • Anticipating Sora and Protecting Prompts: The community is curious about the capabilities of OpenAI’s Sora and discussed the feasibility of protecting custom prompts with @.dunamis. and @kyleschullerdev_51255 agreeing that complete protection isn’t possible, suggesting a layered web application for security instead.


LAION Discord Summary

  • Watch Out for Crypto Scams: One post in the #learning-ml channel from @josephsweeney11 appears to be a potential scam involving making $40k in 72 hours and should be approached with extreme caution.

  • Understanding Transformers’ Learning Capabilities: In the #learning-ml channel, @phryq. inquired about experiments training transformers to understand size relationships to enhance image generation, using hypothetical objects.

  • New Snap Video Project Unveiled: A new project termed Snap Video was discussed in the #general channel, addressing challenges in video generation with a transformer-based model, and sharing the project link and related research paper.

  • Debate Over Optimal CLIP Filtering Techniques: In the #research channel, the discussion revolved around whether CLIP filtering is suboptimal compared to image-text pair classifiers, with reference to a recently published DFN paper in the conversation.

  • Gradient Precision: bfloat16 vs fp32 Debate: Conversations in the #research channel have touched on the use of autocasting on TPUs with bfloat16 gradients and compared its performance against the default fp32 gradients in PyTorch’s autocast behavior.

  • Sharing of AI Research Papers and Methods: Across the channels, participants shared insights and resources on various AI research topics such as state space architecture, Transformer optimization, AI-generated text detection, and discussions around making LLMs significantly cheaper, with links to resources like Mamba-ND, among others.


HuggingFace Discord Summary

  • Democratization of AI Hardware Sparks Intense Debate: In the discussion surrounding the potential of creating proprietary TPUs and the democratization of hardware, parallels to the car and RAM industries elicited skepticism regarding tech promises from companies like Samsung. The importance of such advancements was underscored, given their impact on AI capabilities and access.

  • Towards Accessible and Practical AI Solutions: Several initiatives, including the creation of Galaxy AI, offering free API access to models such as GPT-4, GPT-3.5, and Galaxy AI’s Gemini Pro, to the presentation of surya, an OCR and line detection project that supports over 90 languages, are aimed at making AI tools more accessible and practical for various applications, as explained in this GitHub repository.

  • Neural Network Innovations & Model Finetuning Challenges: From introducing the support for WavLMForXVector in browsers to reviewing Peft’s library for new merging methods for LoRA, there’s a clear focus on model deployment and improving AI performance. Finetuning difficulties, whether with Flan T5 producing incoherent output or a zigzag loss graph in Qwen1.5-0.5B, remain pivotal points of discussion.

  • Cross-disciplinary AI Projects Garner Attention: Projects that integrate AI with specific disciplines, such as Unburn Toys, an open-source AI toolbox, or the TTS Arena for comparing TTS models, signify a cross-functional approach to AI development. This is complemented by the release of datasets for niche applications like philosophical Q&A, available on Hugging Face here.

  • Knowledge Sharing and Collaborative Growth in AI Communities: Whether it’s a query on imitation learning for robotics, the use of AnimeBackgroundGAN, the issues related to multi-language OCR, or the Japanese Stable Diffusion model’s approach to training in a new language, it’s evident that AI communities serve as invaluable forums for sharing knowledge, solving problems, and fostering collective progress in the field.


Eleuther Discord Summary

  • Batching Blues with GPT-4: @rwamit raised concerns about increasing processing time from 2s to 60s per iteration when implementing batching to query GPT-4 using langchain wrapper, ballooning the task from 5 to 96 hours for 5-6k records.

  • Intrigue in Initialization: A particular code piece in Gemma’s PyTorch implementation involving RMSNorm sparked discussions about the significance of the addition of +1 to the normalization process.

  • EfficientNet’s Efficacy: A debate arose over EfficientNet’s merits, with @vapalus defending its use in segmentation tasks despite criticism from @fern.bear regarding its marketing versus performance.

  • Mistral Large Debuts: The release of Mistral Large was announced, a model acclaimed for its strong text generation performance and availability on la Plateforme and Azure. Check out Mistral Large for additional insights.

  • DPO Paper and SFT: Clarity was sought by @staticpunch about model_ref initialization in DPO, with confirmation that Supervised Fine-Tuning (SFT) on preferred completions should precede DPO, as discussed in the DPO paper.

  • Diving Deeper into GRUs: @mrgonao showed curiosity about why gated units like GRUs are termed as such, yet explanations regarding their etymology remained elusive within the channel.

  • The Search for Smarter Search: The “Searchformer” paper Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping describes how a Transformer-based model can outperform traditional A* methods in solving puzzles, offering an innovative approach to search problems.

  • RLHF and the Simplicity Debate: A paper advocating for simpler REINFORCE-style optimization over Proximal Policy Optimization (PPO) for RLHF triggered discussions on the efficiency of fundamental methods in RL for language models. The paper is accessible here.

  • Watermarking Frameworks Face-off: The landscape of text watermarking for large language models was shared, featuring techniques for embedding detectable signals in generated text and analyzing the robustness of such watermarks.

  • Tales of GPT-NeoX and Python: Amidst hesitation about upgrading to Python 3.10, conversations in development veered towards preferences for a custom training loop over GPT-NeoX, showing an active engagement with the finer points of AI development optimization.

  • Multilingual Matters in Tokenization: Queries about optimizing the Mistral tokenizer for better multilingual representation underscored ongoing efforts to enhance language model capabilities beyond English, indicating a focus on global applicability.


LlamaIndex Discord Summary

  • Create-llama Eases Full-Stack Development: The newest create-llama release integrates LlamaPack, streamlining the construction of full-stack web apps through the inclusion of advanced RAG concepts with minimal coding. The announcement was shared in a tweet by @llama_index.

  • Counselor Copilot Leverages Advanced RAG: Counselor Copilot project, highlighted in a tweet, distinguishes itself by utilizing advanced RAG to assist crisis counselors, showcasing a use case as a co-pilot rather than a basic chatbot.

  • RAG Retrieval Enhanced by Summaries: To improve RAG retrieval, a technique using sub-document summaries helps tackle global concept awareness problems that arise from naive chunking. This approach is detailed in a tweet discussing the consequential boost in contextual awareness of each chunk.

  • LlamaParse Masters Complex PDF Parsing: LlamaParse has been introduced as a powerful tool for parsing PDFs with complex tables and figures, crucial for high-quality RAG applications. Accurate table representations aid the LLM in providing correct answers, as stated in a tweet.

  • Challenges with Kafka’s Protagonists in AI: In a discussion regarding generating a book review for Kafka’s “Metamorphosis,” @daguilaraguilar faces trouble with the AI incorrectly highlighting “Grete” as the protagonist instead of “Mr. Samsa,” referencing their code.

  • Insights into Financial Document Analysis and Context Management: SEC Insights brings advanced capabilities for analyzing financial documents, and there is a call within the community for benchmarks related to best practices in context management for large-window LLMs such as GPT-4 turbo and Gemini 1.5.


Latent Space Discord Summary

Sora’s Consistency Questioned: In a correction to a WSJ video, @swyxio pointed out that OpenAI’s Sora maintains consistency over more than 1-minute videos by interpolating from a start image.

NVIDIA’s GEARing Up: NVIDIA announced a new research group, GEAR (Generalist Embodied Agent Research), co-founded by Dr. Jim Fan, focusing on autonomous machines and general-purpose AI.

AI-Generated Podcasts Hit the Airwaves: Perplexity has launched an AI-generated podcast, drawing content from their Discover feed and employing ElevenLabs’ voices for narration.

One Line of AI Code with Cloudflare: Cloudflare’s new AI Gateway has been introduced, featuring easy integration via a single line of code for AI analytics and insights.

AI Takes on Data Analysis with GPT-4-ada-v2: A new tool - ChatGPT Data Analysis V2 enhances data analysis by offering targeted replies and data grid overlay editor, possibly implementing interactive charts and leveraging gpt-4-ada-v2.

LLM Paper Club T5 Session Recap: A recent LLM Paper Club session led by @bryanblackbee dissected the T5 paper with discussions encapsulated in shared Notion notes. Open inquiries included model vocabulary, fine-tuning processes, and architecture differences for NLP tasks.

Local Model Enthusiasts Convene in AI in Action Club: The “AI in Action” event highlighted local model exploration, tooling discussions for local AI models, and references to model fine-tuning with LoRAs deploying tools like ComfyUI. The Latent Space Final Frontiers event was announced, inviting teams to push the boundaries of AI with an application link here.


OpenAccess AI Collective (axolotl) Discord Summary

  • Gradient Clipping Woes & DeepSpeed Query: An issue with gradient clipping set to 0.3 was discussed with suspicions of temporary spikes; meanwhile, a GitHub issue about HuggingFace’s Trainer supporting DeepSpeed Stage 3 incited feedback on usage and updates. Axolotl’s cache clearing techniques were also shared, using huggingface-cli delete-cache.

  • Strategic Shifts at Mistral AI?: Discussions surfaced regarding a strategic partnership between Microsoft and Mistral AI, centering on potential implications for open-source models and the commercial direction of Mistral AI. Links to a Twitter post and news article were shared for further insight.

  • Ease of Access with Axolotl’s Auto-Install: The Axolotl project saw improvements with the introduction of auto_install.sh to simplify installations, showing commitment to non-Python developer support. A Twitter post sought community support for the CUDA mode series with the potential assistance of Jeremy Howard.

  • GPUs, Dockers, and Newbies: Technical issues regarding GPUs, such as long training times and high loss, Docker container complications, and the desire for a beginner-friendly Axolotl tutorial were prominent. Hugging Face’s reported checkpoint save error issue #29157 and Axolotl’s GitHub #1320 were among the key references.

  • Community Highlights Korean Expansion & RAG Features: A fine-tuned phi-2 model without a model card was announced, EEVE-Korean models were touted for extended Korean vocabulary, and R2R Framework for RAG system development was introduced. The supporting arXiv technical report and various Hugging Face models were provided to the community.

  • Runpod Hits a DNS Hitch: A NameResolutionError on runpod suggested DNS resolution issues possibly involving proxy settings when trying to reach ‘huggingface.co’ were reported.


CUDA MODE Discord Summary

  • CUDA Under Fire: Computing legend Jim Keller criticized NVIDIA’s CUDA architecture in a Tom’s Hardware article, suggesting it lacks elegance and is cobbled together. Meanwhile, the introduction of ZLUDA, which enables CUDA code to run on AMD and Intel GPUs, was open-sourced with hopes to challenge NVIDIA’s AI dominance (GitHub link).

  • Gearing Up with GPUs: Debates surfaced regarding GPU choices for AI with the 4060 ti being the cheapest 16GB consumer GPU and the 3090 offering 24GB VRAM as a stronger alternative for LLM tasks. Discussions were also vibrant around second-hand GPU buying strategies and potential technical remedies when issues arise.

  • Quantized Computation Conversations: Clarity surfaced on how computations in quantized models maintain accuracy, and discussions around implementing efficient CUDA kernels through torch.compile by detecting patterns were prominent. The speed of CUDA kernel compilation was also a topic, with methods to reduce compile times from over 30 seconds to under 2 seconds proposed (repository link).

  • Triton Tinkering: Interest piqued in Triton, a tool for enabling Jax support via Pallas and its comparison to CUDA for multi-GPU/node execution. There were calls for experts to explain the lower-level workings of Triton, its foundation in LLVM and MLIR, and to create benchmarks for its quantized matmul kernel.

  • Flash Attention Finessed: Within ring attention discussions, a zigzag_ring_flash_attn_varlen_qkvpacked_func implementation showed speed improvements. A Hugging Face document detailed memory efficiency benefits (Flash Attention Visual), and benchmarks indicated a 20% speed up over classical flash attention (benchmark link).

  • CMU’s Paper on Efficient LLM Serving: A paper from CMU on efficient methods in deploying generative LLMs was shared, titled “Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems” (arXiv link), surveying over 150 works on techniques including non-autoregressive generation and local attention variants.

  • Learning Efficiency Through MIT: An MIT course on efficient AI computing was unveiled, covering model compression, pruning, quantization, and providing hands-on experience with models like LLaMa 2 and touching quantum machine learning topics (course link).

  • CUDA-MODE Lecture Announcements and Learnings: Lecture 7 on Quantization titled Quantization CUDA vs Triton was announced, emphasizing the discourse on efficient techniques in AI computations with quantization at the forefront. Lecture content was supplemented by YouTube videos and easily accessible slide presentations, fostering continued education in the community (YouTube Lecture 6, Lecture 7).

  • Job Prospects and Queries: Nvidia was confirmed to be looking for CUDA and C++ experts, inviting applicants to DM their CV for JobID: JR1968004. Questions around hiring status for companies like Mistral were floated, underlining the employment buzz within the AI engineering sector.


LangChain AI Discord Summary

  • Exploring Function Calls in AI Models: Engineer @saita_ma_ is looking for ways to execute function calls with local models such as OpenHermes, inspired by what CrewAI achieved. Meanwhile, @kenwu_ shared a Google Colab seeking assistance on agent and function calling using Cohere API and LangChain.

  • LangChain Integration in Various Projects: The creation of a personalized chatbot implementing OpenAI, Qdrant DB, and Langchain JS/TS SDK was shared by @deadmanabir, while @david1542 introduced Merlinn, a machine learning tool to support on-call engineers. Furthermore, @edartru. offered Langchain-rust, a crate that allows Rust developers to use large language models in programming.

  • Tutorial Resources Promote DIY AI Projects: A recent YouTube tutorial shows viewers how to create a ChatGPT-like UI using ChainLit, LangChain, Ollama, & Gemma. @rito3281 wrote about using LLMs for finance analysis in the insurance industry, and @tarikkaoutar posted a video on creating a multi-agent application involving LangGraph.

  • Sarcasm Detection and Timeout Extensions in LLMs: There was a suggestion to tag phrases with “sarcasm” for better LLM detection post-fine-tuning, but further discussion on the mechanics was not provided. A query about extending the default 900-second timeout was raised, yet no subsequent solutions or elaborations were found.

  • Emerging Tools and Use Cases Explored by Developers: @solo78 invites collaborative discussions on AI implementations in the insurance sector’s finance function. An AI-powered resume optimizer that helped secure interviews at tech companies was introduced by @eyeamansh.


Datasette - LLM (@SimonW) Discord Summary

  • ChatGPT Multilingual Mishaps: Users noted that chatgpt-3.5-turbo sometimes mistranslates document titles, with one instance changing “Taking Advantage of the Internet” to “Sacándole Provecho a Internet”. The suggested workaround is to use a system prompt specifying “Always use English” to prevent such language detection errors.

  • Prompt Crafting Nostalgia & Fixes: @tariqali discussed the benefits of old school prompt crafting for better control in light of chatbot “time out” issues. Meanwhile, @derekpwillis and @simonw conversed about devcontainer configuration, with @simonw recommending the addition of llm models to the setup.sh script, which @derekpwillis implemented to solve certain bugs.

  • Aspirations for LargeWorldModel on LLM: There is interest in running the LargeWorldModel on LLM, possibly leveraging GPU instances for PyTorch models, as discussed by @simonw. He referenced the models’ availability on the Hugging Face repository.

  • Groq Inference Plugin Debuts: @angerman. released a Groq inference plugin, llm-groq, with the community showing support and curiosity regarding its performance capabilities.

  • llm-groq Plugin Hits PyPI: Following advice from @0xgrrr, @angerman. published his llm-groq plugin to PyPI, facilitating easier installation via llm install. He shared his publishing experience and drew comparisons between Haskell and Python community practices.


LLM Perf Enthusiasts AI Discord Summary

  • Bold Claims on AI Hallucination Footprint: Richard Socher’s tweet hinting at possible solutions for AI hallucinations sparked discussions around embedding models and validation mechanisms to improve AI’s factual accuracy.

  • Introducing a New Wikipedia: Globe Explorer, a tool leveraging GPT-4 to generate customizable Wikipedia-style pages, has been launched and made viral, with a drive to top Product Hunt’s list with additional Product Hunt details.

  • FireFunction V1 Ignites Excitement: The release of FireFunction V1 by @lqiao promises GPT-4-level outputs with faster and more efficacious function calling, announced along with useful new structured output modes such as JSON, discussed with interest among function calling approaches, as detailed in FireFunction’s blog post.

  • Fine-Tuning Adventures with gpt-4-turbo: The query on embedding techniques for improved data extraction and classification tasks using gpt-4-turbo for 1-shot learning stirred interest in effective fine-tuning practices.

  • Anki’s AI Flashcard Revolution Still Pending: The integration of GPT-4 for producing Anki flashcards revealed successes and limitations, such as verbose outputs and challenges with visual content integration, featured in an analytical Tweet by Niccolò Zanichelli.

  • Peering into Feather’s purpose: Feather OpenAI’s icon, hinting at a writing tool, along with historical snapshots and its significance in hiring for SME data labeling and coding annotation, garnered interest, alongside advancements like the “gpt-4-ada-v2” with features enhancing data analysis capabilities, as discussed in Semafor’s article and Tibor Blaho’s tweet.


DiscoResearch Discord Summary

  • Callbacks Feature in Hugging Face Trainer: Sebastian Bodza discussed using custom callbacks with the Hugging Face trainer, emphasizing that while currently exclusive to PyTorch, they offer “read-only” control through the TrainerControl interface.

  • Benchmarks Emerge in German Emotional Intelligence: EQ-Bench now supports the German language, courtesy of updates from Calytrix, with gpt-4-1106-preview topping the German EQ-Bench preliminary scores, details found at the EQ-Bench GitHub repository. However, concerns were raised about the validity of the translated benchmarks, suggesting emotional understanding nuances might be lost, potentially skewing results due to English-centric reasoning patterns.

  • Misgivings on Probability-Based LLM Evaluations: Bjoernp recommended an arXiv paper revealing the inherent limitations in probability-based evaluation methods for LLMs, specifically regarding multiple-choice questions and their alignment with generation-based predictions.

  • Introducing Layered Sentence Transformers: Johann Hartmann unveiled Matryoshka Embeddings via a Hugging Face blog post, detailing their advantages over regular embeddings, and confirmed their integration into the Sentence Transformers library, enhancing the toolkit for users.

  • Clarity on RAG Approach for German Dataset: Johann Hartmann and Philip May deliberated the evaluation methodology for a German retrieval context understanding dataset, with May clarifying that it’s crucial to assess if an LLM can identify relevant information in multiple retrieved contexts. The dataset is a work-in-progress and currently lacks public accessibility.


AI Engineer Foundation Discord Summary

  • Hackathon Team Formation Heats Up: @reydelplatanos and @hiro.saxophone teamed up for an upcoming hackathon, with @hiro.saxophone bringing experience in ML engineering, particularly in multimodal RAG. Meanwhile, @ryznerf. also showed interest in joining a hackathon team, emphasizing eagerness to participate.

  • Collaboration Across Disciplines: Back end developer @reydelplatanos has partnered with ML engineer @hiro.saxophone for the hackathon, representing a fusion of backend and machine learning skills in their new team.

  • Hackathon Registration Rush: @silverpiranha and @jamthewallfacer discussed registration for an event, with @silverpiranha eventually confirming successful registration and suggesting a potential team-up.

  • Drones managed by Code: @.yosun introduced a hackathon project idea about controlling drones through function calls, referencing a method from the OpenAI Cookbook and shared a code snippet as an illustration.


Alignment Lab AI Discord Summary

  • Gemma-7B Gets Conversation Manners: @imonenext has integrated special tokens <start_of_turn> and <end_of_turn> into the Gemma-7B model to facilitate turn-taking in conversational AI. The model with these enhancements is now available for training and fine-tuning on Hugging Face.

Skunkworks AI Discord Summary

  • Seeding Insights for Stochastic Precision: @stereoplegic highlighted an article on the significance of random seeds in deep learning, particularly for Python’s PyTorch users. The article Random Numbers in Deep Learning; Python & the PyTorch Library was lauded as a “shockingly good read” for those keen to explore or fine-tune the underlying mechanics of randomness in model training.

PART 2: Detailed by-Channel summaries and links

TheBloke ▷ #general (1013 messages🔥🔥🔥):

  • Mistral Large Not Worth the Performance?: @timotheeee1 suggested that Mistral Large, with a similar cost as GPT-4 Turbo, is not justifiable given its slightly inferior performance on benchmarks like MMLU. The cost effectiveness is questioned.
  • Megatokens Make Their Debut: @itsme9316 humorously coined the term “megatoken” during a discussion about token costs, sparking a series of light-hearted responses including “lol” from several users such as @technotech.
  • The Great Model Debate: A lengthy debate ensued regarding whether LLMs can truly “reason.” Users like @kalomaze and @kaltcit exchanged views on language models’ capabilities to perform reasoning or whether what they exhibit can only be termed as quasi-reasoning.
  • Open Source Hopes Dashed for Mistral Large?: Dialogue surrounding Mistral’s commitment to open sourcing its large models showed frustration, with users like _dampf lamenting the change and expressing a lack of surprise at the news.
  • Experiencing Technical Difficulties: Users like @kaltcit reported issues with models like academiccat dpo, experiencing errors and segfaults during measurement, hinting at instability or unpredictability in some AI models.

Links mentioned:


TheBloke ▷ #characters-roleplay-stories (275 messages🔥🔥):

  • Achieving Deception in AI: @superking__ and others discussed the challenges of programming a character to lie convincingly, as larger models like Mixtral perform better with explicit goals such as “survive at any cost”.
  • The Tale of a Shapeshifting Android: Despite @superking__’s efforts to create a character card for an android hiding its identity, the AI blew its cover until tasked with the goal “survive at any cost”, which led to an improvement in secretive behavior.
  • Opus V1 Models and Technical Challenges: Participants such as @dreamgen and @kquant navigated issues around DreamGen Opus V1, tokenizer problems, and optimal model settings for better performance.
  • Model Issues with Verbosity and Looping: Several users, including @superking__ and @dreamgen, discussed instances where the AI would write unnecessarily long sentences or enter looping patterns, with shared experiences and potential fixes.
  • Discussion on Character Roleplay: @keyboardking successfully created a character card that managed a gender disguise narrative, showcasing current AI capabilities in managing nuanced roleplay scenarios.

Links mentioned:


TheBloke ▷ #training-and-fine-tuning (71 messages🔥🔥):

  • Seeking DPO Implementation Advice: @cogbuji is on the hunt for a practical implementation of DPO (Decision Transformer) and considers using the DPOTrainer from the trl Hugging Face library as a reference. Various members, such as @dirtytigerx, engage in the discussion, offering insights and resources.
  • Fine-Tuning Versus Training Dilemmas: @cognitivetech expresses concerns about the efficiency of fine-tuning full LLMs and the potential loss of information. The user considers using gguf for fine-tuning and also explores leveraging the official QA-Lora implementation for instruct fine-tuning.
  • Dealing with DeepSpeed OOM Issues: @plasmator struggles to set up DeepSpeed Zero due to out-of-memory errors, despite calculations indicating sufficient resources.
  • Storytelling LLMs and Comic Book Training Set: @hellblazer.666 inquires about how to train smaller models for storytelling, specifically using comic book texts as a dataset. They also share the Augmentoolkit repository as a potential tool for converting their data into a suitable format for training.
  • Training Methods and Model Selection: In a comprehensive discussion, @dirtytigerx and @hellblazer.666 discuss various training methods for LLMs, including full fine-tuning, PEFT-techniques like LoRA, as well as the usage of retrieval-augmented generation (RAG). They conclude that starting with a base model fine-tuned for storytelling might be the best approach for @hellblazer.666’s project.

Links mentioned:


TheBloke ▷ #model-merging (37 messages🔥):

  • Challenges in Novel Model Merging: User @jsarnecki inquired about merging non-homogenous models like llama-2-13b and Mistral-7b using mergekit, which @maldevide confirmed is not possible. The discussion evolved towards exploring merging techniques that could help @jsarnecki reach their objective.
  • Optimizing for Use-Cases: @maldevide prompted @jsarnecki to consider whether they were experimenting for capability discovery or targeting specific use-cases, further providing insights into successful merges on Hugging Face’s models.
  • Techniques for Merging Homogenous Models: @alphaatlas1 mentioned git-rebasin as a potential option for merging models with identical size/layout and discussed limitations such as the lack of a good technique for merging different base models.
  • Advanced Merging Tactics Discussed: The conversation shifted to various merging strategies, including linear interpolation, additive merging, and stochastic sampling as shared by @maldevide. The complexity of model merging techniques and their applicability to different model types was highlighted.
  • DARE Ties Merging Insights: Diffusion models were noted to have challenges with DARE ties merging, as mentioned by @alphaatlas1, who also referenced a particular Hugging Face blog post. However, @maldevide shared a successful experience, pointing to a different implementation on GitHub.

Links mentioned:


TheBloke ▷ #coding (6 messages):

  • DotPeek Scope Clarification: @al_lansley inquired about the languages that DotPeek supports, and @spottyluck confirmed it’s limited to just C#.
  • Nostalgia for OllyDbg’s Features: @spottyluck lamented the lack of a true successor to OllyDbg, particularly its “animate into” feature, noting its limitations with 64bit which renders it nearly obsolete.
  • Eager Anticipation for AI in Decompilation: @mrjackspade expressed excitement for the potential of AI-assisted decompilation to simplify the reverse-engineering process.
  • Frustration with Reconstructing Code: @mrjackspade shared their frustration over manually reconstructing obfuscated decompiled code, hinting at the tedious nature of the process.
  • Idea for AI Training Data Sets: @mrjackspade suggested an approach to creating training data sets for AI decompilation by using a large volume of open-source projects and their outputs.

Mistral ▷ #general (1198 messages🔥🔥🔥):

  • Mistral Large vs Next Performance: Users like `@yasserrmd` and `@chrunt` compared the capabilities of Mistral Large and Next. Large seemingly outperforms Next in certain benchmarks while Next is favored for its concise responses.
  • Hardware Requirements for AI: Discussions led by `@mrdragonfox` and `@tu4m01l` highlighted the impracticality of running large AI models like Mistral Large on CPUs, suggesting the use of APIs for efficiency.
  • Corporate Partnerships and Open Models: Concerns were voiced by users such as `@reguile` about the future of open models following the Microsoft partnership with Mistral. Some, like `@foxlays`, hope Mistral continues to support open model development.
  • Speculations on GPT-3.5 Turbo Parameters: Debates around the actual size of GPT-3.5 Turbo were stirred by a redacted Microsoft paper, with `@i_am_dom` and `@lyrcaxis` discussing its validity and efficiency.
  • Mistral's Market Positioning and Strategy: `@blacksummer99` shared insights into Mistral's efforts to differentiate from OpenAI and the conceived positioning as a European leader in the AI field.

Links mentioned:


Mistral ▷ #models (209 messages🔥🔥):

  • GPU Essentials for Server Builds: @lukun inquired about which models could run on a server with no GPU, and @tom_lrd and @_._pandora_._ explained that a GPU is necessary for reasonable performance, even with smaller models. For larger models, having a GPU with at least 24 GB VRAM, such as a 3090/4090, is recommended by @mrdragonfox. They also provided a detailed test gist on GitHub to illustrate the performance at different conditions.

  • The Cost of Scaling Up: Users @dekaspace, @mrdragonfox, and others discussed the specs for a server build to run language models. @mrdragonfox suggested 24 GB VRAM as a baseline and noted that models over 70B parameters would require a substantial investment in specialized hardware, mentioning Groq’s custom ASIC deployment as a costly approach.

  • Questions About Mistral’s Direction: Several users, including @redbrain and @blacksummer99, expressed concerns over Mistral’s seemingly new business-oriented direction, with closed-weight models like Mistral Small and Mistral Large, diverging from their previously open model reputation. The community speculated about upcoming releases and potential for open weight models in the future.

  • Benchmarks of Mistral Models: @bofenghuang conducted tests with Mistral’s models on a French version of MT-Bench, publishing results that placed Mistral Large at a notable position behind GPT-4. They shared their findings on Hugging Face Datasets and a browser-based space for further inspection.

  • Hopes for Open Access to New Models: Community sentiment as shared by @saintvaseline, @_._pandora_._, and others reflects a mix of hope for future open-access models and skepticism due to the involvement of large tech firms like Microsoft. Some members, including @tom_lrd, @m._.m._.m, and @charlescearl_45005, anticipated Mistral to eventually offer some lesser-quality open models while speculating on the implications of commercial partnerships.

Links mentioned:


Mistral ▷ #deployment (56 messages🔥🔥):

  • Request for Support Unanswered: @fangh flagged that they sent an email last week and haven’t received a response, seeking an update from @266127174426165249.
  • Running Mixtral on Local Machine Query: @c_ffeestain inquired if they can run Mixtral 8x7B on their local machine with 32GB RAM and 8GB VRAM, and is currently using a version on HuggingFace.
  • GPU Compatibility and Configuration Advice: @_._pandora_._ explained that in theory, Mixtral could be run on @c_ffeestain’s machine but would be extremely slow. They also offered help with finding the number of layers shared with the GPU to improve performance.
  • Exploring Model Quants and Layer Sharing: @c_ffeestain noted after downloading the model that generating one token takes about 5-10 seconds. They are in the process of adjusting how many layers are shared with their GPU, but encounter issues detecting their AMD GPU.
  • Inference and Fine-tuning on a Tesla V100: @dazzling_maypole_30144 experienced an out-of-memory error trying to deploy Mistral-7B-Instruct on a Tesla V100. @mrdragonfox and @casper_ai suggested that the V100 might not have enough memory for this task and recommended either alternatives like T4 or A10 GPUs or running the model in AWQ format for better compatibility.

Links mentioned:


Mistral ▷ #ref-implem (6 messages):

  • Inquiry about Mistral Data Normalization: User @severinodadalt from Barcelona Supercomputing Center inquired if the Mistral data has been normalized and the method of its implementation. The user mentioned the absence of information on the topic and is considering that no normalization has been applied.
  • No Base Model Normalization Details: In response to @severinodadalt’s inquiry about data normalization, @mrdragonfox noted that no base model will provide such information.
  • Performance Variance in Different Precision Levels: @bdambrosio asked if there would be a change in inference speed when running Mistral 8x7B locally in full fp16, compared to the current 8 bit exl2, especially with more VRAM available. The question arises from noticing differences between 6.5 and 8-bit precision levels.
  • Precision Levels Affect Performance: In response, @mrdragonfox confirmed that differences are noticeable, and that performance measurement tools like turboderp generally assess perplexity (ppl), suggesting that the precision level does indeed affect performance.
  • Quantization and Context Accuracy: @mrdragonfox also pointed out that quantization can slightly degrade context accuracy when performing tasks with models like Mistral.

Mistral ▷ #finetuning (185 messages🔥🔥):

  • Fine-Tuning Data Quantities and Expectations: @pteromaple inquired about the amount of data needed for fine-tuning, questioning if 4000 instances would suffice. While @egalitaristen suggested it depends on the specificity of the fine-tuning, highlighting that for narrow tasks, this might be enough, the discussion concluded that trial and error could be the best approach.

  • Data Format Dilemmas for Fine-Tuning: @pteromaple sought advice on the correct data format for fine-tuning ‘Mistral-7B-Instruct-v0.2’ with Unsloth and queried about the impact of data format on training results, revealing their current use of Alpaca format. @_._pandora_._ recommended creating a custom prompt format and warned about potential issues when fine-tuning Mistral 7B Instruct with non-English languages.

  • Mistral’s Mysterious Output After Fine-Tuning: @mr_seeker reported a peculiar issue where a fine-tuned model outputs /******/ and loses coherence when prompted with non-dataset-like data. Suggestions from @mrdragonfox and others pointed towards the model’s routing layer, with an indication that successful fine-tuning may require understanding the intricacies of the model’s architecture beyond just applying techniques such as LoRA.

  • Serverless Fine-Tuning and Model Hosting Discussed: @stefatorus questioned the possibility of Mistral offering serverless fine-tuning functionalities in the cloud and discussed related offerings by companies like Hugging Face and OpenAI. RunPod was also brought up as a potential cost-effective solution, but the viability for those with budget constraints was a concern.

  • LoRA Parameters Puzzle: @tom891 faced challenges in determining the appropriate LoRA parameters for their 200k sample dataset for Mistral 7B fine-tuning. Despite guidance from @mrdragonfox and others emphasizing the necessity of understanding the underlying theory and urging independent exploration over spoon-fed answers, the user continued to seek direct suggestions for effective parameter configurations.

Links mentioned:

Serverless GPUs for AI Inference and Training: no description found


Mistral ▷ #announcements (2 messages):

  • Meet Mistral Large: @sophiamyang announced the launch of Mistral Large, a new optimised model with top-tier reasoning, multilingual capabilities, native function calling, and a 32k parameter size. Boasting 81.2% accuracy on MMLU, it stands as the second-ranked model in the world and is available via la Plateforme and Azure.

  • La Plateforme Premieres le Chat Mistral: @sophiamyang introduced le Chat Mistral, a front-end demonstration showcasing the capabilities of the Mistral models. Discover its potential at Chat Mistral.

Links mentioned:

  • Au Large: Mistral Large is our flagship model, with top-tier reasoning capacities. It is also available on Azure.
  • no title found: no description found

Mistral ▷ #showcase (24 messages🔥):

  • Join @jay9265’s Live Coding Stream: @jay9265 is live streaming on Twitch, inviting everyone interested to join.
  • LLMs for Problem Formulation Assistance: @egalitaristen suggests that LLMs can be utilized to help formulate problems or tasks, reminding @jay9265 that explaining the issue to an LLM is a way to seek assistance.
  • Lower Temperature for Structured Code: For tasks that involve structured code like JSON, @egalitaristen advised @jay9265 to reduce the generation temperature to around 0.3 for less “creativity” and more accuracy.
  • WhatsApp Chrome Plugin by @yasserrmd: @yasserrmd has developed a Chrome plugin that uses Mistral API to generate WhatsApp formatted text, with more details available on LinkedIn.
  • AI Inference Benchmarking Analysis: @yasserrmd shared insights from benchmarking AI inference performance across platforms like Groq using Mistral, OpenAI ChatGPT-4, and Google Gemini, providing a LinkedIn post for more information.

Links mentioned:

Twitch: no description found


Mistral ▷ #random (17 messages🔥):

  • Pricing Woes in the Chatbot Landscape: @sublimatorniq brought up the subject of perplexity, likely referring to pricing or complexity in chatbot services. @mrdragonfox suggested that the race to offer the lowest prices cannot continue indefinitely, with unit economics needing to make sense for businesses.
  • Groq’s Competitive Pricing Promise: @shivakiran_ highlighted Groq’s promise of $0.27/million, which likely refers to the cost of processing a certain number of chatbot interactions.
  • Sustainability of Low Prices Questioned: @mrdragonfox pointed out that sustaining low prices for the sake of competition isn’t a financially sound strategy, as it doesn’t equate to profitability, especially with new players willing to absorb even more costs.
  • Critique of Initial Pricing Strategies in Tech: @egalitaristen expressed concern over companies that start with low initial pricing only to later introduce “real” pricing that can be multiple times higher, warning that it may drive most of the user base to seek alternatives.
  • Pistachio Day Proclaimed on Discord: @privetin shared a celebration of Pistachio Day with a link to nutsforlife.com.au and fun facts about the benefits of pistachios, including their protein content and sleep-inducing melatonin.

Links mentioned:


Mistral ▷ #la-plateforme (66 messages🔥🔥):

  • Privacy and Hosting Clarifications Sought: User @exa634 enquired about whether data passing through the Mistral API is used for model training and about the geographical location of the hosting. It was confirmed by @akshay_1 and @ethux that the data is not used for training and that servers are located in Sweden, as mentioned in Mistral’s privacy policy.

  • Mistral7x8b Freezing Issue: User @m.kas reported a bug where Mistral7x8b freezes when trying to generate content for the year 2024. The user @1015814 suggested checking for an accidental end token, but @m.kas clarified no such token was set.

  • Expectation of Function Calling on Mistral Platform: Users @nioned and @mrdragonfox brought up the topic of function calling on the platform, hinting that third-party providers may offer solutions and expressing optimism that Mistral will implement it in due time.

  • API Key Activation Delays Addressed: User @argumentativealgorithm experienced a delay with their API key activation after adding billing information. @lerela confirmed that a short waiting period is common before the key becomes operational, which resolved the user’s issue.

  • Speech to Speech App Query: User @daveo1711 asked about using Mistral Large for a speech-to-speech application, to which @akshay_1 replied that Mistral only supports text and suggested checking out other models for the desired functionality.

Links mentioned:


Mistral ▷ #le-chat (69 messages🔥🔥):

  • Mistral Chat’s Popularity Issues: Users @lerela, @mr_electro84, and others have noted that Le Chat is experiencing difficulties likely due to high traffic and popularity. @mr_electro84 reported platform outages, including the API console.

  • Confusion Over Mistral Chat’s Pricing: @_._pandora_._ and @wath5 discussed whether Le Chat is free, with some users believing they are using paid credits while others, including @margaret_52502, stated it’s free.

  • Enthusiasm and Suggestions for Mistral’s Potential: User @aircactus500 proposed various enhancements for Mistral, from a mobile app with social networking elements to a search engine and even a 3D virtual assistant. They mentioned the idea of a language level setting for le Chat which sparked interest in the community.

  • Conversation About Mistral-Next: Users @__oo__, @_._pandora_._, and @tom_lrd discussed a feature within Le Chat called Mistral-Next, highlighting its conciseness and simplicity compared to the large model, with hopes for its availability as an openweights model.

  • Developing Concepts for Mistral Chat Applications: User @aircactus500 is conceptualizing features for an app tailored to Le Chat, including the ability to select the AI’s conversational style. They have expressed excitement over having a French AI community platform, feeling it enhances idea generation without needing to translate thoughts.


LM Studio ▷ #💬-general (608 messages🔥🔥🔥):

  • LM Studio White Window Issue: User @steve_evian reported an issue where LM Studio only displays a white window upon launch. @heyitsyorkie suggested clearing .cache and %APPDATA% before reinstalling, which resolved the problem for @steve_evian.

  • LM Studio Multilingual Model Query: User @.deug asked for recommendations on pre-trained multilingual LLMs that include Korean language support. @heyitsyorkie responded that there are few LLMs adept in translating Korean to English consistently and advised using an online translator like DeepL in combination with LM Studio.

  • Presets Reverting in LM Studio: User @wyrath commented about LM Studio’s UX, pointing out that when starting a “New Chat,” selected presets revert to defaults, necessitating manual reselection each time. The discussion provided workarounds and the possibility of this being a bug.

  • LM Studio API and Local Hosting: User @muradb inquired about running the LM Studio API on a server without a graphical environment. @heyitsyorkie clarified that LM Studio doesn’t support headless running and didn’t comment on future plans for this feature.

  • Request for Open Source and Headless LM Studio: User @krypt_lynx regretfully noted LM Studio’s closed source nature, also expressing that community contributions could add missing features such as headless operation. @heyitsyorkie confirmed that LM Studio is indeed closed source.

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (98 messages🔥🔥):

  • Hyperparameter Evaluation Dilemma: @0xtotem inquired whether hyperparameters for a RAG should be evaluated on their own dataset or if a similar dataset would suffice. It remains unresolved as a personal choice depending on the closest available data.
  • Dolphin Model Dilemma: @yahir9023 struggled to create a Dolphin model prompt template in LM Studio, sharing an external file for further explanation since Discord lacks text-sending functionality.
  • Model Memory Challenge: @mistershark_ discussed the difficulty of keeping multiple large language models in VRAM simultaneously and confirmed the availability and capabilities of ooba. @goldensun3ds asked for clarification, and @mistershark_ explained the need for significant hardware, sharing the GitHub link to ooba.
  • Translation Model Inquiry: @goldensun3ds questioned the best model for Japanese to English translation, considering models like Goliath 120B and suggesting a potential Mixtral model. No definitive answer was given, drawing attention to the user’s powerful hardware setup.
  • Mixed-Expert Models: @freethepublicdebt queried if there will be future models with different mixtures of expert precisions (FP16, 8bit, and 4bit), which could promote generalization and GPU efficiency. No response was given regarding the existence or development of such models.

Links mentioned:


LM Studio ▷ #🧠-feedback (8 messages🔥):

  • Fire Emoji for the Fresh Update: User @macfly expressed enthusiasm for the latest update, complimenting its look and feel.

  • Acknowledging a Needed Fix: @yagilb acknowledged an unspecified issue and assured that it will be fixed, apologizing for any inconvenience.

  • High Praise for LM from a Seasoned User: @iandol, who previously used GPT4All, praised LM for its excellent GUI and user-friendly local server setup.

  • Download Dilemma in China: @iandol reported difficulties downloading models due to being in China and inquired about proxy support to facilitate downloads.

  • Seeking Dolphin 2.7 Download Support: @mcg9523 faced challenges downloading Dolphin 2.7 in LM Studio and was advised by @heyitsyorkie to switch to “compatibility guess” and collapse the readme for better visibility.


LM Studio ▷ #🎛-hardware-discussion (178 messages🔥🔥):

  • Quest for CUDA Support in AMD: User @nink1 reminisced about AMD’s growth and speculated that smart folks at AMD might be working on creating their own CUDA support, citing enterprise trends towards cost-effective solutions. The fact that ZLUDA was open-sourced hints at potential internal advancements at AMD.

  • Choosing the Best GPU for LLMs: Amidst discussions on AMD vs. Nvidia GPUs, users like @baraduk, @wolfspyre, and @heyitsyorkie debated Radeon RX 7800 XT’s suitability for running LLM models, with a general preference for Nvidia due to easier setup for AI applications, notably with ROCm on AMD requiring additional effort.

  • To NVLink or Not to NVLink: Participants like @slothi_jan, @dave2266_72415, and @nink1 explored the pros and cons of NVLink for multi-GPU setups. While NVLink could theoretically boost performance compared to using standard PCIe slots, practical considerations like cost and compatibility are significant factors.

  • Mac vs. Custom PC for Running LLMs: User @slothi_jan sparked discussions on whether to purchase a Mac Studio or a custom PC with multiple RTX 3090 GPUs for AI model use. Opinions varied, but factors like speed, cost, ease of use, and future-proofing were key considerations with valuable input from users like @heyitsyorkie, @rugg0064, and @nink1, who noted the surprisingly good performance of Apple’s M3 Max.

  • Troubleshooting PC Shutdowns During LLM Use: @666siegfried666 sought assistance with their PC (featuring a 5800X3D CPU and 7900 XTX GPU) shutting down during use of LM Studio. @heyitsyorkie suggested testing with other compute-intensive tasks to identify whether the issue is with LM Studio or the PC hardware itself.

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (27 messages🔥):

  • Celebrating “IQ” Models Working: @drawless111 enthusiastically confirmed that IQ1, IQ2, and IQ3 models are working in LM Studio, praising Yags and the team. They highlighted IQ1’s impressive specs with 14.5 GB VRAM and 70 Billion model at 11.95 t/s.

  • Searching for “IQ” Formats Revealed: @drawless111 provided a step-by-step guide on finding “IQ” formats on HGF such as “gguf imat” or “gguf imatrix”, and noted to avoid compressions fixed with random text for higher quality.

  • LLM Local File System Access Query: @wolfspyre inquired about local file system access for running models, wondering if a directory like /tmp is accessible, but later @fabguy clarified that LLMs don’t have such capabilities, and LM Studio does not support executing commands from LLMs.

  • No Model Tokenization Speed Stats API Yet: @wolfspyre asked if there’s an API to get model tokenization speed stats to which @yagilb shortly replied with a “Not yet”.

  • Llama 1.6 Update Rolled Out: Users @n8programs and @heyitsyorkie discussed and celebrated the update of llama.cpp to version 1.6 in LM Studio, describing it as EPIC.


LM Studio ▷ #autogen (9 messages🔥):

  • AutoGen Anomaly Squashed: User @thebest6337 initially reported a mysterious error with AutoGen but resolved the issue by “uninstall[ing] and reinstall[ing] every autogen python package”.
  • Good Samaritan Reminder: @heyitsyorkie encouraged @thebest6337 to share the solution to their problem with AutoGen to assist others, leading to the discovery of the fix.
  • When in Doubt, Reboot!: In response to @thebest6337’s fix, @heyitsyorkie humorously posted a Tenor GIF link, implying that the classic “turn it off and on again” method is a universally applicable solution: Tenor GIF.
  • Slow Responding Local Models: User @gb24. queried about the slow response time (approx. five minutes) from a local model, implying it is an unusually long delay as the task was not code intensive.

Links mentioned:

It Problem Phone Call GIF - It Problem Phone Call Have You Tried Turning It Off And On Again - Discover & Share GIFs: Click to view the GIF


LM Studio ▷ #langchain (1 messages):

bigsuh.eth: Hello, can I use LM Studio and use RAG in langchain?


LM Studio ▷ #open-interpreter (7 messages):

  • Connection Issues for nxonxi: User @nxonxi encountered a httpcore.Connect Error: [Errno 111] Connection refused when attempting to run open-interpreter with the --local command after installing LM Studio.
  • Syntax Error Strikes: The same user received an error stating {'error': "'prompt' field is required"}, which turned out to be due to a syntax error in their request payload.
  • Simple Python Request to the Rescue: @nxonxi confirmed that while LM Studio is not working from OpenAI (OI), it is operational via a simple Python request.
  • Endpoint URL Troubleshooting: @1sbefore suggested checking the endpoint URL, mentioning that for TGWUI it is http://0.0.0.0:5000/v1 and advised @nxonxi to possibly remove /completions or /v1/completions from the URL being used in requests as a possible solution.

Perplexity AI ▷ #announcements (1 messages):

  • Perplexity Partners with ElevenLabs: @ok.alex announced the launch of the Discover Daily podcast, a collaboration with ElevenLabs, pioneers in Voice AI technology. Find the podcast on your favorite platforms for a daily dive into tech, science, and culture, with episodes sourced from Perplexity’s Discover feed.
  • Discover Daily Podcast Elevates Your Day: Listening to the latest episodes of Discover Daily is recommended during your daily commute or in that spare moment of curiosity. The episodes are available at podcast.perplexity.ai and are enhanced by ElevenLabs’ voice technology.

Links mentioned:

  • Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
  • Discover Daily by Perplexity: We want to bring the world’s stories to your ears, offering a daily blend of tech, science, and culture. Curated from our Discover feed, each episode is designed to enrich your day with insights and c…

Perplexity AI ▷ #general (348 messages🔥🔥):

  • Perplexity AI Unveils New Models ‘Sonar’: The Perplexity AI Discord community discussed the recent introduction of Sonar models (sonar-small-chat and sonar-medium-chat) and their search-enhanced versions. These models claim improvements in cost-efficiency, speed, and performance. Users speculate, based on test interactions, that Sonar Medium may possess a knowledge cutoff around December 2023 (source).

  • Goodbye Gemini: The community briefly mourned the removal of Gemini from the list of available models on Perplexity, with some users clamoring for its return or the potential introduction of Gemini Ultra.

  • Perplexity AI and Imagery: It was clarified that Perplexity Pro does have the capability to generate images, albeit with some operational issues under scrutiny. Users are directed to online resources and Reddit for assistance (Reddit post).

  • Mobile-specific Responses from AI Models: There was a discussion about whether AI chat models respond differently on mobile devices compared to PCs, with some users noticing concise answers from models like Gemini when accessed through the app (system prompt).

  • Alleged Deals and Discrepancies: In the mix of conversations, various unrelated topics were raised such as a supposed 6-month free trial of Perplexity Pro tied to a card service, which was confirmed to be legitimate by a moderator, and an inquiry about whether Mistral is partnering with Microsoft following a historical collaboration with Google.

Links mentioned:

  • Phind: no description found
  • Au Large: Mistral Large is our flagship model, with top-tier reasoning capacities. It is also available on Azure.
  • API Updates February 2024: Announcing Our Newest ModelWe are excited to announce the launch of our latest Perplexity models: sonar-small-chat and sonar-medium-chat, along with their search-enhanced versions, sonar-small-online …
  • Microsoft partners with Mistral in second AI deal beyond OpenAI: Microsoft makes another AI investment.
  • no title found: no description found
  • PerplexityBot: We strive to improve our service every day. To provide the best search experience, we need to collect data. We use web crawlers to gather information from the internet and index it for our search engi…
  • Perplexity Blog: Explore Perplexity’s blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
  • no title found: Perplexity is the leading real-time AI answer engine. Perplexity Pro supercharges research with unlimited file uploads, guided AI search with Copilot, and dedicated support.
  • no title found: no description found
  • Adiós Google | Hola Perplexity: No te vas a creer lo que hace este buscador gracias a la Inteligencia Artificial. Aún no sabemos que sería de Perplexity de no ser por Jeff Bezos, Nvidia y D…
  • ‎Discover Daily by Perplexity on Apple Podcasts: ‎News · 2024
  • Images & media: Explore Perplexity’s blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
  • Billing and Subscription: Explore Perplexity’s blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
  • Reddit - Dive into anything: no description found
  • no title found: no description found

Perplexity AI ▷ #sharing (23 messages🔥):

  • Exploring Topics on Perplexity AI: Users in the “sharing” channel are sharing various links to Perplexity AI topics ranging from reviews of the Xiaomi 14 series (@icelavaman), discussions about PerplexityAI and ElevenLabs (@icelavaman), to analyses of “why put Mistral” in AI models (@mydpi).
  • Curiosity About Global Events: Some users are looking into timely events and items like the first US moon mission in years (@sanjaymenon), Lenovo’s transparent laptop concept (@vipul7031), and Starshield in Taiwan (@cy_alex).
  • Model Comparisons and Technical Queries: Tech enthusiasts are delving into comparisons such as iPhone models (@ming9993) and questions about tech strategies like the use of eigenlayer nodes (@novice9708).
  • Personal Assistants and Learning with Perplexity AI: Individuals are leveraging Perplexity AI for personal discovery and study, with searches about American athletes (@commuting5048) and making personal collections such as “Make your own” (@_yoojungin).
  • Miscellaneous Interests Spotlight: Interests in the channel are diverse, with users such as @chob_hee seeking mathematical calculations, @mistercare looking into recommended tools (in German), and @veryoriginalname123 expressing a personal statement (“I am a…”).

Perplexity AI ▷ #pplx-api (339 messages🔥🔥):

  • Sonar Models Debut: Perplexity AI introduced new models: sonar-small-chat and sonar-medium-chat, as well as their online counterparts with enhanced search capabilities. Users, including @thedigitalcat and @digital_despot, expressed a preference for the pplx-70b-online model, which appears to offer more coherent answers.
  • Comparing Sonar and pplx-70b: @jaicraft suggested that sonar-medium should outperform pplx-70b but others, including @sergevar and @thedigitalcat, reported receiving incoherent or “gibberish” responses from the sonar models.
  • Prefer pplx-70b Over Sonar Medium: Users like @thedigitalcat requested the pplx-70b-online model not be phased out due to its superior performance. @ok.alex from Perplexity AI acknowledged issues with sonar-medium-online and mentioned that a fix was being worked on.
  • API Usage Improvements Discussed: @ericosk sought a programmatic way to fetch model details, expressing a use case for populating a UI with model choices. Additionally, users like @thedigitalcat and @brknclock1215 discussed the impact of using or omitting system prompts in API calls.
  • Gibberish Output from Sonar Models: @brknclock1215 noted that limiting output length could mitigate the gibberish responses from sonar models, but @thedigitalcat shared that pplx models were unaffected by lengthy output. @thedigitalcat provided a screenshot demonstrating a non-human-readable response from sonar-medium-online.

Links mentioned:


OpenAI ▷ #ai-discussions (183 messages🔥🔥):

  • VPN May Interfere with OpenAI Services: User @strang999 experienced the “Something went wrong” error. @satanhashtag suggested VPN services could interfere, even when not actively used, and recommended disabling web protection in VPN settings.

  • Image-to-Video Generation Tools Sought: @sparkette asked for a browser-based image-to-video generator that doesn’t use a credit system. @lugui proposed snappyvideo.ai, although it didn’t fit the unmetered usage criteria.

  • Anticipation for Sora’s Capabilities: Users @rreitsma and @madame_architect discussed the potential of OpenAI’s Sora for creating informative TV shows or personalized language courses, highlighting its advanced simulation features.

  • Mixed Experiences with Copilot: @pruo and @madame_architect shared experiences with Copilot, an in-app chatbot by Microsoft, indicating that while @pruo found it valuable, @madame_architect felt quality was lacking compared to previous AI iterations.

  • Gemini Users Face Social Pressure: @pruo expressed frustration at being shamed for using Google’s Gemini AI system, seeking to use it without judgment. @tariqali responded by emphasizing the problem with the AI, not the users, and the merits of not relying on a single AI system.

Links mentioned:


OpenAI ▷ #gpt-4-discussions (103 messages🔥🔥):

  • ChatGPT’s Context Limits Stir Frustration: @orbart expressed disappointment with a perceived reduction in ChatGPT’s ability to remember long texts for narrative work, suspecting a “nerf” in capabilities. @blckreaper corroborated the feeling, suggesting a reduction in tokens processed from files, from 15K to approximately 8K.

  • Captcha Conundrum Throws Users In Loops: @little.toadstool and @necrosystv reported undergoing repeated and frustrating 20-stage captcha tests within ChatGPT, disrupting the user experience and prompting questions about the service’s current issues.

  • The Search for Math and PDF Solutions: Users like @candonlyc and @yami1010 discussed the lack of a MathPix ChatGPT Plugin and the challenges associated with OCR capabilities for mathematical content, leading to suggestions of using external resources or APIs for enhancement.

  • Protecting Custom Prompts a Slippery Slope: Users @.dunamis. and @kyleschullerdev_51255 exchanged ideas about safeguarding prompts, with the consensus being that complete protection isn’t feasible and a layered web application approach might offer better security.

  • Curiosity About GPT-4’s Fine-Tuning and Discoverability: @kxlja inquired whether AI models on the Discover page are selected by hand or through other criteria, and @liangdev asked about accessing the GPT-4 model for fine-tuning, probing into the availability of such an option.


OpenAI ▷ #prompt-engineering (209 messages🔥🔥):

  • Assistant vs Custom GPT Nuances: @brunoalec noted inconsistencies when using Custom GPTs from the OpenAI store as GPT Assistants in the API. @rendo1 clarified that Assistants can generate tables in ‘code block’ format and markdown formatting is not supported in the Assistants UI, unlike ChatGPT UI which converts markdown into visual elements.
  • Improving Search Functionality: @kevinnoodles faced issues with ChatGPT searches returning no valid results or denying access. There was no solution proposed in the chat.
  • Text Classification Task Query: @crifat questioned whether to use fine-tuning or Assistant for a text classification problem. @eskcanta suggested first trying with the base model to check the error rate.
  • Prompt Optimization for Code Tasks: @tawsif2781 inquired about the best way to prompt for converting JavaScript to TypeScript in a project. There was no specific guide provided in the chat.
  • ChatGPT Responsiveness Issues: @ianhoughton44 reported ongoing issues with ChatGPT responses being unhelpful or non-compliant for more than a week but didn’t receive any troubleshooting advice in the discussion.

Links mentioned:


OpenAI ▷ #api-discussions (209 messages🔥🔥):

  • In Search for GPT UI Optimization: User @joemama8222 sought advice on improving UI design for HTML code but didn’t mention specifics or share a solution.
  • Prompt Truncation Woes Rampant: @jimmysapp expressed continual issues with prompt truncation and missing responses to custom instructions in ChatGPT, with the problem persisting across both browser and phone app. User @madame_architect recommended clearing cookies and rebooting, while others like @eskcanta speculated on possible confusion within the AI due to content policy.
  • AI Function Inquiry Met with Developer Expertise: User @agi_dude asked about function calling with a specific setup for programming documentation queries; guidance was provided by @eskcanta and @madame_architect, with the latter redirecting to API documentation suggesting Assistant API usage.
  • Debate Over Image Prompt Reproduction Abilities: @bombasticfard inquired about replicating specific images with AI prompts, @bambooshoots suggested a strategy using Wright’s Pixel Painter Pro CustomGPT, and @cqoker shared success using the term “anime 2d model/format” to produce desired image styles.
  • Confusion on AI Abilities Between Custom and Assistant Models: User @brunoalec noted differences between Custom GPTs and Assistant GPTs regarding table formatting, DALL-E usage, and markdown functionality, with @rendo1 explaining that Assistants cannot natively format markdown or generate images directly without specific API configurations.
  • Credit Report Data Handling Addressed: @razorbackx9x inquired about an AI that can sort credit report data into Excel. @eskcanta strongly cautioned against uploading sensitive PII data, reinforced by @s_p_e_c asking for official clarity on privacy policies, and @madame_architect advocating for cleaning data before use.

Links mentioned:


LAION ▷ #general (624 messages🔥🔥🔥):

  • Transformer-based Video Model Discussion: @yoavhacohen mentioned a new project called Snap Video that addresses challenges in video generation by employing a framework that accounts for spatially and temporally redundant pixels and a new transformer-based architecture. He shared the project link and the related research paper.

  • Concerns About Generative Video Models: User @qwerty_qwer expressed skepticism about the meaningfulness of generative video models unless they are released by large organizations, suggesting that research students lack the necessary compute resources for impactful releases.

  • Seeking Open Source Projects for Contribution: @k_ek_w introduced themselves as a data scientist with 1 year of experience looking for open source AI and ML projects to contribute to.

  • Image Captioner Demonstration: @yoavhacohen provided examples comparing captions from their team’s image captioner against LLaVA and Google Captioner for different images, highlighting the differing levels of detail in caption descriptions.

  • LoRA Land Release: User @helium__ announced the release of LoRA Land, a collection of Mistral-7b models fine-tuned on various tasks. They noted the models’ superior performance and cost efficiency, and shared a webinar link for more information.

Links mentioned:


LAION ▷ #research (67 messages🔥🔥):

  • CLIP Filters vs HQ Classifiers Debate: @top_walk_town pointed out the importance of the DFN paper for showing that CLIP filtering is suboptimal compared to using high-quality image-text pair classifiers.

  • BFloat16 Gradient Discussion: @yoavhacohen affirmed the use of autocasting on TPUs with bfloat16, while @top_walk_town and @chad_in_the_house discussed Pytorch’s autocast behavior, where the backward pass defaults to fp32.

  • Model Parameter Discrepancies: @thejonasbrothers noted confusion about Google’s release of the gemma as a 7b model, which is actually a 9b model when counting parameters.

  • Gradient Precision Trade-offs: @chad_in_the_house updated that training with bf16 gradients is faster but yields worse results compared to fp32 gradients.

  • Research Papers and Methods Sharing: Multiple research papers and AI-related methods were shared by users @said2000, @thejonasbrothers, and others, touching on state space architecture, optimization of Transformer models, and the detection of AI-generated text’s “radioactivity”. Additionally, @vrus0188 shared a YouTube video discussing the potential for AI to make LLMs significantly cheaper.

Links mentioned:


LAION ▷ #learning-ml (2 messages):

  • Beware of Potential Scams: User @josephsweeney11 offered to help 10 people make $40k in 72 hours with a 10% commission via Telegram @auto_trade_admin. This kind of message could be a scam and users should exercise caution.
  • Experimenting with Transformer Learning Capabilities: @phryq. is curious if anyone has explored the learning capabilities of transformers through experimental training, such as understanding and applying size relationships between made-up objects to generate images. They provided specific examples to question if the model can deduce that a “krog” should be rendered four times as large as a “mmmmmchakaboooboolight.”

LAION ▷ #paper-discussion (1 messages):

said2000: https://arxiv.org/abs/2402.05892


HuggingFace ▷ #general (182 messages🔥🔥):

<ul>
  <li><strong>AI Hardware Endeavors and Speculations</strong>: Users discussed the potential of developing proprietary TPUs and the availability of particular nanometer manufacturing processes, highlighting how this democratization could grant freedom akin to the car industry. The conversation referenced comparisons to the RAM industry's price practices, indicating skepticism about tech promises from companies like Samsung.</li>
  <li><strong>Ongoing AI Debates</strong>: Community members voiced opinions on the impact of AI and capitalism, with some debating whether open-source efforts could rival giants like Intel or Nvidia. Discussions reflected concerns about the loss of jobs and wealth inequality tied to technology advancements, balanced by the practicalities of AI product development to secure individual financial well-being.</li>
  <li><strong>Inquiries and Assistance on Model Utilization</strong>: Users sought help for a range of topics, including the use of specific models on certain GPUs and integrations, limitations related to model sizes and memory constraints, the management of datasets, and finding resources for projects. The community contributed with suggestions such as using llama.cpp for model parallelization and employing CPU offloading with accelerate for large models.</li>
  <li><strong>Exploring Practical Applications and Collaborations</strong>: From seeking partnerships for neural network projects to finding efficient strategies to work with open-source models, users exchanged ideas and advice. They covered areas like machine learning, object detection, language models, and the use of serverless GPU services for cost-effective research and development.</li>
  <li><strong>Technical Support and Problem-Solving</strong>: The backend issues of Hugging Face services, such as inference-api serverless timeouts, were discussed, with user experiences highlighting fluctuating performance. Community members also addressed problems with data serialization, style customization in components, and concerns about GPU support for different models.</li>
</ul>

Links mentioned:


HuggingFace ▷ #today-im-learning (8 messages🔥):

  • Imitation Learning Inquiry: User @alefram sought advice on starting to learn about imitation learning for robotics. No specific resources or tips were provided in response to this query.
  • Deep RL Course Participation: @meriem_baziz expressed their intent to take a deep RL course and asked for advice. Again, the community did not provide visible feedback or guidance.
  • Random Insights on LinkedIn: @stereoplegic shared a LinkedIn article that provides insight into working with random seeds in PyTorch, which they recommended as an informative read.
  • CLI Packaging Enigma: User @vipitis is learning how to package CLI entry points with pyproject.toml, venturing into the intricacies of Python project packaging.
  • V-JEPA Paper Under the Spotlight: @subham5089 authored and shared a blog post explaining the V-JEPA paper released by Meta, likening the model to BERT for multimodal learning, before being reminded by @cakiki to avoid cross-posting in multiple channels.
  • Gemma Model Local Deployment: @ariondas promoted a LinkedIn post outlining how to access Google’s Gemma model on a local Ubuntu machine.

HuggingFace ▷ #cool-finds (33 messages🔥):

  • Deep Unsupervised Learning Course Spring 2024: User @omrylcn shared a link to the Berkeley CS294-158 SP24 course on Deep Unsupervised Learning, mentioning that it will cover Deep Generative Models and Self-Supervised Learning, similar to a previous offering.

  • The Emergence of Large Action Models: @fernando_cejas highlighted a blog post discussing Large Action Models (LAMs)— an advanced AI system capable of performing human-like tasks within digital environments by mixing language capabilities with task execution.

  • Introducing Galaxy AI with Accessible Models: User @white_d3vil introduced Galaxy AI platform offering free API access to various AI models including GPT-4, GPT-3.5, and their proprietary Gemini-Pro. The platform and models are available for testing in projects as per the site.

  • Exploring VLM Resolution Challenges and Solutions: @osanseviero recommended two blog posts from HuggingFace discussing the challenges of resolution in vision-language models (VLMs) and presenting a new approach to overcome this issue. It features a demo and relevant models available on the HuggingFace hub.

  • Scale AI’s Rise in the Data Labeling Market: User @valeriiakuka shared an article from Turing Post about Scale AI’s journey to becoming one of the highest-valued companies in the data labeling market, marking its 8th anniversary. The article is part of a series discussing AI Infrastructure Unicorns and can be found here.

Links mentioned:


HuggingFace ▷ #i-made-this (24 messages🔥):

  • Bringing Speaker Embeddings to the Browser: User `@davidre95` announced a pull request for adding support for WavLMForXVector to transformers.js, enabling speaker embeddings models to run in browsers. The related PR can be found on [GitHub here](https://github.com/xenova/transformers.js/pull/603), and the compatible onnx models are available on [Hugging Face](https://huggingface.co/D4ve-R/wavlm-base-plus-sv).
  • .NET Library for ONNX Inference: User `@sa_ddam213` introduced a C# .NET library for ONNX model inference without requiring Python, with the code available on [GitHub here](https://github.com/saddam213/OnnxStack).
  • Open-Source AI Project Unveiled: User `@flameface` shared a link to Unburn Toys, an open-source AI project which is a collection of useful tools, whose code repository can be found on [GitHub here](https://github.com/flameface/unburn-toys).
  • Interactive TTS Model Comparison: User `@realmrfakename` showcased a Hugging Face Space named TTS Arena, which allows users to compare TTS models by listening to samples and voting, available on [Hugging Face here](https://huggingface.co/spaces/TTS-AGI/TTS-Arena). Feedback and pointers to an open TTS tracker were offered by `@pendrokar`.
  • Philosophical Q&A Dataset Compiled: User `@nabereon` published a dataset of 133,799 philosophy questions and answers, available on [Hugging Face here](https://huggingface.co/datasets/sayhan/strix-philosophy-qa), and welcomed feedback.
  • Gradio App for Code-Free AI Experimentation: User `@nishantsethi_62323` shared their first Gradio app on Hugging Face Space, designed for experimenting with ideas without writing code, accessible on [Hugging Face here](https://huggingface.co/spaces/nsethi610/ns-gradio-apps).
  • Fine-Tuning LLMs Made Easier: User `@ameerazam` provided resources for finetuning large language models (LLMs) with less than 7 billion parameters, sharing a repository with code on [Hugging Face here](https://huggingface.co/ameerazam08/gemma-jokes).

Links mentioned:


HuggingFace ▷ #reading-group (26 messages🔥):

  • Neural Circuit Diagrams Presentation Announced: @chad_in_the_house notified the group that @1191190979580022875 would present on “Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures”. The meeting would be held at 7 pm EST.
  • Meeting Commenced with Tools and Research Discussion: @chad_in_the_house shared that Mathcha.io is the tool used for creating diagrams while discussing a paper. A blog post on mixtral by @vtabbott_ was also highlighted for future parsing work.
  • Presentation Video Posted on YouTube: @chad_in_the_house posted the presentation video on YouTube with the title Hugging Face Reading Group 14: Neural Circuit Diagrams and promised to update GitHub with additional content.
  • Upcoming PR Presentation Teaser: The next week’s presentation, hinted by @chad_in_the_house, will be by @563068096747798529 on a PR to the peft library, focusing on new merging methods for LoRA, accompanied by visual illustrations from two arXiv papers (2306.01708 and 2311.03099).
  • Scheduling and Attribution for Upcoming Talk: @chad_in_the_house and @prateeky2806 coordinated scheduling for the next talk through when2meet, with @prateeky2806 attributing primary work on the PR to @871797575454425159 and @504681610373758977.

Links mentioned:


HuggingFace ▷ #diffusion-discussions (15 messages🔥):

  • Inquiry about AnimeBackgroundGAN: @mfd000m asked how to use the model akiyamasho/AnimeBackgroundGAN and whether they should clone a repo or use libraries like transformers or diffusion. No specific solution was provided in the subsequent messages.
  • Finetuning Diffusion Models for New Languages: @alielfilali01 queried about the possibility of finetuning a diffusion model on a different language corpus instead of a new image style. @chad_in_the_house responded, sharing a link to the Japanese Stable Diffusion model which uses a two-stage training procedure tailored for the Japanese language.
  • Loss Zigzag in Model Finetuning: @khandelwaal.ankit is trying to finetune Qwen/Qwen1.5-0.5B with a specific dataset but is encountering a zigzag loss graph despite trying various hyperparameters. There were no further clarifications or suggestions concerning this issue.
  • Latent Outputs with the Diffusers Library: @shinyzenith discussed the use of output_type='latent' in the stable_diffusion_pipeline from the diffusers library, assuming it yields sampled latent spaces for given prompts. They shared a technical concern about getting NaN values when calculating KL divergence due to negative weights and pondered normalizing the weights, but were unsure if it would distort their analysis.

Links mentioned:


HuggingFace ▷ #computer-vision (23 messages🔥):

  • Emotions in Focus: rodricota_ mentioned they are building an emotion recognition model and wanted to discuss some issues, while @justinm8449 chimed in stating they’ve already built such a model.
  • BLIP2 for Image Sequences Inquiry: @seanb2792 asked if BLIP2 could process image slices from a 3D model that share context, given their dependence on each other, soliciting thoughts on whether to use a different model for this task.
  • Seeking Robust OCR Models for Complex Characters: @icecoldt369 was looking for OCR models adept at handling foreign languages with complex characters, citing dissatisfaction with results from classic LSTM models. They engaged in dialog with @cursorop, discussing the necessity of finetuning and model limitations with lesser-used languages such as Khmer.
  • OCR Model for Multiple Languages Discussed: @cropinky shared a GitHub link to surya, an OCR and line detection project that supports over 90 languages, which has been gaining attention recently.
  • Computer Vision Model Benchmarks and Project Ideas Exchanged: @coffeevampir3 sought out benchmarks for vision models, to which @cropinky recommended the extensive list on Papers With Code. Moreover, @solution3746 requested ideas for a final year computer vision project and received a suggestion to count people from CCTV footage.

Links mentioned:


HuggingFace ▷ #NLP (109 messages🔥🔥):

  • Fine-Tuning Follies: @jimmyfromanalytics is facing issues fine-tuning Flan T5 for generating positive and negative comments on a niche topic and seeks advice. The model is outputting incoherent sentences after fine-tuning, suggesting difficulty in prompt engineering.
  • BERT vs. LLM for Text Classification: @arkalonman asks for sources comparing fine-tuning a larger LLM like Mistral 7B or Gemma 7B with a standard BERT variant for text classification. @lavi_39761 advises that encoder models are more suited and efficient for classification purposes.
  • Puzzling Finetuning Failures: @frosty04212 reports an issue with fine-tuning an already fine-tuned RoBERTa model for NER, encountering 0 and NaN loss values. The issue seems resolved after reinstalling the environment.
  • DeciLM Training Dilemmas: @kingpoki is trying to train DeciLM 7b with qlora but encounters a performance warning related to embedding dimension not set to a multiple of 8. Users discuss possible reasons for the warning.
  • Whisper Project Queries: @psilovechai is looking for a local project with an interface like Gradio to train and process transcribing audio files using Whisper. They receive suggestions for GitHub repositories that could offer a solution.

Links mentioned:


HuggingFace ▷ #diffusion-discussions (15 messages🔥):

  • Introduction to the diffusion-discussion: @mfd000m is new to the discourse on diffusion models and is seeking advice on how to use the model akiyamasho/AnimeBackgroundGAN, asking whether they should clone a repository or use libraries like transformers or diffusion.
  • LM Studio Confusion: @tmo97 mentions LM Studio briefly, triggering a query from @mfd000m asking what it is, indicating unfamiliarity with the term or tool.
  • Looking for Guidance in Cross-Language Model Finetuning: @alielfilali01 inquires about fine-tuning a diffusion model on different languages rather than image styles, noting a lack of experience with diffusers and an interest in community knowledge on the subject.
  • Challenges in Model Fine-Tuning: @khandelwaal.ankit is experiencing difficulties fine-tuning the Qwen/Qwen1.5-0.5B model with a specific dataset, indicating an inconsistent loss graph despite trying various hyperparameters.
  • Sharing Success Stories with Japanese Stable Diffusion: In response to the language fine-tuning query, @chad_in_the_house shares the Japanese Stable Diffusion model card, explaining the two-stage training procedure as a potential blueprint for similar endeavors.

Links mentioned:


Eleuther ▷ #general (190 messages🔥🔥):

  • Prompt Template Batching Dilemma: User @rwamit is seeking advice on implementing batching with the langchain wrapper to query GPT-4 due to cost concerns. They shared their method of multiplying a prompt template to process multiple records at once but face an issue with processing time increasing drastically (from 2s/it to 60s/it), ballooning from 5 hours to 96 hours for 5-6k records.

  • Gemma Pytorch Code Curiosities: A discussion led by users such as @miaumo and @ad8e revolved around a particular piece of code in Gemma’s PyTorch implementation involving RMSNorm with a curious addition of +1. Speculations were made about initialization and the importance of this detail.

  • EfficientNet Debate: @vapalus argued that while EfficientNet might not be ideal for a range of tasks, it performs well as a backbone in segmentation tasks for structured inputs. This followed a critique of EfficientNet by @fern.bear, who expressed strong dissatisfaction with the model’s marketing and actual performance.

  • Mistral Large Model Released: Announcement shared about the release of Mistral Large, described as a cutting-edge text generation model with strong benchmark results. The announcement highlighted that the model was available through la Plateforme and Azure (Mistral news).

  • DPO Paper Clarification Request: @staticpunch inquired about the initialization process of model_ref as described in the DPO paper, believing that the suggestion was to conduct Supervised Fine-Tuning (SFT) on preferred completions first, followed by DPO. @elad7318 and @alstroemeria313 provided clarification confirming this understanding.

Links mentioned:


Eleuther ▷ #research (84 messages🔥🔥):

  • Seeking Knowledge on Gated Units Like GRU: @mrgonao inquired about good resources explaining why gated units such as GRU are named as they are, suggesting an interest in the etymology or conceptual reasoning behind the “gated” terminology. No responses provided any links or explanations.

  • Paper Inquiry on Digit-Level Tokenization for Mathematics: @stellaathena asked for the title of a paper concerning digit-level tokenization in mathematics, and @random_string_of_character provided a link to the paper titled “Digit-Level Language Models for Digit-level Mathematical Tasks” by Siavash Golkar et al., available at arxiv.org/abs/2310.02989.

  • Searchformer Paper Generates Buzz: @jckwind shared a link to the paper “Searchformer: Learning to Search Better Than A*” arxiv.org/abs/2402.14083 which discusses how a Transformer model trained to simulate the search dynamics of $A^*$ search can solve Sokoban puzzles with higher efficiency than traditional methods.

  • RLHF Pitting Simplicity Against PPO: @0x_paws linked to a paper arxiv.org/abs/2402.14740 that advocates for simpler REINFORCE-style optimization over Proximal Policy Optimization (PPO) in the context of Reinforcement Learning from Human Feedback (RLHF), igniting a discussion on the potential of basic methods in RL for language models.

  • Introducing Watermarking Framework: In response to @hyperion.ai’s query about state-of-the-art text watermarking, @catboy_slim_ and @ai_waifu referred to the watermarking paper “A Watermark for Large Language Models” which suggests embedding signals in generated text arxiv.org/abs/2301.10226, while @dmayhem shared a link to a paper discussing the impossibility of creating robust watermarking schemes under certain assumptions arxiv.org/abs/2311.04378.

Links mentioned:


Eleuther ▷ #interpretability-general (18 messages🔥):

  • Exploring Linguistic Lens Tuning: @butanium shares a hypothesis that training a tuned linguistic lens on Chinese would teach it to translate from English to Chinese, suggesting that if the model originally “thought” in English, this would be the result.
  • Looking into Language Tokens: @butanium predicts that even for English tasks, Chinese tokens would become more present, indicating a possible underlying shift due to lens tuning.
  • Code Conundrum with Language Plots: @mrgonao is trying to adjust code to replace “en” tokens with “zh” tokens in plots to understand the Chinese lens better, but time constraints delay a deep dive into the issue.
  • Dataset Dilemma During Translation Task: @mrgonao notes strange behavior with the generated datasets for translation tasks, with incorrect language associations, and clarifies their own error upon discussion with @butanium. The issue is documented on GitHub.
  • Investigating Multilingual Model Representations: @mrgonao shares a visual analysis of language lenses by considering a neutral language pair (French to German), while @norabelrose suggests that language saliency might correlate with corpus frequency. The analysis is based on the llama-2-7b model with plans to compare with llama-2-13b.

Eleuther ▷ #lm-thunderdome (30 messages🔥):

  • Help Wanted: Investigation into lm_eval Hanging Issues: @flobulous is having trouble with lm_eval hanging indefinitely after running evaluations, specifically when using the vllm model. They shared the command and codebase commit f78e2da45f034a23b1b13cde3235105b0f55d830 for assistance.

  • Inconsistent LLM Evaluations Revealed: @.rand0mm pointed to a study shared by @AlhamFikri, highlighting the inconsistencies between multiple-choice (MCQ) and free-text evaluations of LLMs. The study is detailed in this paper on arXiv.

  • Reproducing Open LLM Leaderboard Results with lm-eval: @hailey_schoelkopf provided detailed instructions on how to replicate Open LLM Leaderboard results using lm-eval. They emphasized using a specific commit and uniform settings as outlined in the Open LLM Leaderboard’s HF space.

  • Demand for Better Code-Level Usage of lm-eval: @ariel2137 inquired about a potential extension and improvements to the “code-level usage” interface of lm-eval. @hailey_schoelkopf expressed openness to enhancing the usage experience and invited feedback and suggestions.

  • The Need for Support in Multilingual Evaluations: Conversations initiated by @.johnnysands about multilingual evaluations led to the suggestion of duping configs for new languages. @.rand0mm mentioned that the MMLU had been translated into French using GPT-3.5 turbo, available on Hugging Face datasets.

Links mentioned:

Tweet from Alham Fikri Aji (@AlhamFikri): Many LLM evaluations use a restrictive multiple-choice (MCQ) format, but in practice, these LLMs are used in a more open-ended, free-text format 🔎Our new study reveals that their probability-based M…


Eleuther ▷ #gpt-neox-dev (6 messages):

  • Python 3.10 Upgrade Hesitation: @catboy_slim_ expressed hesitance in upgrading to Python 3.10 due to concerns over test coverage, implying a lack of urgency for this change.
  • GPT-NeoX Development Curiosity: @catboy_slim_ expressed interest in the reasons behind certain development choices, while @80melon stated a preference for a custom training loop over continuing interest in GPT-NeoX.
  • Dealing with Configuration Errors: @jdranpariya encountered a ValueError while trying to disable deepspeed in the config, indicating potential issues with NeoXArgs validation when adjusting settings.
  • Optimization for Multilingual Tokenization: @rand0mm inquired about the best data sources for extending the Mistral tokenizer to more effectively represent other languages, pointing to efforts to improve multilingual capabilities.

LlamaIndex ▷ #blog (5 messages):

  • Create-llama Launches LlamaPack Integration: The @llama_index announced the newest create-llama release, which facilitates building full-stack web apps with just two lines of code by using LlamaPack. This feature exemplifies the ease of integrating advanced RAG concepts into projects. Tweet about create-llama

  • Counselor Copilot Project Highlighted: A tweet by @llama_index featured the Counselor Copilot project as a socially impactful RAG application, serving as an assistant for crisis counselors. The project is also a reference for using advanced RAG as a co-pilot rather than a naive chatbot. Tweet introducing Counselor Copilot

  • Comprehensive RAG Pain Points Cheat Sheet: A video walkthrough was shared by @llama_index featuring @wenqi_glantz, discussing her “12 RAG Pain Points and Solutions” blog post in depth to address issues at every stage of RAG deployment. The post serves as an essential cheatsheet for those working with RAG. Tweet about RAG walkthrough

  • Improving RAG Retrieval with Sub-Document Summaries: @llama_index shared a technique to enhance RAG retrieval performance by using sub-document summaries to combat the global concept awareness issue in naive chunking. By injecting summaries as metadata, each chunk gets contextual enhancement. Tweet discussing chunking trick

  • LlamaParse Overcomes Table Representation Challenges in PDFs: The @llama_index tweet introduced LlamaParse, a PDF parser adept at handling embedded tables and figures, which is crucial for building high-quality RAG applications. Accurate table representation ensures the LLM receives clear information, leading to correct answers. Tweet about LlamaParse


LlamaIndex ▷ #general (234 messages🔥🔥):

  • Exploring Custom LLMPrompt Templates: @andreipopg is trying to understand how to use a custom prompt with the SubQuestionQueryEngine. The user gets tips like “use the RouterQueryEngine for selecting specific data sources” and is advised that “SubQuestionQueryEngine uses a prompt for generating sub-questions,” which can be customized (GitHub example).

  • Troubleshooting Install Issues: @chbla. is facing problems with llama_index installation, specifically with set_global_handler and Settings. @whitefang_jr suggests a full reinstall with pip uninstall llama-index, which resolves @chbla.’s issue.

  • RAG vs. No-RAG Evaluation: @addo__ is looking to evaluate GPT-3.5 with RAG on a dataset, as compared to using no RAG. @whitefang_jr provides a solution using the FaithfulnessEvaluator from LlamaIndex for the no-RAG option.

  • Local LLM Integration Inquiry: @miteshgarg_61244 seeks to use local offline fine-tuned LLM models with LlamaIndex’s NLSQLTableQueryEngine and SQLTableRetrieverQueryEngine. @whitefang_jr recommends setting the local LLM as a global default in Settings and possibly deploying the model on a local server using FastAPI.

  • LlamaIndex Chat Engine Details: @vett93 wants to know the differences between index.as_query_engine() and index.as_chat_engine() after observing varying results using different LLMs. @whitefang_jr explains that index.as_query_engine() queries data for a response, while index.as_chat_engine() considers conversation history for stateful interactions.

Links mentioned:


LlamaIndex ▷ #ai-discussion (9 messages🔥):

  • Misunderstood Metamorphosis Protagonist: @daguilaraguilar is struggling to generate a book review where “Mr. Samsa” is recognized as the protagonist instead of Grete. Their code example incorrectly identifies the main character in Kafka’s “Metamorphosis”.

  • AI’s Kafka Confusion: @daguilaraguilar shared output from their script which mistakenly outputs “Grete” as the protagonist for the book “Metamorphosis” by Franz Kafka, despite expecting “Mr. Samsa.”

  • Understanding V-JEPA’s Role in Multimodal Learning: @subham5089 wrote a blog about the V-JEPA paper released by Meta, discussing its significance for multimodal learning and drawing comparisons to BERT in text-based LLMs.

  • Introducing SEC Insights for Financial Analysis: @forbes99 introduced SEC Insights, a tool designed for analyzing complex financial documents, with features like cross-document inquiries and paragraph-level citations, aiming to enhance business intelligence.

  • Context Management in Large Window LLMs: @jonas69301 is in search of benchmarks or evaluations on the best practices for providing extensive context to large context window coding LLMs, such as GPT-4 turbo and Gemini 1.5, with concerns about the order, repetition, and structure of the information.

  • Open-Source Text Generation with Llama2 Model: @theexecutor5677 is seeking advice for an open-source text generation application that integrates CSV and PDF inputs with the Llama2 model, and is also interested in combining the approach with RAG (Retrieval-Augmented Generation).

Links mentioned:

no title found: no description found


Latent Space ▷ #ai-general-chat (79 messages🔥🔥):

  • Swyx Debunks WSJ’s Sora Video: @swyxio corrects a claim from a WSJ video on OpenAI’s Sora, stating that Sora can maintain consistency over >1min videos by interpolating from a start image, contrary to WSJ’s assertion of impossibility.
  • NVIDIA Gears up with GEAR: @guardiang shares news of NVIDIA’s new research group, GEAR, co-founded by Dr. Jim Fan, aimed at creating autonomous machines with a general-purpose AI.
  • Perplexity Powers Podcast with AI: @swyxio points out Perplexity’s AI-generated podcast which pulls content from their Discover feed, employing ElevenLabs’ voices for narration.
  • Cloudflare Launches AI Gateway: @henriqueln7 spotlights Cloudflare’s AI Gateway, offering one-line-code insights and controls for AI applications, including analytics, caching, and rate limiting.
  • Detecting the Details in Data Analysis Tool: @swyxio highlights a ChatGPT Data Analysis V2 tool utilizing gpt-4-ada-v2, featuring a data grid overlay editor, targeted replies, and possibly interactive charts.

Links mentioned:


Latent Space ▷ #ai-announcements (9 messages🔥):

  • T5 Paper Discussion Imminent: @ivanleomk announced an LLM Paper Club session led by @bryanblackbee to discuss the T5 paper, starting in 5 minutes with a link to join the discussion here.
  • Wishing for a Replay: @swyxio expressed regret for missing the T5 paper discussion and humorously suggested the need for a recording of the session.
  • AI in Action Event Kickoff: @kbal11 alerted members about the “AI in Action” event featuring @yikesawjeez and focusing on local models, providing a link here for immediate attendance.
  • Compliments for a Smooth Session: @swyxio complimented @kbal11 for nicely running the “AI in Action” session with @yikesawjeez.
  • Community Celebrates a Milestone: @fanahova shared a birthday celebration message, thanking everyone for being part of the community, followed by @rubenartus complementing the celebration cake and hat.

Links mentioned:


Latent Space ▷ #llm-paper-club-west (16 messages🔥):

  • Get Your LLM Paper Club Notes Here: @bryanblackbee shared a Notion link containing notes pertaining to the LLM Paper Club.
  • Invitation to Engage in LLM Discussions: @ivanleomk invited participants to join the discussion by either speaking up during the session or by dropping questions and topics in the chat.
  • Inquiry into Model Vocabulary and Text Constraints: @mattoshimasu raised questions about whether new models are utilizing a smaller vocabulary set, the length of texts, and the number of verbs.
  • Understanding NLP Fine-Tuning for Newcomers: @healthymonkey inquired about the fine-tuning process for NLP tasks, using T5 and sentiment classification as examples.
  • Architectural Differences in NLP Tasks Discussed: @hanzo4958 questioned the effectiveness of encoder-decoder versus decoder-only architecture for traditional NLP tasks.
  • Paper Club Participants Express Gratitude: Multiple participants including @healthymonkey, @hanzo4958, @edwin_75513_08956, @lord_idiot, and @youngphlo thanked the hosts for the detailed session and helpful notes.

Links mentioned:

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It’s the all-in-one workspace for you and your team


Latent Space ▷ #ai-in-action-club (136 messages🔥🔥):

  • Exploring Latent Space in Local Models: @dsquared70 inquired about preferred local models, sparking a conversation about local AI model exploration. @420gunna mentioned the curiosity to experiment with image/video generative models and LoRA locally, to which @markredito advised checking out resources like comfyui and A1111.

  • Diving into Model Fine-Tuning with LoRAs: @kishore.reddy and @markredito discussed deploying and stacking multiple LoRAs to fine-tune generative models on the same GPU, referencing tools such as ComfyUI and platforms like civit.ai which host a community sharing models and merged models.

  • Latent Space Final Frontiers Event Highlighted: @kbal11 shared information about the Latent Space Final Frontiers event, which focuses on teams pushing the boundaries of AI and features research/startup competitions judged by industry experts. Details and event application can be found here.

  • Local Model Interaction Tools Discussed: @markredito, @420gunna, and @swyxio discussed LM Studio and Ollama as tools to pull down language models and interact with them locally. Additionally, @swyxio mentioned gemma.cpp from Google for model wrapping with streamlined user interfaces.

  • Humor Infused in Tech Banter: The conversation took a light-hearted turn with jokes about the juxtaposition of high GPU capacity and low internet bandwidth, as highlighted by @swyxio and @kbal11. This demonstrates the community’s ability to infuse humor into technical discussions.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (52 messages🔥):

  • Gradient Clipping Query and Solution: @c.gato inquired about a potential issue with gradient clipping not working, despite being set to 0.3 in the config, after observing a spike. @nruaif suggested the spike may just be temporary, recommending to check if clipping is properly implemented.

  • DeepSpeed Stage 3 Support Discussion: @mihai4256 shared a GitHub issue raising concerns about HuggingFace’s Trainer supporting DeepSpeed Stage 3, with @noobmaster29 and @nanobitz providing feedback on the usage and recent updates.

  • Axolotl Model Storage and Cleanup: @c.gato sought assistance on where Axolotl stores downloaded models and how to clean up space. @mihai4256 advised checking the TRANSFORMERS_CACHE directory, and shared steps using huggingface-cli delete-cache to clear the cache.

  • Mistral AI’s Strategic Partnership Draws Attention: The news about a “strategic partnership” between Microsoft and Mistral AI, including investments and the release of a new AI model, sparked conversation with users like @yamashi and @casper_ai discussing the implications for open-source model availability and the perceived commercial direction of Mistral AI.

  • Axolotl and OpenAI Mistral Discussions: A mix of technical support, discussing issues and updates on Axolotl, Mistral AI, and token classification training features, was seen, including @mihai4256 asking for clarification on installation of deps for non-Python devs and @kearm mentioning a new support PR.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (14 messages🔥):

  • GPTQ and EXL Requirement Clarified: @nanobitz responded to @curiositix that they need gptq or exl, indicating that the suggested Google’s Gemma C++ inference engine does not meet their requirements.
  • Axolotl’s Auto-Install Goodness: @stoicbatman announced the creation of auto_install.sh to simplify Axolotl setup (Pull Request #1329), and @kearm expressed support for the initiative, urging a review.
  • Seeking Review for Installation Script: @stoicbatman requested a review for the newly introduced auto_install.sh, highlighting its goal to ease the installation process, especially for those not using Docker.
  • Tweeting for Community Support: @casper_ai created a Twitter post to garner attention for the CUDA mode series potentially with help from Jeremy Howard.
  • Document Clarification with Axolotl PR: In query to @caseus_, @yamashi provided a link to clarify which document @208256080092856321 was referring to in a discussion regarding Mistral Lora within the Axolotl project.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (121 messages🔥🔥):

  • GPU-Powered Mystery: @kearm discusses an issue with high loss and extended training times using 4 Nvidia RTX 3090 graphics cards. Despite the powerful setup with a Threadripper Pro, the operation led to an estimated training time of 340 hours.

  • Gotta Troubleshoot ‘Em All: Various members, including @kearm and @nanobitz, delve into technical troubleshooting, trying to identify and solve issues related to high loss during training and checkpoint failures. Configurations, deepspeed versions, and potential fixes are discussed, with @kearm experiencing persistent issues despite downgrading deepspeed.

  • 300 Seconds of Slowness: @dreamgen asks for assistance regarding slow merging of models, specifically mixtral, and unexpected non-utilization of the GPU. The discussion evolves around syncing to main, possible memory issues, and potential docker-related solutions.

  • Docker Dilemma: @kearm attempts to run Axolotl within Docker but faces errors, including GPU connection issues on Ubuntu and a specific error when attempting to run the Docker image. @nanobitz points out the need for the Nvidia container toolkit, and @stoicbatman offers a command template for @kearm to facilitate GPU recognition by Docker.

  • Newbie’s Navigator Needed: @grahama expresses interest in an easy-to-follow, end-to-end tutorial for beginners wanting to use Axolotl to fine-tune models like mixtral 7b. @nanobitz indicates that the project README contains a quickstart section that can guide users from setup to inference.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #rlhf (5 messages):

  • Uncertainty Over Training Progress: @noobmaster29 remarked that the train loss appeared quite low, potentially indicating good performance.
  • Confusion Over Epoch Results: @noobmaster29 expressed confusion as running a full epoch yielded worse results than stopping at 50%, challenging expectations on model training outcomes.
  • Difficulty Assessing Without Evaluation Metrics: @noobmaster29 stated the importance of evaluation, noting it’s hard to judge the model’s performance without it.
  • Acknowledging Assistance: @noobmaster29 thanked @kaltcit for their help, to which @kaltcit responded with “np.”

OpenAccess AI Collective (axolotl) ▷ #community-showcase (3 messages):

  • Fine-tuning Achievements with Phi-2: @finetuningllms presents a 2.78B parameter finetuned model of phi-2 without a model card yet, mentioning it was finetuned using axolotl and promises an upcoming card with an image. The model, boasting high performance, can currently be viewed here.

  • Expand Your Language Model’s Vocabulary: @seungduk announced the release of EEVE-Korean models built with Axolotl, offering optimized Large Language Models (LLMs) with expanded Korean vocabulary. Variants including 10.8B and 2.8B parameter models can be viewed with instructions for use and community engagement on Hugging Face.

  • Korean Language LLM Enhancement Exposed: Published alongside the models, a technical report shared by @seungduk details an efficient method for expanding non-English vocabularies in language models and demonstrates their enhanced capabilities in both Korean and English text understanding. Find their research and findings on arXiv.

  • RAG System Development Simplified: @emrgnt_cmplxty introduced R2R, a semi-opinionated framework designed to streamline the transition from experimental Retriever-Answer Generator (RAG) models to production-ready systems. R2R promises ease of deployment, adaptation, and maintenance for production RAG pipelines, and more details can be found on their GitHub repository.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #runpod-help (1 messages):

  • Trouble in Runpod Town: User @tom891 reported an error occurring on runpod involving a NameResolutionError when trying to access ‘huggingface.co’. The error suggests a temporary DNS resolution failure, positing a potential proxy issue.

CUDA MODE ▷ #general (61 messages🔥🔥):

  • CUDA Criticized by Computing Legend: @itali4no shared an article where Jim Keller criticized NVIDIA’s CUDA architecture, comparing it unfavorably with x86 and suggesting it lacks elegance due to being cobbled together over time. The full Tom’s Hardware article details his viewpoints.

  • Debate Over GPU Choices for AI: @cropinky. suggested that while a 4060 ti may be the cheapest 16GB consumer GPU and has low power draw, it’s generally not enough for LLM tasks compared to options like used 3090’s with 24GB VRAM, as indicated by @andreaskoepf emphasizing VRAM importance. A discussion about buying second-hand GPUs for AI tasks spotlighted the gamble involved and potential remedies if issues arise, including changing thermal pads or paste.

  • Precise Computations in Quantized AI Models Discussed: @andreaskoepf and @zippika had an in-depth discussion about how computations in quantized models (4 bit/8 bit) typically happen at higher resolutions like 16 bit to maintain accuracy, with dequantization before matrix multiplication. @marksaroufim contributed by clarifying the terms used for different quantization strategies, like weight only quantization and the ambiguity in distributed settings.

  • In-Person Attendance at GTC Conference: @vim410 and @andreaskoepf suggested organizing either a watch party for Jensen’s Keynote or an in-person meetup for those attending the upcoming GTC conference. _t_vi_ confirmed attendance along with Mike Ruberry and shared excitement for presenting their work.

  • ZLUDA Project Opensourced: @ju_rstr shared news about ZLUDA, a tool that allows NVIDIA’s CUDA code to run on AMD and Intel GPUs, which has been open-sourced after AMD and Intel withdrew support. The developer behind ZLUDA, Andrzej Janik, hopes his project will challenge NVIDIA’s AI dominance, and more information can be found on ZLUDA’s GitHub page.

Links mentioned:


CUDA MODE ▷ #triton (6 messages):

  • Triton as a Gateway to Jax: @srush1301 discussed the Triton implementation, mentioning it allows for Jax support via Pallas, and expressed a desire for a simpler version for researchers to modify.
  • Triton vs CUDA Multi-GPU Support Inquiry: @taekmin.kim asked if Triton is better than CUDA for multi-GPU or node execution, looking for insights into its distributed computing capabilities.
  • Call for Triton Experts: @andreaskoepf voiced the need for an expert to explain Triton, especially its lower-level workings, its foundation in LLVM and MLIR, and its future potential.
  • Benchmarking Triton’s Quantized Matmul Kernel: @andreaskoepf proposed creating an isolated benchmark setup for Triton’s quantized matmul kernel, sharing during a talk to encourage experimentation and comparison with CUDA.
  • Sharing Benchmark Code: @andreaskoepf suggested including the Python file for the aforementioned benchmark setup in the lectures repository for accessibility.

CUDA MODE ▷ #cuda (22 messages🔥):

  • CUDA vs. Python Rounding Error Issues: @zippika encountered more rounding errors when implementing nn.Linear operations in C++ compared to Python due to certain NVIDIA cub compilation flags. A comparison of the code in C++ versus Python was shared illustrating differences that lead to inaccuracies. Python version deemed more accurate.

  • Code Synchronization in Tensor Quantization: @zippika noted the correspondence between dequantize_torch_fp4 in C++, and dequantize_fp4_codebook_invoke_qtype in Python, which have similar functionalities but different argument ordering.

  • Speed Testing BNB vs. TorchFP4: @zippika performed speed tests on the Mistral-7b-instruct-v0.2 model, indicating TorchFP4 had a higher tokens per second rate than BNB.

  • Readme Improvements for torch-bnb-fp4 Library: @zippika updated the library’s readme, now including an huggingface example script for speed testing.

  • CUDA with OpenGL vs Vulkan: @morousg answered @g.huy’s query about combining CUDA with OpenGL, saying it is possible but NVIDIA focuses more on CUDA with Vulkan. Vulkan is recommended over OpenGL for greater efficiency and capabilities.

Links mentioned:


CUDA MODE ▷ #torch (9 messages🔥):

  • Exploring Efficient Kernel Advertisement: @hdcharles_74684 discussed the complexity of making various CUDA kernels accessible, mentioning the release of int_mm via out_dtype as clunky and noting the lack of support for int4 in PyTorch. They highlighted a method of integrating efficient kernels through torch.compile by detecting certain patterns, referencing their work on a 4-bit Triton kernel.

  • The Limits of torch.compile: @hdcharles_74684 pointed out the limitations of PyTorch’s torch.compile, especially in the context of creating efficient kernels from simple operations. They plan to address gaps in available kernels, with a focus on weight-only int8 quantization for batch sizes greater than one.

  • Speeding Up CUDA Kernel Compilation: @briggers proposed a method for reducing cpp_extension.load_inline compile times, seen in cuda-mode-session-4.ipynb, from over 30 seconds to under 2 seconds by using cpp_extension.load and avoiding unnecessary header files. A GitHub repository was shared to demonstrate the improved approach, splitting code into separate .cpp and .cu files.

  • Request for Precompiled Headers (PCH) Guidance: @jeremyhoward requested help with implementing precompiled headers in C++, mentioning it has been years since his last deep involvement with C++.

  • Potential Inefficiency in Recompiling Extensions: @briggers discussed the limitations of using ninja to compile extensions, where it recompiles both wrapper and CUDA code even when only algorithm tweaks in the .cu file are made. _t_vi_ contributed that avoiding C++ files during compilation might not be a substantial gain and questioned current PyTorch support for that method.

Links mentioned:


CUDA MODE ▷ #announcements (1 messages):

  • Lecture 7 on Quantization: @andreaskoepf announced that CUDA-MODE Lecture 7, titled Quantization CUDA vs Triton is scheduled to begin soon. The lecture is starting at a timestamp converted to <t:1708804800:R>.

CUDA MODE ▷ #algorithms (4 messages):

  • CMU’s Paper on Efficient LLM Serving: @ericauld shared a link to a paper from CMU, titled “Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems”, focusing on the challenges and methodologies in deploying generative large language models (LLMs) efficiently.
  • Print to Understand: @marksaroufim expressed their intention to print the mentioned CMU paper, indicating their interest in its content.
  • Survey Abstract Highlighted: @andreaskoepf provided a direct link to the abstract of the CMU survey paper on arXiv, highlighting the need for efficient LLM serving from a machine learning system (MLSys) perspective.
  • Survey Content Breakdown: @marksaroufim shared key insights after reading the survey, noting standout techniques like non-autoregressive generation, speculative decoding, MoE architectures, local attention variants, and different forms of parallelism, illustrating the paper’s breadth in surveying over 150 referenced works.

Links mentioned:

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems: In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computati…


CUDA MODE ▷ #suggestions (1 messages):

  • Efficiency in AI Learning Unveiled: @mortezism shared a course link from MIT focusing on efficient AI computing techniques including model compression, pruning, quantization, and more. The course offers hands-on experience with large language models like LLaMA 2 and covers cutting-edge topics such as quantum machine learning.

Links mentioned:

MIT 6.5940 Fall 2023 TinyML and Efficient Deep Learning Computing: no description found


CUDA MODE ▷ #jobs (3 messages):

  • Mistral’s Hiring Status Inquiry: onuralp. asked whether Mistral is actively hiring in the Bay Area or if hiring is role-specific similar to Deepmind. No public answers were given in the discussion.
  • Nvidia CUDA/C++ positions open: @dasher519 inquired about job opportunities at Nvidia for CUDA and C++ experts. @vim410 confirmed that they are hiring and directed applicants to DM their CV for JobID: JR1968004.

CUDA MODE ▷ #beginner (11 messages🔥):

  • Conundrum with OpenCV in Google Colab: @dpearson is facing difficulties using 'include <opencv2/opencv.hpp>' in Google Colab with nvcc4jupyter. They are seeking alternatives for testing CUDA code on images within a Jupyter notebook environment.
  • Discovering CUDA through YouTube: @bikash_p recommends a YouTube lecture by Jeremy and a related Colab notebook to execute CUDA code using the PyTorch CPP extension, highlighting the seamless integration with ninja for compilation.
  • ACX Community Cross-Pollination: Both @ringofbetelgeuse and _red.j express surprise, possibly over finding about CUDA MODE and acknowledge joining from the ACX community.
  • Python Enthusiast’s AI Aspiration: @ilovepython3 voices their aspirations to fine-tune AI models, despite self-proclaimed poor math skills, and queries about prerequisites for engaging with CUDA MODE.
  • Guidance for a Budding AI Enthusiast: In response to @ilovepython3’s query regarding where to start, @jeremyhoward suggests tackling the fast.ai course first to build foundational knowledge before diving into CUDA.

Links mentioned:


CUDA MODE ▷ #pmpp-book (3 messages):

  • Confusion Over Grid Diagram in PMPP Book: @bikash_p questioned a discrepancy in the PMPP book, where the code specifies dim3 dimGrid(2,2,1), but the accompanying diagram shows two separate grids. They wondered if the diagram should instead show a single grid with four blocks.
  • Clarification on Kernel Function Calls and Grids: @alexanderrgriffing responded to @bikash_p clarifying that the figure represents multiple kernel function calls, with each call launching its own grid of thread blocks. Hence, two kernel calls result in two separate grids.
  • Appreciation for Community Support: @bikash_p expressed gratitude for the explanation provided by @alexanderrgriffing regarding the schematic representation of grids in CUDA code context from the PMPP book.

CUDA MODE ▷ #youtube-recordings (3 messages):

  • Optimization Education Straight from YouTube: @marksaroufim shared Lecture 6 focusing on Optimizing Optimizers with both a YouTube video and the accompanying slides in a Google Docs presentation.
  • Gratitude Expressed by @filippob82: @filippob82 expressed thanks for the shared educational content on CUDA optimization.
  • Taking Quantization Further: @andreaskoepf provided a link to Lecture 7 titled Advanced Quantization on YouTube (watch here) and thanked @325883680419610631 for recording, cutting, and uploading the lecture, with additional slides available on Dropbox.

Links mentioned:


CUDA MODE ▷ #smol-hw (8 messages🔥):

  • Contemplating Random Numbers: User @marksaroufim posted a range of numbers with no context, sparking curiosity from @nshepperd about the origin of these values.
  • Contributions to Quantization: @drisspg shared progress on quantization techniques with notes on reproduction, and provided a link to their GitHub repository with relevant code.
  • Doubts about Quantile Alignment: @drisspg revealed skepticism about the alignment of quantiles to expectations mentioning having a notebook with related concerns but did not provide a link to it.
  • Exploring Quantization Strategies: @marksaroufim highlighted a PyTorch core team repository focused on quantization and pruning of GPU models and referred to a PyTorch blog post detailing optimizations in generative AI accelerations.

Links mentioned:


CUDA MODE ▷ #ring-attention (45 messages🔥):

  • Tweaking Attention for Speed: @zhuzilin96 implemented a zigzag_ring_flash_attn_varlen_qkvpacked_func, which showed a speed improvement although less than anticipated. They later mentioned hardcoding bf16 was due to personal preference rather than necessity.
  • Flash Attention Finessed: @iron_bound shared an explanation and visual from Hugging Face’s documentation about Flash Attention, highlighting its benefits for memory efficiency and training speed by leveraging SRAM over HBM.
  • Zigzag Ring Speed-Up Measured: @zhuzilin96 posted a benchmark script showing a roughly 20% speed up in zigzag ring attention over classic flash attention, but admitted that their earlier screenshot wasn’t correctly warmed up.
  • Ring to the Max: @andreaskoepf discussed maximizing the benefits of RingAttention for larger batch sizes, noting that it’s crucial to measure when the ring-attn-block computation outweighs memory transfer time. Meanwhile, @jamesmel contributed a minor PR for requirements and @andreaskoepf clarified that the Cuda Mode fork is mainly for backup purposes.
  • In-Depth Optimization Discussions: @w0rlord and @andreaskoepf engaged in discussions about softmax base 2 tricks and flash attention function accuracy with respect to sequence lengths. @andreaskoepf shared a notebook regarding the trick and observed that flash attention gave correct results only for longer sequences.

Links mentioned:


LangChain AI ▷ #general (39 messages🔥):

  • Function Calling Dilemma with Local Models: @saita_ma_ is seeking an easy way to do function calling with local models like OpenHermes and has found resources lacking, despite knowing it’s possible as demonstrated by CrewAI.
  • Langchain Tutorials Hit YouTube: @datasciencebasics shares a YouTube tutorial on creating a Chat UI using ChainLit, LangChain, Ollama & Gemma, which allows viewers to create a ChatGPT-like UI locally.
  • Colab Corner: @kenwu_ is looking for help with agent and function calling using Cohere API and LangChain; shared their Google Colab notebook for collaboration and assistance.
  • Sarcasm Detection in LLMs: @juepachon sparks a conversation on whether tagging phrases with “sarcasm” could help an LLM to understand and detect sarcasm better after fine-tuning.
  • Usescraper Launch and Blog Post: @dctanner announces UseScraper.com, a new tool for crawling website content, and wrote a blog post on how it ties in with LangChain.

Links mentioned:


LangChain AI ▷ #langserve (3 messages):

  • Cancelled Error Confusion: User @cryptossssun encountered an asyncio.exceptions.CancelledError but did not provide further details about the context or the code involved.
  • Query about Extending Timeout Limits: @howtonotgiveafuck is looking for a way to extend the timeout beyond the default 900 seconds. No solutions or further discussion on the topic were provided within the scope of the messages.

LangChain AI ▷ #share-your-work (11 messages🔥):

  • Build Custom Chatbots with Ease: @deadmanabir shared a guide on crafting personalized chatbots that maintain a conversation history. The technology stack includes OpenAI, Qdrant DB, and Langchain JS/TS SDK, with more details available on Twitter.

  • Insights on AI in the Insurance Industry: @solo78 expressed interest in exchanging use cases and implementing AI, particularly in the finance function within the insurance sector.

  • Merlinn AI Empowers Engineers: @david1542 introduced Merlinn, a project that aids on-call engineers in incident investigations and troubleshooting, utilizing Langchain under the hood.

  • Langchain on Rust: @edartru. shared Langchain-rust, a new crate enabling Rust developers to write programs with large language models, with the source code available on GitHub.

  • Novel Resume Optimizer Launch: @eyeamansh developed an open-source resume optimizer using AI, which proved successful in securing calls from tech giants like NVidia and AMD. The tool is designed to reduce cost and effort and can be found on GitHub.

Links mentioned:


LangChain AI ▷ #tutorials (7 messages):

  • Build Your Own Chat UI: A newly shared YouTube video demonstrates how to create a Chat UI using ChainLit, LangChain, Ollama, & Gemma, enabling viewers to set up a ChatGPT-like interface locally on their computer.
  • LLMs Illuminate Quarterly Reports: @rito3281 has crafted a detailed article discussing how Large Language Models (LLMs) can assist in parsing through a company’s quarterly report, predicting future growth, and identifying risks and market opportunities, using LangChain, Qdrant, and Mistral AI.
  • Ollama’s New Embeddings on Colab: @schimazing shares a modification that utilizes Ollama’s new embeddings completely hosted on Google Colab, as highlighted in this Twitter post, with no API keys required.
  • Decoding the AI Process: In response to @rajib2189’s inquiry about the underlying mechanisms of AI, @speuce clarified that the process is perplexity-based rather than relying on stopwords or stemming.
  • LangGraph, Calls, and Scraping Simplified: @tarikkaoutar presents a YouTube video that explains how to combine LangGraph, function calls, and a web scraper to create a multi-agent application, encouraging shares to broaden reach.

Links mentioned:


Datasette - LLM (@SimonW) ▷ #ai (4 messages):

  • LOL: ChatGPT Goes Multilingual with Data: derekpwillis shared an anecdote where using chatgpt-3.5-turbo for data extraction tasks resulted in some document titles being translated into Spanish, such as “Taking Advantage of the Internet” becoming “Sacándole Provecho a Internet”.
  • The Multilingual Bug Strikes Again: simonw compared this behavior to a known issue where ChatGPT, coupled with Whisper voice, sometimes misinterprets British accents as Welsh and responds in Welsh.
  • Quick Fix Suggestion: simonw suggested a workaround by using a system prompt specifying “Always use English” to avoid erroneous language detection.
  • Ready to Implement the Language Patch: derekpwillis acknowledged the bug and expressed the intention to implement the “Always use English” prompt to address the issue.

Datasette - LLM (@SimonW) ▷ #llm (30 messages🔥):

  • Reviving Old School Prompt Crafting: @tariqali reminisced about the pre-RLHF approach of using extensive prompts to guide text generation, finding it reminiscent of providing chatbots with a transcript to resume conversations. He finds this method offers more control, especially useful for instances like incomplete chatbot messages caused by “time out” issues.

  • Simplifying Devcontainer Setups and Workarounds: @derekpwillis mentioned having to tinker with the devcontainer.json file, while @simonw suggested adding llm models to the setup.sh script as a bug workaround. @derekpwillis later confirmed the implementation of the proposed fix.

  • LargeWorldModel Running on LLM: @simonw expressed interest in seeing LargeWorldModel running in LLM and discussed the possibility of using GPU instances to accommodate PyTorch models from their Hugging Face repository.

  • Plugin for Groq Inference by Angerman: @angerman. shared his creation of a Groq inference plugin, llm-groq, contributing another inference provider for experimentation. @0xgrrr cheered on the addition, inquiring about the performance claims.

  • Publishing to PyPI for Easier Plugin Installation: @angerman. learned to publish his llm-groq plugin to PyPI following @0xgrrr’s advice, enabling simpler installation using llm install. @angerman. confirmed successful publishing and expressed his experience comparing Haskell and Python community practices.

Links mentioned:


LLM Perf Enthusiasts AI ▷ #general (6 messages):

  • Examining Hallucination Mitigation: User @res6969 shared a tweet by @RichardSocher discussing a potential solution to the hallucination problem in AI. The tweet alludes to successful reference incorporation, stirring curiosity in the research community.
  • Speculating on Anti-Hallucination Techniques: @res6969 speculated that the approach to curb hallucinations involves a validating mechanism coupled with cutting-edge embedding models. This suggests a growing interest in enhancing AI’s factual accuracy.
  • Introducing Globe Explorer: User @sincethestudy announced the launch of Globe Explorer, a tool that creates a customizable Wikipedia-style page on any topic using GPT-4, heralding a new era in information discovery.
  • Globe Explorer Seeks Product Hunt Supremacy: In an effort to top Product Hunt’s daily list, @sincethestudy urged the community to upvote Globe Explorer. Promises of exclusive access to a “pro” version were offered to supporters.

Links mentioned:


LLM Perf Enthusiasts AI ▷ #finetuning (1 messages):

  • Fine-Tuning with Full Documents or Extracts?: @pantsforbirds is achieving great results with 1-shot data extraction using gpt-4-turbo by embedding entire documents into the prompt. They seek advice on whether to embed full example documents or just relevant sections in their finetuning dataset for a more complicated extraction/classification task.

LLM Perf Enthusiasts AI ▷ #opensource (3 messages):

  • FireFunction V1 Sparks Interest: @sourya4 asked for top choices for function calling with open-weights models. They then shared a link to @lqiao’s announcement about FireFunction V1, poised to deliver GPT-4-level structured output and decision-routing at higher speeds, and also stated open-weights availability and commercial usability with supportive blog post.

  • Structured Output for Better Development: The announcement from @lqiao further introduced JSON mode and grammar mode for all language models, ensuring structured outputs and reducing time spent on system prompts, detailed in a second blog post.

  • Hackathon for Hands-on Experience: @yikesawjeez mentioned current preferred tools for function calling, including gorilla openfunctions and others, but flagged an upcoming hackathon focused on FireFunction as a potential game-changer in determining a new favorite.

Links mentioned:

Tweet from Lin Qiao (@lqiao): 🔥 Structure is all you need. 🔥 We’re excited to announce: - FireFunction V1 - our new, open-weights function calling model: - GPT-4-level structured output and decision-routing at 4x lower lat…


LLM Perf Enthusiasts AI ▷ #offtopic (5 messages):

  • Introducing Globe Explorer: @joshcho_ shared a tweet by @sincethestudy introducing Globe Explorer, likening it to a customizable Wikipedia page for anything and hailing it as a herald of a new age in information discovery. They encouraged people to try it at explorer.globe.engineer.
  • Journey of Viral Spread: @joshcho_ humorously noted that a request for widespread sharing of Globe Explorer was unnecessary, as it had already become viral.
  • Launch of R2R for RAG Systems: @emrgnt_cmplxty announced the launch of R2R, a framework to facilitate the rapid development and deployment of production-ready Retriever-And-Generator (RAG) systems, and provided a link to the GitHub repository. They emphasized the framework’s simplicity and its aim to set a new benchmark for ease of use in production environments.

Links mentioned:


LLM Perf Enthusiasts AI ▷ #collaboration (3 messages):

  • Anki and LLM Collaboration Potential: User @degtrdg shared a tweet discussing the performance of various LLMs, including GPT-4 and GPT-3.5, in generating flashcards for spaced repetition tools like Anki, noting that there is still room for improvement.
  • GPT-4 Generates Verbose but Useful Anki Cards: User @thebaghdaddy found success with GPT-4 in creating Anki cards by first organizing information into a table format covering various aspects, such as mechanisms and side effects for a list of drugs, and then prompting GPT-4 to create cards from the table, resulting in slightly verbose but useful content.
  • Anki and LLMs: The Visual Limitation: @thebaghdaddy noted a limitation when integrating LLMs with Anki: the inability to include images, which are beneficial for study methods like image occlusion.

Links mentioned:

Tweet from Niccolò Zanichelli (in SF in May) (@nc_znc): Interesting analysis evaluating the capabilities of different LLMs (GPT-4, GPT-3.5 and some open ones) w.r.t. generating spaced repetition flashcards conditioned on some explanatory text. Clear improv…


LLM Perf Enthusiasts AI ▷ #openai (5 messages):

  • Feather Flock Together: User @.kiingo linked to Feather OpenAI sparking speculation about its purpose. @justahvee responded, suggesting the service seems related to writing based on its icon.
  • Unearthing Feather’s Past: @dare.ai clarified that Feather has been in use since 2022 and is not new, providing a snapshot link from the The Wayback Machine.
  • Feather’s Role in Training AI Models: In another message, @dare.ai noted Feather’s use for SME data labeling and coding annotation, critical for training models, and cited an article from Semafor regarding OpenAI’s hiring practices.
  • GPT-4 Ada’s Analytical Advancements: User @res6969 shared a tweet from @btibor91 about a new GPT-4 model known as “gpt-4-ada-v2,” which features a data grid overlay editor, options for ‘targeted replies’, and potential interactive charts, defining the updated version as “ChatGPT Data Analysis V2”.

Links mentioned:


DiscoResearch ▷ #general (4 messages):

  • Exploring Custom Callbacks in Training: @sebastian.bodza discussed the potential of using custom callbacks with the Hugging Face trainer, noting that they are currently a feature exclusive to PyTorch and are “read only,” except for the control they have via TrainerControl.
  • LLMs and the Query of English Centricity: @_jp1_ pointed out an insightful paper on the English-centric thought process in open large language models (LLMs). He suggests it has significant implications for multilingual applications and shared his perspective with a link to his tweet.
  • Scrutinizing LLM Probability-Based Evaluations: @bjoernp shared an arXiv paper that discusses the limitations of probability-based evaluation methods for LLMs, especially for multiple-choice questions, addressing a problem also encountered in the DiscoLM series research. The study casts doubts on the effectiveness of such evaluations as they may not align with generation-based prediction.

Links mentioned:


DiscoResearch ▷ #benchmark_dev (10 messages🔥):

  • Emotional intelligence benchmark extends to German: EQ-Bench has received German language support and efficiency improvements from .calytrix, making it faster and less resource-intensive. The update is available on the EQ-Bench GitHub repository.
  • Preliminary scores for German EQ-Bench revealed: .calytrix listed initial scores for models on the German version of EQ-Bench, with gpt-4-1106-preview scoring the highest at 81.91, followed by various models, including gpt-3.5-turbo-0125 and different versions of Mistral and Laser.
  • Concerns about the validity of translated EQ-Bench: _jp1_ expressed skepticism about the effectiveness of the EQ-Bench German translation, suggesting that nuances in emotional understanding might not translate well, potentially leading to similar results across different language benchmarks due to shared English-centric reasoning.
  • Translation seen as non-detrimental to benchmark efficacy: .calytrix asserted that the discriminative power of EQ-Bench is retained despite potential translation issues, backed by parallel scores between English and German benchmarks, which suggest that the test is effective even if not perfect.
  • Debate on the cultural nuances in EQ-Bench translations: _jp1_ posited that a model’s ability to understand German-specific emotional nuances could lead to different results in bilingual benchmarks, a theory .calytrix found compelling but remained skeptical on whether different cultural thinking could significantly influence benchmark rankings.

Links mentioned:

GitHub - EQ-bench/EQ-Bench: A benchmark for emotional intelligence in large language models: A benchmark for emotional intelligence in large language models - EQ-bench/EQ-Bench


DiscoResearch ▷ #embedding_dev (2 messages):

  • Introducing Matryoshka Embeddings: @johannhartmann shared a Hugging Face blog post introducing Matryoshka Embeddings, explaining their utility, how to train them using Sentence Transformers, and showcasing a demo of their capabilities. The blog provides a detailed comparison between Matryoshka embeddings and regular embeddings.
  • Sentence Transformers now feature Matryoshka: Additionally, @johannhartmann mentions that Matryoshka Embeddings are now incorporated into Sentence Transformers, broadening the toolkit for users of this library.

Links mentioned:

🪆 Introduction to Matryoshka Embedding Models: no description found


DiscoResearch ▷ #discolm_german (6 messages):

  • Dataset Accessibility Inquiry: @thomasrenkert inquired about accessing the context_understanding dataset on Hugging Face.
  • Work-In-Progress Dataset Details: @bjoernp responded that the dataset, which is a work-in-progress for a benchmark on retrieval context understanding, is not ready for broad sharing and lacks public documentation.
  • Understanding RAG Evaluation: @johannhartmann questioned the approach of asking which context is used to answer a question in the ger-rag-eval instead of checking for a proper answer.
  • Clarifying RAG Evaluation Methodology: @philipmay explained that in a RAG setting, multiple contexts are retrieved, and it’s important to test whether the LLM can locate the relevant information within them.
  • Acknowledgment of Explanation: @johannhartmann acknowledged the point made by @philipmay regarding the RAG evaluation approach.

AI Engineer Foundation ▷ #general (12 messages🔥):

  • Looking for Hackathon Teammates: @reydelplatanos is seeking teammates for an upcoming hackathon. @hiro.saxophone responded with an offer to team up, mentioning their experience as an ML engineer and previous work on a multimodal RAG.

  • Registration Woes and Team Optimism: Both @silverpiranha and @jamthewallfacer expressed they are awaiting registration confirmation for an event. @silverpiranha then shared excitement about the high participation and eventual successful registration, inviting @jamthewallfacer to team up.

  • Back End Meets ML Engineering for Hackathon: @reydelplatanos, identifying as a backend developer, accepted @hiro.saxophone’s offer to form a team for the hackathon, signifying a new partnership.

  • Looking for Additional Hackathon Members: @ryznerf. joined the conversation late but is eager to participate in the hackathon and is looking to join a team.

  • A High-Flying Coding Idea: @.yosun shared a fun hackathon idea about using function calling for piloting a drone, citing an example from the OpenAI Cookbook. They provided a snippet of code illustrating function definitions for drone operation.

Links mentioned:

Fine tuning for function-calling | OpenAI Cookbook: no description found


Alignment Lab AI ▷ #oo (1 messages):

  • Gemma Introduces Conversation Control Tokens: @imonenext enhanced the Gemma-7B model with special tokens for turn-taking in conversations. The new tokens <start_of_turn> and <end_of_turn> are designed for better instruction/RL fine-tuning and can be accessed on Hugging Face.

Links mentioned:

imone/gemma-7b-with-it-tokens · Hugging Face: no description found


Skunkworks AI ▷ #general (1 messages):

  • Understanding Random Seeds in Deep Learning: @stereoplegic shared an article deemed a “shockingly good read” from LinkedIn, focusing on the use of random numbers in deep learning, specifically in Python using the PyTorch library. They recommended it to those interested in understanding or working with random seeds: Random Numbers in Deep Learning; Python & the PyTorch Library.