There’s been a lot of speculation about the surprisingly good open access (not open source, because no license) model Miqu - scoring 74 on MMLU (vs 75 for mistral-medium) and 84.5 on EQ-bench, a subjectively better version of MMLU. There’ve been a lot of debates both for and against this fact - but the CEO of Mistral has now come out and confirmed it.
So technically we can’t use this model but it’s an interesting leak for sure.
Table of Contents
[TOC]
PART 1: High level Discord summaries
TheBloke Discord Summary
-
Miqu Takes the Lead in Model Discussions: Engineers are buzzing about Miqu, comparing it favorably against Llama-2-70B-chat and Mixtral, with Miqu notably excelling in instruction-following and critiquing tasks. The debate extends to sampling methods where dynatemp and min-p are being dissected for their utility in enhancing model confidence levels and outcomes, sparking a dialogue on what qualifies as “superior” results in AI performance.
-
Developers Dive Into Niche Browser Uses and UI Preferences: In less model-centric conversations, developers exchange views on optimal browsers for both development and personal use, ranging from Internet Explorer to more niche choices like Vivaldi and Docker on Ubuntu systems on Arch. This also spirals into a discussion on Discord’s UI aesthetics, particularly the debate over dark vs. light themes, and the exclusive color choices bound to Discord Nitro.
-
Creative Use of Chat Models Unleashed in Role Play Channels: A creative flight was found in utilizing various chat models for role-playing, with models like BagelMistery Tour v2 and Psyfighter v2 standing out for their role-playing finesse. Technical discussions sprung from the functional fp16 dequat of Miqu, where a notable fp16 conversion of Miqu-1-70b was shared, highlighting both advancements and challenges in model quantization.
-
Training and Fine-Tuning Discussions Illuminate a Path for Novices and Experts Alike: Tips on budget-friendly model training with Unsloth and optimizing Mistral 7B’s training process were the highlight, along with the call for comprehensive tutorials for fine-tuning Hugging Face models hinting at a collective need for accessible, advanced guidance in model optimization and customization.
-
Coding Conversations Reflect on Development Culture and Challenges: The complexity of reading over writing code resonated with developers, alongside critiques of contemporary web development practices including the over-reliance on external libraries and the Not Invented Here syndrome. There’s a growing concern over the risk of fostering developers as mere “framework technicians” rather than problem solvers, pointing to a deeper need for a foundation in computer science principles in the programming community.
Nous Research AI Discord Summary
-
Activation Beacon Paves Way for Unlimited Context: The Activation Beacon method has been discussed as a groundbreaking approach to overcome LLM context length limitations, enabling models trained on 4K contexts to generalize to 400K contexts with linear inference time growth. This method, incorporating “global state” tokens, could radically change how LLMs manage memory consumption. Read the paper and check out the implementation.
-
SQLCoder-70B Excels in Text-to-SQL Generation: Introduced as the new leader in Postgres text-to-SQL conversion, SQLCoder-70B, fine-tuned on AIatMeta’s CodeLlama-70B, has set a new standard for LLMs in SQL generation. This model is now accessible on Hugging Face, offering significant advancements for related tasks. Explore SQLCoder-70B.
-
Miqu Model Shatters Expectations on EQ-Bench: The Miqu model has beaten previous benchmarks, scoring an 83.5 on EQ-Bench, surpassing Mistral Medium and sparking community discussions about its potential origins and capabilities as perhaps the best publicly available model yet. Details on Miqu and its performance can be found on Hugging Face.
-
MoE’s Scaling Laws and Impact on Model Efficiency: Two key papers were highlighted in the discussion on MoE scaling laws, shedding light on the efficiency and performance benefits of Mixture-of-Experts models. The exploration of these models represents a significant interest in enhancing computing resource utility and model performance. Reference: Scaling Laws for Autoregressive Generative Modeling and ST-MoE: Designing Stable and Transferable Sparse Expert Models.
-
Advancements and Needs in Vector Language Models (VLMs): The community has expressed a pressing need for more efficient and accessible inference libraries for VLMs, highlighting ongoing efforts and innovations in batch LORA inference and potential extensions to support VLMs more effectively. This ongoing development aims to improve accessibility and computational efficiency, addressing the current lack of resources in this domain.
LM Studio Discord Summary
-
LM Studio API Connection Woes Resolved: @.ben.com clarified to @.mchinaga that issues in
chat_gpt_bot.py
were due to failing to reach the API server, not a bug, with @dagbs advising on correct API base setting and response key configurations. This common pitfall underlines the necessity of accurate endpoint configuration for successful API interactions. -
Innovative Text to Video Puppeteering Explored: LM Studio users, led by @toddmiller, delved into the feasibility of converting text scripts into video puppets, discussing current limitations and the potential need for models beyond LM Studio’s current capabilities for such sophisticated tasks.
-
GPU Acceleration Optimizations for LM Studio: Detailed advice was shared on GPU acceleration settings for an RTX 4070 by @fabguy, highlighting the importance of
n_gpu_layers
adjustment for enhancing performance without straining the CPU. This insight underpins the critical balance between GPU utilization and overall system efficiency in AI applications. -
Compatibility and Execution Challenges with LM Studio Across Various Hardware: The discussion covered strategies to overcome challenges when running LM Studio on diverse platforms, particularly focusing on Linux library issues, GPU selection, and ARM CPU compatibility concerns. Notably, mobile platforms like Android and iOS pose significant compatibility hurdles, reinforcing the importance of platform-aware development in AI tools.
-
Emerging Trends and Performance in AI Model Discussion: The community reported mixed performances and unique behaviors in models like Tiny Models, CodeLlama 70B, and MIQU, with an amusing incident of Tiny Models producing absurd jokes post-reboot. The discourse extended to practical discussions on LangChain’s potential integration with LM Studio for dataset generation, emphasizing the ongoing innovation and troubleshooting within AI model utilization and development.
These summaries highlight key discussions on technical challenges, model performance, and innovative applications within the LM Studio community, reflecting a vibrant dialogue on AI technology’s frontiers among practitioners.
OpenAI Discord Summary
-
VRAM Choice Conundrum for AI Beginners:
aqua_dawn_67525
is deliberating between 16GB VRAM and 24GB VRAM for starting AI projects, with a backup thought of migrating to cloud computing for more robust requirements. Meanwhile,toror
vouched that 16GB VRAM proves to be quite capable for contemporary optimized models, shedding some light on the practical sufficiency of lower VRAM for starters. -
Limits and Confusions in GPT Plus Unveiled: Users like
@iyons
and@myroslava_35196_08143
encountered unexpected message limit warnings on GPT Plus, igniting confusion given that they hadn’t reached the advertised 40-message threshold. This issue pointed to broader concerns regarding communication and support from OpenAI for its users. -
GPT Mentions Usher in New AI Collaboration Possibilities: Introduction of GPT mentions, allowing GPTs to share context and actions, has stimulated community excitement due to its potential for enhanced AI composability. However, despite the innovative leap, users like
@darthgustav.
and@blckreaper
are still mapping out the practical capabilities and limitations of inter-GPT communication. -
GPT’s Gaming and Creative Challenge: In the realm of word games and creative projects, challenges like effectively managing GPT in grid-based word games were highlighted, with community members suggesting advanced strategies involving 2D arrays and Python for better game management. Additionally, the integration of DALL-E 3 and GPT, facilitating projects that span across text and visual generation, represents a pioneering step in AI-assisted creative endeavors, although currently requiring manual orchestration.
-
Community Tackles API Complexities and Feature Limitations: The community has been actively engaging in troubleshooting and brainstorming around the use of OpenAI’s APIs for complex project workflows, such as chaining different models for multi-step processes. Despite the excitement around features allowing for inter-model communication, discussions revealed that such integrations necessitate manual navigation due to current technological constraints.
Eleuther Discord Summary
-
Ratchet Revolutionizes ML in Browsers: Ratchet, a novel ML framework for browsers, promises optimized speed and developer experience using Rust and WebGPU, as showcased in whisper-turbo.com.
-
EleutherAI’s Research Triumph: EleutherAI celebrates the acceptance of 6 out of 10 papers at ICLR, spotlighting advancements like “LLeMA: An Open Language Model for Mathematics” and marking a significant achievement for authors and contributors.
-
Sparse Fine-Tuning Outshines LoRA: A new method for sparse fine-tuning large language models presents a more parameter- and memory-efficient alternative over (q)LoRA, potentially revolutionizing instruction tuning as evidenced in this study and its implementation.
-
CUDA and CUDNN on PPC64LE Present Challenges: The NVIDIA CUDA container’s only support for
ppc64le
platform, combined with CUDNN installation issues and the struggle in building wheels, underscores the difficulties faced in optimizing AI development environments on specific architectures. -
Tokenizing Strokes for Vector Graphic Synthesis: StrokeNUWA introduces a method for tokenizing strokes to facilitate vector graphic synthesis, indicating a novel approach in multimodal AI research.
Mistral Discord Summary
-
Mistral 7B VRAM Requirements and Workarounds: The community discussed the feasibility of running Mistral 7B models on 6GB VRAM GPUs like the 1660 Ti, with mixed results. While out-of-memory issues were common, a quantized model on Hugging Face was identified as capable of running within the 5.5GB VRAM limitation, providing a solution for users with lower-end hardware.
-
Optimizing Mistral Performance Across Hardware: Users reported varied performance of Mistral 7B on both low-end and high-end systems, including the RTX4090. The discussions emphasized the importance of optimizing GPU utilization and considering resources like Colab notebooks for efficiently loading models without fully consuming VRAM.
-
Fine-Tuning for Specific Output Preferences: In the finetuning channel, there was a request for advice on getting more concise responses from the LLM, aiming for direct answers like “4” to straightforward questions. Enhanced finetuning strategies were suggested, including increasing the number of steps beyond 60 and possibly lowering the learning rate to 2e-5 for improved model performance.
-
Enterprise-Level Web UI for Mistral Debuts: The showcase channel introduced uMdali, an open-source project providing a Web UI for the Mistral API. This tool supports connections to Ollama, OpenAI, and Gemini, aiming to serve as an “Enterprise Chat Front End”.
-
Collaboration and Internships with Mistral: The community was encouraged to contribute to Mistral’s public documentation on GitHub for collaborative improvement. Queries about internship opportunities at Mistral highlighted the competitive nature and high qualifications required to join the Mistral team, suggesting a vibrant and engaged developer community.
HuggingFace Discord Summary
-
Code Llama 70B and Sentence Transformers v2.3.0 Unleashed: The Hugging Face community announced the release of Code Llama 70B, a new AI chat model, and the updated Sentence Transformers v2.3.0, boasting bug fixes and performance enhancements. Check out Code Llama 70B here and the Sentence Transformers release notes here.
-
Multimodal LLM Dataset for the Malaysian Context: The Multimodal Malaysian LLM dataset, aim to advance LLM training with multimodal inputs including translated LLaVA instructions, is now available on HuggingFace as part of the mesolitica collection.
-
Innovative AI Tools Created by Community Members: Community developers have introduced new tools: a Gradio app for transforming Excel/CSV files into database queries accessible on HuggingFace Spaces, and a novel application of AI to Magic: The Gathering for multi-label classification of card colors which you can explore here.
-
CUDA Troubleshooting and Unknown Compiler Option Challenges: A user, leveraging an RTX 2080ti, reported issues while configuring GPU acceleration, encountering an
nvcc fatal: Unknown option 'fPIC'
error, indicating compatibility complications with thenvcc
compiler – detailed in this GitHub issue. -
Discussions on LLMs and Diffusion Models Deepen: The community explored various topics, from the quest for improving the robustness of QA datasets beyond DPO, the capability of lokr and loha for inference with “loading with peft” method, to expressing dissatisfaction with a 70B coder chat model’s limited knowledge of the 🤗 Diffusers library. Additionally, members discussed the challenge of replicating specific art styles using Stable Diffusion, with an attempt to capture an anime style outlined here.
OpenAccess AI Collective (axolotl) Discord Summary
-
Axolotl’s Continuous Advancements Draw Acclaim: The axolotl framework is praised by
@nafnlaus00
for its enhancements leading to lower VRAM usage, faster training, and improved outcomes, sparking discussions about sharing these successes on platforms like Twitter. Additionally,@dreamgen
shared their experiences with hardware and VM enhancements that contributed to speedup in AI projects, emphasizing the crucial role of hardware in development. -
MIQU-1-70b Makes a Leap: The dequantization of MIQU-1-70b from q5 to f16 and its PyTorch integration caught the community’s attention, provided by
@dreamgen
with a link to the model on Hugging Face. This breakthrough is particularly noted for its potential applications and ease of use. -
Technical Troubles and Triumphs in Development and Deployment: From VRAM usage concerns in axolotl’s new implementations to Docker dilemmas resolved for better deployment practices, the community is heavily engaged in troubleshooting and sharing solutions, such as the
pip install -e .
command to fix module errors. In the realm of developing with Axolotl,@stefangliga
’s exploration of LoftQ and its approximation using SVD is a highlight, showcasing the community’s inventive spirit. -
Llamacpp Lights Up Community Showcase:
@mistobaan
shared an innovative project involving function calls using llamacpp, leveraging various community tools and models. This experiment, detailed through a shared Gist and highlighted resources from GitHub and Hugging Face, represents a prime example of collaborative innovation within the community. -
Deployment Concerns Call for Community Insight: In deployment-help, an oblique mention by
yamashi
hints at challenges with parallel requests, indicating ongoing discussions or troubleshooting within deployment contexts. While minimal in detail, it spotlights the areas within AI engineering that require attention.
Perplexity AI Discord Summary
-
Mistral Medium’s API Access Sparks Interest: Community members, led by
@arcinarci
, are eagerly awaiting API access for Mistral Medium, signaling a growing demand for broader API capabilities. -
Perplexity’s Free Trial Potentially Paused: Discussions initiated by
@aarav7024
suggest that the 7-day free trial offer might have been discontinued, leading to confusion among new users. -
Perplexity Enhances App Development Creativity: Users like
@gammagames
found Perplexity effective for generating creative content, such as names and addresses for app development, spotlighting Perplexity Labs as a resource for exploring the AI’s functionalities. -
Exploring Efficient Integration of Perplexity into Web Applications: A detailed guide for seamlessly integrating the Perplexity API into web apps, including handling text inputs for chat interactions and referencing documentation for API token creation, was shared, although it was noted that file uploads are not currently supported.
-
Pioneering Local Model Training and Execution with Ollama: The feasibility of local training and execution of large models without requiring high-end hardware was discussed, pointing out tools like Ollama for local model utilities, alongside community-driven support for addressing API access issues, such as the 401 authentication error troubleshooting.
LLM Perf Enthusiasts AI Discord Summary
-
Triton and CUDA discussions converge on performance tuning: Triton programming language discussions emphasized its limited low-level CUDA feature flexibility and data control at the GPU block level, with Triton’s synchronization capabilities needing enhancement. Participants recommended delving into Triton’s implementation through its original paper for a deeper understanding.
-
Vectorized Memory Access in CUDA discussed: The significance of vectorized memory access for optimizing performance in CUDA programs was underscored, with an NVIDIA blog post cited as a key resource. Additionally, a unique approach to simplifying CUDA programming using Numba was shared through a Twitter post by @HaseoX94.
-
Learning Resources and Offers for GPU Access Provided: Offers to run code on A100/H100 GPUs and a dual-GPU machine were made to the community, aiming to facilitate testing and performance measurements. CUDA beginners were guided to comprehensive resources, including a book recommendation, free GPU access at lightning.ai, and a YouTube channel CUDA MODE for CUDA learning.
-
CUDA Programming Basics and Setup Guidance: Queries about CUDA programming basics and setup, particularly for an RTX 3070 laptop, sparked discussions. Advice ranged from book recommendations and environment setup to using Visual Studio for CUDA integration on Windows, denoting a preference for Torch over TensorFlow when working with Conda.
-
CUDA Timing and Memory Management: Technical exchanges focused on CUDA memory indexing and the use of CUDA events for accurate timing. It was clarified that synchronization is crucial for timing measurements and understanding the behavior of operations like
cudaMemcpy
, enforcing the idea that CUDA API calls are fundamentally asynchronous and necessitate explicit synchronization for performance metrics critique.
LangChain AI Discord Summary
-
LangChain Forks Make a Comeback: After a mysterious issue where forks of the LangChain repository weren’t recognized as forks on GitHub, leading to PRs vanishing, the situation has been resolved. Contributors should manually reopen any still-affected PRs as detailed in the GitHub discussion.
-
Expertise Sought for Custom Tool Parameters: Georg.ort seeks expert consultation on defining essential and optional parameters for a custom tool, offering payment for valuable insights, with communication links provided within the general channel.
-
Innovative AI Tool Announcements Stir Excitement: Pre-launch and launch announcements including Oranscribe on Product Hunt, SkillForge V1’s demonstration of agent skill creation on YouTube, and JACoB’s introduction as a production-ready AI coding assistant at jacb.ai sparked interest and anticipation.
-
LangServe’s Access and Resource Management Challenges: Discussions around LangServe touched on efforts for quicker access for educational purposes and the necessity of an additional layer for hardware resource management to prevent server crashes during high LLM usage rates.
-
Building and Prompting AI with Precision: Detailed explorations and resources shared, such as a guide on using Qdrant’s Multitenancy for a RAG chatbot, a video on empowering AI developers, and insights on prompt engineering for open-source LLMs, reveal an active pursuit of precision and innovation in AI development and application.
LlamaIndex Discord Summary
-
LlamaIndex and Replit Join Forces for RAG Bounties: LlamaIndex has announced a $2,000 bounty in collaboration with Replit for the development of open source templates leveraging advanced Retrieval-Augmented Generation (RAG). Dive into this opportunity here.
-
Insightful Exploration on RAG at LlamaIndex: In a guest post by @CobusGreylingZA, LlamaIndex’s recent foray into handling complex queries through RAG is featured, detailing the integration of multi-agent coordination and chain-of-thought reasoning with re-ranking from Cohere. Gain valuable insights here.
-
LlamaIndex Discourse: From Fine-Tuning to Query Enhancements: Discussions in the LlamaIndex community explore various technical aspects from fine-tuning embeddings using text and metadata, transforming CSVs to JSON for better data handling, to integrating pre-trained models from Hugging Face for embedding fine-tuning within LlamaIndex’s infrastructure.
-
Integration Challenges and Platform Connectivity: Community members delve into practical issues like embedding an improved model into LlamaIndex’s
SubQuestionQueryEngine
, utilizing AWS Sagemaker for deploying AI applications, and the intricacies of llama packs with existing databases and file formats in conversation building scenarios. -
Tracking AI’s Imagination with the Hallucination Leaderboard: andysingal shared a resource for those particularly interested in measuring and combating hallucinations in AI outputs, pointing to the hallucination leaderboard available here.
LAION Discord Summary
-
AI Crafts Warhammer 40k’s Grim Dark Future: A fan-created Imperium Of Man - Warhammer 40k trailer, utilizing AI generative tools, has been commended for its remarkable visual effects, notably the fire and explosions segment at 0:54, viewable here. The discussion around AI video generative tools suggests that despite some uncanny results, these tools offer great temporal consistency and potential for creative industries.
-
AI’s Creative Limitations Unveiled: A Terminus model’s intriguing output shared in the discussion illustrates the exceptional yet occasionally flawed AI-generated content, underscoring the limitations inherent in current training datasets. The visual example can be seen here.
-
Querying the Capabilities of AI in Image Generation: An inquiry surfaced regarding the comparative efficiency and advancements between DALL-E 2 - PyTorch and Stable Diffusion in the realm of AI image generation, pinpointing the community’s growing interest in understanding the nuances of these powerful tools.
-
MoE-LLaVA Framework Steps Up LVLMs: The introduction of MoE-tuning and MoE-LLaVA framework, detailed in a paper, offers a novel approach to enhancing Large Vision-Language Models’ efficiency by invoking only top-k experts during deployment, promising high parameter models with maintained computational cost. The framework is further explored in Hugging Face’s implementation, accessible here.
-
Advances in Multilingual AI and Ethical AI Codes: CodeLlama 70b Instruct illuminates the delicate balance between ethical predispositions and efficiency in code generation, available on Hugging Face, while MAGBIG, a new multilingual text-to-image benchmark shared here, aims at broadening the linguistic applicability of AI models, spotlighting the AI community’s strides towards inclusivity and responsible AI development.
DiscoResearch Discord Summary
-
Unlocking German Language Capabilities with DiscoLM German 7b and RAG Datasets: The release of DiscoLM German 7b and GermanRAG datasets marks a significant step in enhancing German language model performance, introducing comprehensive datasets for RAG finetuning and broadening applications in native language processing.
-
Prometheus Meets Mistral for Enhanced English Models: Inquiry on progress with the Prometheus Mistral model specifically for English applications, hints at the ongoing efforts in developing cutting-edge language models.
-
Code Llama 70B and Llama Factory Insights: Meta’s launch of Code Llama 70B garners attention, alongside discussions revolving around Llama Factory’s recommended practices for parameter tuning, underscoring the continuous evolution in code generation AI technology.
-
Boosting Retrieval Performance with Multilingual BGE-M3 and ColBERT: Innovations like BGE_M3 and ColBERT demonstrate advances in embedding techniques, offering multilingual support and improved search through nuanced retrieval. Practical advice was shared for BGE-large users to include prompts in queries for enhanced retrieval outcomes.
-
Diverse Strategies for German-language AI Development Emerge: Discussions encompass a spectrum from data augmentation techniques like Web Rephrase Augmented Pre-training (WRAP) and explorations of German Orca DPO datasets, to novel dataset initiatives that leverage GPT-4 for enriched training material, signaling a vibrant ecosystem of German language AI research and development.
Latent Space Discord Summary
-
VFX Studios Eye AI Integration: A tweet by @venturetwins reveals major VFX studios, including one owned by Netflix, are now seeking professionals skilled in stable diffusion technologies. This new direction in hiring underscores the increasing importance of generative imaging and machine learning in revolutionizing storytelling, as evidenced by a job listing from Eyeline Studios.
-
New Paradigms in AI Job Requirements Emerge: The rapid evolution of AI technologies such as Stable Diffusion and Midjourney is humorously noted to potentially become standard demands in future job postings, reflecting a shift in employment standards within the tech landscape.
-
Efficiency Breakthroughs in LLM Training: Insights from a new paper by Quentin Anthony propose a significant shift towards hardware-utilization optimization during transformer model training. This approach, focusing on viewing models through GPU kernel call sequences, aims to address prevalent inefficiencies in the training process.
-
Codeium’s Leap to Series B Funding: Celebrating Codeium’s progress to Series B, a complimentary tweet remarks on the team’s achievement. This milestone highlights the growing optimism and projections around the company’s future.
-
Hardware-Aware Design Boosts LLM Speed: A new discovery highlighted by a tweet from @BlancheMinerva and detailed further in their paper on arXiv:2401.14489, outlines a hardware-aware design tweak yielding a 20% throughput improvement for 2.7B parameter LLMs, previously overlooked by many due to adherence to GPT-3’s architecture.
-
Treasure Trove of AI and NLP Knowledge Unveiled: For those keen on deepening their understanding of AI models and their historic and conceptual underpinnings, a curated list shared by @ivanleomk brings together landmark resources, offering a comprehensive starting point for exploration in AI and NLP.
Alignment Lab AI Discord Summary
- Lilac Garden Launches, Revolutionizes Dataset Transforms: Lilac Garden, a new cloud service for accelerated dataset transforms, has been announced by
@nikhil_thorat
, featuring LLM-powered clustering as its first service. The announcement along with details can be found on Twitter. - Explore Precomputed OpenOrca Clusters: The OpenOrca dataset, along with embeddings and clusters precomputed, is now available on Lilac Garden, providing an advanced toolkit for dataset analysis. Users can explore the dataset via this direct link.
- Founding Engineer Wanted at WashU Startup:
DoubleMint
is seeking a founding engineer for a new venture in collaboration with Washington University in St. Louis, emphasizing proficiency in Next.js, TailwindCSS, and Supabase. The project has secured a $50,000 Letter of Intent and is poised for rapid scaling.
LLM Perf Enthusiasts AI Discord Summary
- Gratitude Expressed in LLM Perf Enthusiasts: User
@an1lam
simply expressed their gratitude with a “Thanks!” in the discussion. - Seeking Insights on Gemini Pro:
@res6969
asked for insights or results from anyone who has experimented with Gemini Pro in a production setting, aiming to understand its performance and applicability.
AI Engineer Foundation Discord Summary
- Join the Open Source Movement at AI Engineer Foundation: @hackgoofer has made a call to
@everyone
to submit and recommend open source projects for the AI Engineer Foundation, emphasizing the importance of community involvement. Here’s the Guide to Submit Projects for anyone interested in contributing.
PART 2: Detailed by-Channel summaries and links
TheBloke ▷ #general (1209 messages🔥🔥🔥):
- Miqu Model Discussion Intensifies: Users have been comparing Miqu with other models like Llama-2-70B-chat and Mixtral, finding Miqu exceptionally good at taking criticism and following instructions. Miqu outperforms even frankenmerge 120b models according to some.
- Explorations in AI Sampling and Samplers: There’s ongoing discussion about different sampling methods, with a specific focus on the practicality of dynatemp and min-p in improving results. The dialogue circles around the challenge of defining what constitutes “better” results and the potential overconfidence of models in their estimations.
- Navigating Browser Choices for Development: Beyond AI, there’s chatter about browser choices for development and personal use. Internet Explorer, Brave, Vivaldi, and Docker for Ubuntu on Arch systems were mentioned, alongside mentions of arcane tech like NCSA Mosaic and VRML browser plugins.
- Discord UI and Themes: Conversation touched on Discord’s UI, specifically the preferences for dark vs. light theme, and the limited color choices available without Discord Nitro. The discussion briefly highlighted the impact of color choices on visual perception and neuro-linguistics.
- General AI Enthusiasm and Critique: Users expressed concerns about AI models refusing to engage in certain topics or responding with disclaimers. There was also a lighter conversation on AI models’ ability to replicate catgirl behavior, showcasing the wide range of interests within the AI community, from technical to playful.
Links mentioned:
- Stitch Sad Sad Stitch GIF - Stitch sad Sad stitch - Discover & Share GIFs: Click to view the GIF
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters: The “pretrain-then-finetune” paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to…
- codemateai/CodeMate-v0.1 · Hugging Face: no description found
- Mosaic (web browser) - Wikipedia: no description found
- Simple JSON Parser — Lark documentation: no description found
- TOGETHER: no description found
- This Character AI Alternative With No Filters Just Released a Free 70B Model - Miku GG: This Character AI Alternative with no filters - Miku GG, has released a new update. And the best part is that you get to access their 70B Model for absolutel…
- vikhyatk/moondream1 · Hugging Face: no description found
- Eggwuh It Come With Eggroll GIF - Eggwuh It come with eggroll Eggroll - Discover & Share GIFs: Click to view the GIF
- Reddit - Dive into anything: no description found
- Parsers — Lark documentation: no description found
- Reddit - Dive into anything: no description found
- Grammar Reference — Lark documentation: no description found
- Tiny Elvis 1.5 (TELV150) : Matthew T. Smith : Free Download, Borrow, and Streaming : Internet Archive: Tiny E sits at the bottom of your Windows desktop, pops to his feet to comment on your huge icons, cursors, etc. Fun! Plays waveform audio (WAV) voice…
- CUDA: Faster Mixtral prompt processing by JohannesGaessler · Pull Request #4538 · ggerganov/llama.cpp: On master Mixtral prompt processing is currently always done with a batch size of 1 due to MoE. This PR makes it so that instead for batch sizes > 1 the src1 columns are made contiguous so that the…
- Reddit - Dive into anything: no description found
TheBloke ▷ #characters-roleplay-stories (285 messages🔥🔥):
-
Exploration and Utilization of Chat Models: The discussions have revolved around finding and applying various chat models for roleplaying and general tasks.
@funtimedaddyohyea
touched on exploring BagelMIsteryTour-v2-8x7B- GGUF among others, while@frammie
and@_dampf
prefer BagelMistery Tour v2 and Psyfighter v2, respectively, for roleplaying purposes. -
Conversations on Quantization and Model Performance: The debate on the functional fp16 dequat of Miqu sparked technical discussions.
@doctorshotgun
shared an fp16 conversion of Miqu-1-70b, highlighting a significantly lower perplexity compared to previous conversions, yet facing challenges withexl2
quanting. -
Miqu vs. Other Models: Users like
@mrdragonfox
and@goldkoron
discussed Miqu’s performance in roleplay (RP) and its superiority in character comprehension over Mixtral and Yi-34b, citing anecdotal evidence of its effectiveness. -
Exploration of AI for Creative Content:
@c.gato
mentioned an experimental approach of generating RP answers from GPT-4 using Yahoo Answers data, aiming to diversify the chatbot responses. This highlights the continuous search within the community for more dynamic and human-like interactions. -
Technical Challenges and Community Experimentation: Various users, including
@doctorshotgun
and@dreamgen
, discussed the technical aspects and challenges of working with models like Miqu, from quantization issues to the exploration of model efficiencies and potential improvements through fine-tuning and testing against benchmarks.
Links mentioned:
- 152334H/miqu-1-70b-sf · Hugging Face: no description found
- ycros/BagelMIsteryTour-v2-8x7B-GGUF · Hugging Face: no description found
- PotatoOff/Michel-13B · Hugging Face: no description found
- llama.cpp/examples/speculative at master · ggerganov/llama.cpp: Port of Facebook’s LLaMA model in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
TheBloke ▷ #training-and-fine-tuning (17 messages🔥):
- Training on A Budget with Unsloth:
@superking__
mentioned that it’s possible to train models with just 16GB of memory using Unsloth, and pointed out that Colab offers free resources to run examples. This could be a valuable tip for developers with limited resources. - Optimization Tips for Mistral 7B Training: In a discussion on training the Mistral 7B,
@bishwa3819
shared struggles with train loss not decreasing despite specific LoRA configurations.@dirtytigerx
responded, suggesting the need to try to overfit the model first as a troubleshooting step and mentioned that the provided graph showing training for only 80 steps, might be insufficient data. - High Hardware Demands for Yarn-Scaled Models: The conversation between
@blackl1ght
and@sao10k
highlighted the high cost as the primary reason behind the lack of instruct tunes for large-scale models, like yarn-scaled 128k models. This points out the scalability challenges in machine learning projects. - Seeking Tutorial for Fine-tuning Hugging Face Models:
@chovii
requested recommendations for comprehensive guides or tutorials for fine-tuning Hugging Face models, particularly expressing difficulty in using Trainer for the TheBloke/TinyLlama-1.1B-Chat-v1.0-AWQ model. This underscores the need for accessible information for newcomers to model fine-tuning. - Direct Guidance Requests: Both
@givan_002
in search of high-quality datasets for role-play and@222gate
seeking advice on quantizing multimodal models underscore the demand for specialized guidance and resources in the ML community. These inquiries show the varied and specific nature of challenges faced by practitioners.
TheBloke ▷ #model-merging (4 messages):
- Community Shout-Out:
@kquant
expressed gratitude towards@284810978552578050
for providing feedback that helped improve their work. No specific details about the work were provided. - Hope for Engagement:
@kquant
shared a hopeful message about the interest level of their content, without specifying the topic. - Sharing a Research Document:
@kquant
posted a Google Docs link regarding k-NN in Mixture of Experts, though details about the content are not described within the message. - Code Snippet Cut Off:
@kquant
mentioned an issue with a portion of code being accidentally cut off in their documentation, indicating that readers might face difficulties without this part if they attempt to implement the discussed methods.
Links mentioned:
k-NN In Mixture of Experts: no description found
TheBloke ▷ #coding (19 messages🔥):
- Reading vs. Writing Code:
@zachmayer
briefly mentioned a common sentiment among developers: Reading code is harder than writing code, sparking a nod among coders everywhere. - Comparative Analysis of AI Coders:
@technomancer73
tested an unnamed AI code generator and found it more comprehensive in its answers than Bard, engaging in manual comparisons of generated code based on past prompts. - The Web Development Paradox:
@wbsch
and@dirtytigerx
discussed the contradiction in web development culture, citing both Not Invented Here (NIH) syndrome and overreliance on external libraries as prevalent issues. They touched on the historical lack of a standard library in JavaScript, highlighting the incident withleft-pad
as a symptom of deeper experience and knowledge gaps among developers. - Programmers as Framework Technicians:
@dirtytigerx
lamented the emergence of a generation of programmers trained primarily to use frameworks and APIs, without the foundational skills needed to tackle novel, complex challenges. This comment sparked a conversation about the importance of understanding basic computer science principles and how the lack of specialization could be both a job security factor and a concern for those in leadership roles. - Struggles of Spec Writing and Project Management:
@wbsch
and@dirtytigerx
shared their frustrations with team/project management, particularly the challenge of communicating basic computer science concepts to team members and the art of translating common sense into technical specifications. This conversation touched on both the practical and existential dilemmas faced by those responsible for leading development teams.
Nous Research AI ▷ #ctx-length-research (1 messages):
dreamgen: “global state” tokens sound similar to attention sinks, did not read the paper though
Nous Research AI ▷ #off-topic (26 messages🔥):
- Exploring Large Language Models:
@tempus_fugit05
sought resources on understanding Large Language Models (LLMs), and@teknium
recommended watching a YouTube tutorial by Karpathy titled “[1hr Talk] Intro to Large Language Models”. This is aimed at providing a general-audience introduction to LLMs like ChatGPT. Watch here. - Upcoming Paper on Chatbot Roleplay Personality Evaluation:
@lorenzoroxyolo
announced they are in the process of publishing a paper focused on chatbot roleplay personality evaluation and is seeking means of advertising it. They mentioned benchmarking several models, though not exhaustive, and encouraged following @lrzneedresearch on Twitter for updates. - Sharing Fun with GIFs:
@Error.PDF
shared a couple of humorous GIFs, including one of a cat (Cat Nyash GIF) and another featuring a turtle and a dog (Turtle Dog GIF), adding a light-hearted touch to the conversation. - Teasing and Jokes Among Members: After
@Error.PDF
posted a sleeping emoji,@teknium
humorously suggested that no one cared, triggering a playful exchange that included laughter and affirmative emojis from@Error.PDF
.
Links mentioned:
- Cat Nyash GIF - Cat Nyash Meow - Discover & Share GIFs: Click to view the GIF
- miqudev/miqu-1-70b · Add miku.mp4 to the readme: no description found
- [1hr Talk] Intro to Large Language Models: This is a 1 hour general-audience introduction to Large Language Models: the core technical component behind systems like ChatGPT, Claude, and Bard. What the…
- Turtle Dog GIF - Turtle Dog - Discover & Share GIFs: Click to view the GIF
Nous Research AI ▷ #benchmarks-log (14 messages🔥):
- Discussing the Impact of Assembly Knowledge:
@euclaise
mentioned that having knowledge in assembly might not be greatly beneficial for typical code benchmarks and isn’t aware of benchmarks that assess assembly skills specifically. - Debate on Chain of Thought (CoT) Methodology:
@euclaise
suggested using the Chain of Thought (CoT) approach for fairness in evaluation, while@teknium
responded it would not be a fair comparison as he has not used CoT before and would need to reevaluate all models. - BBH Benchmark and CoT:
@euclaise
highlighted that the BBH benchmark is specifically designed to test the CoT methodology, advising to always utilize CoT for assessments. - Future Model Evaluation Strategies:
@euclaise
recommended adopting CoT for all future model assessments to ensure consistent and fair evaluations. - Stablelm Zephyr’s Performance on BBH with CoT:
@euclaise
noted with surprise that Stablelm Zephyr scored 0.9% on the BBH benchmark using CoT, indicating an unusually low performance.
Nous Research AI ▷ #interesting-links (15 messages🔥):
-
Unlimited Context Length Breakthrough with Activation Beacon:
@nonameusr
highlighted a significant advancement with the introduction of the Activation Beacon method for extending LLMs context, potentially solving the context length limitation by generalizing a model trained on 4K context length to 400K. The method proposes the addition of “global state” tokens to maintain fixed memory consumption and ensure linear inference time growth. Read more about the research paper and explore the implementation code. -
SQLCoder-70B Revealed:
@if_a
shared news about SQLCoder-70B, a new model outperforming all publicly accessible LLMs for Postgres text-to-SQL generation, made available on Hugging Face. This model, fine-tuned on AIatMeta’s CodeLlama-70B, showcases the potential for significant advancements in SQL generation tasks. Access SQLCoder-70B on Hugging Face. -
Temporary Halt on Memphis-CoT Due to Bug:
@euclaise
warned users about a discovered bug in the Memphis-CoT training code and advised against making quants, merges, or any alterations until a retraining process is completed. The model, initially aimed at improving reasoning-focused outcomes, is based on human data and undergoing an iterative corrective finetuning procedure. -
Nous Research Unveils New Open Source Model Evaluation System:
@manojbh
shared Nous Research’s announcement of a novel system for evaluating open-source models through a subnet on Bittensor, combating the limitations of traditional benchmarking reliant on public datasets. This system aims to offer a dynamic, fair, and continuously evolving evaluation platform. Explore the Nous Subnet Leaderboard. -
Call for Caution in LLMs’ Use for Cryptography:
@deki04
relayed a cautionary note from @moyix on the limitations of LLMs in handling cryptography tasks, suggesting that despite the advancements in AI and machine learning, certain domains like cryptography remain challenging for these models.
Links mentioned:
- Tweet from Brendan Dolan-Gavitt (@moyix): LLMs: not very good at cryptography
- Tweet from Nous Research (@NousResearch): Today we are announcing our latest project, an effort to provide a new evaluation system for open source models. Traditional benchmarking leans heavily on public datasets which can be easy to game and…
- Tweet from Yam Peleg (@Yampeleg): If this is true it is over: Unlimited context length is here. Activation Beacon, New method for extending LLMs context. TL;DR: Add “global state” tokens before the prompt and predict auto-r…
- euclaise/Memphis-CoT-3B · Hugging Face: no description found
- Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback: The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment al…
- Tweet from Rishabh Srivastava (@rishdotblog): We just opened sourced SQLCoder-70B! It outperforms all publicly accessible LLMs for Postgres text-to-SQL generation by a very wide margin. SQLCoder is finetuned on @AIatMeta’s CodeLlama-70B mode…
- OpenRouter: Language models ranked and analyzed by usage across apps
Nous Research AI ▷ #general (502 messages🔥🔥🔥):
- Debating AI and Human Language Efficiency:
@nonameusr
sparked a discussion about the inefficiency of AIs using human language for both comprehension and cross-model communication, suggesting that adapting AI to a more efficient form of communication might be better. - Miqu Shines on the Benchmarks:
@n8programs
highlights that Miqu stands out by achieving an 83.5 on EQ-Bench, which he claims surpasses Mistral Medium, pointing out that Miqu could be the best openly accessible model available, despite skepticism from the community. - AI Community Buzzes About Miqu’s Performance and Origin: The AI Discord community is abuzz with talks about Miqu’s performance on benchmarks like MMLU and EQ-Bench, with some questioning whether it’s a Mistral Medium leak and discussing its potential as a top-performing open-source model.
- Quantization and Compression Strategies Explored: Discussions about quantization strategies such as 2-bit Qlora and the challenges of maintaining performance metrics like GSM8K score during finetuning, alongside the technicalities of dequantization to improve AI models’ efficiency and accessibility, took center stage.
- Subnet Discussions and GPU Leasing Queries: There were inquiries about subnet 6 functionality and whether it’s possible to lease GPUs through the Akash network for model serving in conjunction with running inference via subnet, indicating a keen community interest in optimizing resource use for AI development.
Links mentioned:
- 152334H/miqu-1-70b-sf · Hugging Face: no description found
- SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models: Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, existing methods cannot maintain accura…
- miqudev/miqu-1-70b · Hugging Face: no description found
- Growing Living Rat Neurons To Play… DOOM?: Head to https://squarespace.com/thethoughtemporium to save 10% off your first purchase of a website or domain using code: thethoughtemporium_________________…
- The REAL cost of LLM (And How to reduce 78%+ of Cost): I want to give you step by step guide on how to reduce LLM cost by 70%, and unpack why it is costing so much nowFree HubSpot AI For Marketers Course: https:/…
- Tweet from Yam Peleg (@Yampeleg): Apparently the underground adapters for running server-grade A100 are no longer under the ground.. People on /r/LocalLLaMA are starting to step up their home GPU setup game.. hard.. 😆
- Druski GIF - Druski - Discover & Share GIFs: Click to view the GIF
- Darktide Adeptus Mechanicus GIF - Darktide Adeptus mechanicus Mechanicus - Discover & Share GIFs: Click to view the GIF
- GitHub - NVIDIA/TensorRT-LLM: TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.: TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
- jondurbin/bagel-2.8b-v0.2 · Hugging Face: no description found
- 1. Installation): PygmalionAI’s large-scale inference engine. Contribute to PygmalionAI/aphrodite-engine development by creating an account on GitHub.
- 2. Usage): PygmalionAI’s large-scale inference engine. Contribute to PygmalionAI/aphrodite-engine development by creating an account on GitHub.
- UserBenchmark: Nvidia RTX 2080-Ti vs 3090: no description found
Nous Research AI ▷ #ask-about-llms (58 messages🔥🔥):
-
Exploring the Scaling Laws of MoE:
@vikas.p
responded to@joey00072
’s query about papers on MoE scaling laws by sharing two significant papers (Scaling Laws for Autoregressive Generative Modeling and another paper’s PDF) that explore the efficiency and performance of Mixture-of-Experts (MoE) models. -
Discussing Nous Models’ Large Context Windows:
@rememberlenny
inquired about the trade-offs of Nous models, such as the NousResearch_YarnMistral128k, with 200k context windows. The conversation highlighted concerns regarding the capacity to scale position embeddings and the potential for position truncation due to Λ-shaped context windows. -
Scaling Position Embeddings in Large Context Models: Both
@teknium
and@bloc97
contributed to the discussion on scaling position embeddings, with bloc97 explaining the advantages and disadvantages of not truncating position embeddings in models like YaRN, which allows for attention across the entire context window. -
Desire for Better Inference Libraries for VLMs:
@gabriel_syme
and@carsonpoole
discussed the need for more accessible and efficient inference libraries for Vector Language Models (VLMs), highlighting the lack of such resources currently and the efforts being made by CarsonPoole and Max_paperclips towards this goal. -
Technical Insights into Batch Inference and Framework Capabilities:
@carsonpoole
detailed some of the underlying work on inference libraries, particularly focusing on features like batch LORA inference, converting dense models to LORAs, and the potential for incorporating machine learning model (MLM) extensions, indicating ongoing development to support VLMs more effectively.
Links mentioned:
- repeng/notebooks/emotion.ipynb at main · vgel/repeng: A library for making RepE control vectors. Contribute to vgel/repeng development by creating an account on GitHub.
- Unified Scaling Laws for Routed Language Models: The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditio…
- ST-MoE: Designing Stable and Transferable Sparse Expert Models: Scale has opened new frontiers in natural language processing — but at a high cost. In response, Mixture-of-Experts (MoE) and Switch Transformers have been proposed as an energy efficient path to eve…
LM Studio ▷ #💬-general (184 messages🔥🔥):
-
Troubleshooting LM Studio’s Code and Connection Issues: @.mchinaga sought assistance with an issue in
chat_gpt_bot.py
, suspecting a bug, but @.ben.com clarified the problem was due to not reaching the API server. Users discussed various errors, including invalid response keys and endpoint issues, with @dagbs advising on correct API base setting and response key adjustments. -
Exploring Text Script to Video Puppet Solutions: @toddmiller inquired about a model or app equivalent for converting text scripts to video puppets, drawing parallels with other OpenAI and related technology equivalences. The conversation evolved into discussing the limitations of existing models for this purpose, with @dagbs suggesting LM Studio might not be suitable for such advanced video manipulation.
-
GPU Acceleration Tips for Optimal Performance: @rahulg1981 questioned the best GPU acceleration settings for a RTX 4070 to improve performance without overloading the CPU. @fabguy provided detailed recommendations for adjusting
n_gpu_layers
based on the model size, offering a practical approach to optimizing GPU usage. -
LM Studio’s Model Compatibility and Execution Challenges: Various users shared their struggles and solutions around running LM Studio on different hardware setups and operating systems. Topics ranged from dealing with Linux library issues, selecting the correct GPU for acceleration, and challenges related to ARM CPU compatibility on Mac and non-supported platforms like Android and iOS.
-
Discussions on LM Studio Features and Future Improvements: Members discussed the functionality and future development directions of LM Studio, including running multiple models, sorting chat histories, and integrating local models. @yagilb provided insights into version updates, bug fixes, and workarounds for using local models within LM Studio’s framework.
Links mentioned:
- 👾 LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs
- Gordon Ramsay Chef GIF - Gordon Ramsay Chef Its Raw - Discover & Share GIFs: Click to view the GIF
- Amd Ryzen GIF - AMD Ryzen Radeon - Discover & Share GIFs: Click to view the GIF
- The unofficial LMStudio FAQ!: Welcome to the unofficial LMStudio FAQ. Here you will find answers to the most commonly asked questions that we get on the LMStudio Discord. (This FAQ is community managed). LMStudio is a free closed…
- GitHub - brett-baudin-consulting/uMdali: Enterprise Chat Front End: Enterprise Chat Front End. Contribute to brett-baudin-consulting/uMdali development by creating an account on GitHub.
LM Studio ▷ #🤖-models-discussion-chat (61 messages🔥🔥):
-
Tiny Models Go Absurd After Reboot:
@pudlo
reported that highly quantized tiny models, after a reboot, started producing hilariously absurd jokes, making them unintentionally funny. -
Suggestion for a “Top Models of the Month” Channel:
@666siegfried666
proposed creating a channel to highlight top models of the month, sparking a lively discussion on how to make model recommendations more accessible and organized. Suggestions included voting systems and admin-only posting to ensure readability. -
Challenges with CodeLlama 70B Model: Multiple users, including
@unskilless
and@dave000000
, reported problems with the CodeLlama 70B model, noting it was “terribly broken” for specific tasks. However,@heyitsyorkie
suggested using the “Codellama Instruct” preset for better results. -
MIQU Model Generates Mixed Reviews: Conversations about the MIQU model highlighted varying experiences, with
@n8programs
praising it for domain-specific knowledge and stating it sits between Mistral medium and GPT-4 in terms of capability. However,@ptable
found its performance on their setup not significantly better than Mixtral and highlighted speed issues. -
Requests for Help with Functionary Model:
@vbwyrde
sought assistance for using the Functionary model from Hugging Face for CrewAI, emphasizing the model’s ability to execute functions intelligently but expressing uncertainty about the correct presets or prompt formats for optimal usage.
Links mentioned:
- meetkai/functionary-small-v2.2-GGUF · Hugging Face: no description found
- Slimeline Tmt GIF - Slimeline Tmt Slime - Discover & Share GIFs: Click to view the GIF
- Ouch Slow Mo GIF - Ouch Slow Mo Slow Motion - Discover & Share GIFs: Click to view the GIF
LM Studio ▷ #🧠-feedback (5 messages):
- Model Compatibility Confusion Cleared:
@rasydev
faced an error when loading a model in LM Studio.@heyitsyorkie
clarified that LMStudio is only compatible with GGUF models, pointing out that thesaftensors
file downloaded by rasydev is not compatible. - CodeLlama Models Misunderstanding: In a follow-up query,
@rasydev
asked if thecodellama/CodeLlama-70b-hf
model supported LM Studio.@heyitsyorkie
responded that CodeLlama models, being RAW PyTorch models, do not work with LMStudio by default, and recommended searching for GGUF quants by TheBloke to ensure compatibility.
LM Studio ▷ #🎛-hardware-discussion (99 messages🔥🔥):
-
Retro Tech in Modern Train Systems:
@hexacube
shared that German railway automation operates on MSDOS and Windows 3.1, sparking discussions on the simplicity and efficiency of older systems for certain applications. This segued into a wider conversation about the appreciation of coding efficiency in the demoscene community and speculation on government IT investments. -
Performance Insights on Minimal Tech: Discussions revealed scenarios where minimal hardware (e.g., 125MHz, 8MB RAM) efficiently runs specific applications like observatories, with
@hexacube
mentioning successful operation on an old i5 mini PC before upgrading for more intensive processing. This highlights ongoing relevance and effectiveness of seemingly outdated hardware in certain niches. -
AI and Gaming Evolving Together: Several users, including
@cihiris
and@goldensun3ds
, speculated on the future intersection of AI and gaming, with possibilities ranging from AI NPCs to whole games generated on-the-fly by AI. There was enthusiasm about AI’s potential to revolutionize game development and player interaction. -
Hardware and AI Development Constraints: The conversation touched on various considerations for running large language models (LLMs) and AI-related tasks, including the importance of GPU over RAM for speed, the potential for multi-GPU setups for different tasks within gaming, and the intriguing concept of hardware specifically designed for AI acceleration.
-
Navigating the Best Hardware Setup for AI Applications: Users, including
@pudlo
and@heyitsyorkie
, debated the merits of investing in high powered GPUs versus ample RAM for running AI models, with consensus leaning towards the significant performance gains provided by advanced GPUs. Links to resources like a LocalLLaMA LLM GPU Buying Guide were shared, offering insights on hardware selections tailored for AI development purposes.
Links mentioned:
- Cats Matrix GIF - Cats Matrix Neo - Discover & Share GIFs: Click to view the GIF
- Hacking Computer Screen GIF - Hacking Computer Screen Green Screen - Discover & Share GIFs: Click to view the GIF
- Reddit - Dive into anything: no description found
- Liquid AI raises $37.6M to build ‘liquid’ neural networks - SiliconANGLE: Liquid AI raises $37.6M to build ‘liquid’ neural networks - SiliconANGLE
LM Studio ▷ #🧪-beta-releases-chat (18 messages🔥):
-
Troubleshooting Exit Codes:
@fabguy
suggests to those experiencing Exit Code 1 errors to check the FAQ in the pinned messages for solutions, often related to C++ Redist issues. -
Anticipation for New Release:
@oldandnew.
shares the community’s anticipation and eagerness to see if a new beta release will be announced soon. -
Queries on Download Resume Feature:
@greg0403
inquires about the possibility of adding a download resume feature, with@dagbs
and@senecalouck
pointing him towards specific channels for further information. -
Reporting and Diagnosing Model Errors:
@epicureus
reports a model error with exit codes and detailed system information, leading to a dialogue with@yagilb
who appreciates the error report and commits to fixing the issue after attempting to load the problematic model themselves. -
Quick Fix for Model Loading Issue: After discussing with
@yagilb
,@epicureus
identifies that Openhermes 2.5 works but experiences issues with other models, hinting at potential ram detection problems.@yagilb
further seeks information on which specific models are failing and states that a fix is incoming upon identifying the issue with the Dr Samantha 7b model shared by@epicureus
.
Links mentioned:
TheBloke/Dr_Samantha-7B-GGUF · Hugging Face: no description found
LM Studio ▷ #autogen (1 messages):
- Wizard Coder 15b struggles with terminating outputs:
@strangematter
shared their experience using Wizard Coder 15b for Python code generation, finding that although it produced coherent code, it had difficulty with terminating outputs despite satisfactory results. They inquired if anyone has had better success with another model for code generation tasks.
LM Studio ▷ #langchain (2 messages):
- LangChain Integration with LM Studio:
@circulustreme
inquired about the possibility of integrating LangChain with a local LM Studio running Mixtral to generate a dataset of 100 responses.@yagilb
affirmed it’s possible and referred to a previous message for instructions on the connection process.
OpenAI ▷ #ai-discussions (3 messages):
- Choosing VRAM for AI tasks:
aqua_dawn_67525
is contemplating whether to get 16GB VRAM or 24GB VRAM for AI projects as a beginner, wondering if 16GB will be sufficient for a few years. - Moving to Cloud for More Power:
aqua_dawn_67525
considers the possibility of moving onto the cloud for more computing power if personal hosting proves to be expensive. - Personal Experience with 16GB VRAM:
toror
shares that having 16GB VRAM on a 4080 GPU is ample for many of the modern, optimized models, providing a data point foraqua_dawn_67525
’s consideration.
OpenAI ▷ #gpt-4-discussions (170 messages🔥🔥):
-
GPT Plus Users Encounter Message Limit Confusions: Several users, including
@iyons
and@myroslava_35196_08143
, expressed confusion and frustration over hitting a message limit warning on GPT Plus despite not reaching the supposed 40 messages per 3 hours limit. They reported receiving no response from support concerning this issue. -
Introduction of GPT Mentions Sparks Excitement: The introduction of GPT mentions, as announced by
@kumquatexpress
, has generated excitement among users. This new feature allows for the sharing of context and custom actions between mentioned GPTs in a conversation, promising enhanced composability and flexibility in AI applications. -
Effective Strategies for Long Conversations Shared:
@darthgustav.
shared tips for managing long conversations with GPT, advising periodically asking for summaries to maintain context and effectively using the conversation token budget. -
Users Explore the Potentials and Limitations of GPT Mentions: Users like
@darthgustav.
and@blckreaper
explored and debated the capabilities of GPT mentions, discussing the ability to switch contexts between different GPTs and the constraints related to each GPT’s knowledge base. -
Calls for Expanded Functionality in GPT Plus:
@peter07082
expressed difficulties accessing new features such as Explore GPTs in Australia, despite being an early GPT Plus subscriber. They, along with other users, faced challenges in obtaining support or clear answers from OpenAI’s help system.
OpenAI ▷ #prompt-engineering (32 messages🔥):
-
Exploring GPT for Word Games:
.berns
inquired if it’s feasible to use prompts effectively for grid-based word games, noting GPT’s prior struggles with such tasks.7877
andeskcanta
provided insights, suggesting the use of a 2D array and Python tools for tracking information to potentially overcome these issues, albeit recognizing the challenge’s complexity. -
New Era with DALL-E 3 and GPT Integration:
darthgustav.
shared their positive experience with a new feature allowing GPTs to call on each other using the@
sign, particularly highlighting successful interactions between Custom GPT and DALL-E 3. They emphasized this integration’s potential, noting it preserved entire contexts including code and visual prompts, marking a significant step forward in chat realm engineering. -
Clarification on Inter-GPT Communication Limits:
novumclassicum
sought advice on using the@
sign to chain multiple GPTs for a complex task involving creation, proofing, and translation.solbus
andbambooshoots
clarified that this feature currently only works in-conversation, not within the instructions field of a GPT, meaning tasks must be managed manually, step by step. -
Powerful Potential of Full Context Preservation:
darthgustav.
discussed the groundbreaking potential of the feature that allows for full context preservation in requests between GPTs. This opens doors for seamless multi-disciplinary projects, such as writing and illustrating a children’s book in a single session without switching tools or tabs. -
Insight on Manual Process for Now: Despite the enthusiasm for the new
@
sign feature’s potential,novumclassicum
discovered that for now, their project—a complete chapter lesson for language teachers including illustrations—would require manual calls to various GPTs.darthgustav.
offered guidance on navigating this “labyrinth,” suggesting there’s a way to leverage the feature effectively even with current limitations.
OpenAI ▷ #api-discussions (32 messages🔥):
-
Exploring GPT’s Potential in Word Games:
@.berns
expressed concern about GPT’s effectiveness in word games like hangman, suggesting it often makes mistakes such as making up words. In response,@7877
and@eskcanta
provided solutions involving the use of 2D arrays and Python tools for better tracking and letter placement within games, highlighting both the challenges and potential strategies for success. -
DALL-E 3’s New Features Frustrate and Fascinate:
@darthgustav.
shared his testing experience with the new feature that allows calling different GPT models using the@
sign, illustrating how it preserves full context across different models and functionalities. He highlighted the power of seamlessly combining custom GPTs and DALL-E 3 for image generation, noting both the limitations and surprising capabilities of this feature. -
Master GPT Routine and GPT Calls Clarified:
@novumclassicum
inquired about the capability to program master GPT routines that call subroutines in other GPT models for processes like creating, proofing, and translating texts. However,@solbus
and@bambooshoots
clarified that while the concept is intriguing, the functionality to automate this process through instruction fields currently does not exist, requiring manual intervention for each step. -
Potential for Creative Project Integration:
@darthgustav.
and@novumclassicum
discussed the exciting potential of using the new GPT and DALL-E features for comprehensive projects, like writing children’s books or creating full chapter lessons with illustrations. Despite current limitations and the necessity of manual operations, they remain optimistic about future developments that could streamline such creative endeavors. -
Community Engagement and Problem-Solving: Several users, including
@darthgustav.
and@solbus
, demonstrated community-driven troubleshooting and sharing of experiences with recent OpenAI features. Their dialogue underscores the community’s role in discovering, testing, and providing feedback on the evolving capabilities of OpenAI’s models.
Eleuther ▷ #general (74 messages🔥🔥):
-
ML in the Browser Takes a Leap with Ratchet:
@frazermc
asked about machine learning inference engines in the browser, leading to@carsonpoole
introducing Ratchet, a cross-platform browser ML framework using Rust and WebGPU. It powers whisper-turbo.com and promises quantization support and optimization for speed and developer experience. -
Flash Attention 2 Under the Microscope:
@nshepperd
sparked a technical discussion about a potential use-after-free issue in Flash Attention 2, yet the code “works” possibly due to the memory not being overwritten before kernel execution. This oddity opened up a detailed dialogue about Tensor memory management in PyTorch with@nlab_enthusiast
. -
EleutherAI Celebrates ICLR Acceptances:
@stellaathena
shared the exciting news that 6 out of 10 EleutherAI-affiliated papers were accepted to ICLR, listing accepted publications like “LLeMA: An Open Language Model for Mathematics” and expressing congratulations to first-time authors and contributors. -
AI+Music Intersection Explored with New Survey:
@loubb
encouraged community members to help evaluate AI-driven music models through a survey found at http://survey.loubbrad.com:8501/. The announcement, supported by@stellaathena
, aims to gather insights on the latest model developments. -
Interest in GitHub’s Annual Growth Data and Anthropic’s Interpretation Efforts: Users expressed curiosity about GitHub’s year-over-year stats on commits and pull requests, and
@digthatdata
delved into Anthropic’s research on “OV” circuits, sharing links to transformer-circuits.pub as a resource for understanding emerging technology and interpretability efforts.
Links mentioned:
- Whisper Turbo: Transcribe any audio file - completely free!
- distil-whisper/distil-large-v2 · Hugging Face: no description found
- GitHub - pyf98/DPHuBERT: INTERSPEECH 2023: “DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models”: INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models" - GitHub - pyf98/DPHuBERT: INTERSPEECH 2023: "DPHuBERT: Joint Distillation and …
- GitHub - FL33TW00D/whisper-turbo: Cross-Platform, GPU Accelerated Whisper 🏎️: Cross-Platform, GPU Accelerated Whisper 🏎️. Contribute to FL33TW00D/whisper-turbo development by creating an account on GitHub.
- GitHub - FL33TW00D/ratchet: A cross-platform browser ML framework.: A cross-platform browser ML framework. Contribute to FL33TW00D/ratchet development by creating an account on GitHub.
- Circuits Updates - January 2024: no description found
- Circuits Updates — May 2023: no description found
Eleuther ▷ #research (96 messages🔥🔥):
-
Cluster Performance vs. Accuracy Paradox:
@llm_enjoyer
sparked a discussion on the expectation that better clustered embeddings (measured by metrics like Davies–Bouldin and Calinski–Harabasz indices) should lead to higher classification accuracy. However, they observed the opposite in their experiments, finding models with better clustering metrics performing worse in accuracy, leaving them puzzled. Davies-Bouldin Index Wiki, Calinski-Harabasz Index Wiki -
Exploring the Limits of muP Trained Models: A series of inquiries about the largest models trained with muP revealed a 3B model using Cerebras as the largest cited specific case, with also a speculative mention of GPT-4 potentially being trained with muP. Through this discussion
@jstephencorey
,@ad8e
, and@thatspysaspy
explored which large scale models might have benefited from muP technology. -
Sparse Fine-Tuning (SFT) Beats LoRA in Latest Research:
@random_string_of_character
shared a breakthrough in sparse fine-tuning for large language models like Llama 2, presenting a method that’s both parameter- and memory-efficient while outperforming the (q)LoRA approach. The research suggests a significant advancement in instruction tuning performance with both the paper and code available for further exploration. -
Potential for “Citations” Using RAG Q&A Explored:
@carsonpoole
proposed the idea of generating detailed “citations” by analyzing attention maps during RAG Q&A sessions, prompting@johnryan465
and@kharr.xyz
to point out relevant tools and approaches used previously for similar objectives, such as Bertviz and fine-tuning strategies to produce line citations. -
Mixture of Softmaxes Proposed for LLM Training:
@alstroemeria313
shared an intriguing method of training models by blending logits from the last k layers of a transformer, an approach inspired by the concept of avoiding the “softmax bottleneck”. This method, which seems to show promise at a small scale, involves softmaxing each set of logits and then blending them according to softmaxed weights for output.
Links mentioned:
- Tweet from Edoardo Ponti (@PontiEdoardo): We scaled sparse fine-tuning (SFT) to LLMs (such as Llama 2) by making it both parameter- and memory-efficient! (q)SFT instruction tuning performance is often better than (q)LoRA with comparable spee…
- SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning: no description found
- Transfer Learning for Text Diffusion Models: In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to s…
- The Normal Blog - Infinite Context LLMs: Going Beyond RAG with Extended Minds: In this blog we discuss how the transformer architecture naturally extends over external memories, and share empirical results which leverage this capability to succeed where RAG has struggled. These …
- Tweet from Edoardo Ponti (@PontiEdoardo): We experimented with instruction tuning with different mixtures (Flan v2, GPT4 Alpaca, Tülu v2), model scales (Llama 7B and 13B), and quantization (4 bits). We found that SFT outperforms LoRA and oth…
- Davies–Bouldin index - Wikipedia: no description found
- Calinski–Harabasz index - Wikipedia: no description found
- GitHub - cambridgeltl/composable-sft: A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.: A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings. - GitHub - cambridgeltl/composable-sft: A library for parameter-efficient and composable transfe…
Eleuther ▷ #lm-thunderdome (3 messages):
-
Efficient Dataset Caching Technique Shared:
@hailey_schoelkopf
shared a helpful solution to cache datasets for offline use with HF datasets, referencing a GitHub issue and the documentation on using local datasets in the EleutherAI’slm-evaluation-harness
. -
PyPI Project Ownership Transfer Request:
@hailey_schoelkopf
requested to transfer owner level access of thelm_eval
PyPI project tohaileyschoelkopf
or toeleutherai
usernames, aiming to manage maintainer permissions more effectively. -
Ownership Transfer Confirmed:
@bmk1476
promptly responded, confirming the completion of the ownership transfer request regarding thelm_eval
PyPI project.
Links mentioned:
- Is there a way to cache the building of datasets? · Issue #1344 · EleutherAI/lm-evaluation-harness: Hey all, once again, excellent framework. I’m using the framework programatically. I wrote code to do this myself, but I want to make sure I’m not being stupid and wanted to check that the pac…
- lm-evaluation-harness/docs/new_task_guide.md at main · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
Eleuther ▷ #multimodal-general (2 messages):
- Exploring Tokenizing Strokes for Vector Graphic Synthesis:
@stellaathena
mentioned a theoretical discussion about tokenizing strokes for vector graphic synthesis, linking to a relevant study, StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis. The concept appears to align with this study’s focus.
Eleuther ▷ #gpt-neox-dev (7 messages):
- CUDA Container Woes on PPC64LE:
@catboy_slim_
discovered that the only NVIDIA CUDA container that supportsppc64le
is their UBI (Red Hat) one, and lamented over RHEL’s peculiar stance on Python versions, particularly the non-recognition of 3.10. - CUDNN Installation Issues: Further issues were noted by
@catboy_slim_
with the CUDNN installation on the UBI NVIDIA image, claiming it appears to be broken. - Challenges with PPC64LE Wheel Building:
@catboy_slim_
expressed frustration and a likelihood of giving up on building a wheel forppc64le
, hinting at the severity of the challenges faced. - Apex Cross-Compilation Dilemmas: The complexity and undesirability of setting up a cross-compilation build for Apex was discussed by
@catboy_slim_
, highlighting the significant difficulties and dependencies involved in NeoX and deepspeed without Apex. - Necessity of On-Device Apex Building: Concluding the discussion,
@catboy_slim_
stated that building Apex on-device seems to be an unavoidable solution for specific cases, and emphasized the impracticality of prebuilding general-purpose binaries due to NVCC’s limitations concerning cross-compilation.
Mistral ▷ #general (147 messages🔥🔥):
-
Mistral on a Budget: Users debated the possibility of running Mistral 7B models on GPUs with 6GB VRAM like the 1660 Ti.
@batot4968
reported failure due to out of memory issues, while@mrdragonfox
clarified that it works but not in fp16 and insisted that most users need at least 24GB of VRAM to play with AI locally. -
Finding the Right Model for Restricted VRAM: Amidst discussions,
@batot4968
found a model on Hugging Face that is quantized to work with around 5.5GB VRAM, suitable for 1660 Ti. The conversation highlights the need to choose quantized models for lower VRAM requirements. -
Performance Queries on High-End Systems: Concerns were raised about the performance of Mistral 7B models on high-end systems like RTX4090, with
@batot4968
observing inconsistent speeds.@i_am_dom
recommended focusing on GPU utilization to solve the issue, implying CPU usage should be minimized for optimal performance. -
Clarifications on Running Models: As users like
@mikifireblue
sought clarity on running Mistral models without fully loading them into VRAM,@i_am_dom
pointed towards utilizing resources like Colab notebooks for guidance on efficient model loading. -
Tech Enthusiasts Navigate AI Landscape: Discussions from users new to the scene, like
@krahs.
, who expressed interest in integrating Mistral AI into game design, underline the community’s exploration and experimentation with AI models. The dialogue showed a mix of enthusiasm and the need for detailed guidance to navigate the AI model landscape efficiently.
Links mentioned:
- TheBloke/Mistral-7B-Instruct-v0.2-GGUF · Hugging Face: no description found
- GitHub - mistralai/mistral-src: Reference implementation of Mistral AI 7B v0.1 model.: Reference implementation of Mistral AI 7B v0.1 model. - GitHub - mistralai/mistral-src: Reference implementation of Mistral AI 7B v0.1 model.
- Reddit - Dive into anything: no description found
Mistral ▷ #finetuning (2 messages):
-
Seeking Brevity in LLM Responses:
@brentnhunter
requested advice on how to make the LLM’s responses shorter. Despite instructions for brevity and limiting token count, responses remained undesirably lengthy, specifically desiring straightforward answers like “4” for questions such as “what is 2 plus 2”. -
Fine-Tuning Tips for Model Improvement:
@friendly911
suggested running more steps, considering 60 too few for significant data sizes, and recommended decreasing the learning rate to maybe 2e-5 for better model performance.
Mistral ▷ #showcase (1 messages):
- Open Source Web UI for Mistral Released:
@darkstar1011
has launched an open source project named uMdali aimed at providing a Web UI for Mistral API. This project also supports connections to Ollama, OpenAI, and Gemini, positioning itself as an “Enterprise Chat Front End”.
Links mentioned:
GitHub - brett-baudin-consulting/uMdali: Enterprise Chat Front End: Enterprise Chat Front End. Contribute to brett-baudin-consulting/uMdali development by creating an account on GitHub.
Mistral ▷ #la-plateforme (4 messages):
- Open Call for Contributions on Mistral’s GitHub:
@sophiamyang
encourages the community to submit Pull Requests (PRs) to Mistral’s public documentation on GitHub, inviting collaboration and contributions. - Direct Support for Notebook Inquiry:
@sophiamyang
addressed Patrick with an apology for the oversight and promised to review the submitted notebook, highlighting Mistral’s responsive engagement with its community. - Internship Inquiries at Mistral Spark Curiosity: User
@bepis4552
inquired about the possibility of applying for an internship at Mistral, indicating interest in joining the team. - Tough Competition for Mistral Internships: In response to an internship inquiry,
@sublimatorniq
points out the high qualifications of Mistral’s developer relations team and suggests that landing an internship could require exceptional talent and a bit of luck.
Links mentioned:
GitHub - mistralai/platform-docs-public: Contribute to mistralai/platform-docs-public development by creating an account on GitHub.
HuggingFace ▷ #announcements (1 messages):
- Code Llama 70B Launches:
@lunarflu
announces the release of Code Llama 70B, the community's latest AI chat model. Try it out here. - Sentence Transformers v2.3.0 Released:
@tomaarsen
introduces Sentence Transformers v2.3.0 featuring bug fixes, performance enhancements, and more efficient model loading. Release notes available here. - Introducing Serverless Object Detection:
@whitphx
shares a Gradio-Lite andtransformers.js.py
collaboration for a serverless object detection app. Check out the app and code here. - Autotrain Advances to Local-first:
@abhi1thakur
declares Autotrain is now "local-first", enabling local training with a UI through a simple pip install. Instructions available here. - Hugging Face and Google Cloud Partnership: A strategic partnership between Hugging Face and Google Cloud aims to democratize AI utilizing open models and technologies. More details on the partnership can be found here.
Links mentioned:
- HuggingChat: Making the community’s best AI chat models available to everyone.
- Tweet from Omar Sanseviero (@osanseviero): Code Llama 70B is here! 🚀🦙🤖 Find the models in transformers format - Base https://hf.co/codellama/CodeLlama-70b-hf - Python https://hf.co/codellama/CodeLlama-70b-Python-hf - Instruct https://hf.c…
- Big Code Models Leaderboard - a Hugging Face Space by bigcode: no description found
- Tweet from tomaarsen (@tomaarsen): The long-awaited Sentence Transformers v2.3.0 is now released! It contains a ton of bug fixes, performance improvements, loading custom models, more efficient loading, a new strong loss function & mor…
- Tweet from Yuichiro (@whitphx): Gradio-Lite (Serverless @Gradio) + http://Transformers.js.py's object detection pipeline = Serverless Object Detection App! Write only Python code with Gradio and Transformers, then host it as a …
- The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models: no description found
- Tweet from Moritz Laurer (@MoritzLaurer): .@huggingface TGI is now compatible with @OpenAI’s python client/HTTP interface: put any open-source LLM in a TGI container on your own hardware and call it via the OpenAI python client 😎 Step 1…
- Tweet from abhishek (@abhi1thakur): AutoTrain is now local-first! 💥 This means you can install autotrain-advanced using pip and run trainings using the UI locally 🚀 In Hugging Face Spaces, just attach the GPU you like to your AutoTrai…
- makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch: no description found
- Tweet from Xenova (@xenovacom): Depth Anything is now available in 🤗 Transformers.js! At just 25M parameters, the small version of the model runs great locally. Here’s a demo I created which performs monocular depth estimation…
- Tweet from Vaibhav (VB) Srivastav (@reach_vb): Alrighty! W2V-BERT 2.0: Speech encoder for low-resource languages! 🔥 With < 15 hours of audio, you can beat Whisper and get your own SoTA ASR model! > Pre-trained on 4.5M hours of data. > …
- LevelBot - a Hugging Face Space by huggingface-projects: no description found
- Tweet from Sayak Paul (@RisingSayak): We have started opening up a mix of different opportunities for contributions in the 🧨 diffusers repo more structurally. If you’re interested in contributing, look for issues, starting with the…
- Tweet from lunarflu (@lunarflu1): Now everyone can see your @huggingface posts on your profile! 🔥🧙♂️✨
- Tweet from Mishig Davaadorj (@mishig25): Updated web-search experience on HuggingChat. We’ll keep shipping real big updates on HuggingChat real fast. Looking forward to Llama 3 & beyond
- Tweet from Katie Link (@katieelink): Did you know you can control access to your @huggingface models and datasets with gates? 🔐 This can be critical for healthcare/biomedical models & datasets which might require access approvals, trai…
- Tweet from Quentin Lhoest (@qlhoest): The 🤗 Hugging Face Hub now natively supports ⚡⚡ WebDataset ⚡⚡ It’s by far the best dataset format for Streaming data for AI model training, let me explain👇🧵
- Hugging Face and Google partner for open AI collaboration: no description found
HuggingFace ▷ #general (97 messages🔥🔥):
-
Exploring Multimodality in LLMs:
@thegenerativegeneration
is catching up on LLM and multimodality understanding, particularly interested in how models handle multiple images and videos simultaneously for context understanding. They also inquired about practical experiences with LLMs that have 3D understanding, seeking any relevant surveys or resources on the topic. -
Reducing LLM Costs:
@jasonzhou1993
shared a YouTube video titled “The REAL cost of LLM (And How to reduce 78%+ of Cost)” discussing strategies to significantly reduce LLM operational costs. -
Seeking Tips on Twitter Posts for Open Source/ML Projects:
@vipitis
sought advice on writing announcement-style Twitter posts for open source/ML topics, eliciting tips and feedback from the community. Recommendations included analyzing high-quality announcements and considering visual storytelling as an effective tool. -
CUDA Memory Allocation Inquiry: A discussion between
@felixsanz
and@pixxelkick
about Python’s PyTorchtorch.cuda.max_memory_allocated
function and how it correlates with memory usage reported by NVIDIA’s nvidia-smi tool. Confusion arose over the discrepancy between reported and actual GPU memory allocation. -
Inquiry about TTS AI Performance on Low-Spec Hardware:
@yengecbey
queried the community about the feasibility of running text-to-speech AI on a laptop with an i5 processor and 8GB of RAM, indicating an interest in understanding the hardware requirements for TTS AI applications.
Links mentioned:
- The REAL cost of LLM (And How to reduce 78%+ of Cost): I want to give you step by step guide on how to reduce LLM cost by 70%, and unpack why it is costing so much nowFree HubSpot AI For Marketers Course: https:/…
- GitHub - mermaid-js/mermaid: Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown: Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown - GitHub - mermaid-js/mermaid: Generation of diagrams like flowcharts or sequence diagrams from…
- Tweet from Tanishq Mathew Abraham, Ph.D. (@iScienceLuvr): Happy to share a new paper I worked on!: “Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers” abs: https://arxiv.org/abs/2401.11605 website: https://c…
- Tweet from EleutherAI (@AiEleuther): We are excited to join other leaders in artificial intelligence in partnering with @NSF to launch the National AI Research Resource (NAIRR), a shared infrastructure that will promote access to critica…
- Tweet from Jerry Wei (@JerryWeiAI): New @GoogleAI+@Stanford paper!📜 Symbol tuning is a simple method that improves in-context learning by emphasizing input–label mappings. It improves robustness to prompts without instructions/relevan…
- Tweet from Stella Biderman (@BlancheMinerva): Have you ever wanted to do an experiment on LLMs and found that none of the existing model suites met your needs? At @AiEleuther we got tired of this happening and so designed a model suite that cente…
HuggingFace ▷ #cool-finds (1 messages):
- Unveiling the Multimodal Malaysian LLM Dataset: User
@andysingal
shared a link to a multimodal Malaysian LLM dataset hosted on HuggingFace, offering a resource for developing LLM models with a focus on the Malaysian context. This dataset, part of the mesolitica collection, includes translated LLaVA instructions and aims to enhance language model training with multimodal inputs.
Links mentioned:
Multimodal Malaysian LLM dataset - a mesolitica Collection: no description found
HuggingFace ▷ #i-made-this (3 messages):
-
Excel/CSV files directly to database magic:
@impl66
created a Gradio app that transforms Excel/CSV files into database tables and allows users to query them easily. Check out the app on HuggingFace Spaces. -
Tackling AI existential fears:
@mateomd_dev
discusses the often sensationalized fear of AI gaining consciousness and turning against humanity in the latest issue of their newsletter, Recurrent Neural Notes. For a deep dive into the topic, visit RNN #8 - Will AI Become Evil?. -
Magic: The Gathering meets AI:
@joshuasundance
introduces what might be the first Magic: The Gathering model on HuggingFace, capable of multi-label classification of card color identity based on card name and text. To explore this innovative use of AI for deck building, visit mtg-coloridentity-multilabel-classification.
Links mentioned:
- Tables - a Hugging Face Space by sid27: no description found
- RNN #8 - Will AI Become Evil?: The Role of Consciousness in the Dangers of AI
- joshuasundance/mtg-coloridentity-multilabel-classification · Hugging Face: no description found
HuggingFace ▷ #reading-group (3 messages):
- New to the Reading Group Channel:
@marc.casals.salvador
inquired about how the channel operates and if there are meetings set up to discuss readings. - Discord Calls for Presentation:
@chad_in_the_house
responded to @marc.casals.salvador, noting that discord calls are being organized when presenters are available, with the next session planned for Friday around 1-2 pm EST.
HuggingFace ▷ #diffusion-discussions (5 messages):
-
Seeking robustness beyond DPO for QA datasets:
@blackbox3993
is exploring how to enhance a question answering dataset with negative answers using DPO (Differential Privacy Optimization). They are curious about alternative methods to make their model more robust and accurate. -
Inference support query for lokr and loha:
@forsana
is inquiring if lokr or loha support inference using the “loading with peft” method, indicating a specific interest in deployment techniques. -
CUDA plugin registration errors in Google Colab:
@straze007
encounters multiple CUDA-related errors while attempting to fine-tune a text-to-image model using LoRA in Google Colab, specifically with cuDNN, cuFFT, and cuBLAS plugins. -
Disappointment with 70B coder’s chat capabilities:
@pseudoterminalx
shares a link (hf.co chat) expressing disappointment with the 70B coder chat model, noting it lacks knowledge of the 🤗 Diffusers library. -
Struggle to replicate specific art style in Stable Diffusion:
@troyfix
aims to recreate an anime-style art form using vanilla Stable Diffusion and provides a detailed prompt to capture the sketchy and rough texture of the target style. They link to an example (akimasaweb) which embodies the aspirational art style.
Links mentioned:
works 徳永明正-航空イラストなど-: no description found
HuggingFace ▷ #NLP (7 messages):
- GPU Acceleration Troubles Detected: User
@sgp
began troubleshooting an issue where they realized that GPU acceleration was not functioning at all despite their efforts in configuring the setup. - RTX 2080ti in the Mix: They disclosed using an RTX 2080ti for their experiments, hinting at their high-end hardware setup.
- LLama GitHub Issue Consulted: Seeking solutions,
@sgp
found and shared a GitHub issue related to LLama cpp problem with GPU support, which suggested potential fixes. - Encountered nvcc Compilation Error: Following the proposed GitHub solutions,
@sgp
encountered a compilation error:nvcc fatal : Unknown option 'fPIC'
, signifying compatibility issues with thenvcc
compiler options. - Possible Misconfiguration Leads to Larger Issues: The troubleshooting attempts led to a complication where
@sgp
indicated that their efforts might have inadvertently broken their existing setup, affecting the functioning of their gptq model.
Links mentioned:
LLama cpp problem ( gpu support) · Issue #509 · abetlen/llama-cpp-python: Hello, I am completly newbie, when it comes to the subject of llms I install some ggml model to oogabooga webui And I try to use it. It works fine, but only for RAM. For VRAM only uses 0.5gb, and I…
HuggingFace ▷ #diffusion-discussions (5 messages):
- Query on Creating Negative Answers for Datasets:
blackbox3993
is looking for ways to enhance a question answering dataset with negative answers, questioning if DPO is a viable method for this. They also ask for alternatives to DPO to make the model more robust and accurate. - LOKR or LOHA Support Inquiry for Inference:
forsana
inquires about the support for inference using LOKR or LOHA with the PEFT loading method, seeking clarity on the available options. - TensorFlow Errors in Google Colab:
straze007
experiences multiple errors while trying to fine-tune a text-to-image model using LORA in Google Colab, pointing out compatibility issues with TensorFlow plugins. - Disappointment with 70B Coder Chat Experience:
pseudoterminalx
shares a link expressing dissatisfaction with the 70B coder chatbot’s lack of knowledge regarding the 🤗 Diffusers library. - Seeking the Perfect Anime Art Style with Stable Diffusion:
troyfix
struggles to recreate a specific anime art style using vanilla stable diffusion, providing a detailed prompt and an example photo but not achieving the desired sketchy and rough look.
Links mentioned:
works 徳永明正-航空イラストなど-: no description found
OpenAccess AI Collective (axolotl) ▷ #general (23 messages🔥):
-
Axolotl Updates Bring Joy:
@nafnlaus00
shared their delight in the continuous improvement of axolotl and its dependencies that yield lower VRAM usage, faster training, and better outcomes without needing dataset or configuration adjustments. This feedback was appreciated, leading to a conversation about sharing the positive experience on Twitter and highlighting the importance of reproducibility. -
Discussions on Hardware and VM Enhancements:
@dreamgen
explained a part of their speedup in AI projects was due to increased wattage (300 vs 700), while@dangfutures
expressed a need for increased VM RAM on GPUs, emphasizing the importance of hardware in AI development. -
The Excitement around MIQU-1-70b Dequantization:
@dreamgen
presented the dequantization of MIQU-1-70b from q5 to f16 and its adaptation to PyTorch, including a link to the model on Hugging Face. The conversation involved sharing the usage code snippet for implementation details and encouraging community members to explore this development. -
Hardware Speculation and Rack Considerations:
@dreamgen
shared information about a 4xMI250 server being priced at 70K but also pointed out the logistical consideration of needing space for a server rack, highlighting budget and space as critical factors in infrastructural planning for AI projects. -
Speculation and Inquiry into Mistral Medium’s Authenticity: Discussion emerged around the credibility and performance of Mistral Medium, with
@le_mess
and others questioning its authenticity and benchmark results, and@dreamgen
offering access to the Mistral API for those wanting to run their own tests.
Links mentioned:
152334H/miqu-1-70b-sf · Hugging Face: no description found
OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (7 messages):
- VRAM Usage Concern on New Implementation:
@caseus_
highlighted a known issue where a recent implementation consumes 2x VRAM, a concern under investigation by the paper’s author.@suikamelon
expressed disappointment, having hoped to test it locally. - Potential DIY Solution to LoftQ by StefanGliga:
@stefangliga
suggested possibly re-implementing LoftQ due to understanding that it might be just an alternative initialization technique for LoRA. - Exploring First Order Approximation with LoftQ: In an attempt to approximate LoftQ,
@stefangliga
shared a code snippet using SVD on the difference between original and dequantized weights as what they believe is a first order approximation to LoftQ. - Debate on How to Address Implementation Concerns:
@caseus_
argued that it might be more beneficial to correct the VRAM issue upstream and, failing that, to address it within Axolotl.
OpenAccess AI Collective (axolotl) ▷ #general-help (65 messages🔥🔥):
-
Docker Dilemmas Solved with
pip install
:@duke001.
encountered an error “No module named 'axolotl.cli'
” when using the Docker image for Axolotl. The issue was resolved by@nanobitz
’s guidance tocd /workspace/axolotl
andpip install -e .
, highlighting the importance of proper volume mounting in Docker use. -
Inference on Merged Models with vLLM:
@diabolic6045
queried about inferring a merged qlora llama2 7B model using vLLM, to which@nanobitz
advised ensuring sufficient vRAM for operations. Discussions also included observations about vLLM’s speed versus quality in text generation. -
Training Troubles Tackled:
@jorelosorio
faced aZeroDivisionError
during model training with a small dataset.@nanobitz
recommended adjustments tonum_epochs
andmicro_batch_size
, and subsequently suggested the use ofgradient_accumulation_steps: 1
to prevent division by zero errors. -
Adding Slang to Tokenizers Explained:
@arcontex
sought advice on training a conversational model with Axolotl using a corpus with country-specific slang.@nanobitz
explained how to add new tokens to the tokenizer using the built-intokens:
in YAML, illustrating the process with chatml tokens and discussing when adding words to the tokenizer is most beneficial. -
Conversational Model Training Clarified:
@arcontex
inquired about special considerations for declaring a dataset in YAML when training a conversational model with Axolotl.@nanobitz
suggested remapping to sharegpt for ease, outlining how to define the dataset in YAML for conversational model training with data in OpenAI format.
OpenAccess AI Collective (axolotl) ▷ #community-showcase (1 messages):
- Exploring Function Calls with Llamacpp:
@mistobaan
shared a creative experiment on making function calls using llamacpp, showcasing a collaborative effort with tools and models from the community. This included using Gist for the colab notebook, ggerganov’s llama.cpp, llamacpp python wrapper by @abetlen, and @calebfahlgren’s natural functions model on Hugging Face.
Links mentioned:
Tweet from Fabrizio Milo (@fabmilo): Had fun experimenting on function calling outside #OpenAI API. Sharing my #colab [1] that leverages @ggerganov’s llama.cpp[2] / @abetlen llamacpp python wrapper [3] + @LangChainAI wrapper + @ca…
OpenAccess AI Collective (axolotl) ▷ #runpod-help (9 messages🔥):
- Troubleshooting
axolotl
module not found:@gonejiggy
encountered an error with theaxolotl
module not being found in the runpod docker image, which was temporarily fixed withpip3 install -e '.[deepspeed]'
andpip uninstall flash_attn
commands. However, they were confused as this issue was not present the previous day. - Possible Docker Volume Mounting Issue:
@caseus_
and@propback
discussed a potential problem where Docker might not mountaxolotl
in the workspace directory properly due to default mounting of network volumes to/workspace
, which could overwrite the container’s/workspace
. - Jupyter Server Extension Warning and Errors:
@rss101
shared server logs showing warnings about deprecated Jupyter server extension functions and a critical error stating/workspace is outside root contents directory
, resulting in the Http service at port 8888 never being ready.
OpenAccess AI Collective (axolotl) ▷ #deployment-help (1 messages):
yamashi: Parallel req
Perplexity AI ▷ #general (12 messages🔥):
- Curiosity About Mistral Medium API Access:
@arcinarci
inquired about when API access for Mistral Medium might be available, indicating community interest in wider API access. - Free Trial Confusion Cleared: User
@aarav7024
was puzzled about why they couldn’t access the 7-day free trial. This sparked a discussion, possibly suggesting that Perplexity might have halted the 7-day free trial offer. - Understanding the Mobile App’s Limitations:
@gooddawg10
questioned if multiple uploads were possible on Android.@ok.alex
confirmed that the file and image upload feature is not yet available on the app but mentioned it would be added in future releases. - Seeking Guidance on Image Generation:
@stoop6981
sought advice on how to use the Image Generation Model effectively, after encountering issues.@ok.alex
directed them to a helpful thread for more detailed guidance.
Perplexity AI ▷ #sharing (10 messages🔥):
-
Creative Use of Perplexity for App Development:
@gammagames
explored using Perplexity for generating names and addresses for a delivery-themed Pomodoro app, finding the tool highly effective for creative content creation. -
Discovering Perplexity Labs:
@rowalth
highlighted the existence of Perplexity Labs, where users can experiment with various functionalities of the AI. -
Bias and Decision Making in App Development:
@zinovi.eu
reflected on personal biases against Apple Inc. while considering iOS app development, ultimately deciding against it despite a constructive inquiry with Perplexity. -
From Music Classification to Ice Cream Machines:
@fungifriendly47
embarked on a journey with Perplexity from music classification to discovering Ice Cream machines on a B2B site, illustrating Perplexity’s diverse utility. -
Finding the Right Laptop for Gaming in India:
@aninokuma95
conducted a thorough search for laptops with good GPUs under $600, identifying options like Lenovo Yoga 7i (2023) and Acer Nitro 5, and highlighting the importance of component consideration for gaming performance.
Links mentioned:
- Meet Aravind from India who quit OpenAI to disrupt Google - conversation with Marina Mogilko: How will the New-Gen Search Engine look like? Let’s find out together with Aravind Srinivas who came from India to USA to disrupt the online search with AIGe…
- Reddit - Dive into anything: no description found
- Reviews by ngoshawk: no description found
- Apple Releases macOS Ventura 13.0.1 Update With Bug Fixes: Yes, between 13.0.1 and Malwarebytes releasing a Ventura compatible version the issue is fixed.
- Reddit - Dive into anything: no description found
- Best laptops under $600 in 2024: Looking for a new laptop but don’t want to go beyond $600? We have some of the best options for you available for purchase right now!
- Choosing a low/medium-end gaming laptop under 600$: Hello. Question asked a countless times, yet here it is again. I’ve been a PC user for my whole life so far but with the college arrival it’s time to make a change. Any additional info abo…
- Fastest RTX 4050 Gaming Laptop for $600! Acer Nitro 5 Review: Check Acer Nitro 5 Prices: https://geni.us/JLqV5💲Find the best gaming laptop deals at my site https://gaminglaptop.dealsAcer’s Nitro 5 is the fastest gaming…
- The 4 Best Laptops For Programming - Winter 2024: Reviews: The best laptop for programming we’ve tested is the Apple MacBook Pro 14 (M2, 2023). This high-end mobile workstation has a sturdy all-aluminum build, a portable, compact design, and all-day batte…
Perplexity AI ▷ #pplx-api (61 messages🔥🔥):
-
Integrating Perplexity in Web Apps Made Easy:
@dogemeat_
helped@andreafonsmortigmail.com_6_28629
by sharing a starting point for integrating the Perplexity API into a web application. Instructions and relevant documentation can be found here and API token creation here. However, it’s noted that the pplx-api does not support file uploads for chat interactions. -
Perplexity API and File Handling Inquiry:
@andreafonsmortigmail.com_6_28629
queried about the capability of pplx-api to handle file uploads and text summarization.@clay_ferguson
clarified that while direct file handling might not be supported, users can extract text from files for inclusion in prompts, effectively allowing for summarization within the given text limits. -
Discovering Cody, the AI Coding Assistant:
@thereverendcognomen
shared insights on using Cody, a free AI coding assistant that knows the user’s entire codebase, suggesting it as a model for future integrations and expressing interest in consolidating AI-related expenses under one platform. More information on Cody can be found here. -
Potential for Local Model Training Questioned:
@gritknox
and@thereverendcognomen
discussed the feasibility of training and testing large models without high-end local devices. The conversation highlighted the ability to use certain models locally with basic hardware and the mentioned utility of platforms like Ollama for local model execution and training. -
API Troubleshooting and Community Support:
@mafia_boii
faced a 401 authentication error while trying to access the Perplexity API. Community support led by@clay_ferguson
provided troubleshooting steps, including rewinding to more basic shell commands for isolating the issue, and confirmed the validity of the snippet found in the documentation.
Links mentioned:
- Cody AI - Visual Studio Marketplace: Extension for Visual Studio Code - Code AI with codebase context
- GitHub: Let’s build from here: GitHub is where over 100 million developers shape the future of software, together. Contribute to the open source community, manage your Git repositories, review code like a pro, track bugs and fea…
- Upload 3 files · microsoft/phi-2 at 7e10f3e: no description found
- Getting Started with pplx-api: no description found
- Supported Models: no description found
- Ollama: Get up and running with large language models, locally.
- GitHub - ollama-webui/ollama-webui: ChatGPT-Style Web UI Client for Ollama 🦙: ChatGPT-Style Web UI Client for Ollama 🦙. Contribute to ollama-webui/ollama-webui development by creating an account on GitHub.
- quantizr/src/main/java/quanta/service/node/PplxAiService.java at 246e7e2b3510b033b1c133e8702fcc7da1b325a0 · Clay-Ferguson/quantizr: Quanta is an open-source CMS with ChatGPT and Social Media (Fediverse) features - Clay-Ferguson/quantizr
- Chat Completions: no description found
LLM Perf Enthusiasts AI ▷ #triton (5 messages):
- Flexibility Issues with Triton and CUDA:
@mhmdsabry
inquired about low-level CUDA features that Triton lacks flexibility with, requesting how these aspects could improve performance. He also asked for links and resources for a detailed answer. - Triton’s Handling of Low-Level Data:
@gogators.
highlighted that Triton provides limited control over data storage and partitioning at the GPU ‘block’ level, affecting the use of shared memory and registers crucial for algorithms like flash attention. He mentioned that despite this, Triton’s management of these resources could still reach optimal performance levels. - Recommended Reading on Triton’s Implementation: For those looking to dive into the specifics,
@gogators.
recommended reading the original Triton paper and GPT-4 details as excellent resources for understanding Triton’s implementation details. - Triton’s Synchronization Features Need Improvement: Addressing part of
@mhmdsabry
’s query,@andreaskoepf
pointed out Triton’s weak synchronization features, specifically highlighting the lack of robust sync primitives beyond the documented debug_barrier, which is mainly used to synchronize all threads in a block.
Links mentioned:
triton.language.debug_barrier — Triton documentation: no description found
LLM Perf Enthusiasts AI ▷ #cuda (24 messages🔥):
- Vectorized Memory Access Boosts Performance:
@andreaskoepf
shared insights on improving performance in memory-bound CUDA kernels, suggesting a shift towards reading consecutive memory and using vector loads. He referenced an NVIDIA blog post to emphasize the significance of optimizing bandwidth utilization through vectorization. - CUDA Made Easier with Numba:
@hamelh
highlighted a Twitter post by @HaseoX94 that discusses simplifying CUDA programming using Numba, including a tutorial by Jeremy Howard that demystifies CUDA for Python users. - Potential CUDA Presentation Teased:
@marksaroufim
humorously volunteered@555959391833292811
for a talk on February 24, sparking a conversation about presenting coding work, with@zippika
expressing humility about their presentation skills while appreciating the opportunity. - RAM vs. VRAM Requirements Debated:
@bazlan
inquired why some recommend having twice as much RAM as VRAM, leading to a discussion with@marksaroufim
and@zippika
about the practical benefits of ample RAM for data preprocessing and model manipulation. - CUDA Refactoring Trials and Tribulations:
@artste
shared their experiences with refactoring CUDA code to enhance readability and performance, ultimately finding a “non-float” approach suggested by@555959391833292811
to be the fastest albeit with minor pixel discrepancies. The journey and comparison of various methods are compiled in a GitHub notebook.
Links mentioned:
- Tweet from Somshubra Majumdar (@HaseoX94): I personally dislike C++, and if you do too, you can simply use Numba to do CUDA programming very easily. Here is Jeremy’s tutorial with all the CUDA + torch load_inline() replaced with simple py…
- lecture2/lecture3/cuda_rgb_to_gray_refactor.ipynb at cuda_rgb_to_gray_refactor_notebook · artste/lecture2: lecture 2 - 2024-01-20. Contribute to artste/lecture2 development by creating an account on GitHub.
- CUDA Pro Tip: Increase Performance with Vectorized Memory Access | NVIDIA Technical Blog: This post demonstrates the use of vectorized memory access in CUDA C/C++ to increase bandwidth utilization while decreasing instruction count.
LLM Perf Enthusiasts AI ▷ #torch (1 messages):
andreaskoepf: https://x.com/pytorch/status/1752406904809341165
LLM Perf Enthusiasts AI ▷ #algorithms (2 messages):
- Offer to Run Code on A100/H100 GPUs:
@vim410
has offered to run someone’s codebase on A100/H100 GPUs to generate data points. However, they cannot provide SSH access. - Dual-GPU Machine for Testing Available:
@jeremyhoward
is willing to provide a dual-GPU machine for testing for an unlimited time. Direct messaging him is the way to get this organized.
LLM Perf Enthusiasts AI ▷ #suggestions (1 messages):
vim410: Thanks for sharing, i am one of the person who wrote the article. 🙂
LLM Perf Enthusiasts AI ▷ #jobs (2 messages):
- ML Performance Wizard Guide Unveiled: User
@muhtasham
shared a link to the ML Performance Wizard, a comprehensive guide or resource, but did not provide further details in their message. - NVIDIA Seeks CUDA and C++ Talent:
@vim410
announced that NVIDIA is hiring CUDA and C++ experts. Interested candidates with intermediate or expert knowledge in CUDA were encouraged to contact them for connections to the right NVIDIA team members.
Links mentioned:
Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It’s the all-in-one workspace for you and your team
LLM Perf Enthusiasts AI ▷ #beginner (8 messages🔥):
- Diving Into CUDA Programming:
@noobpeen
expressed interest in learning CUDA programming and inquired about guides and prerequisites. - A Treasure Trove of CUDA Resources Unveiled by @apaz: For CUDA learning,
@apaz
recommended the “Programming Massively Parallel Processors” book, accessible on eBay or libgen.is. Apaz shared valuable resources, including a free GPU access link at lightning.ai, a twitter thread for setting up an environment by Jeremy Howard (@jeremyphoward), and a YouTube channel for CUDA lectures at CUDA MODE. - CUDA Setup Query for RTX 3070 Laptop:
@noobpeen
sought advice on setting up CUDA library for an RTX 3070 laptop, questioning if there were any special requirements. - Conda Preference: Torch Over Tensorflow: In response to
@noobpeen
’s query regarding the preference between Conda with Torch or TensorFlow,@apaz
endorsed using Conda with Torch, humorously commenting on Tensorflow’s waning relevance. - Windows CUDA Setup Tip from @lancerts: For CUDA setup on Windows,
@lancerts
suggested using Visual Studio for its direct integration with CUDA, clarifying not to use Visual Studio Code for this purpose.
Links mentioned:
- Lightning AI | Turn ideas into AI, Lightning fast: The all-in-one platform for AI development. Code together. Prototype. Train. Scale. Serve. From your browser - with zero setup. From the creators of PyTorch Lightning.
- CUDA MODE: A CUDA reading group and community https://discord.gg/XsdDHGtk9N Supplementary content here https://github.com/cuda-mode Created by Mark Saroufim and Andreas Köpf
LLM Perf Enthusiasts AI ▷ #pmpp-book (6 messages):
- CUDA Memory Indexing Explained:
@andreaskoepf
detailed how to calculate the memory index in CUDA, emphasizing the roles ofblockDim.x
,blockIdx.x
, andthreadIdx.x
in determining an element’s index within a specific section of the memory array. - Understanding Through Collaboration: Following an explanation from
@andreaskoepf
,@ashpun
expressed gratitude for the clarity provided on CUDA memory indexing, highlighting the value of community support in resolving technical inquiries. - Exploring CUDA Events for Timing:
@shindeirou
initiated a discussion on the necessity of usingcudaEventSynchronize()
when measuring the time ofcudaMemcpy
operations, despite the blocking nature ofcudaMemcpy
. - Clarifying CUDA Timing Mechanisms:
@_tvi_
responded to@shindeirou
with a clarification that synchronization is needed for both the completion ofcudaMemcpy
operations and the recording of the event itself, which might explain unexpected behavior like getting 0.0 in timing measurements. - Importance of Synchronization in CUDA:
@vim410
emphasized that all CUDA API calls should be considered asynchronous by default, underscoring the importance of explicit synchronization when capturing performance metrics.
LangChain AI ▷ #announcements (2 messages):
- LangChain Fork Fiasco:
@.bagatur
reported an incident where forks of the LangChain repository were not being recognized as forks, and corresponding PRs vanished overnight. A GitHub discussion has been opened for tracking the issue. - Quick Recovery and Action Steps:
@.bagatur
followed up announcing that the issue with LangChain forks appears resolved, and many of the closed PRs have been reopened. Contributors whose forks still have problems need to reopen the PRs manually, as the team cannot access those.
Links mentioned:
GitHub Incident: Forks not being recognized, PRs automatically closed · langchain-ai/langchain · Discussion #16796: As of Jan 30, 2024 9:30am PST we’re aware that most LangChain forks have stopped being recognized as forks, and the corresponding PRs have automatically been closed. We’re in contact with the …
LangChain AI ▷ #general (35 messages🔥):
-
Seeking Advice on Custom Tool Parameters:
@georg.ort
is looking for assistance on defining required and optional parameters for a custom tool. They are open to payment for valuable consultation and shared a link for communication. -
LangChain Incorporates Handlebars: Handlebars has been experimentally incorporated into LangChain JS as a supported templating language, according to
@afirstenberg
and confirmed by@jacoblee93
. -
Investigating GitHub Forks Issue with LangChain:
@.bagatur
highlighted an issue where forks of LangChain on GitHub were not being recognized correctly, with PRs closing automatically. The problem seems to have been resolved, with efforts to reopen affected PRs. -
Prompt Engineering for Open-Source LLMs: A link shared by
@juanpablomesa
highlights the nuances of prompt engineering for open-source LLMs like Mistral and Llama compared to closed-source models, derived from Dr. Sharon Zhou’s insights. Full details can be found on juanpml.com. -
Troubleshooting GPT-4 Integration with Python Application:
@lucas1809
shares challenges in integrating GPT-4 into a Python application for creating a chatbot, facing errors when attempting to use it outside the v1/completions endpoint. A series of messages detail the progression towards understanding the error and seeking a solution.
Links mentioned:
- The REAL cost of LLM (And How to reduce 78%+ of Cost): I want to give you step by step guide on how to reduce LLM cost by 70%, and unpack why it is costing so much nowFree HubSpot AI For Marketers Course: https:/…
- GitHub - vectara/hallucination-leaderboard: Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents: Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents - GitHub - vectara/hallucination-leaderboard: Leaderboard Comparing LLM Performance at Producing H…
- How to structure your API prompt calls for Open-Source LLMs: Explore how to prompt open-source LLMs like Mistral-7B and Llama-2-7b compared to GPT-3.5 and GPT-4, with practical coding examples.
- GitHub Incident: Forks not being recognized, PRs automatically closed · langchain-ai/langchain · Discussion #16796: As of Jan 30, 2024 9:30am PST we’re aware that most LangChain forks have stopped being recognized as forks, and the corresponding PRs have automatically been closed. We’re in contact with the …
LangChain AI ▷ #langserve (3 messages):
- Seeking Faster Access to Langserve:
@rebelsandrobots_97106
is looking for a quicker way to get access to Langserve for hosting an LLM in a college literature course. They are currently on the waiting list and are exploring other options for quicker access. - LangServe and Hardware Resource Management:
@veryboldbagel
clarified that LangServe doesn’t manage hardware resources for LLMs, meaning users need to add an additional layer for this purpose. Without it, there’s a risk of server crashes during concurrent LLM usage.
LangChain AI ▷ #share-your-work (5 messages):
-
Oranscribe Pre-Launch Tease:
@shving90
introduced Oranscribe, a tool designed to enhance writing, flow, and growth, now featured on Product Hunt. Excitement builds with anticipation for its official launch. -
AI Symphony by ColBERT & Langchain:
@andysingal
shared a Medium article titled “ColBERT & Langchain’s Symphony with RAGatouille”, heralding a revolution in AI and humor interaction through a novel collaboration. -
Launch of SkillForge V1:
@robot3yes
showcased their weekend project, a SkillForge agent prototype that has the capability to create skills for other agents. Here’s the intriguing YouTube video titled “SkillForge V1”. -
JACoB: A New Dawn for AI Coding Bots:
@momentnerd
revealed significant progress on their AI coding bot project, now named JACoB (Just Another Coding Bot), which has transitioned from a concept to a production-ready coding assistant. Excitement surrounds the open-source announcement and the offering of a detailed walkthrough on JACoB’s capabilities, further details found at jacb.ai. -
A Flashback to JACoB’s Origins: Following the big reveal of JACoB,
@momentnerd
references a post from June ‘23, providing context and continuity to the project’s journey. Unfortunately, the link is missing, leaving readers curious about the origins of JACoB.
Links mentioned:
- SkillForge V1: Agent IX will soon have a skills library. Simple python functions that can be used as tools and components.I’m developing a SkillForge agent that generates s…
- Experience the Future of Coding - Watch AI Coding Bot JACoB Self-Building Its Own Homepage.): Introducing JACoB: Just Another Coding Bot. Watch in real-time as JACoB crafts its own homepage via seamless integration between Figma and GitHub. This isn’t…
- JACoB - Just Another Coding Bot: no description found
- OranScribe - Write, Flow, and Grow | Product Hunt: OranScribe is a Content Creation platform that offers a seamless writing experience. From ideation to final output with AI-powered flows.
- ColBERT & Langchain’s Symphony with RAGatouille: Redefining AI Interaction
LangChain AI ▷ #tutorials (3 messages):
-
Multi-user RAG Chatbot for Students:
@rito3281
delved into the concept of Multitenancy to build a multi-user RAG chatbot tailored for students from different departments, ensuring data privacy and security using the Langchain framework and Qdrant Vector Database. A detailed exploration and guide are shared in their blog post, explaining how to set up multi-tenancy in Qdrant DB to make student inquiries department-specific. -
Empowerment for AI Developers:
@lhc1921
highlighted a YouTube video titled “AI Development - The Monthly Dev #37”, presenting a platform where world-class speakers empower the developers’ community with their insights, courtesy of daily.dev. -
Prompt Engineering Insights for Open-Source LLMs:
@juanpablomesa
emphasized the differences in prompt engineering between open-source LLMs like Mistral-7B-Instruct-v0.1 and Llama-2-7b-chat-hf, and closed-source models like GPT-3.5 and GPT-4. Their blog summary underlines the unique approaches required for effective prompt engineering in open-source LLMs, as detailed by Dr. Sharon Zhou.
Links mentioned:
- How to structure your API prompt calls for Open-Source LLMs: Explore how to prompt open-source LLMs like Mistral-7B and Llama-2-7b compared to GPT-3.5 and GPT-4, with practical coding examples.
- AI Development - The Monthly Dev #37: The Monthly Dev brings world-class speakers to empower the developers’ community, once a month. Made with ❤️ by daily.dev.Agenda:We’re excited to announce ou…
- Building a multi-user RAG chatbot in Langchain using Qdrant’s Multite: In this blog, we will create a multi-user RAG chatbot designed for students. This chatbot is available for students from various departments to inquire about topics related to their specific field. Fo…
LlamaIndex ▷ #blog (2 messages):
-
LlamaIndex Announces Bounty with @replit: LlamaIndex has partnered with @replit to offer $2,000 in bounties for building open source templates focused on advanced RAG (Retrieval-Augmented Generation). Check out this collaborative opportunity here.
-
Exploring RAG with LlamaIndex - A Guest Post by @CobusGreylingZA: The latest guest post discusses the use of agents for handling complex queries through RAG, showcasing multi-agent coordination and chain-of-thought reasoning across numerous documents and featuring re-ranking from @cohere. Discover the insights here.
LlamaIndex ▷ #general (37 messages🔥):
-
Embedding Fine-Tuning Inquiry:
@balanp
inquired whether for fine-tuning embeddings, both metadata and text from a textnode should be paired with questions to form a datapoint. The discussion evolved into whether data in both text fields and metadata keys are converted into vector embeddings in LlamaIndex. -
Converting CSV to JSON for Fine-Tuning:
@balanp
sought guidance on converting a CSV file, with questions and relevant context columns, into a JSON file suitable for fine-tuning embeddings in LlamaIndex. They were provided with a Python snippet for converting CSV to the required JSON format by kapa.ai. -
Fine-Tuning with Hugging Face Models: The conversation covered the feasibility of fine-tuning an embedding model from Hugging Face within the LlamaIndex environment, including specifying model IDs and passing tokenizer parameters.
@balanp
was interested in using"intfloat/e5-mistral-7b-instruct"
for fine-tuning and how to handle themax_length
parameter. -
Query Engine Integration challenges:
@balanp
also asked about incorporating a fine-tuned PyTorch and Hugging Face model into LlamaIndex’sSubQuestionQueryEngine
, seeking a method to employ their fine-tuned embedding model within the engine. -
AWS Sagemaker and Local Database Connections:
@refik0727
inquired about tutorials or GitHub code examples for using Llama with AWS Sagemaker and building a chatbot with a CSV file or connecting directly to a local database. Other users like@shinji3046
asked general questions about llama packs and their compatibility with open-source models such as Mistral, while@a3lita
reported issues with empty responses when attempting RAG on complex PDFs using a specific llamapack and hosting on Streamlit.
LlamaIndex ▷ #ai-discussion (1 messages):
andysingal: hallucination-leaderboard. https://github.com/vectara/hallucination-leaderboard
LAION ▷ #general (17 messages🔥):
- Warhammer 40k AI-generated trailer impresses:
@max_voltage
shared a fan-made trailer of Imperium Of Man - Warhammer 40k, highlighting its impressive use of AI generative tools. The video, found here, utilizes various AI tools to create an impressive showcase, especially noting the fire and explosions at 0:54 as standout moments. - Insights on AI Video Generative Tools: Discussion notes from
@pseudoterminalx
and@astropulse
highlight how AI-generated content, despite occasional uncanny elements, demonstrates good temporal consistency and potential uses in pitching movies and TV shows. A notable comment mentions the use of the same seed for every frame leading to a unique residual noise, akin to “looking at the world through some warped glass.” - AI models evoke mixed reactions: The discourse moves to specific AI models with
@pseudoterminalx
sharing an image produced by Terminus, prompting reflections on its capabilities and limitations. The post, offering a visual here, underscores how extraordinary results can sometimes highlight deficiencies in training datasets. - Inquiry about DALL-E 2 - PyTorch vs Stable Diffusion:
@homie115
seeks insights into the comparison between DALL-E 2 - PyTorch and Stable Diffusion, questioning improvements and current standings among AI image generation tools. - Technical Requests and Model Discussions in AI Community: Users inquire about practical applications and technical setups - from extracting text with OCR models (
@twoabove
asking for a lost link) to optimizing WhisperSpeech for streaming audio (@normilkyway
requesting setup help), and discussions on training conditions for CLIP models (@kal2296
questioning the feasibility of trained models with/out image transformations).
Links mentioned:
Imperium Of Man - Warhammer 40k: Imperium Of Man - Warhammer 40k is a fan-made (unofficial) trailer by JustMovies, produced using various AI generative tools. What started as a project a few…
LAION ▷ #research (17 messages🔥):
- Introducing MoE-LLaVA for Efficient LVLMs:
@nodja
shared a paper on arXiv introducing MoE-tuning and the MoE-LLaVA framework, aimed at improving Large Vision-Language Models (LVLMs) efficiency by activating only the top-k experts during deployment. This strategy enables the construction of sparse models with a high number of parameters but constant computational cost. - MoE-LLaVA Demonstrated on Hugging Face: Follow-up,
@nodja
also highlighted the MoE-LLaVA model’s implementation on Hugging Face’s platform, inviting the community for direct exploration. - CodeLlama 70b Challenges with Ethical Precautions:
@Ivannius
introduced the CodeLlama 70b Instruct version, noting its impressive humaneval score but also its tendency to moralize unnecessarily. He suggested using specific instructions to bypass the model’s ethical guidelines for more straightforward code generation tasks, available on Hugging Face. - InternLM-XComposer Excels at Captioning:
@mkaic
praised the InternLM-XComposer for delivering the best caption among all open source Vision-Language Models (VLMs) tested, especially highlighting its ability to notice details like a vent on the ceiling, showcased on Hugging Face. - MAGBIG: A New Multilingual Text-to-Image Benchmark:
@felfri_
shared MAGBIG, a newly proposed benchmark for evaluating multilingual text-to-image models, encouraging the community to use and share it. This dataset aims to advance the development and assessment of models on a broader linguistic scale.
Links mentioned:
- MoE-LLaVA: Mixture of Experts for Large Vision-Language Models: For Large Vision-Language Models (LVLMs), scaling the model can effectively improve performance. However, expanding model parameters significantly increases the training and inferring costs, as all mo…
- InternLM XComposer - a Hugging Face Space by Willow123: no description found
- MoE LLaVA - a Hugging Face Space by LanguageBind: no description found
- codellama/CodeLlama-70b-Instruct-hf · Hugging Face: no description found
- felfri/MAGBIG · Datasets at Hugging Face: no description found
DiscoResearch ▷ #disco_judge (1 messages):
huunguyen: <@213644857309134849> - any luck on the prometheus mistral model for en?
DiscoResearch ▷ #general (21 messages🔥):
-
German Orca DPO Dataset Discussions: Debate centers around the existence and preparation of a German Orca DPO dataset.
@johannhartmann
shared a Hugging Face dataset and mentioned azureml and hermeo tools used for translation.@_jp1_
hinted at work done on an original dataset with an intent to open-source it for improving German model training. -
Approaches to Data Augmentation and Translation: Bjoernp discusses a novel data augmentation technique, Web Rephrase Augmented Pre-training (WRAP), proposed by Apple, highlighted in a research paper which demonstrates significant improvements in pre-training efficiency.
-
DiscoLM German 7b and GermanRAG Dataset Release:
@rasdani
announces the release of DiscoLM German 7b and shares a Hugging Face link to the GermanRAG dataset, used for finetuning the model’s retrieval augmented generation capabilities. They highlight this dataset’s usefulness for RAG finetuning with varied contexts and fully formulated answers. -
New Public Dataset for RAG Fine-tuning by Philipmay:
@philipmay
introduces a new dataset generated with GPT-4 for RAG fine-tuning, featuring 124,961 German pairs of context, question, and answer. He mentions the ongoing addition of “rejected” answers to convert it into a DPO dataset, available on GitHub. -
Code Llama 70B Release and Llama Factory Discussions: Meta’s release of Code Llama 70B and the tease of Llama 3 are briefly highlighted with a link to Twitter. There is also a conversation about following generic recommendations from llama_factory readme for hyperparameter settings, without specifics for phi-2.
Links mentioned:
- Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling: Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such data requires an abundance o…
- Tweet from kache (yacine) (KING OF DING) (@yacineMTB): HOLY FUCKING SHIT ITS REAL ↘️ Quoting AI at Meta (@AIatMeta) Today we’re releasing Code Llama 70B: a new, more performant version of our LLM for code generation — available under the same license a…
- DiscoResearch/germanrag · Datasets at Hugging Face: no description found
- mayflowergmbh/intel_orca_dpo_pairs_de · Datasets at Hugging Face: no description found
- aari1995/ultradistil-intel-orca-dpo-de · Datasets at Hugging Face: no description found
DiscoResearch ▷ #embedding_dev (3 messages):
-
Simon Willison dives into ColBERTs mysteries:
@jp1
shared an insightful article by Simon Willison on ColBERT, a model that challenges the standard embedding approach by allowing scalable BERT-based search. Unlike usual embedding models that store a single vector per document, ColBERT stores multiple, enabling a more nuanced retrieval. -
BGE_M3: A Multilingual Marvel Unveiled:
@sebastian.bodza
introduced BGE_M3, a new multilingual model that combines dense retrieval models with sparse and multi-vector approaches like ColBERT. Its development is detailed on GitHub. -
Pro Tip for BGE-large Users:
@sebastian.bodza
also offered a key update for BGE-large users, suggesting the inclusion of a prompt in short2long retrieval queries to significantly enhance performance.
Links mentioned:
- Exploring ColBERT with RAGatouille: I’ve been trying to get my head around ColBERT .
- FlagEmbedding/FlagEmbedding/BGE_M3 at master · FlagOpen/FlagEmbedding: Dense Retrieval and Retrieval-augmented LLMs. Contribute to FlagOpen/FlagEmbedding development by creating an account on GitHub.
DiscoResearch ▷ #discolm_german (1 messages):
- DiscoLM German 7b v1 Drops: User
@ustoll
sought deployment advice for DiscoLM German 7b v1, a Mistral-based model focused on German language applications, succeeding the EM German model family. They inquired about low-friction services similar to together.ai or anyscale for deploying the model.
Links mentioned:
DiscoResearch/DiscoLM_German_7b_v1 · Hugging Face: no description found
Latent Space ▷ #ai-general-chat (21 messages🔥):
-
Stable Diffusion Spotlight in VFX Jobs:
@swyxio
highlights the integration of stable diffusion technologies in VFX job descriptions, linking to a tweet by @venturetwins about a major VFX studio owned by Netflix expanding into AI roles. The discussion continues with@coffeebean6887
sharing a job listing from Eyeline Studios detailing the demand for expertise in generative imaging and machine learning for revolutionizing storytelling. -
Latency Challenges with LLM Responses:
@austintackaberry
expresses frustration over the extra latency experienced when not seeking complex LLM responses, especially when direct links are not promptly highlighted. -
Amusement Over Future Job Requirements:
@guardiang
jokingly shares apprehension about future job postings demanding years of experience in Stable Diffusion and/or Midjourney, reflecting on the rapidly evolving AI landscape and its impact on employment standards. -
Innovative Paper on LLM Training Efficiency:
@swyxio
shares a new paper by Quentin Anthony on optimizing hardware utilization for transformer model training, urging a shift in mindset towards viewing models through the lens of GPU kernel calls to mitigate inefficiencies. -
Codeium Hits Series B:
@swyxio
celebrates Codeium’s advancement to Series B funding, with a congratulatory note to the team, including a tweet marking the achievement.@.prem
, associated with Codeium, acknowledges the milestone, highlighting the excitement around the company’s growth.
Links mentioned:
- Tweet from Justine Moore (@venturetwins): A major VFX studio owned by Netflix is hiring in a bunch of AI roles: Generative imaging, workflow design, model training, data acquisition, and even ML researchers We’re going to be seeing a lo…
- Tweet from Quentin Anthony (@QuentinAnthon15): Getting the most out of your hardware when training transformers requires thinking about your model as a sequence of GPU kernel calls. This mindset, common in HPC, is rare in ML and leads to inefficie…
- Scanline VFX - Research Scientist, Computer Graphics, Computer Vision, and Machine Learning: As a Senior Research Scientist, you will develop new technologies to revolutionize live-action content creation and storytelling. You will conduct applied research in computer vision and computer grap…
Latent Space ▷ #llm-paper-club (2 messages):
-
New Pythia Paper Reveals 20% Speed-up Missed by Many:
@swyxio
shared insights from BlancheMinerva’s tweet about a crucial hardware-aware design that can lead to a 20% throughput improvement for 2.7B LLMs. This tweak, overlooked due to copying GPT-3’s architecture, has been detailed in the paper found at arXiv:2401.14489. -
Curated List of Influential AI and NLP Resources:
@ivanleomk
came across a comprehensive list on Twitter, featuring landmark resources in AI and NLP including The Annotated Transformer, The Unreasonable Effectiveness of RNNs, and more key readings and papers beneficial for understanding AI models and their formulation. This collection serves as a valuable starting point for those looking to deepen their knowledge in AI and NLP.
Links mentioned:
- Ilya 30u30:
- Tweet from Stella Biderman (@BlancheMinerva): Are you missing a 20% speed-up for your 2.7B LLMs due to copying GPT-3? I was for three years. Find out why and how to design your models in an hardware-aware fashion in my latest paper, closing the …
Alignment Lab AI ▷ #general-chat (3 messages):
- Announcing Lilac Garden:
@nikhil_thorat
announced Lilac Garden, a new cloud service for accelerated dataset transforms built on Lilac, featuring LLM-powered clustering as the first service. The announcement was made on Twitter. - OpenOrca Dataset Hosted on Lilac: As part of the Lilac Garden launch, the entire OpenOrca dataset, with embeddings and clusters precomputed, is now hosted on Lilac, available here.
- Exploring OpenOrca Clusters:
@nikhil_thorat
shared a direct link to explore the clusters within the OpenOrca dataset on Lilac, providing users with a detailed view on how to navigate the dataset clusters.
Links mentioned:
- no title found: no description found
- no title found: no description found
Alignment Lab AI ▷ #looking-for-workers (1 messages):
- WashU Startup in Search of Founding Engineer:
DoubleMint
announces a startup collaboration with Washington University in St. Louis looking for a founding engineer well-versed in Next.js. With a first Letter of Intent signed for $50,000, they’re eager to scale up and are also interested in skills related to TailwindCSS and Supabase.
LLM Perf Enthusiasts AI ▷ #general (2 messages):
-
Short and Sweet Gratitude: User
@an1lam
expressed their thanks with a simple, “Thanks!” -
Inquiry about Gemini Pro in Production:
@res6969
inquired if anyone has conducted experiments with Gemini Pro in a production environment, seeking insights or results from these experiments.
AI Engineer Foundation ▷ #general (1 messages):
- Open Source Projects Wanted for AI Engineer Foundation: User
@hackgoofer
called on@everyone
to share and recommend Open Source projects for joining the AI Engineer Foundation. A Guide to Submit Projects was shared for interested parties.
Links mentioned:
Guide to Submit Projects to AI Engineer Foundation: no description found