> AI Discords for 2/13-15/2024. We checked **20** guilds, **312** channels, and **10550** messages for you. Estimated reading time saved (at 200wpm): **909 minutes**. Due to a config bug we summarized 2.5 days worth of conversations by accident.

If you’re reading this you probably are aware of the absolute mayhem unleashed the day after Valentine’s. We covered Gemini 1.5 and Sora on a live ThursdAI podcast so you can get our takes there, but also we have been tracking the must-see and must-know Sora takes on the Latent Space discord. Of course, we weren’t alone.

image.png

This was a rough day to launch anything if you aren’t a frontier model lab.


Table of Contents

[TOC]

PART 1: High level Discord summaries

TheBloke Discord Summary

  • Dungeon Master AI Development Queries: Engineers discussed the creation of a Dungeons and Dragons DM assistant capable of cross-referencing rules, locating stat blocks, and accessing lore, while also producing new world-building content. Models like H20 GPT were suggested for document question answering, with challenges such as understanding tables and cross-references being noted.

  • Power Supply Concerns for High-End GPUs: There’s a debate on the feasibility of running a 3090 and 3060 GPU on a 650W PSU. Engineers stressed the risks involved and the importance of cable management, highlighting the potential power limitations to 250W for the 3090 and 150W for the 3060 to avoid overtaxing the PSU.

  • Cutting-Edge AI Sparks Discussion: The guild examined Google’s Gemini 1.5 and OpenAI’s Sora—impactful technologies with applications ranging from long token context handling to generating detailed minute-long video simulations. There’s anticipation for their use in serious media production and cautious optimism expressed by the community.

  • Complexities of Running Large World Models (LWM): Members shared struggles with the multimodal functionality of LWM, discussing computational resource limits and technical intricacies. There is a shared experience of difficulties in making these complex models operational.

  • GPT-Assisted Coding: The utility of LLMs like GPT and Copilot for coding support was debated. Some members valued these tools for initial code drafting and documentation, while others pointed out limitations such as missing edge cases, suggesting that these models are complementary tools rather than replacements for human expertise.

  • Role-Play Model Optimization Techniques: For role-play and story-writing, different settings were debated for various models like Yi models and Mixtral Instruct. Adjustments in temperature settings between 0.6 to 0.7 were recommended, with an emphasis on balancing diversity and coherence in outputs.

  • Exploration of Training Effects: Guild members encountered higher training and evaluation losses with Mixtral versus Mistral, with experiments ongoing to determine whether it’s a bug or a feature of setup choices. Other discussions involved data cleaning for datasets using tools like pandas and regex, and the use of M2 Max for running 70B models, with tips shared on increasing RAM usage.

  • Discord Bot Fine-Tuning: Information and potential scripts to fine-tune models were shared, including ā€œAutoFineTuneā€, a script capable of generating synthetic message pairs for smaller models, discussed as part of an effort to simplify the fine-tuning process.

  • Model Merging Hurdles: An encountered RuntimeError was shared when attempting to merge two MistralCasualML models with differing context sizes, highlighting a tensor size mismatch. Community members were seeking solutions to this and related issues.

  • JSPyBridge Facilitates Cross-Language Engineering: Engineers shared success in integrating Python and JavaScript through JSPyBridge, demonstrating pragmatic examples such as creating new JavaScript classes that interact with Python, adjusting BigDL’s LLM transformer for specific file types, and handling device tensor discrepancies—all critical details for AI engineers looking to interweave diverse technologies.


LM Studio Discord Summary

Chat with RTX Generates RAG Excitement: NVIDIA’s ā€œChat with RTXā€ feature, utilizing retrieval-augmented generation (RAG) on Nvidia 30+ series GPUs, has been contrasted with LMStudio, which supports RAG but is currently limited to Mistral 7b and Llama 13b models.

Geminis Giant Leap in Context: Conversations are abuzz with Google’s Gemini 1.5 model boasting a 1 million token context window; access remains invite-only, underscoring the gap between proprietary and open-source AI tools.

Sora’s Synthetic Cinema: OpenAI’s Sora model, capable of generating videos from text up to a minute long, is on engineers’ radars. With availability initially to a select group, its implications for evidence credibility are under scrutiny.

Model Support and LM Studio Features in Spotlight: Yi-VL models are pending an update to be compatible with LMStudio due to new llama.cpp requirements. Meanwhile, users discuss LMStudio features ranging from enabling function calling to overcoming model and software restrictions.

RAM Bug Uncovered in LMStudio: An acknowledged bug in LMStudio misreports system RAM, misleading users like @pdx_, who saw no change indicated in the software after a hardware upgrade to 64Gb.

Cost and Compatibility Guide the Hardware Debate: Discussions around hardware for LLM tasks involve detailed cost comparisons for high-end builds, potential GPU mixing for optimizing performance, and overclocking intricacies.

Quantum Compression and AVX Instructions: A new development in model compression, specifically 1.5 bit quantization, is expected to greatly improve efficiency, allowing large models to operate on reduced hardware. In the meantime, users are advised to utilize an AVX beta release for CPUs lacking AVX2 support.

Humorous Take on AI Work Ethic and Errors: @wolfspyre brought levity to the conversation with a comical inquiry if bots need to work and a playful depiction of bots stuck in a repetitive output loop.


OpenAI Discord Summary

  • ChatGPT Remembers: ChatGPT introduced a new memory feature to remember past conversations, with user controls for memory management being tested among Free and Plus users; details are outlined in OpenAI’s blog post.

  • OpenAI Introduces Sora: Sora, a model that generates short videos from text descriptions, was announced, targeting red teamers and creative professionals for initial feedback, as mentioned on OpenAI’s Sora introduction page.

  • Google’s AI Joins the Fray: The AI community compared Google’s GPT model, priced similarly to OpenAI’s models, discussing Google’s strategic positioning and the enhanced capabilities of Gemini Advanced - for more info check Google’s Gemini post.

  • GPT-4’s Learning Curve and System Strain: Reports of service outages and issues with GPT-4 context retention prompted discussions on performance challenges, while users eagerly discussed the implications of the Sora model for creative fields, despite its current inaccessibility.

  • Prompt Engineering Deep Dive: Users delved into strategies for engaging with GPT, optimizing token usage, and crafting prompts for structured outputs like yes/no answers, utilizing resources like the behavioral adjustment tool for prompt refinement and service improvements.

  • Challenges in Image Rotation and GPT Interactions: Frustrations were voiced regarding DALL-E 3’s 50/50 success rate with image orientation and the disappearance of webp files, as well as the importance of balancing grammer precision with token economy for prompt optimization, reflected in Discord conversations.


Nous Research AI Discord Summary

  • Introducing QuIP# - State-of-the-art in Quantization: A research paper details QuIP#, a method for post-training quantization of large language models that achieves impressive results with 2-3 bit precision, potentially outperforming existing methods in highly compressed regimes.

  • New Advances and Speculation in AI: Discussions include ZLUDA, a tool to run CUDA on AMD GPUs — though reportedly abandoned — and anticipation around a mysterious new architecture with comparisons made to DeepMind’s RETRO. Meanwhile, speculations humorously suggest future paper naming conventions, such as ā€œoptimal 0.5 bit quantization.ā€

  • AI-Assisted Content Creation Blossoms: OpenAI’s announcement of Sora, a text-to-video model, ignited excitement among users, marking a significant step in AI-generated video content. Sharing and dissecting breakthroughs, from model routing analysis to QuIP# and MIQU’s training, showcases the guild’s dedication to technical exploration.

  • Evaluating and Hosting AI Models: Practical recommendations were shared for AI model assessment, such as using the LM Evaluation Harness, and hosting options like Together for Deepseek Coder. For vision language models, Replicate and Alibaba were recommended, despite some rate limit concerns.

  • Collective Cognition Project Hits a Snag: The project faced downtime, linked to new modes in chat GPT that broke the website, suggesting that maintenance challenges have led to a period of inactivity.


Eleuther Discord Summary

  • Direct Principle Feedback Tackles the Avoidance Issue: EleutherAI introduced a new method, Direct Principle Feedback (DPF), outperforming traditional models and matching GPT-4 for guiding chatbots to avoid unwanted topics, detailed in their recent paper, which can be accessed here.

  • Language Model Harness Troubleshooting: @christianpala provided a suggested fix for issues with the lm-evaluation-harness when adapted for local models and tokenizer. Users inquired about open-book and Chain of Thought (COT) prompts support and were advised on Python version compatibility for using the harness with older versions.

  • Exploration of Model Training and Pre-Training Techniques: A member questioned pre-training encoder-decoder parts for seq2seq tasks like machine translation, initiating a discussion on its efficacy. Another flagged potential alignment issues between training data batches and checkpoints in Pythia-deduped for 2.8b models, with another member committing to inspect this concern.

  • Safety, Security, and Inferring Capabilities in LLMs: Researchers discussed the implications of secret collusion via steganography among AI agents and the memorization capabilities of LLMs, highlighting the risks as their abilities evolve. A new collaborative scrutiny is suggested to probe the replicability of Pythia findings post-training discrepancy in some models.

  • Interpretability Methods and Challenges Sparse: Users expressed a need for updated overviews on AI interpretability, discussed interpretability in vision transformers and diffusion models, and sought approaches for evaluation techniques applied to propensity evaluations in models.


Mistral Discord Summary

  • Mistral Outperforms on Sturdy Servers: Users underlined that Mistral’s performance is heavily dependent on server prowess and load conditions, highlighting that even smaller models like GPT-4 can excel if the server is unfettered.

  • Intern Seeks Finetuning Wizardry: An influx of requests surfaces around finetuning Mistral, with users sharing a plethora of materials such as Jupyter notebooks, Kaggle, and Hugging Face’s AutoTrain, while interns share their daunting tasks, including transforming infrastructures with Kubernetes.

  • Latency Lurks in Mistral API’s Shadows: Reports of high latency issues with Mistral API’s ā€˜completions’ endpoint arise, with users being directed to consult Mistral support for remediation.

  • Mistral Mysteries Unveiled: While Mistral’s training data remains shrouded in secrecy, details emerge about Mixtral Instruct, a 6.48B parameter model with I32 and FP16 tensor type support hosted on Hugging Face, boasting over 8,430 recent downloads. A query about the distinctions among various Mistral 8x7B fine-tunes spirals into a discussion about dataset specificity for fine-tuning.

  • Fine-Tuning Frustrations and Successes: Engineers explore fine-tuning Mistral 8x7B using Apple’s MLX, with resources and scripts from GitHub repositories like mlx-examples being circulated for potential guidance, while another repository signals ongoing development for better MLX compatibility.

  • NVIDIA’s Novel Chatbot Chat with RTX: NVIDIA debuts a customizable chatbot, Chat with RTX, powered by RTX 30 Series GPUs, eliciting comparisons with other chatbot solutions and proving to be a topic of fascination within the community.

  • European Internship Quest and PDF Pandemonium: A French librarian scouts for internships alongside discussions on the parsimonious budgets for S2S models, while users battle the woes of PDF data extraction and laud the launch of a new character AI website.

  • GDPR, Chatbots, and Payment Pathways on La Plateforme: Queries about GDPR compliance with Mistral’s APIs lead to sharing of the data processing agreement. Meanwhile, members guide a new subscriber on setting up a ChatGPT-like bot with resources like Mistral’s Python client library and discussions on payment methods, including the absence of PayPal for Mistral, steer towards workarounds.


LAION Discord Summary

  • Efficiency in Image Generation Gets a Boost: @chad_in_the_house has applied lfq 2^17 to ImageNet, focusing on rapid training via the Muse architecture. Envisioning new fine-tuning prospects for vqgans, this stride could mark a leap for image generation processes.

  • Safety vs. Functionality Tradeoff in AI: OpenAI’s commitment to safety is viewed with concern as .undeleted speculates that extreme safety tuning might render models too expensive and impractical for certain applications. This conversation reflects an underlying tension between AI safety and utility.

  • The Quest for Quality Synthetic NSFW Datasets: Echoing struggles within the AI community, @progamergov points out the challenges in procuring high-grade synthetic NSFW content for datasets, and criticizes Civitai’s subpar outputs. This discussion highlights a niche but critical aspect of dataset development in AI.

  • Video-Linguistic Models Eye New Frontiers: RingAttention has been identified as a promising approach for parsing extensive video and book datasets, as touched upon by @spirit_from_germany and @max_voltage. This technique is earmarked for its potential impact on long-sequence training.

  • Exploring OpenAI Sora’s Text-to-Video Paradise: @qwerty_qwer brings attention to OpenAI’s Sora, a transformative text-to-video model flaunting the capacity to conjure richly detailed scenes. Despite its awe-inspiring demo, the closed access nature raises some concerns within the community about its broader adoption and transparency.


HuggingFace Discord Summary

  • Hugging News Unwrapped: The Hugging Face community has been buzzing with diverse updates, including the launch of new APIs and model compatibilities, advancements in community contributions, and innovative application tools. Additionally, the upcoming reading group session is set to cover the paper ā€œMamba: Content-Based Reasoning for Foundations Models,ā€ which discusses computational inefficiencies in Transformers. (89th edition of Hugging News, 45th edition of Community Highlights, Read ā€œMambaā€ paper)

  • Snap Detection Quest & Gemini Pro Insights: Discord users in the general channel discussed various topics including the hunt for real-time finger snap detection in video/audio, difficulties with token issues in Hugging Face Spaces, a user’s blog post gaining traction, debunking myths about stolen data in Mistral and other LLMs, and buzz around Google’s Gemini 1.5’s improved long-context understanding. No specific solutions or papers provided for snap detection. (Google’s Gemini 1.5)

  • Sheet-savvy and Modulated Melodies: Learning new skills is highlighted in the today-im-learning channel, with discussions around the finesse of merging Google Sheets, learning achievements with DoReMi reproduction using FP8 3D parallelism, delving into diffusors and transformers, exploring face-swapping programs, and discussions around custom NER tagging in language models with references to specific datasets. (few-nerd dataset, conll2003 dataset)

  • MoE Discussion Strikes Chord: In the cool-finds channel, members discussed the potential threats posed by vulnerabilities in Mixture of Experts models. Attention was also drawn to a project capable of parsing long text and video data over one million tokens, the merging of online and offline reinforcement learning algorithms, and SPIN, a method enabling Language Learning Models to mimic human reactions. (Paper on MoE security issue, DeepMind’s largeworldmodel project)

  • RAGs to Riches & Local LLMs: The i-made-this channel showcased member creations and projects such as RAG-based applications with a plethora of text2image prompts, hosting of large language models for free via LocalLlm on Colab, the release of tokviz for visualizing model tokenization, and the introduction of UI interaction models PTA-Text and generative coding models Trinity and Neo. (LocalLlm on GitHub, tokviz documentation, Trinity Space, Neo Space)

  • LangTest Dive and Seed Selection: The reading-group channel hosted conversations on the application of the LangTest library for safe LLMs, arranging presentations on model merging, addressing questions about the Mamba paper, exploring the effects of random seed selection, and discussing works related to seeds and model performance. (LangTest publication, Mamba paper discussion)

  • Shaping Dreams and Cascading Conversation: Diffusion-discussions saw members report success with image generation from text using Stable Cascade, inquire about deploying models on SageMaker, generate images using serverless APIs, and solve problems associated with vanishing gradients during model fine-tuning. (Lykon/dreamshaper-8 discussions)

  • Visionary Queries and PTA-Text Showcase: The computer-vision channel experienced queries and discussion on topics such as gaussian splats, multimodal projects, improving image retrieval systems, transforming hairstyles using generative models, and unveiling a project focused on multimodal UI interactions, PTA-Text. (PTA-Text Space, Model checkpoint)

  • Text and Voice Transformers Talk: Conversations in the NLP channel covered XLM-RoBERTa language extraction, translations into algebraic representations, simulating voices and changing languages with transformers, introducing the PTA-Text project for UI interaction, and discussing its capabilities and current limitations. (XTTS model space)


Perplexity AI Discord Summary

  • Slack to Receive Perplexity Updates: Perplexity is rolling out a new feature called Perplexity Push that will allow topic subscriptions within Slack, aimed to streamline team communications and information sharing.

  • Perplexity AI Unveils New Models and Faces Reliability Issues: Perplexity shared details regarding its pplx-7b-online and pplx-70b-online models for API integrations, while users reported intermittent failures and inconsistencies in API responses. Meanwhile, speculation around an unconfirmed pplx-8x7b model stirred curiosity, but no official information on availability or pricing was given. The Gemini 1.5 AI model by Google was announced, noting its potential one million-token context window.

  • Resourceful Community Engages and Shares Perplexity Content: Users engaged with Perplexity AI’s features including an alternative Alt-D Feed for community collaboration and discussed bookmarking limitations. A GitHub repo integration with structured data and logic patterns was teased by a user, while another shared their success story using Perplexity AI for a DIY hair tutorial.

  • API Pain Points Need Addressing: Various users expressed frustration over the Perplexity AI models delivering unreliable API results, calling attention to inconsistencies and hallucinated content in responses. A user provided a guide to integrate Perplexity AI with LangChain, looking to help others overcome issues with model substitutions in applications.


LlamaIndex Discord Summary

  • No-Code Revolution for AI Workflows: A webinar on building no-code RAG (Retriever-And-Generator) is announced for Friday at 9 am PT. The session, led by Henry Heng from FlowiseAI, is set to explore creating LLM-powered workflows using LlamaIndex.TS and Flowise integration, targeting those seeking to bypass coding steps. Register for the informative webinar.

  • DanswerAI Empowered by LlamaIndex: The integration of DanswerAI with LlamaIndex technology promises to enhance workplace tool efficiency. LlamaIndex highlights this collaboration, along with a series of other educational content, including scientific research workflow tutorials and guidelines for building custom agents with LLM. The featured notebook and video tutorial are bridging the gap for AI engineers.

  • Arize-Phoenix Enhancements Incoming: Metadata tagging in tracing user queries is undergoing improvements and expected in the coming week, as confirmed in an update related to Arize-Phoenix. An issue with SimpleDirectoryReader misinterpreting DOCX files has been resolved with the newest llama-index-core version, and a community Discord server has been set up for integration support.

  • Real-Time RAG Optimization Conversations: LlamaIndex users discuss real-time optimization of RAG pipelines through user feedback, suggesting the use of reranking based on scores. Providing actual code snippets, the community offers insights for separating retrieval and synthesis steps for more effective real-time evaluation.

  • Integration Troubles & Solutions Shared: Users share solutions to common integration issues such as excluding metadata from custom QA templates, with suggestions to set exclusion keys before data ingestion. Additionally, resources like Excalidraw for collaborative whiteboarding and Notion for workspace organization are mentioned, along with a diverse range of LlamaIndex documentation and GitHub examples provided for various use cases.


LangChain AI Discord Summary

  • Journaling App Integrates Memory with LangChain: LangChain introduces a new journaling app featuring memory capabilities, currently in the early stages with feedback sought. Access a demonstration via Loom video and the app itself at Journal by LangChain.

  • LangSmith’s Leap Forward: The general availability of LangSmith is announced along with a $25M Series A led by Sequoia Capital, a new homepage, brand, and career opportunities. Insights available in their blog post, with features on Forbes and Product Hunt.

  • Pinecone and Langchain Dependency Challenges: Peer dependency conflicts arise between Pinecone v2 and LangChain, with solutions including the use of npm install --legacy-peer-deps or version bumps discussed. Optimization tips for RAG pipelines based on user feedback were exchanged, including manual inspection and parameter adjustment.

  • LangServe Development Discussions Unfold: Topics ranged from overcoming Image Base64 encoding issues in LangChain playground to ā€œconnection refusedā€ errors within Kubernetes clusters. Deployment questions concerning Langchain/LangServe apps prompted mentions of using Vercel and Replit for web accessibility.

  • AI Innovation and Knowledge Sharing: A Reverse Job Board at AI Devs Work provides a platform for AI talent recruitment. A guide on creating a goal-setting assistant and a tutorial on building a LangChain.js-powered question-answering CLI with Dewy showcase application building chops. Additionally, ā€œMulti Document RAG using LangChain codes explainedā€ video tutorial is highlighted, offering education on implementing Multi-Document RAG Agents.


OpenAccess AI Collective (axolotl) Discord Summary

  • Keras As a Universal Model Adaptor: Engineers discussed porting models to Keras for broader hardware support, as it now functions as an independent abstraction layer over frameworks like Torch, TF, and Jax.
  • Checkpoint Fiasco Finds a Fix: A link to a problematic pull request on the HuggingFace repository was shared, which was identified to be causing checkpoint saving errors. This issue was also connected to recent outages experienced by HF.
  • Efficient LLM Hosting Solutions Debated: Cost-effective hosting for large language models was a hot topic, with Together AI, OpenRouter, and services like basten being suggested. Additionally, NVIDIA’s RTX-based demo app, Chat With RTX, was brought up as a way to run personalized GPT models on local RTX hardware.
  • Serious Schema Strategies: JSON schema for pairing user and assistant messages was recommended for better dataset structuring, while there was a push for flexibility in role naming within the message schema to avoid influencing model behavior.
  • Real-Time LoRA Adapter Flexibility: The feasibility of adding LoRA adapters to a base model in real-time was confirmed to be possible with the HF framework, presenting a dynamic way to manage PEFT models.

CUDA MODE Discord Summary

  • Long Live LLMs with Math Skills: The fine-tuning of a 7B parameter model on business data could deteriorate its performance on mathematical questions, as noted by @mertbozkir. For math-intensive tasks, engagement with domain-specific methods or models like internlm, metamath, arithmo might be necessary.

  • GPU Market Volatility Hits Techies: Recent conversations have highlighted the frustration with the fluctuating prices and availability of GPUs like the 3090s, with @joseph_en and @andreaskoepf sharing their experiences of cost spikes and referring to GPUs as ā€œGPU gold.ā€

  • CUDA Compatibility Quest: Multiple users, including @_tvi_, @shikhar_7985, and @btdubbins, discussed the struggles of maintaining different CUDA versions for compatibility with other systems like PyTorch and FAISS. @marksaroufim recommended Conda for managing CUDA versions with PyTorch.

  • Associativity in Algorithms Challenged: There’s skepticism regarding the practicality of function representation, associativity, and classes from @andreaskoepf, @_tvi_, and others, with @euclaise sparking the conversation about using function composition similar to prefix-sum operations.

  • Search for Fun and Education in CUDA: Users, including @euclaise and @marksaroufim, are discussing CUDA educational resources with an emphasis on enjoyment. Suggestions like The Book of Shaders were mentioned, but no particular CUDA book was singled out as being notably fun.

  • Matrix Magic in Memory: Discussions unfolded around the performance impact of keeping vectors in sequential memory for dot products, optimal index orders in loops, and the application of atomicAdd operations in shared memory, without definite consensus on best practices.

  • Lecture Legwork: Queries about the organization of CUDA-related YouTube content, such as Lecture 5, spurred users to suggest solutions like a comprehensive playlist or adding videos directly to Cuda’s official YouTube channel.

  • The TensorFlow Conundrum and PyTorch vs. JAX: Rumors of TensorFlow’s potential discontinuation were brought up briefly by @spacyphus, while @marcom79 initiated a comparison between JAX and the upcoming PyTorch 2.0, with no detailed discussion ensuing on these topics.


LLM Perf Enthusiasts AI Discord Summary

  • Gemini Pro 1.5: Bigger, Longer, Uncut Context: Gemini Pro 1.5 has been a hot topic, with @wenquai highlighting its impressive 1 million token context window, while @thebaghdaddy brought attention to Jeff Dean’s post claiming an even more remarkable ten million token context window, including the ability to process extensive multimodal content. This information surfaced along with discussions surrounding the effectiveness of such large context windows, where skepticism was noted regarding models’ performance with token windows past 50-60k. Jeff Dean’s Twitter post announcing Gemini Pro 1.5’s developer preview captured the attention for its capabilities and forthcoming wider release.

  • Surya OCR Trumps Tesseract: Converting 35k PDFs into data has delivered a significant financial punch for @res6969 due to high processing costs using a vision transformer. @robhaisfield chimed in with Surya OCR, a new OCR tool reported to outdo Tesseract across 93 languages, potentially offering a cost-effective solution.

  • Engineering Minds, Grab Your Slice!: The AI community in Singapore is buzzing with a meet-up opportunity posted by @ivanleomk, promising a project hacking session with free pizza at Funan Mall, organized by Gabriel Chua, Jon Jon, & tengfone. For those looking to mingle with minds alike, there’s one slot up for grabs with registration details available online.

  • Rumors and Releases from OpenAI: GPT-5 rumors and the humor they’re generating among enthusiasts was noted by @res6969. On a more concrete note, OpenAI is showcasing innovation, testing ChatGPT’s memory capabilities, as described in their blog post. They also debuted Sora, a text-to-video AI model, now undergoing red team testing to identify potential risks.

  • GPT-4 Enigma: A single message with a cryptic ā€œyeahā€ from robotums under the #gpt4 channel seems to reflect the terse mystery that often surrounds emerging technologies.


Alignment Lab AI Discord Summary

  • Fine-Tuning Follies: After fine-tuning a 7B parameter language model on business data, users noted a likely degradation in math performance, suggesting the intensity and duration of fine-tuning determine the impact.

  • Optimism Versus Pessimism in ML Stability: The conversation highlighted a dichotomy in reinforcement learning where optimism is essential for exploration, while applying pessimism during inference could lead to more stable sequential decision-making in machine learning systems.

  • Business Instruction Extraction Quest: A user sought guidance on extracting business-related instructions from the teknium/OpenHermes-2.5 Instruction dataset, although no specific methodologies or resources were provided.

  • Trouble in Discord Town: There were concerns raised regarding a user’s potential technical issues with Discord, with suggestions favoring direct messaging as a solution.


Skunkworks AI Discord Summary

  • LLaVA Integration Challenges Go Unanswered: @CodeMan sought advice on configuring LLaVA for use with an SGLang server and SGLang worker, departing from the typical model worker approach. The query remained unanswered, indicating a gap in community knowledge or engagement on this topic.

  • Business Data Quest in OpenHermes Dataset: @sabu7003 is searching for techniques to extract business-related instructions from the teknium/OpenHermes-2.5 Instruction dataset, highlighting a need for targeted data isolation methods in this dataset.

  • Fine-Tuning Finesse for Business Data: @sabu7003 also raised a question about the effectiveness of a 7B parameter LLM in mathematics problems after being fine-tuned solely on business information, a query that went without community input or exploration.

  • Random Seed Learnability Sparks Debate: @stereoplegic and @aspott sparked a discussion on whether random seeds could be learnable parameters within AI models, with @aspott noting the impossibility of obtaining a gradient from a seed and suggesting learning an initialization function as an alternative pathway.


AI Engineer Foundation Discord Summary

Weekly Sync-Up Time: The weekly meeting was initiated with an announcement tagging @._z.

Hackathon Hosting Huddle: An invitation was extended to co-host an AI developers hackathon by @caramelchameleon, considering the proximity to the Game Developers Conference and inviting both online and onsite participation.

Hackathon Experience on the Table: @yikesawjeez indicated interest in the hackathon opportunity, drawing from their background in organizing such events in the Bay Area.

Investor Matchmaking Event: @atalovesyou publicized a chance for startup founders to engage with over 30 venture capital firms at an investor matchmaking session; additional slots are available at Founders x VC Event for interested individuals.


Datasette - LLM (@SimonW) Discord Summary

  • Less Compute, Same Power with Gemini 1.5: Google announced its Gemini 1.5 Pro, maintaining the performance of Gemini 1.0 Ultra but with reduced compute needs, featuring a context window capable of handling 1 million tokens.

  • Prompt Engineering vs. Token Capacity: The discussion highlighted the trade-off between prompt engineering and the direct input of relevant data, spurred by the increased token handling capacity of models like Gemini 1.5. As models improve, the skill of prompt engineering could become obsolete if larger contexts can be managed more economically.

  • Token Stretching Not Yet Standard: Though Google has tested models with up to 10 million tokens, the decision to release Gemini 1.5 with a 1 million token context window suggests external constraints such as cost. This could imply that prompt engineering will retain its relevance in efficiently interacting with models in the near future.


PART 2: Detailed by-Channel summaries and links

TheBloke ā–· #general (1263 messagesšŸ”„šŸ”„šŸ”„):

  • Contextual Inquiry for DnD Assistant: @zackman634 is looking to create a Dungeons and Dragons (DnD) Dungeon Master (DM) assistant that can cross-reference rules from various books, find stat blocks, and access lore, while also being able to generate new content on demand for world-building. They received suggestions to try models like H20 GPT for document question answering and were advised about the complexities regarding models understanding tables and cross-references.
  • Concerns over PSU Capacity: @kalomaze is worried about running a 3090 and 3060 GPU on a 650w PSU, having power-limited the GPUs to 250w and 150w respectively. Fellow users like @felixsanz and @alphaatlas1 warned about the risks, and the importance of not daisy-chaining the power cables was discussed.
  • Google’s Gemini 1.5 and OpenAI’s Sora Make Waves: Users discuss Google’s recent announcement of Gemini 1.5 with up to 1M token context and OpenAI’s SORA which can generate minute-long videos with detailed simulations. @nextdimension highlighted SORA’s capabilities and how it may soon be utilized in serious applications like films, and @itsme9316 expressed cautious optimism towards such leaps in technology.
  • LWM Multimodal Woes: Multiple users including @itsme9316 and @mrdragonfox share their struggles with trying to get Large World Model (LWM) multimodal functionality to work. Issues were posted about reaching the limits of computational resources, and users find commonality in their inability to operate the model due to technical complexities.
  • GPT for Coding Assistance: Discussion around the utility of using LLMs like GPT and Copilot for coding support, with mixed opinions on efficacy. While @mr_pebble appreciates the help in initial code drafting and documentation, @nextdimension and others note that LLMs tend to miss complex edge cases, indicating that these tools complement rather than replace the need for human insight in software development.

Links mentioned:


TheBloke ā–· #characters-roleplay-stories (301 messagesšŸ”„šŸ”„):

  • Fine-Tuning Model Preferences: @dreamgen and @neriss discussed appropriate settings for role-play/story-writing with different models. @neriss recommended a temperature of 0.7 for Yi models, while suggesting that base Mixtral Instruct may be a superior option if the hardware supports it.

  • Model Training and Temperature Insights: @neriss explained that higher temperatures increase the diversity of model outputs at the cost of coherence. For more creative outputs, a higher temperature is advisable, with Yi models known to run well at lower temperatures, around 0.6.

  • Model Benchmarks and Issues: @weirdconstructor ran the Agnes Test for ERP models, noting that higher temperatures, like the one used by the model @c.gato trained, might not be ideal for preventing repetitive dialogues. It was suggested that models with more RP data might require lower temperatures to minimize loops.

  • Data Cleaning and Analysis Guidance: @mrdragonfox provided advice and resources for cleaning datasets, recommending the use of pandas for tabular data and regex for general cleaning. They also shared a gist link for assistance.

  • M2 Max Performance on Transformers Models: @heyitsyorkie shared their experience using miquliz v2.0 120b q4km and @sssynk discussed the capabilities of their M2 Max for running 70B models effectively, with prompt processing times being a consideration. @timothyallan mentioned a terminal hack to increase RAM usage, allowing larger models to run on an M2 Max.

Links mentioned:


TheBloke ā–· #training-and-fine-tuning (58 messagesšŸ”„šŸ”„):

  • Notebook Upload Anticipation: @starsupernova mentioned they would upload a notebook to Unsloth’s Discord server and ping @1025039473932775485 once done, but no specific bugs were guaranteed to be addressed.
  • Mixtral vs. Mistral Loss Conundrum: @dreamgen and @nruaif discussed observing higher training and eval losses with Mixtral compared to Mistral, initially thinking it’s a bug. Both are experimenting with different setups to address this issue.
  • Fine-tuning Intricacies and Struggles: From @kquant laser-tuning yielding loss of 0.08 points to @haroon30 inquiring about VRAM and RAM requirements for finetuning deepseek models, community members are sharing their fine-tuning challenges and seeking advice.
  • Building a Better LLM for Portuguese: @luishenriquemartins seeks to leverage a large dataset in Portuguese to train an LLM for journalistic applications. The discussion ranged from considering the cost of training with the help of a research institution to the possibility of fine-tuning existing models like mistral or llama.
  • AutoFineTune Script Showcase: @jiha shared a link to a tweet by @yoheinakajima introducing a script named ā€œAutoFineTuneā€, which can generate synthetic message pairs and fine-tune a small model using Together Compute. The GitHub/Replit is provided in the thread linked in the tweet.

Links mentioned:

Tweet from Yohei (@yoheinakajima): Just made a lil’ script to easily fine-tune a small model with synthetically generated data… …calling it ā€œAutoFineTuneā€! (~110 lines of code) Generates 100+ synthetic message pairs w…


TheBloke ā–· #model-merging (3 messages):

  • Merging Models with Different Context Sizes Results in Error: User @222gate encountered a RuntimeError when trying to merge two MistralCasualML models with different context sizes. The reported error message was related to a tensor size mismatch: Tensor size mismatch for model.layers.22.self_attn.o_proj.weight, sizes: [torch.Size([2560, 2560]), torch.Size([4096, 4096])].
  • Seeking Solutions for Tensor Mismatch: @222gate asked the community if anyone knew a workaround for the tensor size mismatch issue they faced while merging models.
  • Positive Feedback but Undisclosed Solution: @222gate expressed excitement with a message saying ā€œthis is awesomeā€ but did not provide details on whether the issue was resolved or the nature of what they found awesome.

TheBloke ā–· #coding (6 messages):

  • Integrating Python in JavaScript with JSPyBridge: @spottyluck described a successful experience in bridging JavaScript with Python using JSPyBridge, including example code snippets that show how to create a new class in JavaScript that interacts with Python code asynchronously.
  • Interfacing with compression in Node.js: They detailed creating an async function in Node.js compressPrompt that uses Python classes via the bridge to compress prompts for efficient processing.
  • Alterations to BigDL: @spottyluck modified BigDL’s LLM transformer to load q8_0 gguf files and disabled an optimization to prevent prompt mangling, crucial for running LLMLingua. The provided code snippet shows the necessary adjustments and considerations for running the transformer, especially on Windows.
  • Device Tensor Handling in Node.js: Additional guidance was provided on dealing with errors related to tensors not being on the expected device, highlighting the use of model.to() in the Node.js context when interfacing with Python.
  • Compressing prompts in the request handling process: @spottyluck finalized with an explanation of integrating prompt compression through a conditional in the Node.js router.post method, allowing the Python-based compression to be leveraged as if it were a native JavaScript class.

Links mentioned:

GitHub - extremeheat/JSPyBridge: šŸŒ‰. Bridge to interoperate Node.js and Python: šŸŒ‰. Bridge to interoperate Node.js and Python . Contribute to extremeheat/JSPyBridge development by creating an account on GitHub.


LM Studio ā–· #šŸ’¬-general (410 messagesšŸ”„šŸ”„šŸ”„):

  • Intrigue Over ā€œChat with RTXā€: NVIDIA’s ā€œChat with RTXā€ has sparked interest with its built-in retrieval-augmented generation (RAG) feature on Nvidia 30+ series, which is different from LMStudio as it can perform RAG tasks but only supports Mistral 7b and Llama 13b models currently (@heyitsyorkie).

  • Curiosity About Gemini 1.5’s Massive Context Window: There’s a buzz about Google’s Gemini 1.5 model claiming to support a 1 million token context window, though it’s invite-only and not publicly available (@hypocritipus and @rugg0064).

  • Exploring Sora’s Capabilities: OpenAI’s new Sora model for generating videos from text prompts up to a minute long has caught attention as it becomes available to red teamers and creative professionals; however, there’s concern about the impact on evidence credibility (@joelthebuilder and @rugg0064).

  • Optimizing LMStudio Experience: Users inquire about various features within LMStudio, such as enabling function calling (@vbwyrde), adjusting thread usage (@rekt.gg), and configuring advanced inference parameters (@jackiezhou0601).

  • Lament Over Model and Software Restrictions: Dialogue touched on the limitations imposed by different AI systems, where models like GPT-4 remain proprietary and can’t run locally, which disappoints users like @stoic_king and @securityguruguy. There’s also a discussion about the impracticality of large context sizes and potential privacy concerns (@pwrreset and @hypocritipus).

Links mentioned:

](https://www.squibler.io/pricing): no description found


LM Studio ā–· #šŸ¤–-models-discussion-chat (125 messagesšŸ”„šŸ”„):

  • Yi-VL Models Await Update: @heyitsyorkie mentioned that Yi-VL models are currently unsupported due to the need for an update to the llama.cpp version used by LM Studio. @jedd1 inquired about a list or heuristic for unsupported GGUFs, to which @heyitsyorkie replied that issues generally arise with new small models requiring updates to llama.cpp.

  • PDF & Book Upload for Assistants Under Discussion: @edu0835 inquired about the possibility of creating an assistant in LM Studio that allows for PDF or book uploads, like the Huggingface chat assistant or GPTs with entire books in PDF form, specifically referencing medical texts for disease responses.

  • Comparing Coding Models: Discussions about the comparative performance and peculiarities of Deepseek Coder Ins 33b and Codellama Instruct 70b were had by users @kujila and @heyitsyorkie, with the former stating a preference for Deepseek due to its serious approach compared to Codellama’s ā€œwhimsicalā€ responses.

  • Exploration of Image Generation: @joelthebuilder and @heyitsyorkie shared their experiences and recommended tools for diving into image generation, mentioning Stable Cascade, automatic1111, and comfyui as notable options to check out.

  • Quest for Reliable RAG Solutions for Windows: @666siegfried666 sought information about Retrieval-Augmented Generation (RAG) options for Windows, with @wildcat_aurora and @kujila suggesting to look at H2oGPT, lollms, or AGiXT which can be used with LM Studio local server for RAG capabilities.

  • File Downloading and Model Conversion Challenges: There was discussion about difficulties with downloading and converting large model files, such as issues with the model downloader in LM Studio and the need for manual file merging when certain quantizations are missing, as shared by @666siegfried666, @fabguy, and @n0w1sm.

Links mentioned:


LM Studio ā–· #🧠-feedback (3 messages):

  • RAM Upgrade Confusion: @pdx_ shared that after upgrading their system to 64Gb of RAM, LM Studio still indicated they had 16Gb. @yagilb acknowledged this as a known bug and reassured that it would be fixed in the next update, clarifying that the issue is purely informational and loading modules should work if the VRAM is sufficient.
  • Gratitude for Quick Response: @pdx_ expressed gratitude with a simple ā€œok thanks šŸ™‚ā€ following the prompt support received regarding the RAM misreport issue.

LM Studio ā–· #šŸŽ›-hardware-discussion (200 messagesšŸ”„šŸ”„):

  • AMD vs Nvidia for LLM Performance: @goldensun3ds expressed curiosity about combining an RTX 4060 Ti and an RX 7600 XT for potentially better performance. @666siegfried666 remarked on ROCm’s youth and potential for future improvements, indicating some skepticism about mixing GPUs.
  • High-End Build Cost Comparisons: @nink1 broke down the costs of a high-end build comparable to a Mac studio, while others like @heyitsyorkie and @jedd1 debated the pros and cons of different configurations, including gaming and Linux compatibility.
  • Threadripper & RAM Overclocking: Multiple users discussed the utilization of AMD’s Threadripper for various workloads. @666siegfried666 delved into the details of RAM overclocking, suggesting careful manual tuning to avoid system instability.
  • Exploring VRAM Upgrades for AI Workloads: Users like @rugg0064 and @goldensun3ds discussed the possibility and practicality of modding GPUs such as the RTX 2080 Ti to increase VRAM to 22GB, suggesting potential for AI applications but questioning the economic viability compared to buying new GPUs with more VRAM.
  • Laptop GPU Selection Issues in LM Studio: @radion8267 sought assistance for configuring LM Studio to use a dedicated GPU rather than the default APU, noting performance issues. @heyitsyorkie mentioned a known detection bug and suggested potential alternatives.

Links mentioned:


LM Studio ā–· #🧪-beta-releases-chat (19 messagesšŸ”„):

  • Quantum Leap in Model Compression: @drawless111 shared excitement about 1.5 bit quantization being worked on and posted a GitHub pull request as evidence of this development. They also mentioned impressive benchmarks for a 70-billion parameter model (70B) that hinted at substantial advancements in quantization efficiency.

  • Quant Size Anticipated to be a Gamechanger: @heyitsyorkie and @drawless111 discussed the potential impact of 1.5 bit quant sizes, with @heyitsyorkie expressing curiosity about performance quality when compared to other quantization methods. @drawless111 responded with optimism, highlighting that these new quants - particularly IQ2 and IQ3 - are outperforming previous models and could soon replace them.

  • Models Running on Slim Hardware: Both @drawless111 and @heyitsyorkie discussed the implications of new quant sizes, like IQ1, allowing for large 70B models to run on machines with only 16 GB of VRAM, addressing a previous message about encountering 5 IQ1 models on Hugging Face which ballooned to 10 shortly.

  • Performance Details of Quant Models: @drawless111 provided detailed comparisons of different quantized models, discussing their sizes (such as IQ2_XXS at 2 GB) and performance. The post-compression fine-tuning was noted as a factor that could affect the performance of these compressed models.

  • Troubleshooting for LM Studio AppImage: After a user @w_sky mentioned the LM_Studio-0.2.14-beta-1.AppImage for Linux was crashing, @heyitsyorkie inquired whether the CPU supported AVX2 instructions, suggesting a potential cause for the crash.

Links mentioned:


LM Studio ā–· #langchain (1 messages):

.ben.com: markdown has linebreaks end your line with two spaces the carriage returns


LM Studio ā–· #avx-beta (7 messages):

  • AVX Version Woes for LM Studio: User @rafalsebastian faces disappointment upon learning that their processor does not support AVX2 instructions needed for LM Studio. They inquire if it’s possible to run LM Studio on a CPU with only AVX support.
  • Windows Rescue for Older CPUs: @heyitsyorkie responds with a solution, directing to download version 0.2.10 AVX beta release for Windows from LM Studio’s beta releases, although they recommend upgrading to a CPU with AVX2 instructions for optimal performance.
  • Salvation for Workstations: @rafalsebastian expresses gratitude as their older workstation is saved from being scrapped thanks to the AVX beta release.
  • Linux Users Left Waiting: Despite @rafalsebastian’s interest in a Linux version of the AVX beta, @heyitsyorkie confirms that no Linux version is available and there likely won’t be one for some time.
  • Reluctant to Experiment: @rafalsebastian shares that they have another workstation with Xeon CPUs that support AVX2, but they hesitate to use their primary work machine for experimentation with LM Studio.

Links mentioned:

LM Studio Beta Releases: no description found


LM Studio ā–· #crew-ai (1 messages):

  • Comedic Inquiry on Bot’s Efficiency: @wolfspyre quipped about bot functionality with a light-hearted question: do they have to work? followed by a smiling emoji symbolizing a grin.
  • A Case of the Repetitive Bot Syndrome: @wolfspyre humorously portrayed a scenario of a bot outputting the same text repetitively, complete with a playful exaggeration of the repetition and comic sound effects. The concern was also raised about potential repetition errors involving task distribution among workers.

OpenAI ā–· #annnouncements (2 messages):

  • ChatGPT gets a Memory Upgrade: @abdubs announced that ChatGPT is being tested for its new memory feature, which will allow it to remember past conversations. Users can control this feature by telling ChatGPT to remember or forget information, and it’s being rolled out to a select group of Free and Plus users. For full details, users can read more on OpenAI’s blog post.

  • Meet Sora, the Text-to-Video Model: @abdubs introduced Sora, OpenAI’s first model that generates up to 60-second videos from text descriptions, which can include complex scenes and characters showing emotions. Currently available to red teamers and creative professionals for feedback, more information is available at OpenAI’s Sora introduction page.

Links mentioned:


OpenAI ā–· #ai-discussions (338 messagesšŸ”„šŸ”„):

  • Diverse Uses for GPT-4: In a series of messages, participants, including @feltsteam0, discussed the different ways people use GPT-4, with some using it to simplify complex topics and others concerned about potential for increased laziness among users.

  • Google vs OpenAI AI Models: @kevinlk questioned Google’s strategy for releasing their GPT model with similar pricing to OpenAI’s models. @lumirix mentioned the perks of Gemini Advanced, and several users compared the performance of OpenAI models with Google’s newly released iterations.

  • Concerns About GPT’s Performance: Users, such as @pigondrugs and @drinkoblog.weebly.com, expressed their issues with the latest updates to GPT models, specifically pointing out difficulties in retaining context and maintaining coherent long-form communication.

  • New Player in Town - Abacus.AI’s Smaug-72B: A few users, including @cassofthenight and others, reacted to the announcement of Abacus.AI’s latest model outperforming OpenAI’s, stirring a discussion on the competition in the AI arena.

  • Sora - OpenAI’s Next Leap in AI: Discussions blossomed around OpenAI’s text-to-video model called Sora, with users like @dooz, @johnnyrobert, and @infidelis speculating on its potential impact on creative industries and the limitations of current AI in filmmaking.

Links mentioned:


OpenAI ā–· #gpt-4-discussions (131 messagesšŸ”„šŸ”„):

  • GPT’s Vision for Video Understanding Tutorial: @flokyhuan shared a link to OpenAI’s notebook which outlines how to use GPT-4’s visual capabilities for video understanding, even though GPT-4 cannot directly process videos.
  • Fine-Tuning Image Recognition Clarification: Users @flokyhuan and @solbus discussed that fine-tuning OpenAI language models is currently text-only, and the model does not support fine-tuning for image recognition tasks.
  • Service Outages Timely Troubles: Several users including @cmt283, @james18btdoomer, @snowzer, and @lumirix reported and discussed various error messages and interruptions in service when using GPT-4, indicating a potential widespread system issue.
  • GPT-4 Access and Latency Woes: User @3top1a encountered frequent errors during custom GPT prompts, wondering about the limits to GPT’s knowledge and the feasibility of processing large text files.
  • Intrigue Around Sora: A discussion led by @antnation, @wccats11, and @doperabbitwojak highlighted excitement for Sora, OpenAI’s text-to-video model, which is in development and currently unavailable to users.

Links mentioned:


OpenAI ā–· #prompt-engineering (86 messagesšŸ”„šŸ”„):

  • Notice Me, Chatbot: @beanz_and_rice made several attempts to engage with the chatbot, expressing feeling unnoticed, until @toror playfully acknowledged the situation.
  • Horizontally Rotated Woes: @kv1383 expressed frustration with images rotating incorrectly and disappearing webp files, to which @darthgustav. replied explaining potential GPT model limitations with orientation.
  • Too Big Prompt Dilemma: @rdcdt queried about simplifying a 4k character long prompt, and was directed by @bambooshoots to the behavioral adjustment tool for help (g-6qn4yGrBR-directive-gpt-llm-behavioral-adjustment-tool).
  • Seeking a Yes/No Only Response: @loictonneau sought a way to craft prompts that elicit only ā€œyesā€ or ā€œnoā€ responses from GPT, and @darthgustav. provided a structured output template to facilitate this.
  • Token Optimization Techniques: @realspacekangaroo discussed strategies for minimizing token usage in prompts, while @eskcanta and @darthgustav. suggested focusing on clear, efficient language and the potential risks and benefits of using intentionally poor grammar for cost-saving.

Links mentioned:


OpenAI ā–· #api-discussions (86 messagesšŸ”„šŸ”„):

  • Technical Inquiry on Image Rotation and File Persistence: @kv1383 expressed frustration that an image ended up being rotated horizontally instead of vertically, and also mentioned a dislike for webp files as they seem to disappear after some time. To this, @darthgustav. responded that DALL-E 3 does not truly understand orientation and offers a 50/50 chance of getting it right.

  • Streamlining Interaction with OpenAI: Several users, including @loictonneau, @rdcdt and @beanz_and_rice, engaged in conversations about using ChatGPT and creating prompts, with @loictonneau seeking help to create a yes/no prompt and @rdcdt asking for prompt simplification advice. Assistance and resources were provided by @darthgustav. and @bambooshoots.

  • Prompt Grammar Debate: A debate over the use of proper grammar in prompts was sparked by @realspacekangaroo, arguing that imprecise grammar can save tokens, which is economically beneficial for large-scale use. @eskcanta cautioned against this practice for clarity’s sake and to avoid unforeseen model updates affecting prompt interpretation.

  • Concerns on Model Contamination and Behavior Tuning: @stealth2077 in a series of messages expressed concerns about editing model outputs and the potential for contamination of context. @eskcanta suggested positive reinforcement and guidance as a strategy to avoid context issues and provided examples of their methodology for training the model on specific tasks.

  • Explorations of Text Classification Using GPT: @ben.30 and @romera5032 discussed their experiences using GPT for text classification within their companies. @ben.30 encountered difficulties with GPT classifying ā€˜boat skip’ and sought advice on forcing the model to adhere to a knowledge base, with the conversation moving to a direct exchange after connecting on the platform.

Links mentioned:


Nous Research AI ā–· #ctx-length-research (5 messages):

  • Magic.dev mentioned by 4biddden: User @4biddden just shared the word magic.dev without further context or explanation.

  • Gummybee highlights a new paper: @giftedgummybee shared an arXiv paper asserting a new approach to language models by incorporating video sequences to enhance understanding of the physical world, overcoming challenges of memory constraints and computational complexity.

  • Speculation on the nature of a new architecture: @hexani mused that a certain undisclosed new architecture could simply be akin to DeepMind’s RETRO under a different name.

  • Debate around mysterious architecture continues: Following up, @hexani invited others to guess what the new architecture might be, insinuating curiosity and anticipation about its possible features.

  • Predicting the architecture’s identity:@atkinsman surmised that the unrevealed architecture could likely employ an approach similar to RETRO or self-extend, rather than being completely novel, speculating that recent releases by competitors may have influenced its development.

Links mentioned:

World Model on Million-Length Video And Language With RingAttention: Current language models fall short in understanding aspects of the world not easily described in words, and struggle with complex, long-form tasks. Video sequences offer valuable temporal information …


Nous Research AI ā–· #off-topic (5 messages):

  • Zuckerberg Shifts Perception in AI and VR: User @nonameusr shared a link (Zuck’s AI and VR take) hinting at a perception shift where Mark Zuckerberg is seen transitioning from the villain to the savior in AI and VR.

  • Skepticism About VR Passthrough Quality: @teknium responded, agreeing with the opinions in a linked post except for the claim of superior passthrough, and noted that the passthrough on their Quest 3 was terrible.

  • The Mysterious Rock-Cat Raises Eyebrows: User @error.pdf shared a gif from Tenor (Rock-Cat’s Eyebrow Raise) that humorously combines a cat with Dwayne ā€œThe Rockā€ Johnson’s iconic eyebrow raise.

Links mentioned:

Rock Cat Eyebrow Cat GIF - Rock cat Eyebrow cat Meme - Discover & Share GIFs: Click to view the GIF


  • CUDA for AMD GPUs?: @leontello introduced ZLUDA, a tool that allows unmodified CUDA applications to run on AMD GPUs with near-native performance. However, @adjectiveallison clarified that ZLUDA is essentially abandoned, with updates only expected for workloads of personal interest to the developer.

  • Wavelet Space Attention enhancing Transformers: An arXiv paper shared by @euclaise discusses improving long sequence learning capabilities in Transformers through the implementation of Wavelet Space Attention (WavSpA).

  • New Local AI Assistants Merge: @sanjay920 posted a GitHub link to Rubra, a project merging openhermes and neuralchat aimed at simplifying the creation of AI Assistants and Large Language Models. This announcement was met with enthusiasm by @teknium, while @gabriel_syme humorously played down the idea of it being truly local.

  • Impressive Context Size for LLM: @if_a and others discussed the introduction of Gemini 1.5 Pro by Google, highlighting its 10M token context length and efficient MoE architecture, marking it as possibly a significant upgrade from previous models.

  • Multilingual Generative Model with Instructions in 101 Languages: @.benxh shared a Hugging Face link to Aya 101, a model that reportedly outperforms both mT0 and BLOOMZ, featuring capabilities for instructions in 101 languages and trained on a vast dataset including xP3x and other collections.

Links mentioned:


Nous Research AI ā–· #general (536 messagesšŸ”„šŸ”„šŸ”„):

  • QuIP# - A Leap in Post-Training Quantization: User @stellaathena shared a research paper discussing QuIP#, a method for post-training quantization of large language models achieving state-of-the-art results with 2-3 bit precision. They suggest that this approach outperforms previous methods, especially in highly compressed regimes.

  • Intriguing Rumors: Users converse about potential future advancements like ā€œoptimal 0.5 bit quantizationā€ (@nruaif) and humorous speculation on the next naming convention for quantization papers by @if_a.

  • New Frontier in Model Routing Analysis: User @teknium shares a routing analysis study based on Mixtral 8x7B model from @MistralAI using POS tags instead of document context, suggesting a new research direction in model understanding.

  • reViSiTing the Hand Issues: The conversation on ā€œhandsā€ is telling about the community’s ongoing challenge with detailed image generation as users like @giftedgummybee engage in technical jargon to highlight improvements and benchmarks.

  • OpenAI’s Sora Video Generation: Members of the chat, including @otisai, @bstdev, and @leontello, were abuzz with excitement after OpenAI’s announcement of Sora, a text-to-video model that marks a significant advancement in AI-generated video content. The chat reflects the impact this technology could have across AI communities and associated industries.

Links mentioned:


Nous Research AI ā–· #ask-about-llms (53 messagesšŸ”„):

  • Searching for GPT-4 Alternatives: @natefyi_30842 inquired about more affordable coding models as an alternative to GPT-4. @teknium suggested Deepseek Coder, and upon asking where to find it hosted, @teknium mentioned perhaps on Together.

  • SFT vs. Continued Pretraining Clarified: @natefyi_30842 sought clarification on the difference between SFT (supervised fine-tuning) and continued pretraining, with @teknium confirming that continued pretraining generally uses a raw corpus without instruction focus.

  • MIQU’s Training Unveiled: @teknium explained that MIQU was continued pretrained from the llama-2 70b model and then instruction-tuned (SFT’d), with only its final form being made accessible.

  • AI Benchmarking Made Easy: @nerdabhay asked for resources to test a trained model, and @teknium recommended the LM Evaluation Harness, while @atkinsman shared a link to a Google Colab for automatic evaluation setup by llm-autoeval.

  • API Options for Vision Language Models Explored: @vikas.p inquired about the best vision language models available via an API with decent rate limits and pricing. Multiple suggestions were made, including GPT-4V which scales with total API spend, and @leontello noted the existence of Qwen-VL and LLaVA models while @orabazes recommended checking Replicate for hosting these models, with mention of Alibaba hosting Qwen-VL albeit with low rate limits.

Links mentioned:


Nous Research AI ā–· #collective-cognition (3 messages):

  • Project Faces Downtime: @adjectiveallison encountered an issue when attempting to access the site, questioning if the project is still active.
  • Modes Break the Machine: @teknium confirmed that due to new modes in chat GPT, the website broke, and the maintaining team could not sustain it, resulting in the project’s current inactivity.

Eleuther ā–· #announcements (1 messages):

  • Solving the Pink Elephant Problem: @canadagoose1 highlighted a new EleutherAI paper that addresses the challenge of making chatbots avoid certain topics, known as the Pink Elephant Problem. The paper introduces Direct Principle Feedback (DPF), a technique that outperforms traditional models and is on par with GPT-4, and can be found here.

  • DPF for Customizable Chatbot Control: The announcement shared insights into the Direct Principle Feedback (DPF) method that allows fine-grained control over language models by avoiding the need for reranking responses, making it a promising approach for real-life AI fine-tuning (RLAIF) applications.

  • Read More on Twitter: Additional information and discussions on the Pink Elephant Problem and the newly published paper can be followed on a Twitter thread posted by @synth_labs, inviting further exploration of the research here.

Links mentioned:

  • Suppressing Pink Elephants with Direct Principle Feedback: Existing methods for controlling language models, such as RLHF and Constitutional AI, involve determining which LLM behaviors are desirable and training them into a language model. However, in many ca…
  • Tweet from Open Synth Lab (@synth_labs): PINK ELEPHANTS! 🐘 Now, don’t think about it. Chatbots also find this supremely difficult. Ask one of the most popular open source models NOT to talk about pink elephants, and it will fail 34% of the…

Eleuther ā–· #general (228 messagesšŸ”„šŸ”„):

  • XLMR Language Detection Curiosity: _michaelsh asked about how to extract the language from the XLM-RoBERTa model as mentioned in a Hugging Face post, curious to know the method of language determination.

  • reka.ai Model Speculations: @rallio. wondered if the reka.ai model could be a T5 style model since the founder was the UL2 model guy at Google and mentioned its 20-billion-parameter scale. @stellaathena responded indicating that the size of a model doesn’t necessarily correlate with its style, and emphasized that practical considerations as important as technical motivations.

  • Cloud Resource Recommendations for NLP: @pxxxl inquired about the best cloud resources for training an NLP Classification model, receiving suggestions for GCP, Colab, Runpod, and vast.ai, the latter needing caution if unfamiliar with pitfalls as per @ad8e.

  • Inquiries About Custom Adapters on Mamba: @vidava discussed challenges and sought guidelines surrounding the creation of semicustom LLM models with their own fine-tuning adapters for models like Mamba. They expressed interest in obtaining resources to conduct further experiments and engaged in a detailed dialogue about potential solutions, including torch parameterizations and dynamically modifying class methods.

  • Gemini 1.5 – A Leap in Multi-Modal AI: @karatsubabutslower shared a Twitter link highlighting Google’s Gemini 1.5, prompting @fessus to ponder upon its implications for robotics, with @clock.work_ and @karatsubabutslower discussing the real-time data stream processing that robotics require, beyond the capabilities of models showcased in demos.

Links mentioned:

  • Memory and new controls for ChatGPT: We’re testing the ability for ChatGPT to remember things you discuss to make future chats more helpful. You’re in control of ChatGPT’s memory.
  • Our next-generation model: Gemini 1.5: Gemini 1.5 delivers dramatically enhanced performance, with a breakthrough in long\u002Dcontext understanding across modalities.
  • lora_example.py: lora_example.py. GitHub Gist: instantly share code, notes, and snippets.
  • BridgeAI Programme - Brief - Digital Catapult FutureScope: Digital Catapult is launching an accelerator programme; Innovate UK BridgeAI, that seeks to stimulate the adoption of artificial intelligence and machine learning technologies in agriculture, creative…
  • Tweet from Open Synth Lab (@synth_labs)): PINK ELEPHANTS! 🐘 Now, don’t think about it. Chatbots also find this supremely difficult. Ask one of the most popular open source models NOT to talk about pink elephants, and it will fail 34% of the…
  • Open Synth Lab (@synth_labs): PINK ELEPHANTS! 🐘 Now, don’t think about it. Chatbots also find this supremely difficult. Ask one of the most popular open source models NOT to talk about pink elephants, and it will fail 34% of the…

Eleuther ā–· #research (218 messagesšŸ”„šŸ”„):

  • Suspicions Around MoE Scaling Law Paper: @kyo_takano raised concerns about the MoE scaling law paper. They questioned the unusually perfect loss predictor and consistent parameters achieved by the authors, suggesting an almost perfectly fitted model that generalizes even in a higher-compute regime is highly unlikely.

  • Discussing Encoder-Decoder Pre-Training: @loubb began a conversation on whether it would be beneficial to pre-train parts of an encoder-decoder model, specifically the decoder, for seq2seq tasks like machine translation. The user proposed pre-training the decoder on unsupervised data before fine-tuning on seq2seq tasks, emphasizing the usefulness of learned text representations prior to fine-tuning.

  • LLM Security and Adversarial Compromises: A new paper, mentioned by @ai_waifu, discussed the emergence of secret collusion among communicating AI agents, detailing how steganography might be used to conceal unauthorized information sharing. This highlights the security and privacy concerns arising as the capabilities of LLMs grow.

  • Research on Memorization in LLMs: Several users, including @avi.ai, @0x_paws, and @pizza_joe., shared papers addressing the memorization capabilities of large language models (LLMs), exploring both the use of copyrighted content to train LLMs and adversarial efforts to extract information from models.

  • Non-Determinism in GPT-4 and MoE Models: Extensive discussion occurred regarding the non-determinism noticed in outputs from GPT-4, even when a seed was used. Users like @catboy_slim_ and @carsonpoole debated whether the non-determinism stemmed from MoE implementation, batch effects, or different backend model behaviors.

Links mentioned:


Eleuther ā–· #interpretability-general (8 messagesšŸ”„):

  • Seeking Interpretability Overviews: @jaimerv requested recommendations for an updated overview of interpretability approaches, referencing a paper on Representation Engineering.

  • Saliency in Vision and Transformers: @aiobhinn offered insights on different lines of research in interpretability, mentioning salient map approaches in vision tasks and attention maps or information flow studies in transformer models.

  • Clarifying Research Focus: Responding to @aiobhinn’s query, @jaimerv clarified that their research focuses on evaluations using interpretability techniques, specifically for evaluating propensity evaluations like honesty and power-seeking.

  • Diffusion Models Interpretability: @rbz99_27250 inquired about methods to evaluate or interpret diffusion models, noting a lack of research on the UNET aspect as compared to the CLIP side of problems within diffusion models.


Eleuther ā–· #lm-thunderdome (17 messagesšŸ”„):

  • Harnessing Trouble with Local Models: @christianpala is encountering issues when trying to use the lm-evaluation-harness with local models and tokenizer, particularly around calculating logprobs and sorting items that are being returned incorrectly by the tokenizer.

  • Suggested Fix for lm-evaluation-harness: @christianpala suggested a fix for the mentioned issue by changing self.end_of_text_token_id = self.tokenizer.eos_token to self.end_of_text_token_id = self.tokenizer.eos_token_id but indicated that integrating the tokenizer as an argument isn’t directly supported by the harness.

  • Evaluating Math in Language Models: @kamilla7693 inquired about how non-vision models handle SAT or GRE math tests’ graph and plot questions. @baber_ and @stellaathena noted that models like MATH use LaTeX to represent graphics, whereas some questions simply reference non-existent images.

  • Enquiring about Open-Book and COT Support in Harness: @uanu. asked if the lm-evaluation-harness supports open-book tasks or Chain of Thought (COT) prompts, with @hailey_schoelkopf confirming COT support but no current capabilities for search augmented tasks.

  • Issues with Python Version and Harness Cloning: @madison_33844 faced an error regarding Python version compatibility when using the lm-evaluation-harness and received advice from @pminervini to try updating Python and use a specific older version of the harness (b281b09) for replicability with the OpenLLM leaderboard.

Links mentioned:


Eleuther ā–· #gpt-neox-dev (4 messages):

  • Potential Misalignment in Pythia-Deduped for 2.8b: @pietrolesci flagged a possible issue with 2.8b models in the Pythia-deduped suite regarding alignment between training data batches and checkpoints. They observed that the batch loss isn’t decreasing as expected post-training for 2.8b, unlike other model versions.
  • Schoelkopf to the Rescue: Upon noticing the issue reported by @pietrolesci, @hailey_schoelkopf acknowledged the concern and promised to follow up on the matter.
  • Call for Collaborative Scrutiny: @stellaathena expressed excitement over the potential to demonstrate the replicability of Pythia via a blog post or workshop paper, highlighting the opportunity for a community-driven verification project.
  • Grateful for Support and Suggestions: @pietrolesci thanked @hailey_schoelkopf for looking into the 2.8b alignment issue and appreciated @stellaathena for proposing a post-ACL deadline project to delve into the findings.

Mistral ā–· #general (282 messagesšŸ”„šŸ”„):

  • Mistral Performance Discussion: Users discuss the dependency of Mistral performance on both hardware and server load. @i_am_dom emphasizes that a smaller model like GPT-4 can outpace larger models like 7B if the server is robust and not under load.

  • Mistral Learning Curve for Interns: @nana.wav is seeking guidance on using Mistral after downloading it, with an intention to finetune the model. Assistance is offered, including suggestions to look up resources like Jupyter notebooks, Kaggle, and Hugging Face for beginners.

  • Users Share Internship Struggles: @frosty04212 and others share tales of overwhelming tasks during internships, including migrating entire stacks to Kubernetes and dealing with workplace expectations for (almost) free work.

  • Latency Issues with Mistral API: @justinmann. and @ginterhauser report high latency when using Mistral API endpoints like api.mistral.ai/v1/chat/completions, and are advised to contact support at Mistral for assistance with scaling issues.

  • Inquiries on Model Specifications and Troubleshooting: @drnicefellow asks about the token count Mistral is trained on, while @nana.wav seeks help with execution errors, receiving advice to check updates and ensure correct installations. @sapphics discusses challenges with Mistral embed and receives directions to the documentation for clarification.

Links mentioned:


Mistral ā–· #models (38 messagesšŸ”„):

  • Praise for DSPy’s Prompt Flow: @mrdragonfox shared a positive example of why DSPy is powerful, highlighting its efficiency by using the LLM as a ā€œdeviceā€ rather than a ā€œchatā€ interface.
  • Debate on Model’s Production Viability: @mrdragonfox criticized LangChain for its complex dependencies, suggesting it’s impractical for production use, while @rolandtannous mentioned the occurrence of production releases with others holding back due to potential system crashes. Further, @rabdullin discussed industry variances in adopting these models, and shared an NVIDIA demo app for personalized chatbots.
  • Intrigue Around Mistral-7B’s Training Data: Users @kushagra_67246 and @gamerboi0129 inquired about the datasets involved in training Mistral-7B, while @tom_lrd and @mrdragonfox conveyed the secretive nature of such datasets.
  • Clarification on Mistral’s Open-Sourced Checkpoint: @nofreewill42 sought information on the availability of a raw open-sourced checkpoint following raw internet text pretraining, without finetuning, referring to mistralai/Mistral-7B-v0.1.
  • Guide to Chaining LLM Responses: @brendawin queried about integrating an API as a prompt in app development, with @mrdragonfox providing guidance on chaining LLMs and handling logic externally, and shared a link to Mistral’s guides.

Links mentioned:


Mistral ā–· #deployment (7 messages):

  • Model Mix-Up Alert: @casper_ai noted that thebloke’s model is corrupted. They provided a link to a working version of Mixtral Instruct that is AWQ quantized: Mixtral Instruct - AWQ.
  • Alternative Mixtral Repository Recommended: In a follow-up, @casper_ai recommends using their Mixtral Instruct AWQ repository as the repository from TheBloke is currently not functioning.
  • Model Details Shared: The working version of Mixtral Instruct has 6.48B parameters and supports I32 and FP16 tensor types. It’s received 8,430 downloads in the last month.
  • Cryptomotion Seeks Help: New joiner @cryptomotion asked for links to the authoritative <#1154028168466923600> documentation.
  • Official Documentation Provided: @mrdragonfox responded with the official Mistral AI documentation and details on how to use the API.

Links mentioned:


Mistral ā–· #finetuning (10 messagesšŸ”„):

  • Seeking Guidance on MLX and Mistral: @hammer_mt asked for a tutorial on fine-tuning Mistral 8x7B using Apple’s MLX, similar to a detailed guide available for local LLM fine-tuning on a Mac. Here’s the guide in question.
  • Slim Chances on 8x7B Fine-Tuning with MLX: @mrdragonfox expressed skepticism about the feasibility of fine-tuning Mistral 8x7B using MLX, hinting at potential technical challenges.
  • Potential MLX Fine-Tuning Resources Shared: @sublimatorniq suggested looking into an MLX example repository which could possibly help with the process.
  • Development in Progress for MLX: @cogbuji indicated ongoing development efforts for MLX compatibility and provided a link to a resource for creating moe models using MLX. Visit this GitHub repo for scripts and info.
  • Clarifying Mistral 8x7B Variants: @notphysarum enquired about the differences between different Mistral 8x7B fine-tunes, leading @hammer_mt to suggest that the variants are likely fine-tuned on specific familiar datasets.

Links mentioned:


Mistral ā–· #showcase (5 messages):

  • NVIDIA RTX Powers Personal Chatbots: @ethux shared a link about Chat with RTX, NVIDIA’s new offering allowing users to personalize a chatbot using an NVIDIA GeForce RTX 30 Series GPU or higher. It includes a tech demo currently available for free download.
  • User Inquiry about NVIDIA’s Chatbot Technology: @sublimatorniq asked about the performance of Chat with RTX, but @ethux responded that they have not used it yet, comparing it to Lmstudio in terms of functionality.
  • The Leading German Chatbot on Mistral: @johannhartmann introduced Wiedervereinigung-7b-dpo, the top-performing German chatbot on Mistral benchmarks, available on Hugging Face. The model is a merge of four German Mistral fine-tunes and includes dpo-training for improved result quality.

Links mentioned:


Mistral ā–· #random (19 messagesšŸ”„):

  • Internship Quest in France: @maeelk, a French librarian, is reaching out to find an internship opportunity for a student studying psychology and AI. They’ve shared a link to the master’s program and is asking if Mistral or any French-based company is willing to offer an internship.

  • Budget Limits for AI Projects: @akshay_1 discusses a client’s underwhelming budget of $1,000 to build an S2S model with a persona using an audio dataset – a budget @ethux and @mrdragonfox find far too low for any significant work.

  • The Trouble with PDFs in Data Science: Converting PDFs containing LaTeX to text for an LLM connection is a subject of discussion, and @mrdragonfox shares a blog post from Unstructured detailing the process and challenges of extracting data from PDFs.

  • Launch of a New Character AI Website: @ppprevost announces the creation of a character.ai like website using Langchain, Next.js, and the Mistral API. They invite members to try it and provide feedback, and share a YouTube video showcasing the site.

Links mentioned:


Mistral ā–· #la-plateforme (113 messagesšŸ”„šŸ”„):

  • Seeking GDPR Compliance Information: @.hutek was inquiring about compliance details related to Mistral’s APIs for client projects in France. @dawn.dusk provided a link to Mistral’s data processing agreement (Data Processing Agreement) which outlines how Mistral AI processes personal data under GDPR.

  • ChatGPT-like Testing with Mistral API: @_jackisjack subscribed to the Mistral API and asked for guidance on setting up a simple ChatGPT-like dialogue without customization or development. @fersingb suggested using Mistral’s Python client library and specifically the chatbot_with_streaming.py example after setting the API key.

  • Streamlined Path to Chatbot Testing Discussed: @mrdragonfox and @fersingb guided @_jackisjack through setting up a simple testing environment for a ChatGPT-like dialogue with Mistral and recommended using the open-source UI from ETHUX Chat.

  • Payments and Access Concerns: @notphysarum asked whether PayPal was an option for payment on Mistral as they lacked a credit card. @lerela responded that PayPal was not available, and the conversation shifted to potential platforms that support PayPal and provide access to Mistral’s APIs.

  • Language Capabilities and Performance: During discussions, @mrdragonfox mentioned that Mistral’s API is trained in French as well as English and linked a user interface that allows testing the models (ETHUX Chat). They further commented on Mistral’s performance, estimating that Mistral medium falls between GPT-3.5 and GPT-4 in terms of capabilities.

Links mentioned:


LAION ā–· #general (427 messagesšŸ”„šŸ”„šŸ”„):

  • LAION Explores Efficient Image Gen: @chad_in_the_house experimented with image generation by applying lfq 2^17 on ImageNet training and using the Muse architecture for further development. They consider lfq architectures swift to train and suggest they can be fine-tuned from existing vqgans.

  • Model Performance Real Talk: Discussing the Stable Cascade GitHub repository and its impact, @pseudoterminalx and others express skepticism around its performance and potential issues, such as face distortion and large VRAM requirements for inference.

  • OpenAI’s Sora Surprises the Community: The latest update, Sora from OpenAI, a text-to-video model capable of producing minute-long videos, amazed many users. This includes a demonstration of its ability to simulate complex scenes and is expected to open up a wave of new creative opportunities.

  • Workshop Call for Low-Resource Languages at ICML 2024: @sttruong invites interested parties to contribute to a workshop focused on low-resource languages. The topics cover data processing, LLM training, and social impacts, with a proposal deadline of February 15th.

  • Concerns about Sustainable Training Practices: Amidst praise for Sora, @pseudoterminalx raises ethical questions about the reliance on Kenyan labor for content moderation and annotation, emphasizing a potential shadow over advancements in AI capabilities.

Links mentioned:


LAION ā–· #research (41 messagesšŸ”„):

  • Concern over OpenAI Model Degradation: .undeleted expressed worries that OpenAI’s safety tuning could degrade model quality to the point of being impractical for tasks. They commented, ā€œā€¦become unreasonably expensive…already happened.ā€

  • Synthetic NSFW Content Shortage: @progamergov mentioned the struggle to find high-quality synthetic NSFW content for datasets, criticizing Civitai for its messy outputs.

  • Anime AI Development Stagnation Observed: @drhead argued that the anime community’s reliance on the NovelAI leaked model hindered progress, contrasting this with the furry community, who, due to a lack of analogous leaks, have advanced further in their respective model development.

  • RingAttention Enables Video-Linguistic Models: @spirit_from_germany and @max_voltage discussed the potential of models using RingAttention for parsing large datasets, such as the combined video and book data, noting the technique’s influence on long-sequence training.

  • Sora, OpenAI’s Text-to-Video Model: @qwerty_qwer shared a link introducing OpenAI’s text-to-video model, Sora, which generates detailed scenes and movements based on provided text prompts. Discussion about its early release and the seeking of feedback was mentioned, along with skepticism from @twoabove regarding the closed nature of the model.

Links mentioned:


HuggingFace ā–· #announcements (2 messages):

  • Hugging News and Updates: The 89th edition of Hugging News introduces a variety of developments including the Message API with OpenAI compatibility, community efforts for building open datasets, and new releases such as Datatrove, Gradio 4.18.0, Remove Background Web, Nanotron, and updates to Hugging Face Competitions and Accelerate. Additionally, the introduction of LoRA Studio, 2FA for the HF Hub, and a task page for Mask Generation are highlighted. Read on Twitter.

  • Exciting Community Contributions: The 45th edition of Community Highlights features prompt-to-automation demos, a specialized model for judging multiagent conversations called Prometheus, and the first version of the tokviz library for visualizing tokenization patterns. Innovations also include text-to-image and text-to-animation demos, art generation through Kandinsky-API, and datasets to train art generation models similar to midjourney / DALL-E-3. Check Prometheus.

  • Creative Community Spaces Unveiled: Users continue to shine with unique spaces like a monocular depth estimation tool that converts RGB to depth, a quiz creator space named quizmona, and an on-device LLM for mobile machine learning dubbed Olmo-on-device. These creative tools expand the applications of AI in various fields and make them accessible to a broader audience.

  • Educational Opportunities and Tooling: A partnership with Codecademy offers a free AI course on transformers, a blog post introduces SegMoE for model merging on text-to-image models, and Accelerate showcases faster loading of pre-trained PyTorch models. These resources aid users in learning about AI technologies and optimizing their implementation.

  • Upcoming Reading Group: A reading group discussion is scheduled to cover the paper ā€œMamba: Content-Based Reasoning for Foundations Models,ā€ focusing on addressing the computational inefficiency of Transformers on long sequences. This indicates a community interest in advancing the understanding of foundational models and architectural improvements. View the paper.

Links mentioned:


HuggingFace ā–· #general (227 messagesšŸ”„šŸ”„):

  • Real-time snap detection inquiry: User @butchbangher inquired about a model or program to detect finger snaps in real-time video and audio. Having tried MediaPipe without finding included support for this gesture, they were searching for guidance on how to approach temporal detection.
  • HF Spaces token issues: Users @hari4626 and @thatonecoder20 discussed a missing HF_Token field, necessary for running spaces, which might require manual inclusion in settings.
  • Blog post reaches audience: @not_lain celebrated that their blog post about custom architectures with Hugging Face has reached 240 readers, sharing a link to the post and a snippet of code for baseline model creation.
  • AI Career Hopes: @00face discussed the difficulty in debunking a misconception about Mistral and all LLMs containing stolen data and was looking for white papers or hard data to refute such claims.
  • Introducing Gemini 1.5 Pro: In the field of generative models, users @pierrunoyt, @danfosing, and @skyward2989 discussed the latest announcement from Google's Gemini 1.5, noting its enhanced performance and breakthroughs in long-context understanding.

Links mentioned:


HuggingFace ā–· #today-im-learning (11 messagesšŸ”„):

<ul>
  <li><strong>Merging Sheets with Caution</strong>: `@lunarflu` discussed the challenges of merging two Google Sheets, emphasizing the need to avoid duplicate records and maintain unique keys. They highlighted the importance of creating distinct records to prevent data issues.</li>
  <li><strong>A Melody of Learning</strong>: `@neuralink` expressed their progress in learning about DoReMi reproduction and training with FP8 3D parallelism, achieving a remarkable 99% and 32% respectively.</li>
  <li><strong>End-to-End Learning Spree</strong>: `@sardarkhan_` engaged in a deep dive into diffusors and transformers before switching gears back to rigorous coursework preparation.</li>
  <li><strong>Face Swapping Exploration</strong>: `@virtual_josh` shared their experience exploring different programs for deep faking videos and asked for recommendations on services for swapping faces in videos.</li>
  <li><strong>Custom Labels in NER</strong>: `@jakemorrison` inquired about the flexibility of `ner_tags` labels in token classification, sparking a discussion where `@cubietom` pointed to custom label usage with references to the CoNLL2003 and Few-NERD datasets.</li>
</ul>

Links mentioned:


HuggingFace ā–· #cool-finds (12 messagesšŸ”„):

  • MoE Models Under Threat: @osanseviero highlighted a paper revealing vulnerabilities in Mixture of Experts (MoE) models that could allow attackers to influence the output of other users’ queries within the same batch. The paper was discussed, including potential mitigation strategies on HuggingFace and further personal insights were shared in a blog post.

  • Concerns Over Potential MoE Risks: @meatfucker pointed out that the threat from the MoE vulnerability is not immediate, but could pose problems in the future if left unaddressed. The user also mentioned the potential of incidental negative impacts on output quality in systems using large batches.

  • Million-Length Video and Language Processing: @not_lain shared excitement about a new DeepMind project, which includes open-source 7B models capable of deciphering long text and video data over one million tokens. More information and resources are available through the largeworldmodel project and arXiv abstract.

  • Online and Offline RL Blended: @poudelbibek brought attention to a paper discussing Online Decision Transformers (ODT), a novel reinforcement learning algorithm that unifies offline pretraining with online finetuning. The paper can be found on arXiv.

  • Introducing SPIN for Realistic Model Reactions: @andysingal posted about SPIN, a new method enabling Language Learning Models (LLMs) to produce reactions indistinguishable from human responses, enhancing self-play capabilities without the need for higher-level annotators. The method details can be checked out on GitHub.

Links mentioned:

  • [@osanseviero on Hugging Face: ā€œMixture of experts: beware šŸ›”ļøāš”ļø

New paper by DeepMind: Buffer Overflow inā€¦ā€](https://huggingface.co/posts/osanseviero/980907000007376): no description found


HuggingFace ā–· #i-made-this (25 messagesšŸ”„):

  • RAG Application Ready to Share: @osiworx created a RAG-based application using nearly 1 million text2image prompts and inquires about the possibility of running it on HuggingFace, seeking advice on datastore management.

  • Free Hosting for LLMs on Colab: @typoilu introduced LocalLlm, a solution for hosting large language models for free on Colab or locally, inviting the community to try it and provide feedback on the early version of the repository. LocalLlm on GitHub.

  • Visualize Tokenization Patterns with tokviz: @deeeps.ig announced the first release of tokviz on PyPI, a library for visualizing how various language models from the Hugging Face library tokenize text, and shared the documentation.

  • PTA-Text to Locate UI Text: @calmdown.manu showcased the PTA-Text model, designed to process UI screenshots and click commands, and shared a demo and the model checkpoint.

  • Trinity and Neo Models Now Available: @tonic_1 highlighted the introduction of Trinity from Rabbit, a coding model believed to be a deepseek branch, which is now available on the Hugging Face Spaces; mentioned Neo as a partner to Trinity capable of fitting 33B parameters on an A10G. Trinity on HuggingFace, Neo on HuggingFace.

Links mentioned:


HuggingFace ā–· #reading-group (40 messagesšŸ”„):

  • LangTest Paper Published: @prikfy announced the publication of their LangTest paper in the Software Impacts journal, a library for testing LLM and NLP models, including a method to augment training datasets based on test outcomes. The paper can be accessed here, and a GitHub repository and website for LangTest were highlighted by @ryzxl.

  • Model Merging Presentation on the Horizon: @prateeky2806 offered to present ideas on model merging in the upcoming reading group session on March 1st. @lunarflu suggested that the presentation include diagrams and potentially a demonstration using a notebook or Gradio.

  • Mamba Paper Inquiry Answered: Questions regarding the Mamba paper were addressed with an arXiv link provided by @chad_in_the_house, and @ericauld mentioned discussing variations of the work and entry points for new variations.

  • Secrets of Seed Selection Explored: @stereoplegic inquired about papers where random seeds are learnable parameters, evoking a discussion on gradient-based optimization and data augmentation policies with references to the AutoAugment paper by @chad_in_the_house.

  • Search for Seed-Related Works: Dialogue on the impact of random seed selection on model performance was initiated after @stereoplegic did not find much existing literature. They detailed their approach involving the use of random seeds in model initializations, with @chad_in_the_house providing references and engaging in discussion on the potential of the concept.

Links mentioned:


HuggingFace ā–· #diffusion-discussions (17 messagesšŸ”„):

  • Turning Text to Images: @isidentical reported achieving a 50% success rate on generating images from arbitrary words using Stable Cascade, after figuring out a good prompting strategy.
  • Stable Diffusion Discussions: @chad_in_the_house mentioned that the HuggingFace diffusion chat might be better discussed in a different channel, but acknowledged that text generation with models like Stable Cascade is quite effective.
  • SageMaker Setup Snag: @nayeem0094 encountered a problem where the disk space was insufficient for the expected model file size while setting up a HuggingFaceModel on SageMaker and asked for assistance.
  • Serverless API Generation Query: @vrushti24 inquired about generating multiple images using a serverless API with the Lykon/dreamshaper-8 model, which currently generates only one image from text.
  • Vanishing Gradient Puzzle: @maxpappa reached out to discuss experiences of vanishing gradient issues while fine-tuning models or using DiffusionDPO pipelines, later clarifying that they are using fp32 instead of fp16 in response to @pseudoterminalx.

Links mentioned:


HuggingFace ā–· #computer-vision (8 messagesšŸ”„):

  • Gaussian Splat Expertise Suggestion: @johko990 recommended that someone looking for help with gaussian splats should ask in another channel where experts on the topic are likely active.
  • Multimodal Queries and Collaboration Offer: @joee2711 is working on a multimodal project involving Q-formers and MLP connectors; queries about their differences and similarity to adapters. The user is also seeking collaboration.
  • Seeking Image Retrieval System Improvements: @femiloye is developing an image retrieval system based on custom DeiT transformers trained with reid loss and seeks advice for enhancing retrieval accuracy beyond just model embeddings.
  • Hairstyle Transformation Research Assistance: @abrahamowodunni requested resources for changing hairstyles with a generative vision model, which @lunarflu suggested might relate to another user’s fashion demo project.
  • New Project Spotlight - PTA-Text Model: @calmdown.manu shared a project about a lightweight multimodal model, PTA-Text, designed for UI interaction using screenshots and text commands, inviting feedback and highlighting current limitations related to training data and functionality.

Links mentioned:


HuggingFace ā–· #NLP (6 messages):

  • Seeking Language Extraction from XLM-R: @_michaelsh is looking for guidance on how to extract the language from an XLM-RoBERTa model, linking to the HuggingFace documentation, but has not received a response yet.

  • Curiosity for Algebraic Translation: @_david_valente_ inquires about any existing work that translates natural language into algebraic representations like LEAN, but no answers have been provided in the discussion.

  • Voice Simulation and Language Change with Transformers: @mentrass asked about simulating their voice and changing the language with transformers. @mahimairaja recommended XTTS, a real-time voice cloning tool, and provided a link to the model that supports 17 languages and is used in Coqui Studio and API.

  • Introducing a Text-Only Click Model: @calmdown.manu shared a project named PTA-Text, which is a text-only click model designed for UI interactions. They provided both a demo and a model checkpoint, noting that it’s designed for 1920x1080 screenshots and is still in the prototype stage.

Links mentioned:


HuggingFace ā–· #diffusion-discussions (17 messagesšŸ”„):

  • Successful Text Generation with Stable Cascade: @isidentical reported achieving a 50% success rate on arbitrary word text generation using a good prompting strategy with Stable Cascade, mentioning this in the context of the model’s performance in README examples.
  • Inference Engine Mention by Huggingface: @chad_in_the_house briefly noted that Huggingface has made an inference engine for large language models, though the specific link was not provided.
  • Deploying Models on SageMaker: @nayeem0094 faced issues deploying a HuggingFace Model on SageMaker due to insufficient disk space, with an error message indicating a lack of available space for the expected file size (3892.53 MB).
  • Serverless API Query for Dreamshaper-8: @vrushti24 inquired about the possibility of generating multiple images from a single text prompt using a serverless API for the Lykon/dreamshaper-8 model, asking for advice within the HuggingFace community.
  • Vanishing Gradient Issue in Fine-Tuning: @maxpappa sought advice for an issue with vanishing gradients when fine-tuning a model or using DPO with the DiffusionDPO pipeline, clarified later that he was using fp32 training, not fp16.

Links mentioned:


Perplexity AI ā–· #announcements (1 messages):

  • Perplexity Push Spices up Slack with Subscriptions: User @ok.alex announced the upcoming feature Perplexity Push, allowing users to subscribe to topics and receive updates directly in Slack channels. This feature promises to enhance team discussions and keep everyone in the loop.

Perplexity AI ā–· #general (256 messagesšŸ”„šŸ”„):

  • Referral Program and Coupon Usage Explained: User @mares1317 provided details on how to apply a coupon to a Perplexity account, referencing an FAQ on coupons and discounts and instructing users to go to perplexity.ai/pro to redeem coupons.

  • Perplexity API Integration and pplx Models Highlighted: @mares1317 shared a link explaining the new pplx-7b-online and pplx-70b-online models, emphasizing their help in delivering up-to-date and factual responses via API and Perplexity Labs.

  • Discussion and Speculation on pplx-8x7b Model: Discussions took place around the nature and capabilities of the pplx-8x7b model. While there was no definitive documentation provided, users like @akumaenjeru and @jake speculated that it’s likely related to existing models such as mixtral-8x7b-instruct or a fine-tune of the mixtrial model.

  • Perplexity Service Availability Concerns: Multiple users like @diego.tech, @lucassmith56_38679, and @luke_____________ reported issues with Perplexity’s service availability, citing timeout errors and problems with model responses.

  • Gemini 1.5 Announcement Catches Attention: @luke_____________ highlighted blog post updates about Google’s AI technology, Gemini 1.5, discussing its potential and anticipated features like a one million-token context window and faster release cycles compared to past models.

Links mentioned:


Perplexity AI ā–· #sharing (22 messagesšŸ”„):

  • Perplexity AI Search Spotlight: User @idesign12 shared a Perplexity AI search link, seemingly to showcase the search capabilities of the platform, but no specific details were provided.
  • Introducing Alt-D Feed for Community Curation: @ok.alex shared a link to Alt-D Feed, an alternative feed/newsletter for community collaboration. They encouraged likes and shares for those interested.
  • Bookmarking on Perplexity Discussed: @jaybob32 inquired about bookmarking collections, to which @ok.alex replied that bookmarking in-browser is the current solution as collections aren’t saved to user libraries unless they are contributors. However, suggestions for improvements are being considered.
  • Perplexity AI and GitHub Repo Announced: @_kokomos described how they are integrating structured data and logic patterns with Perplexity AI, while also providing a link to their non-live GitHub repository which will become public in the future.
  • DIY Tutorial Triumph: User @duplex0150 found success in layering their hair by following a tutorial from Perplexity AI and shared their positive experience, with future plans to cut at angles.

Links mentioned:


Perplexity AI ā–· #pplx-api (60 messagesšŸ”„šŸ”„):

  • API Intermittent Failures Discussed: @myadmingushwork_52332 reported that both pplx-7b-online and pplx-70b-online were returning random and absurd replies, providing a snippet of the problematic output. No specific resolution was mentioned in the subsequent conversation.
  • Inconsistencies in API Responses Troublesome: @ia7df expressed concerns about the inconsistency in API responses compared to perplexity.ai and sought developer assistance. @icelavaman clarified that perplexity.ai and pplx-api are different and require different prompts; however, issues of consistency across prompts were not resolved.
  • LangChain and Perplexity Compatibility Clarified: @icelavaman shared a helpful guide by Mochan for using Perplexity AI with LangChain, addressing @ponomoly_dev’s issues with substituting pplx-7b-chat for gpt-3.5-turbo.
  • New Model Availability Uncertain: @paul16307 inquired about the availability and pricing of PPLX-8x7B on the API, and while users like @brknclock1215 indicated it might already be functional, @icelavaman stated there is no ETA for official release or pricing information.
  • Discrepancy in API Model Performance Observed: @xlhu_69745 reiterates that they’re getting random results from pplx-70b-online and recognizes that the model sometimes hallucinates, citing examples where the provided links do not exist.

Links mentioned:


LlamaIndex ā–· #announcements (1 messages):

  • No-Code RAG Building with FlowiseAI: @jerryjliu0 announces an upcoming webinar on how to build no-code RAG with Henry Heng from FlowiseAI. The event is scheduled for Friday at 9 am PT, aiming to teach users about leveraging the LlamaIndex.TS + Flowise integration to develop LLM-powered workflows without coding. Register for the webinar here.

Links mentioned:

LlamaIndex Webinar: Build No-Code RAG Ā· Zoom Ā· Luma: Flowise is one of the leading no-code tools for building LLM-powered workflows. Instead of learning how to code in a framework / programming language, users can drag and drop the components…


LlamaIndex ā–· #blog (7 messages):

  • DanswerAI Integrates LlamaIndex: @llama_index acknowledged the integration of DanswerAI, a ChatGPT tool enhancing efficiency across workplace tools, backed by LlamaIndex technology. Check out their full announcement here.
  • No-Code RAG Webinar: @FlowiseAI, known for building no-code LLM workflows, joins @llama_index for a webinar featuring @henryhengzj. They will discuss LlamaIndex.TS and Flowise integration.
  • Scientific Research Workflow Tutorial: A new notebook by @quantoceanli details constructing an agent to perform scientific research, including fetching abstracts from ArXiv. The workflow aims to simplify the process for researchers, shared by LlamaIndex.
  • Tutorial on Building Custom Agentic Workflows: LlamaIndex released a video tutorial to empower AI engineers in creating their own agents from scratch, demonstrating that it is not just for AI researchers.
  • Technology Showcase for ADU Planning GenAI App: Celebrating their hackathon’s first-place winner, ADU Planner, LlamaIndex highlighted the AI app’s multifaceted capabilities, from parsing ADU local regulations to floor plan suggestions, available here.

LlamaIndex ā–· #general (285 messagesšŸ”„šŸ”„):

  • Arize-Phoenix Tracing Feature Update Coming Soon: User @richard1861 inquired about tracing user queries with Arize-Phoenix. @cheesyfishes confirmed that tagging traces with metadata is in progress and should be ready in the next week or so.

  • Custom Metadata Tagging in Queries: User @akash_18327 sought advice on excluding metadata from the context in their custom QA template. @cheesyfishes suggested setting excluded metadata keys before data ingestion with document.excluded_llm_metadata_keys = ["field1", ...].

  • Trouble Reading DOCX in v0.10: @.mai_ reported issues with SimpleDirectoryReader interpreting DOCX files as encoded data. The issue was resolved after updating to the latest llama-index-core version alongside llama-index.

  • Real-time RAG Pipeline Optimization: @barrahh asked about optimizing the RAG pipeline using user feedback or ratings. @lemuffinman mentioned the possibility of using reranking based on score, while @cheesyfishes provided a code example to separate the retrieval and synthesis steps for real-time evaluation.

  • Discord Server for Peer Support: @ryanrib14 shared a Discord invite link to join a community server aimed at helping with integration issues related to LlamaIndex and Azure AI search vector bank and sharing experiences.

Links mentioned:


LangChain AI ā–· #announcements (10 messagesšŸ”„):

  • LangChain Introduces a Journaling App with Memory: @hwchase17 announced an early incarnation of a journaling app that uses LangChain’s memory module, intended to remember user information for future interactions. The application is in a very early stage and feedback is welcome; see the app in action via Loom video and try it out at Journal by LangChain.

  • User Feedback on Login Methods: @rpall_67097 suggested incorporating social logins like Google, GitHub, or Twitter for the Journal app, mentioning the barrier that traditional email/password sign-ups may create for potential users.

  • LangSmith Now Generally Available & Fundraised Series A: @hwchase17 shared the news of LangSmith’s general availability, a $25M fundraise led by Sequoia Capital, and introduced their redesigned homepage and brand with excitement. Learn more on their blog post, access LangSmith directly here, read about their journey on Forbes, and find out more about working with them at LangChain Careers.

  • Query About LangSmith Pricing: @rajib2189 expressed enthusiasm about the announcement of LangSmith but noted issues accessing the pricing page.

  • LangSmith Featured on Product Hunt: @hwchase17 mentioned that LangSmith is now live on Product Hunt, showcasing its features for developing and monitoring LLM applications. Find it on Product Hunt: LangSmith General Availability.

Links mentioned:


LangChain AI ā–· #general (64 messagesšŸ”„šŸ”„):

  • Database Integration for Personalized Chats: @batmansalt suggested that personalized chat histories for customers can be stored in a database, and loaded into the chatbot prompt when a new chat session begins, enhancing the interaction with the chatbot.
  • Peer Dependency Conflicts with Pinecone and Langchain: @segmentationfault. encountered issues upgrading Pinecone to v2 due to langchain dependency conflicts. @jacoblee93, a full-time maintainer of Langchain, provided assistance, including the recommendation to use npm install --legacy-peer-deps or to bump langchain to the latest version.
  • Langchain User Seeks RAG Optimization Tips: @barrahh inquired about optimizing RAG pipelines based on user feedback or ratings. @batmansalt proposed manual inspection of results to refine parameters like chunk size and number of retrieved texts, and mentioned using high ratings for future fine-tuning of the model.
  • Collaboration Opportunities in the Langchain Community: @kiddu expressed interest in joining AI or backend projects, and @aminerwy invited collaborations on an AI-powered public transit planning assistant, especially to improve the backend RAG conversational chatbot.
  • Trouble with Streaming in ConversationChain: @hndrxx_25149_81926 mentioned issues with streaming capability within ConversationChain, and queried the community for possible workarounds.

Links mentioned:

Pinecone | šŸ¦œļøšŸ”— Langchain: You can use Pinecone vectorstores with LangChain.


LangChain AI ā–· #langserve (64 messagesšŸ”„šŸ”„):

  • Image Base64 Bloating Browser: @dachsteinhustler found that including base64-encoded images in the LangChain playground caused browser crashes due to the lengthy intermediate steps. A rewrite of the app using a RunnableLambda avoided displaying the strings.

  • K8s Connection Refused Troubles: @ezelanza. experienced a ā€œconnection refusedā€ error when trying to invoke an OpenAI API within a Kubernetes cluster. The issue was discussed extensively with @veryboldbagel providing guidance on security concerns regarding accidentally posted OpenAI API keys and offering structure fixes for CURL requests and the use of APIHandlers.

  • Trouble With LangServe Routes: @ezelanza. sought help debugging issues with LangServe routes. @veryboldbagel advised on the correct curl request structure and suggested checking the request pattern using browser developer tools.

  • LangServe Chat History Challenge: @lfglopes queried about implementing a chat history feature within LangServe that interacts with a SQL database, sparking a discussion on managing conversation threads through separate endpoints or internally generated UUIDs guided by @veryboldbagel.

  • Deployment Query for Langchain/LangServe App: @aminerwy was looking for advice on deploying a Langchain/LangServe app to be accessible from the web. The community mentioned Vercel and Replit as potential platforms for deployment.

Links mentioned:


LangChain AI ā–· #share-your-work (4 messages):

  • Guidance for Goal-Setting Assistant: @avfranco offered a step-by-step approach for creating a goal-setting assistant, suggesting that establishing a vision, breaking down features, selecting core components like architecture and user interface, and continuous experimentation are key steps to success.

  • Curiosity About Action Plan Tools vs. LangGraph: @jay0304. inquired whether the Action plan and tools are an alternative to LangGraph, or if they can be utilized concurrently.

  • Reverse Job Board for AI Experts: @sumodd introduced a new Reverse Job Board for individuals seeking AI-related roles, featuring a free platform where recruiters can discover potential candidates listing various skills and experiences.

  • LangChain Meets Dewy: @kerinin shared a tutorial on building a question-answering CLI with Dewy, an open-source knowledge base, and LangChain.js, illustrating how developers can incorporate large language model functionalities into their applications.

Links mentioned:


LangChain AI ā–· #tutorials (1 messages):

  • Learn to Implement Multi-Document RAG: @mehulgupta7991 shared a YouTube tutorial titled ā€œMulti Document RAG using LangChain codes explained,ā€ which guides viewers through the implementation of Multi-Document RAG using Agents with custom tools for chatting with different external files. The tutorial is also part of a book launched by the user.

Links mentioned:

Multi Document RAG using LangChain codes explained: This tutorial explains how to use multiple diverse files with a single RAG agent for querying your data. This tutorial is a part of my newly launched book ā€œL…


OpenAccess AI Collective (axolotl) ā–· #general (69 messagesšŸ”„šŸ”„):

  • Exploring Keras Habitation for Porting Models: @yamashi proposed porting models to Keras to extend hardware support. They highlighted that Keras, now independent with version 3, serves as an abstraction layer over frameworks such as Torch, TF, and Jax.
  • Checkpoint Saving Snafus Identified: After facing an error during checkpoint saving, @dreamgen shared a link to a pull request on the HuggingFace repository that caused issues, which were discussed in relation to recent outages experienced by HF.
  • Quest for Affordable LLM Hosting: @le_mess inquired about the cheapest endpoint service for hosting a model like Mixtral via an API. Contributions from @dreamgen, @noobmaster29, and others outlined various options, including together.ai, OpenRouter, and basten.
  • NVIDIA Showcases Chatbot Frontend: @dangfutures shared a link to NVIDIA’s RTX-based demo app called Chat With RTX, allowing personalized GPT models to run on local RTX hardware. A discussion surrounding its practicality and bugs ensued, with alternatives like using the engine with Chainlit proposed by @nruaif and @dangfutures.
  • CohereForAI’s Aya Model Serialization: Users @noobmaster29, @nanobitz, and @dreamgen discussed the newly released Aya model by CohereForAI, capable of instructions in 101 languages, and considered its expected performance in comparison to its predecessors.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #axolotl-dev (20 messagesšŸ”„):

  • Collaborating Minds for Schema Design: @faldore and others have agreed that a JSON schema detailing user and assistant message pairs, with optional system, tools, and source messages is ideal for dataset formatting. @faldore illustrated how this schema enforces user and assistant message pairing and that the last response is always from the assistant.

  • Flexibility in Role Naming: @c.gato raised the idea of renaming the ā€œuserā€ and ā€œassistantā€ roles in the message schema. They noted the influence of using the term ā€œassistantā€ on model behavior and shared an anecdote about having to replace ā€œassistantā€ with ā€œsecretaryā€ in an RP model to avoid self-referential AI behavior.

  • Multi-User Chat Complications: Upon further discussion of the proposed schema, @c.gato inquired about its applicability to multi-user chat scenarios. @dreamgen suggested the core schema should be unopinionated and open to extensions to cater to diverse tasks like RP or story-writing.

  • Support for Non-Linear Learning in AI: @suikamelon shared a research paper on the potential benefits of integrating structured cognitive learning methodologies into LLM instruction tuning and asked about disabling random shuffling in favor of curriculum learning. @c.gato expressed interest in exploring the sorting of training examples by length.

  • Contemporary Discussion on Instruction-Tuned LLMs: @suikamelon discussed a novel approach to instruction tuning inspired by curriculum learning, indicating models may perform better when complex instructions are fine-tuned last. They express skepticism but acknowledge the potential utility of a more structured and less randomized approach to fine-tuning.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #general-help (11 messagesšŸ”„):

  • Real-time LoRA Adapter updates: @wizmak sought advice on adding fine-tuned LoRA adapters to a base model in real-time, without needing to merge and restart. @nanobitz confirmed it’s possible with HF, implying you can load and unload the PEFT model dynamically.

  • SGLang and LLaVA Worker Requirements?: @CodeMan questioned if both SGLang and LLaVA workers are necessary for particular functionality, but the context or responses to the query were not provided.

  • In Search of DeepSpeed Config for Model Parallelism: @mihai4256 asked for a working DeepSpeed Zero 3 config for model parallelism, noting the challenge of finding one despite their expectation that it should be readily available.

  • Mixture of Experts Training Resources: @emperor inquired about the best repository to train different mixture of experts architectures from scratch, later suggesting Megablocks as a possible solution, though no direct affirmations or alternatives were provided.


OpenAccess AI Collective (axolotl) ā–· #runpod-help (15 messagesšŸ”„):

  • RunPod Image Works on Vast.AI: @dreamgen shared a Public Service Announcement that the Axoltl RunPod image can be used on Vast.AI without any issues, and it works straight out of the box.
  • Opting for Vast.AI over RunPod: @dreamgen and @dangfutures noted that Vast.AI might offer cheaper GPUs than RunPod, especially when H100 SXM GPUs are rarely available on RunPod’s community cloud.
  • Ease of Setup between Services: @dangfutures commented that although Vast.AI may be preferred for GPU pricing, RunPod offers a simpler setup process. @dreamgen said the setup felt similar on both and highlighted that /workspace/axolotl isn’t empty on Vast.
  • Data Transfer Queries Resolved: Users asked about transferring data from services like Google Storage to Vast. @dreamgen recommended using scp, and @nanobitz mentioned you can SSH into a Docker container provided by Vast, offering considerable flexibility.
  • Troubleshooting RunPod GPU Issues: @c.gato indicated frustration with 4090 RunPods, stating that they appear to have driver issues, mentioning crashes due to lack of AMP support, and also had difficulty getting the axolotl docker to work on Vast.

CUDA MODE ā–· #general (9 messagesšŸ”„):

  • LLM’s Struggle with Math Post Business-Finetuning: User @mertbozkir suggested that a 7B parameter model fine-tuned on business data would give poor answers to math questions if it lacks domain-specific methods like forward/backward reasoning. They mentioned alternatives like internlm, metamath, arithmo which are configured for such tasks.
  • Price Surge for 3090s GPUs: @joseph_en lamented the price increase of 3090s GPUs, sharing their experience of buying them cheaper back in July and August, implying a significant cost uptrend.
  • GPU’s Unpredictable Value: @andreaskoepf humorously referred to the GPUs as ā€œGPU goldā€ in the light of recent price fluctuations and the shared experiences of @joseph_en.
  • Resource Page Update Reminder: @andreaskoepf acknowledged the need to update their resource-stream page with the recently posted links, showcasing an effort to keep shared resources organized.

CUDA MODE ā–· #cuda (12 messagesšŸ”„):

  • Microsoft corners chip market: @andreaskoepf humorously suggested that Microsoft has bought the full production capacity of chips, affecting the market, and joked about antitrust agencies being unable to keep up with Sam Altman’s pace, insinuating a dystopian future where traditional efforts can’t combat Altman’s ā€œnano-bot and virus army.ā€
  • GPU Troubles with PyTorch: @_tvi_ shared frustrations about working with PyTorch on a Radeon VII and Ryzen APU, citing issues with video RAM allocation and kernel crashes when large memory chunks are allocated.
  • CUDA Compatibility Anecdotes: @shikhar_7985 sought advice on managing different CUDA versions for various projects, while @btdubbins discussed the need to remain pinned to CUDA 11 for compatibility with FAISS, before considering an update to CUDA 12.
  • PyTorch Inline Loading Bug: @eporat reported a problem with PyTorch’s load_inline not creating .so files, which was resolved by changing the optimization flag; @marksaroufim suggested a workaround for a known recompilation issue by adding a semicolon to the code — as discussed here.
  • Conda for CUDA Version Management: In response to a CUDA version management query, @marksaroufim recommended using Conda when working with PyTorch, as found on PyTorch’s official site.

Links mentioned:


CUDA MODE ā–· #algorithms (9 messagesšŸ”„):

  • Exploring the Bounds of Function Composition: @euclaise shared insights on using prefix-sum-like scans for computing complex recurrences, such as y[t]=max(y[t-1], x[t]), indicating the approach’s generality due to function composition’s associative property. Further details and discussions can be found in their tweets.

  • Skepticism on Function Representation: @andreaskoepf questioned the practical limitations in representing functions using the method @euclaise discussed, expressing curiosity about the performance-acceptable class of such representations.

  • Practical Challenges with Associativity of Functions: @_tvi_ pointed out the computational difficulty in applying function associativity for more complex functions, suggesting the approach’s utility may be limited to functions that are ā€œeasy to represent and fast to apply.ā€

  • In Search of Knowledge on Function Classes: @telepath8401 inquired about resources for understanding ā€œeasily representableā€ functions and their classes, indicating a desire to learn more about the subject.

  • Collaboration Invitation for RingAttention Kernel Project: @andreaskoepf extended an invitation for collaboration on the RingAttention kernel project within the cuda-mode Discord, offering to help organize GPU resources and coordinate efforts despite not being able to fully devote themselves as a developer.


CUDA MODE ā–· #beginner (9 messagesšŸ”„):

  • User Reveals Their GPU: @cs_os_05101 mentioned simply, ā€œI have 4060 Ti.ā€
  • Looking for Engaging CUDA Literature: @euclaise inquired about CUDA books that are fun to read; however, no specific book titles were recommended in the conversation.
  • Shaders as an Entry Point: @marksaroufim suggested The Book of Shaders by Patricio Gonzalez Vivo and Jen Lowe as a fun and progressive guide to Fragment Shaders.
  • euclaise Familiar with Shaders, Not CUDA: Despite the suggestion, @euclaise clarified they are already familiar with shader programming but not directly with CUDA or compute shaders.
  • Seeking Fun in Programming Massively Parallel Processors (PMPP): Although not characterizing it as fun, @marksaroufim mentioned PMPP (Programming Massively Parallel Processors) as the best resource they’ve found related to CUDA, while @euclaise expressed a willingness to try it, suggesting that research is the most fun for them.

Links mentioned:

The Book of Shaders: Gentle step-by-step guide through the abstract and complex universe of Fragment Shaders.


CUDA MODE ā–· #pmpp-book (7 messages):

  • Matrix Transposition May Not Boost Performance: @andreaskoepf brought up whether maintaining both vectors for dot products in sequential memory makes a significant performance difference and suggested the idea of an alternating memory layout. @jeremyhoward responded, noting from his experience that transposing the matrix for tile creation did not yield any performance improvements.

  • Exploring In-Loop Index Order Optimization: @eporat mentioned that instead of in-place transposing, changing the order of indices in the inner loop might be a viable optimization. However, @andreaskoepf seems unsure about the improvements as the data would be read transposed in any case.

  • Altering For-Loop Variables Slows Down Performance: @eporat tested changes in a CUDA kernel, only to find that changing the order of loop variables made the function even slower. They shared a modified function with an atomicAdd operation, but it failed to work efficiently with shared memory.


CUDA MODE ā–· #youtube-recordings (4 messages):

  • Lecture 5 Inquiry and Discovery: @filippob82 inquired about the availability of Lecture 5 on the Cuda YouTube channel. @reluctantly_normalized replied with a link to the lecture on Jeremy Howard’s channel: Going Further with CUDA for Python Programmers.
  • Suggestion for Cuda’s Channel: @reluctantly_normalized suggested to have a reference or a copy of the lecture on Cuda’s official YouTube channel for easier access.
  • Playlist Creation Idea: @filippob82 proposed creating a YouTube playlist as a possible solution to organize the lectures.

Links mentioned:

Going Further with CUDA for Python Programmers: This technical talk by Jeremy Howard explores advanced programming techniques for maximizing performance when using CUDA with Python. The focus is on optimiz…


CUDA MODE ā–· #jax (2 messages):

  • Uncertain Future for TensorFlow?: User @spacyphus inquired about the potential discontinuation of TensorFlow, but there was no further discussion or information provided to confirm or deny this possibility.
  • JAX vs. PyTorch Debate Sparked: User @marcom79 asked @660097403046723594 for an opinion on JAX versus PyTorch, suggesting that PyTorch 2.0 might be feature-equivalent to JAX after its recent updates. The conversation did not progress beyond the initial query.

LLM Perf Enthusiasts AI ā–· #general (10 messagesšŸ”„):

  • Gemini Pro 1.5 Excites with Massive Context Window: @wenquai expressed excitement for Gemini Pro 1.5, which boasts a 1 million token context window and the ability to process long videos. They find the capacity to handle such extensive content impressive.
  • Skepticism Around Large Context Windows: @thebaghdaddy expressed skepticism about the effectiveness of large context windows, like the 250k ones, citing that models tend to perform worse after 50-60k tokens. They referenced testing on Claude which showed negligence of content in the middle of large context windows.
  • Curiosity Based on Google’s Claims: @wenquai acknowledged this skepticism but mentioned relying on Google’s reports for optimism, also revealing efforts to gain access through a Google cloud rep.
  • Gemini’s Claimed Ten Million Token Context: @thebaghdaddy corrected an earlier figure, stating Gemini Pro 1.5 claims an even more astonishing ten million token context window, referencing a post by Jeff Dean.
  • Jeff Dean Highlights Gemini 1.5 Pro Innovations: A detailed post by Jeff Dean shared by @thebaghdaddy unveils Gemini 1.5 Pro, highlighting its 10 million token context length and ability to handle vast multimodal inputs. Dean’s post Twitter includes links to a main blog post, technical report, and various interaction videos along with announcements of a limited developer preview and upcoming broader model release with pricing tiers.

Links mentioned:

Tweet from Jeff Dean (@šŸ”) (@JeffDean): Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pr…


LLM Perf Enthusiasts AI ā–· #gpt4 (1 messages):

robotums: yeah


LLM Perf Enthusiasts AI ā–· #offtopic (15 messagesšŸ”„):

  • Massive Dataset Processing Costs: @res6969 shared that their dataset comprises 35k PDFs with around 40 pages each, resulting in substantial processing costs, particularly due to the use of a vision transformer.
  • Vision Transformers vs GPT-4V: In the cost breakdown, @res6969 clarified that the vision transformer represents a significant portion of the cost, despite initially considering whether the cost was mostly due to GPT-4V.
  • Announcing Surya OCR: @robhaisfield highlighted a new OCR tool called Surya OCR, which outperforms Tesseract in text recognition for 93 languages according to a tweet from @VikParuchuri.
  • Seeking Cost-Effective Alternatives: @robhaisfield and @res6969 discussed the possibility of finding a more efficient method than the vision transformer for classifying sections in PDFs, with the vision transformer costing $10/1000 pages.
  • Innovative Solutions on the Horizon: @robhaisfield suggested the use of GPT-4V or Llava for identifying charts or figures in a PDF as a potential cost-saving measure, which @res6969 acknowledged could indeed work and contemplated doing the math to compare costs.

Links mentioned:

Tweet from Vik Paruchuri (@VikParuchuri): Announcing surya OCR - text recognition in 93 languages. It outperforms tesseract in almost all languages, often by large margins. Find it here - https://github.com/VikParuchuri/surya .


LLM Perf Enthusiasts AI ā–· #irl (1 messages):

  • AI Wednesdays with Free Pizza: @ivanleomk invites AI enthusiasts to gather at Funan Mall, Singapore, next Wednesday for a project hacking session complete with free pizza. The event is hosted by Gabriel Chua, Jon Jon, & tengfone, with details and registration available here. Only one spot is left and registration requires host approval.

Links mentioned:

AI Wednesdays Ā· Luma: Let’s hang out and build! šŸ› ļø šŸ”„ šŸ“ Location: Near Funan Mall (Exact location will be provided to registered attendees) ā° Doors open at 5.30pm, and feel free to join any time. šŸ• Pizza, šŸ“¶ā€¦


LLM Perf Enthusiasts AI ā–· #openai (8 messagesšŸ”„):

  • GPT-5 Rumors Abound: @res6969 humorously noted the extent of rumors about GPT-5 might be overhyped.
  • Laughter Among Enthusiasts: Emojis from @res6969 and @potrock indicate amusement, possibly in response to ongoing discussions or the hype around GPT-5.
  • OpenAI Tests ChatGPT with Memory: @potrock shared OpenAI’s blog post on a new ChatGPT feature that tests memory across conversations, allowing users to request the AI to remember or forget certain pieces of information.
  • Skepticism Over OpenAI’s Recent Updates: @thebaghdaddy expressed a critical view that OpenAI might be using strategy leaks as a distraction from less popular feature releases in the past months.
  • Announcing OpenAI’s Sora: @res6969 linked to OpenAI’s introduction of Sora, a text-to-video AI model that generates minute-long videos and is now being tested by red teamers and creative professionals to assess potential risks and gather feedback on its use.

Links mentioned:


Alignment Lab AI ā–· #ai-and-ml-discussion (4 messages):

  • Fine-Tuning LLMs on Business Data Affects Math Performance: @sabu7003 inquired about the potential performance of a large language model (LLM), specifically a 7B parameter model, on math questions after fine-tuning on business data only. @rusch suggested that such fine-tuning would gradually degrade the model’s math capabilities, with the degree of degradation being proportional to the intensity and duration of the fine-tuning process.
  • Optimism vs. Pessimism in ML Decision Making: @rrenaud shared an insight on the role of optimism in exploration/exploitation during reinforcement learning (RL), and how, conversely, pessimism during inference can prevent machine learning (ML) systems from deviating too far from the training distribution and help maintain stability in sequential decision-making.

Alignment Lab AI ā–· #general-chat (1 messages):

  • Seeking Business-Specific Instructions: @sabu7003 is looking for methods to extract only business-related instructions from the teknium/OpenHermes-2.5 Instruction dataset. They have not indicated any methods attempted or any links to the dataset.

Alignment Lab AI ā–· #oo (2 messages):

  • Silence from a Discord User: @joshxt expressed concern over not hearing from a user, hinting that the user’s Discord might be broken. @atlasunified suggested that direct messaging (DM) him is the best course of action.

Alignment Lab AI ā–· #qa (1 messages):

daydream.nation: o sh


Skunkworks AI ā–· #general (1 messages):

  • LLaVA Setup Inquiry: User @CodeMan is seeking advice on integrating LLaVA with an SGLang server and SGLang worker, as opposed to the standard model worker setup. No responses or further discussion followed.

Skunkworks AI ā–· #datasets (1 messages):

  • Seeking Business-Specific Instructions: @sabu7003 inquired about methods to filter out business-related instructions from the teknium/OpenHermes-2.5 Instruction dataset. They are looking for guidance on how to isolate business-specific data.

Skunkworks AI ā–· #finetuning (1 messages):

  • Inquiring Minds Want to Know: User @sabu7003 questioned the ability of a 7B parameter LLM to answer math questions after being fine-tuned on business data alone, pondering if and how the performance on math would differ from business queries. There was no response or further discussion provided in the channel messages.

Skunkworks AI ā–· #papers (4 messages):

  • Can Random Seeds be Learnable?: @stereoplegic inquired about the possibility of random seeds being learnable as scalar parameters in AI models.
  • Learning Random Seeds - A Technical Impossibility?: @aspott asserted that learning a random seed isn’t feasible since one can’t get a gradient on a random seed.
  • Exploring Seed Loss and Initialization Functions: Despite the challenge, @stereoplegic suggested evaluating the loss of passes through the parameters initialized by seeds, while @aspott proposed the potential to learn an initialization function instead.

AI Engineer Foundation ā–· #events (7 messages):

  • Weekly Meeting Kickoff: @._z announced the start of the weekly meeting with a jovial ā€œDĆ©jĆ  vuā€ sentiment.
  • Absentee Alert: @juanreds informed that they couldn’t attend the weekly meeting.
  • Hackathon Co-hosting Opportunity: @caramelchameleon inquired about interest in co-hosting an AI developers hackathon before Game Developers Conference, open to both online and onsite participation in San Francisco.
  • Hackathon Organizer Steps In: @yikesawjeez expressed interest, mentioning their experience in organizing hackathons related to events in the Bay Area.
  • Exclusive Founders x VC Event Slots Open: @atalovesyou shared an opportunity for startup founders to join an investor matchmaking session with limited additional spots available at Founders x VC Event, featuring 30+ venture capital firms and extensive networking opportunities.

Links mentioned:

Founder x Investor Matchmaking Ā· Luma: LIMITED SPOTS REMAINING. We have received interest from over 600+ Pre-Seed, Seed, Series A+ Founders. We are at capacity but opened a few more slots for founders on a ticket…


Datasette - LLM (@SimonW) ā–· #ai (2 messages):

  • Google Unveils Gemini 1.5 Pro: @tariqali shared a Google Developers blog post announcing the private preview of Google’s Gemini 1.5 Pro, which reportedly has the same performance as Gemini 1.0 Ultra but uses less compute. It also referenced the model’s ability to handle a context window of 1 million tokens.

  • Expanding the Context Window: In discussing the importance of prompt engineering with larger context windows, @tariqali speculated that the need for prompt engineering could decrease as the ability to input more relevant data directly increases. They considered the possibility that the cheaper compute might outweigh the efforts of prompt engineering, potentially rendering it an antiquated skill.

  • Gemini’s Potential Constraints: Following up, @tariqali highlighted a line from the Google Blog indicating that Google has successfully tested models with up to 10 million tokens, but chose to release Gemini 1.5 with a 1 million token context window instead. They inferred that there might be significant constraints, such as cost, preventing the release of models with larger context windows, suggesting that prompt engineering may still be valuable in the short term.

Links mentioned: