> AI News for 3/7/2024-3/11/2024. We checked [**356** Twitters](https://twitter.com/i/lists/1585430245762441216) and **21** Discords (**335** channels, and **6154** messages) for you. Estimated reading time saved (at 200wpm): **734 minutes**. We [added Unsloth AI today](https://news.ycombinator.com/item?id=39671146).

Google’s recently released Gemma model was widely known to be unstable for finetuning. Last week, Daniel Han from Unsloth got some love for finding and fixing 8 bugs in the implementation, some of which are being upstreamed. There is a thread, blogpost, and today Hacker News commentary and Google Colab to follow along, with some deserved community love.

image.png

It is full of extremely subtle numerical precision issues like this: image.png

Which takes extreme attention to detail to notice. Kudos!


Table of Contents

[TOC]

PART X: AI Twitter Recap

all recaps done by Claude 3 Opus. Today’s output is lightly swyx edited. We are working on antihallucination, NER, and context addition pipelines.

Here is a summary of the key topics and themes from the provided tweets, with relevant tweets organized under each category:

Technical Deep Dives

  • Yann LeCun explains the technical details of a pseudo-random bit sequence used to pre-train an adaptive equalizer, which is a linear classifier trained with least squares and a descendant of the Adaline (competitor of the Perceptron).
  • Subtweeting a Yann tweet, FranƧois Chollet argues that the information bandwidth of the human visual system is much lower than 20MB/s, despite having 1 million optic nerve fibers. He estimates the actual information input is under 1MB/s, and the information extracted by the visual cortex and incorporated into the world model is even lower, measured in bytes per second.
  • NearCyan feels that search engines provide monotonous sludge with zero actual information, so he now uses LLMs as his primary conduit of information with any semblance of reality.

New AI Model Releases & Benchmarks

  • Arav Srinivas reports that after 100s of queries on Perplexity with Claude 3 (Opus and Sonnet) as the default model, he has yet to see a hallucination, unlike his experience with GPT-4. Similar reports from others who are switching.
  • Hacubu benchmarked Anthropic's new Claude-3 models on structured data extraction using LangSmith. The high-end Opus model had no errors over 42 examples and slightly outperformed the previous non-GPT-4 contender, Mistral-Large.

Emerging Trends & Reflections

  • Yann LeCun reflects on the history of AI, noting that generations of researchers thought the latest paradigm would lead to human-level AI, but it's always harder than expected with no single magic bullet. However, progress is definitely being made and human-level AI is merely a matter of time.
  • Teknium predicts that people will start breaking down every GPT-based pipeline and rebuild it to work well with Claude instead.
  • Aidan Clark experiences the emotional rollercoaster of hitting a bug and loving/hating machine learning in quick succession when working on ML projects.

Tutorials & How-To Guides

  • Santiago Valdarrama recorded a 1-hour video on building a RAG application using open-source models (Llama2 and Mixtral 8x7B) to answer questions from a PDF.
  • Jerry Liu demonstrates receipt processing with @llama_index + local models + PaddleOCR.
  • Langchain published in-depth documentation on how to customize all aspects of Chat LangChain, in both Python and JS/TS, including core concepts, modifications, local runs, and production deployment.

Memes & Humor


PART 0: Summary of Summaries of Summaries

Claude 3 Sonnet (14B?)

  1. Model Finetuning and Performance Optimization:

    • Unsloth AI discussions centered around finetuning Gemma models, dealing with special tokens, and addressing issues like OOM errors. Solutions included updating Unsloth, using pip install "unsloth[cu121-torch220] @ git+https://github.com/unslothai/unsloth.git", and exploring Gradient Low-Rank Projection (GaLore) (ArXiv paper) for reducing memory usage during LLM training.
    • The CUDA MODE community explored techniques like thread coarsening, vectorized memory access, and CUDA profiling tools to optimize performance. Projects like ring-attention and flash decoding were discussed.
    • Answer.AI announced the ability to train 70B models locally using FSDP + QLoRA on standard GPUs like RTX 3090 (blog post).
  2. AI Model Comparisons and Benchmarking:

    • Discussions compared models like Claude Opus, GPT-4, and Mistral for coding prowess, with Claude Opus often outperforming GPT-4 in areas like SQL and Rust. Users also anticipated the release of GPT-4.5/5 and its potential improvements.
    • The DiscoResearch community explored using GPT-4 and Claude3 as judges for creative writing, developing benchmarks, and comparing models like Brezn3 and Dpo on German datasets.
    • Gemini was highlighted for its impressive performance, with a YouTube video comparing it to Claude Opus and GPT-4 Turbo, noting its superior speed and lower costs.
  3. AI Ethics, Regulation, and Societal Impact:

    • Concerns were raised about censorship and restrictions creeping into AI models like the ā€œClaude 2 self-moderated versions.ā€ Discussions touched on balancing free expression with content moderation.
    • The impact of AI on creativity and employment was debated, with some believing AI will assist rather than replace human creativity, while others anticipated job market shifts.
    • A Slashdot article highlighted U.S. government concerns about frontier AI posing an extinction-level threat, suggesting potential regulatory measures.
  4. Open-Source AI Models and Community Contributions:

    • Anticipation grew around the open-sourcing of models like Grok by @xAI, as announced by Elon Musk’s tweet.
    • Cohere introduced Command-R, a new retrieval augmented model with a 128k context window and public weight release for research (blog post).
    • Community members shared projects like Prompt Mixer for building AI prompts, an open-source AI chatbot using LangChain, and tools like claudetools for function calling with Claude 3.

Claude 3 Opus (8x220B?)


  • Claude Outperforms GPT-4 in Coding Tasks: Engineers have observed that Claude Opus consistently delivers more complete and effective code outputs compared to GPT-4, particularly excelling in languages like SQL and Rust, as discussed in the OpenAI Discord.

  • Perplexity AI's Context Retention Struggles: Users have expressed frustration with Perplexity AI's inability to retain context effectively, often defaulting to base knowledge responses, leading to refund requests and bug reports, as seen in the Perplexity AI Discord. The removal of the 32k context length feature from the roadmap has also raised transparency concerns.

  • Gemma Models Gain Traction Despite Issues: While Gemma models have shown promise, such as the release of Ghost 7B v0.9.1 which ranked 3rd on VMLU's leaderboard, users in the LM Studio Discord have reported technical issues with Gemma models in LM Studio, even after the release of custom quantized versions.

  • Efficiency Breakthroughs in LLM Training and Inference: Researchers have made significant strides in reducing memory requirements and accelerating LLM training and inference. GaLore (arXiv paper) reduces memory usage by up to 65.5%, while Answer.AI's system using FSDP and QLoRA (blog post) enables training 70B models on consumer GPUs. For inference, techniques like ToDo (arXiv paper) can increase Stable Diffusion speeds by 2-4.5x through token downsampling.

ChatGPT (GPT4T)

I apologize for that oversight. Here's a revised version incorporating the relevant links inline:

  • Finetuning Challenges and Solutions in AI Modeling: The Unsloth AI community tackled finetuning Gemma, highlighting issues with special tokens and adapter precision. Recommendations include reinstalling xformers to address errors, suggested via command pip install "unsloth[cu121-torch220] @ git+https://github.com/unslothai/unsloth.git". The integration of multi-GPU support and a FSDP + QLoRA system by Answer.AI for training 70B models on gaming GPUs marked significant advancements (oKatanaaa/unsloth). Ghost 7B v0.9.1 showcased advancements in reasoning and language, accessible on huggingface.co, highlighting Unsloth AI's efficiency improvements during LLM fine-tuning.

  • Emerging AI Technologies and Community Engagement: OpenAI Discord highlighted Claude Opus' superior performance over GPT-4 in coding tasks, spurring discussions on AI consciousness and Claude's capabilities. Technical solutions for GPT-4 bugs and strategies to improve ChatGPT's memory recall were shared, emphasizing the use of an output template for achieving consistency in custom models.

  • Model Compatibility and Efficiency in Coding: LM Studio's discourse revolved around model selection for coding and cybersecurity, noting Mistral 7B and Mixtral's compatibility with various hardware. Persistent issues with Gemma models prompted suggestions for alternatives like Yi-34b, available on arXiv. Discussions on power efficiency and ROCM compatibility underscored the ongoing search for optimal LLM setups, with detailed hardware discussions available at their hardware discussion channel.

  • Innovative Tools and Techniques for AI Development: CUDA MODE Discord provided insights into merging CUDA with image and language processing. The community also engaged in self-teaching CUDA and exploring Triton for performance improvements. Techniques like GaLore and FSDP with QLoRA for large model training were discussed, along with shared resources for CUDA learning, including CUDA Training Series on YouTube and lecture announcements for CUDA-MODE Reductions.

These summaries more accurately reflect the discussions and technical explorations across AI communities, showcasing challenges, innovative solutions, and the collaborative spirit driving advancements in the field, with relevant links provided inline for deeper exploration.


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord Summary

Finetuning Frustrations and Triumphs: Discussions focused on finetuning Gemma created challenges with special tokens and the efficacy of model loading after finetuning, suggesting potential versioning issues and the impact of adapter precision. A recommendation included reinstalling xformers with pip install "unsloth[cu121-torch220] @ git+https://github.com/unslothai/unsloth.git" to address errors and updating Unsloth as a possible fix for OOM errors.

Unsloth Giveaways and Growth: The Unsloth community celebrated the implementation of multi-GPU support (oKatanaaa/unsloth) and the release of a new FSDP + QLoRA system by Answer.AI for training 70B models on gaming GPUs. A knowledge sharing exercise for Unsloth finetuned models on Kaggle identified key bugs and fixes, and the community also recognized contributors’ support on Ko-fi.

Boosting Productivity with Unsloth AI: Ghost 7B v0.9.1 advanced in reasoning and language, ranking 3rd on VMLU’s leaderboard and accessible on huggingface.co. Another significant achievement was reported by @lee0099, demonstrating Unsloth AI’s optimizations resulting in a 2x speedup and 40% memory reduction during LLM fine-tuning with no loss in accuracy.

Celebrating AI Contributions and Cutting-edge Updates: The Unsloth AI community shared updates and insights, including a new 0.43.0 release of bitsandbytes for FSDP support, contributing to the existing finesse of framework operations. AI2 Incubator’s provision of $200 million in AI compute to startups was highlighted, and discussions around OpenAI’s transparency surfaced as consequential.

Welcoming Winds and Gear for Growth: New Unsloth community members were directed to essential information channels, while suggestions for Unsloth advancements involved integrating features from Llama-factory into Unsloth. The prominence of the Galore thread was acknowledged, and a GitHub project named GEAR was shared, showcasing an efficient cache compression recipe for generative inference (GEAR on GitHub).


OpenAI Discord Summary

  • Claude Edges Out GPT-4 in Coding Prowess: Engineers have noted that Claude Opus appears to outperform GPT-4 in providing coding solutions, exhibiting strengths in SQL and Rust. The community has cited Claude’s ability to offer more complete code outputs.

  • AI’s Existential Question: Consciousness on the Table: The guild has engaged in debates concerning the potential consciousness of AI, specifically Claude. Papers and philosophical views on universal consciousness have been referenced, revealing a profound interest in the metaphysical aspects of AI technology.

  • AI Hiccups: Workarounds for GPT-4 Bugs: Users across the guild have reported GPT-4 outages and language setting bugs. A widely agreed solution is to switch the language to Auto-detect and refresh the browser, which has helped alleviate the issues for many users.

  • Transforming Prompts into Reliable AI Memories: Discussions have revolved around optimizing ChatGPT’s memory recall with prompt structuring. The approach includes formatting advice, like avoiding grammar mistakes and ensuring clarity, and using summaries to cue AI memory.

  • Maximizing Output Consistency across Custom Models: For achieving consistent outputs from custom GPT models, it’s been suggested to use an output template. The template should contain variable names that encode summary instructions, aligning well with an engineer’s need for standardized results.


LM Studio Discord Summary

  • Model Selection for Coding and Cybersecurity: Engineers are exchanging experiences using various models like Mistral 7B and Mixtral on different systems, including Mac M1 and PCs with Nvidia GPUs. For more detailed hardware and model compatibility discussions, such as running 70B models on a 64GB M2 MacBook with slow response times, engineers are advised to consult the hardware discussion channel.

  • GEMMA Models’ Quirks Confirmed: Technical issues persist with Gemma models in LM Studio, even after the release of custom quantized versions. Yi-34b with a 200k context window was suggested as a feasible alternative.

  • Explorations in Power Efficiency for LLM Setups: Community members are actively discussing the power consumption of high-end GPUs like 7900 XTX and CPU performance, especially AMD 3D cache models. The importance of efficient RAM setups and cooling systems, like Arctic P12 fans, is also noted. For system configuration recommendations, hardware discussion chat is a valuable resource.

  • Desire for Improved LM Studio Features: Users are requesting enhancements in LM Studio, including the ability to view recent models easily and more sophisticated filter capabilities to select models by size, type, and performance. Solutions like using Hugging Face to view recent models with a specific search are being shared while waiting for the platform to expand its features. An example search link is here.

  • ROCM Readiness for Diverse Operating Systems: Compatibility concerns with ROCM on various operating systems, including Windows and non-Ubuntu Linux distributions, have been raised. ROCm’s performance on Debian has been described as challenging due to Python version conflicts and AMD’s predominant Ubuntu support. Users successfully running models on Windows with ROCM have suggested using koboldcpp and the override HSA_OVERRIDE_GFX_VERSION=10.3.0.

  • CrewAi vs AutoGen Evaluation for Bot Integration: As users navigate the complex landscape of bot integrations, with options like AutoGen and CrewAi, there’s active discussion on structural design and compatibility. CrewAi is characterized by its intuitive logic, while AutoGen offers a graphical user interface. Concerns over token costs due to agent loops and API calls are noted for those integrating these systems with GPT.


Perplexity AI Discord Summary

Perplexity’s Context Retention Struggles: Users expressed frustrations over Perplexity AI’s context handling ability, with complaints about it defaulting to base knowledge responses and subsequent requests for refunds. Concerns were raised about transparency after the removal of the 32k context length from the roadmap.

Confusion Around API Token Limits: Queries on the maximum output token length for new models and the absence of the expected 32k context length feature on the roadmap sparked discussions, amidst concerns of documentation inconsistencies and how they might affect API usage and development of projects like an Alexa-like personal assistant.

New Users Navigate the Pro Plan: New Perplexity Pro users were confused about redeeming promo subscriptions and using the API conservatively to avoid depleting credits, leading to requests for clear guidance on usage tracking.

Legal, Health, and Tech Discussions on Sharing Channel: Insightful conversation threads from the sharing channel touched on Apple’s legal actions against Epic, life expectancy concerns, the merits of a specific Super Bowl halftime show, Google’s payments to publishers, and discussions on nootropic efficiencies recommending caffeine, L-theanine, and creatine stack.

Comparative Analysis and Learning: The community exchanged thoughts on diverse AI services, comparing Perplexity to others like Copilot Pro and ChatGPT Pro, with Perplexity drawing praise specifically for its image generation capabilities.


Nous Research AI Discord Summary

  • Decoding the Decoder Models: An inquiry by @mattlawhon regarding the implications of using longer sequences during inference with decoder models trained without Positional Encoding was raised. @vatsadev clarified that feeding more tokens is possible, though it may lead to errors or nonsensical output, and the question’s specificity caused some puzzlement among peers.

  • Creative AI Unleashed: A new multiplayer game Doodle Wars ventured into neural network-scoring doodles, while discussions on enabling party games with fewer players via multi-modal LLMs took place. The announcement of Command-R from Cohere as a new generation model optimized for RAG and multilingual generation was also shared through Hugging Face.

  • Benchmarks and AI Analysis Expansion: The Gemini AI, designed to understand entire books and movies, was introduced alongside the WildBench and Bonito models, proposing new approaches to benchmarking and dataset creation. Discussions also highlighted Lex Fridman’s tweet addressing the intersection of AI with power dynamics, although the exact content wasn’t provided.

  • Model Parallelism and GPT-next: The complexities of model parallelism were dissected, with insights on the limitations of current methods and anticipation for GPT-5’s release stirring debates. Meanwhile, Cohere’s new model release and practical assistance with Genstruct were also hot topics.

  • LLMs at the Forefront: The ability to train effective chatbots with a curated set of 10k training examples was discussed, referencing insights from the Yi paper found on Reddit. XML tagging was highlighted as an evolving method for precise function call generation, and open-webui was recommended as a user-friendly GUI for Claude 3.

  • Quality Data and Quirky Model Responses: Within Project Obsidian, the challenge of maintaining data quality was acknowledged. Language models reflecting user-provided assumptions — even whimsical events like a fictional squirrel uprise — point to inherent model behaviors worth considering.

  • Focused Discussions for Bittensor: A prompt reminder was issued to keep the discussion on Bittensor topics, following a scam alert. Questions about primary insights from models produced by the subnet and the mention of an enhanced data generation pipeline, aimed at increasing diversity, indicated ongoing improvements.


LlamaIndex Discord Summary

  • Innovative Code Splitting with CodeHierarchyNodeParser: Users in the LlamaIndex guild discussed the use of CodeHierarchyNodeParser for splitting large code files into hierarchies, potentially enhancing RAG/agent performance. The approach has been shared on Twitter.

  • AI Chatbot Challenges and Cosine Similarity Clarifications: A user sought advice on creating a RAG chatbot using LlamaIndex, citing the Ensemble Retriever document, while another user clarified the range of cosine similarity, which includes negative values and its implication for similarity score cutoffs in a query engine, referencing Wikipedia.

  • Handling Ingestion Pipeline Duplication and Conda Install Issues: Discussions highlighted solutions for ingestion pipelines processing duplicates, solved by using filename_as_id=True, while another user reported on and sought help with resolving Conda installation conflicts involving version mismatches and modules not found post-upgrade.

  • Query Pipeline Storage Queries and PDF Parsing with LlamaParse: One user inquired about saving pipeline outputs, questioning the feasibility of using Pydantic objects, and another shared informational resources on PDF parsing using LlamaIndex’s LlamaParse service through a YouTube video.

  • Engaging Community with User Surveys and AI-enhanced Browser Automation: LlamaIndex is conducting a 3-minute user survey, found here, to gather user feedback for improvements while also discussing LaVague, a project by @dhuynh95 utilizing RAG and MistralAI to aid in creating Selenium code from user queries, detailed in this post.


LAION Discord Summary

  • Artefacts Troubling Engineers: Technical discussions highlighted issues with high-resolution AI models, such as discernible artefacts at large resolutions and constraints in smaller models like the 600m. Engineers like @marianbasti and @thejonasbrothers indicated a shared concern that these limitations might prevent full realization of the models’ capabilities.

  • Constructing Advanced Video Scripting Tools: @spirit_from_germany proposed an advanced two-model system for video scripting capable of analyzing and predicting video and audio, recommending concentration on the most popular videos to ensure data quality. The idea was shared through a Twitter post.

  • Generated Datasets Under Microscope: @pseudoterminalx mentioned the limitations of generated datasets, underlining the potential for being trapped within a specific knowledge corpus and the automated descriptions being constrained by the training of the generating model.

  • CogView3 vs. Pixart - An Incomplete Picture: The exploration of CogView3’s framework, a 3-billion parameter text-to-image diffusion model, was discussed with reference to its arXiv paper. The absence of comparative data with Pixart was noted, bringing into question the assessments of CogView3’s capabilities.

  • Loss Spike Dilemmas on MacBooks: MacBook Pro M1 Max users like @keda4337 are facing challenges with overheating while training diffusion models, resulting in erratic loss spikes from 0.01 - 0.9 to 500 when resuming training across epochs. Such issues underscore the practical challenges of model training on certain hardware configurations.


HuggingFace Discord Summary

  • Inference API Performance Inquiry: @hari4626 reported possible performance issues with Hugging Face’s Inference API, expressing concerns about receiving incomplete responses which might affect production suitability.

  • Collaborative Learning on Generative AI: @umbreenh and @yasirali1149 showed interest in collaborative learning on generative AI for development purposes, while @wukong7752 looked for guidance on calculating KL-divergence in latent-DM.

  • Algorithm Optimization & AI Advancements: Discussions about AI models for programming optimization included GitHub Co-Pilot and DeepSeek-Coder instruct. Important resources include discussions about strategic reasoning with LLMs using few-shot examples (arXiv paper) and the scope of NLP covered by a deep learning article.

  • AI-Created Entertainment and Legal Datasets Released: Doodle Wars, a neural network-scored doodling multiplayer game, was introduced at Doodle Wars, and Caselaw Access Project with Harvard Library released over 6.6 million U.S. court decisions data set, accessible via Enrico Shippole’s Tweet.

  • Mistral Model Bluescreens and Image-to-Text With Problems: User @elmatero6 sought advice on CPU-optimizing Mistral to prevent system bluescreens, and @ninamani searched for high-performing, accurate open-source models for uncensored image captioning, with cogvlm as a suggested option, albeit with noted quantization stability issues.


Eleuther Discord Summary

  • The Great BOS Debate: The use of the Beginning of Sentence (BOS) token was under scrutiny, with a consensus that its application varies across different models; no uniform standard exists. HFLM code was discussed regarding the incorporation of ā€˜self.add_bos_token’.

  • Efficiency Leap in Image Diffusion: ToDo, a method to up the ante in Stable Diffusion speeds by up to 2-4.5x through token downsampling, piqued interest, with related repository and discussion spanning potential implications for AI residency in hardware.

  • Zero-Shot Wonders Overtaking Few-Shots: Counterintuitive results on MMLU benchmarks showed zero-shot outperforming few-shot, sparking theories on context distraction and an idea to curve test with varying shots.

  • Dependencies and Developments in NeoX Land: GPT-NeoX development touched on the challenges of dependency management and the necessity of Apex, amid a climate of container complexity and Flash Attention updates.

  • Resources for AI Interpretability Aspirants: ARENA 3.0 was hailed as a ā€œgemā€ for embarking on interpretability research, with a juicy link to riches: ARENA 3.0 Landing Page.

  • On the AI Existential Radar: A chilling Slashdot article spotlights U.S. government concerns over frontier AI as an extinction-level threat, nudging towards heavy-handed regulatory steps.


Latent Space Discord Summary

  • AI Biographer Raises Security Eyebrows: @swyxio recommended trying out Emma, the AI Biographer but advises caution on privacy, opting for fake details during trials.

  • Leadership Restructured at OpenAI: After internal turmoil, OpenAI reinstates Sam Altman as leader and welcomes three new board members, concluding a governance review.

  • Ideogram 1.0’s Quiet Entrance: The potential of Ideogram 1.0, a new text rendering tool, is noted by @swyxio but seems to have slipped under the radar.

  • Microsoft Research Seeks LLM Interface Feedback: A new interface standardization proposal from Microsoft, AICI, is currently up for community feedback, particularly on its Rust runtime, as shared in a Hacker News post.

  • State Space Model Could Rival Transformers: @swyxio spotlights ā€œMamba,ā€ a State Space Model, as a Transformer alternative for LLMs, guiding interested AI Engineers to a visual guide and research paper.

  • Latent Space Paper Clubs Activate!: In various time zones, members are gearing up for GPT-focused discussions with preparatory notes shared and real-time responses to queries during sessions, such as clarifying ā€œcausal attention.ā€

  • AI-strategy Sessions Spark Community Sharing: From tips on workflow optimization using AI to sharing AI-enhanced CLI tools like asciinema, AI in Action Club members are not just engaging but also advocating for future topics like decentralized AI applications.

  • Asia Engages with GPT-2 Knowledge: A call to join Asia’s paper-club members for an EPIC presentation on the GPT-2 paper was made by @ivanleomk. Further engagement is seen with the recent release of a Latent Space pod.


Interconnects (Nathan Lambert) Discord Summary

  • New Roles to Distinguish Discord Members: Nathan Lambert introduced new roles within the Discord guild to separate manually added close friends from subscribers, inviting feedback on the change.

  • GPT-4’s Doom Playing Capabilities Published: GPT-4 demonstrated its ability to play the 1993 first-person shooter game Doom, as described in a paper shared by Phil Pax (GPT-4 Plays Doom). The model’s complex prompting is highlighted as a key factor in its reasoning and navigation skills.

  • Musk and Open Models Stir Debate: A tweet by Elon Musk about OpenAI’s Grok being open-sourced led to discussions around market reactions and the use of ā€œopen source,ā€ with concerns over OpenAI’s ongoing commitment to open models also mentioned. Separately, Cohere’s new model Command-R sparked anticipation among engineers due to its long context window and public weight release, potentially impacting startups and academia (Command-R: Retrieval Augmented Generation at Production Scale).

  • AI-Centric Dune Casting Game Unfolds: Discord members humorously cast prominent figures from the AI industry as characters from Dune, with suggestions including Sam Altman as the Kwisatz Haderach and Elon Musk as Baron Harkonnen.

  • Reinforcement Learning Podcast and Papers Touted: Ian Osband’s TalkRL podcast episode on information theory and RL was recommended (Ian Osband’s episode on Spotify), and discussions emerged around a paper on RLHF, PPO, and Expert Iteration applied to LLM reasoning (Teaching Large Language Models to Reason with Reinforcement Learning). The theme of consistent quality in RL content was echoed across discussions.

  • Inflection AI Model Integrity Questioned: After similar outputs were noted between Inflection AI’s bot and OpenAI’s Claude-3-Sonnet, debates ensued over possible A/B testing or model wrappers, exacerbated by Inflection AI’s response about its bot Pi remembering previous inputs (Inflection AI’s Clarification).

  • Costs and Approaches to Model Training Examined: The affordability of less than $1,000 for pretraining models like GPT-2 and the potential for deals on compute, such as Stability AI’s speculated sub-$100,000 expenditure for their model’s compute, were hot topics. Fine-tuning with books and articles using a masking strategy was also discussed.

  • Sam Altman’s Return to OpenAI and Light-Hearted Role Queries: Sam Altman’s return to the OpenAI board prompted discussions and a sprinkling of humor about leadership. Discord roles, including self-nominated goose roles, were jestingly proposed as subscribers’ stakes became a topic of amusement.


OpenRouter (Alex Atallah) Discord Summary

  • Mistral 7b 0.2 Takes the Stage with Speed: The newly introduced Mistral 7b 0.2 model is making waves with a 10x performance increase for short outputs and 20x for longer outputs, and boasts a 32k token context window. A performance demo can be viewed in a tweet by OpenRouterAI.

  • Gemma Nitro Offers Efficiency and Economy: OpenRouter announces a new model, Gemma Nitro, with over 600+ tokens per second speed and pricing set at an affordable $0.1 per million tokens. Details are outlined on OpenRouter’s model page.

  • Conversations Heat Up Around AI Censorship: User concerns rise about censorship potentially affecting AI models, like Claude 2’s self-moderated versions, prompting discussions about free expression and the need for uncensored platforms, alongside technical inquiries regarding message formatting and system parameters.

  • Community Innovates with Claude 3 Library: @thevatsalsagalni presents claudetools, a library that facilitates function calling with Claude 3 models, promoting ease-of-use for developers with Pydantic support. The library is available for community contribution on GitHub.

  • Technical Discussions Abound on Model Limits and Usage: Users discuss the technical aspects of AI models, delving into topics like GPT-4’s token output limitation, the intricacies of Claude API’s role message handling, and the utilization of Chat Markup Language (ChatML) in prompt customization. Community-created tools, like a Google Sheets connection app, demonstrate growing engagement and address model accessibility concerns.


CUDA MODE Discord Summary

  • Combining CUDA with Image and Language Processing: Engineers discussed challenges in concatenating image features with caption layers and the use of linear layers to project image features to the shape of NLP embeddings. Further insights included CUDA’s potential for improving machine learning model operations by employing techniques like vectorized additions.

  • Exploring CUDA and Triton Development: The community is engaging in self-teaching CUDA and exploring tools for performance improvement, such as the Triton language. There’s an interest in comparing CUDA’s performance to higher-level tools like libtorch and understanding the compilation process involved in torch.compile.

  • Advancements in Large Model Training: Techniques like GaLore and FSDP with QLoRA are discussed for their contribution to reducing memory requirements and enabling the training of large models on standard GPUs. An ArXiv paper covers Gradient Low-Rank Projection, and Answer.AI’s blog post provides insights on training a 70b model at home.

  • CUDA Knowledge Sharing and Lecture Announcements: A YouTube playlist and GitHub repository for the CUDA Training Series were shared, while a call for participation in a CUDA-MODE Reductions lecture was announced, with resources for the lecture available online. Moreover, CUDA novices discussed compilation differences and performance observations across PyTorch versions.

  • Job Opportunities and Project Development in CUDA: A developer is sought to design a custom CUDA kernel, offering a remuneration between $2,000 and $3,000 USD, with prerequisites including experience in algorithmic development and CUDA programming. Conversations also highlighted user projects like building a custom tensor library and the importance of depth in knowledge for CUDA’s practical applications.


LangChain AI Discord Summary

  • Innovative Prompt Crafting Desktop App: User @tomatyss introduced Prompt Mixer, a new tool for building, testing, and iterating on AI prompts, offering features like connecting to various models, prompt version tracking, and a guide for creating custom connectors.

  • Enhancements on Langchain: Users discussed multiple aspects of Langchain such as PDF extraction issues, handling complex logic in templates, wrappers for ChatOllama functions such as Ollama Functions, execution locations for Langchain Serve, and capturing outputs from routes. Meanwhile, Claude3 support enhancement is in progress as indicated by @baytaew, referencing a Pull Request #18630 on GitHub.

  • RAG Tutorial Resources Shared: Tutorials on improving and utilizing Retrieval Augmented Generation (RAG) were shared by @mehulgupta7991 and @infoslack, providing videos on enhancing RAG with LangGraph and building a chatbot with RAG and LangChain respectively.

  • Open Source Tools for Chatbots and Data Analytics: An open-source AI Chatbot for conversational data analysis was shared by @haste171 on GitHub, while @appstormer_25583 released Data GPTs in Appstorm 1.5.0 for data exploration and visualization with sample GPTs for various industries.

  • Automated Lead Generation & Generation Tools: @robinsayar is developing an automated tool for generating leads using public company information, sparking interest from @baytaew who is anticipating the potential impact of such innovation.


DiscoResearch Discord Summary

  • AI Judging Creative Writing: Skepticism was raised by .calytrix about the feasibility of AI models judging creative writing due to parameter limitations. Despite this, GPT-4 and Claude3 are being tested with detailed scoring criteria for such a task, a benchmark is being developed by .calytrix, and Mistral large has been suggested as a potential candidate for an ensemble of AI judges by bjoernp.

  • Evo Tackles Genomic Scale: Evo, featuring the StripedHyena architecture, was released by Together AI and the Arc Institute for handling sequences ranging from DNA, RNA, to proteins and supports over 650k tokens. Interest was shown by johannhartmann in AutoMerger for automatic model merging, though it’s currently non-operational.

  • Benchmarking Tools and Strategies Discussed: johannhartmann shared the tinyBenchmarks dataset for efficient AI benchmarking and expressed intent to translate it for broader usability. Insights on benchmarking with the Hellaswag dataset suggested that using 100 data points might be insufficient for detailed comparisons.

  • Advancements and Challenges in German AI Research: johannhartmann provided insights into training models like Mistral using the German Orca dataset and addressed technical issues encountered by crispstrobe in model merging through a GitHub commit fix. Additionally, Brezn3 showed promising improvements over its predecessor given benchmark results, while Dpo (Domain Prediction Override) was noted as in progress. Consideration was being given to DiscoLM for better benchmarking consistency over previous base models.


Alignment Lab AI Discord Summary

  • AI Hallucination Challenges Spark Debate: Engineers explored strategies to minimize AI hallucinations, discussing Yi’s report without a consensus on a definition and considering methods like using RAG (Retrieval-Augmented Generation) or employing a manual rewrite of repetitive responses in fine-tuning datasets to mitigate hallucinations. No consensus emerged from the discussion.

  • Mermaid Magic for Code Diagrams: The use of Claude to create mermaid graphs from code bases up to 96k tokens was presented as an innovative approach to visualizing code architecture, sparking interest in potential applications for such visualization techniques.

  • Gemma-7b Arrives with a Bang: The introduction of Gemma-7b, enhanced with C-RLFT and fine-tuned using 6T tokens, was heralded as a significant achievement, almost matching the performance of Mistral-based models. The first usable fine-tune is available on HuggingFace and was celebrated in a tweet by OpenChatDev.

  • Balancing Act Between Gemma and Mistral Models: A conversation highlighted why Gemma 7B was released even though it doesn’t outperform Mistral 7B, with agreement that each model represents a distinct experiment and Gemma’s potential was yet to be fully explored, especially in areas like NSFW content moderation.

  • Community Collaboration in Coding: Users shared experiences and extended calls for collaboration, particularly around setting up a Docker environment to facilitate development. The tone was comradely, emphasizing the value of collective input in overcoming technical hurdles.


LLM Perf Enthusiasts AI Discord Summary

  • Free AI Tools for Vercel Pro Subscribers: Claude 3 Opus and GPT-4 vanilla are now accessible for free to those with Vercel Pro. More information and tools can be found at the Vercel AI SDK.

  • Migrating from OpenAI to Azure SDK: Transitioning from OpenAI’s SDK to an Azure-based solution has been a topic of interest for users like @pantsforbirds, who are seeking advice on potential migration challenges.

  • XML Enhances Function Calls in Claude: Users, notably @res6969, have noted improved function call performance when using XML tags with Claude. Conversely, @pantsforbirds pointed out that embedding XML complicates sharing prompt generators.

  • Opus Rises Above GPT-4: Discussions led by users @jeffreyw128, @nosa_., and @vgel highlighted Opus prevails over GPT-4 in delivering smart responses. @potrock preferred Claude’s straightforward prose over GPT’s more verbose explanations. Users are eagerly anticipating GPT-4.5 and GPT-5 releases, curious about enhancements over current models.

  • Speculations on Google’s Potential AI Dominance: @jeffreyw128 theorizes Google could dominate in general AI use due to its capability to integrate AI into its existing platforms, like search and Chrome, and offer it at lower costs, possibly introducing a Generative Search Experience. However, they suggest that OpenAI may maintain a competitive lead with specialized applications, while Google might prioritize a blend of generative and extractive AI solutions.


Skunkworks AI Discord Summary

  • A Groundbreaking Claim in AI Training: @baptistelqt has announced a substantial methodological breakthrough, asserting the ability to accelerate convergence by a factor of 100,000 by training models from scratch each round. The details of the methodology or verification of these claims have not been provided.

Datasette - LLM (@SimonW) Discord Summary

  • Shout-out to Symbex: @bdexter expressed gratitude for regular usage of symbex, with @simonw acknowledging the project’s fun aspect.

AI Engineer Foundation Discord Summary

Mysterious Mention of InterconnectAI: A user named .zhipeng appears to have referenced a blog post from Nathan’s InterconnectAI, but no specific details or context were provided.

AI Video Deep Dive Incoming: An event has been announced focusing on Gen AI Video and the ā€˜World Model’, featuring speakers such as Lijun Yu from Google and Ethan He from Nvidia, set for March 16, 2024, in San Francisco and available on Zoom. Those interested can RSVP here.


PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ā–· #general (368 messagesšŸ”„šŸ”„):

  • Inquiry about Finetuning Gemma: @kaleina_nyan and @starsupernova discussed issues with finetuning Gemma using the ChatML template, with concerns about whether special tokens like <start_of_turn> and <end_of_turn> are trained for vanilla pre-trained models. They explore potential fixes and workarounds, such as unfreezing the embedding matrix (Unsloth Wiki).

  • Multi-GPU Support for Unsloth: @kaleina_nyan shared a fork she made on GitHub implementing multi-GPU support to Unsloth (oKatanaaa/unsloth) and further discussed potential issues with numerical results and memory distribution.

  • New FSDP + QLoRA Training System: @dreamgen highlighted a new system released by Answer.AI, capable of training 70B models locally on typical gaming GPUs, not yet sure how it differs from existing methods involving DeepSpeed and QLoRA.

  • Experiences Sharing Unsloth Finetuned Models on Kaggle: @simon_vtr shared experiences attempting to use Unsloth finetuned models in a Kaggle competition, dealing with issues related to offline packages and inference bugs. A notebook with bug fixes for Gemma models was mentioned for inference use on Kaggle by @starsupernova.

  • Thanking Supporters: @theyruinedelise and @starsupernova expressed gratitude towards the Unsloth community members for their support on Ko-fi, thanking individual contributors like @1121304629490221146 and @690209623902650427 for their donations.

  • Gemma Token Mapping and generate Method: @kaleina_nyan and @starsupernova engaged in a technical discussion about the function of map_eos_token and its implications for the .generate method of Gemma models. They identified a potential issue with generate not stopping after creating `

Links mentioned:


Unsloth AI (Daniel Han) ā–· #welcome (4 messages):

  • A Warm Welcome and Handy Reminders: @theyruinedelise greeted new members with a hearty welcome in multiple messages and encouraged everyone to check out important channels. Members are specifically reminded to read information in channel 1179040220717522974 and to select their roles in channel 1179050286980006030.

Unsloth AI (Daniel Han) ā–· #random (19 messagesšŸ”„):

  • CUDA Conundrums: User @maxtensor reports Bootstrap CUDA exceptions in certain scripts within the same environment where others work perfectly, wondering if it’s an OS script limitation. Troubleshooting with @starsupernova leads to potential GPU visibility issues.

  • Praise for the Framework: @maxtensor expresses admiration for a framework they find innovative, stating it ā€œopens a lot of new doors.ā€

  • New bitsandbytes Version Released: @maxtensor shares a link to the new 0.43.0 release of bitsandbytes, notable for FSDP support and officially documented Windows installation, but remains cautious about updating their working environment.

  • AI2 Incubator’s Massive Compute Giveaway: @mister_poodle shares news about the AI2 Incubator, which has secured $200 million in AI compute resources for its portfolio companies, offering significant support for startups in the AI space.

  • Questions Around OpenAI’s AGI Tactics: @iron_bound and @theyruinedelise discuss concerns and implications of OpenAI’s approach to AI development, particularly in relation to sharing scientific advancements and Elon Musk’s stance on OpenAI’s alleged shift in openness.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #help (514 messagesšŸ”„šŸ”„šŸ”„):

  • Xformers Installation Issues: A user @fjefo encountered errors related to xformers while attempting to use Unsloth AI with Gemma models. They were advised by @starsupernova to reinstall xformers, and later to use the python package installation command pip install "unsloth[cu121-torch220] @ git+https://github.com/unslothai/unsloth.git".
  • Gemma Model Load and Fine-tuning Challenges: [Gemma Loading Difficulty] @patleeman faced troubles loading a finetuned Gemma 2B model using Unsloth on the vLLM server, getting a KeyError for lm_head.weight. After a workaround to skip the key, the model loaded fine, suggesting a potential issue on vLLM’s end, as discussed in this Github issue.
  • Using HF_HOME Environment Variable with Jupyter: [HF_HOME Troubles] @hyperleash struggled with setting the HF_HOME environment variable in Jupyter notebooks for Unsloth. They managed to successfully set it for .py scripts but hit a snag with notebooks, stating no logs were generated for troubleshooting. @starsupernova acknowledged the issue, confirmed there are no logs, and provided advice on trying to set the environment variable correctly.
  • Discussions on Finetuned Model Performance: Users discussed the performance of finetuned models. @mlashcorp observed a performance discrepancy with the merged model versus when loading the adapter directly. @starsupernova suggested trying "merged_4bit_forced" and mentioned precision issues when merging adapters.
  • Downloading and Finetuning Gemma 7B Issues: @fjefo reported issues with downloading and finetuning Gemma 7B but was later able to initiate training successfully. They mentioned OOM errors compared to Mistral 7B and were guided by @starsupernova to update Unsloth and consider redownloading via transformers.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #showcase (8 messagesšŸ”„):

  • Ghost 7B v0.9.1 Takes Flight: User @lh0x00 announced the release of Ghost 7B v0.9.1, touting improvements in reasoning and language capabilities in both Vietnamese and English. It’s available for online use and app applications at huggingface.co.
  • Ghost 7B Secures Top Rank: In a subsequent message, @lh0x00 mentioned that Ghost 7B v0.9.1 scored high enough to rank 3rd in VMLU’s ā€œLeaderboard of fine-tuned modelsā€.
  • Community Cheers for Ghost 7B: Users @starsupernova and @lh0x00 exchanged congratulations on the successful launch and high performance of the Ghost 7B model.
  • French AI app insight: User @theyruinedelise shared a YouTube video titled ā€œ4 apps incroyables qui utilisent l’IAā€ offering insights into impressive AI apps: Watch here.
  • Unsloth AI Accelerates Fine-tuning: @lee0099 discussed finetuning yam-peleg/Experiment26-7B on a NeuralNovel dataset, highlighting Unsloth AI’s optimizations that lead to 2x speedup, 40% memory reduction, and 0% accuracy degradation during LLM fine-tuning.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #suggestions (5 messages):

  • Suggestion for Unsloth Integration: User @imranullah suggested implementing features from Llama-factory into Unsloth AI, implying that such features have proven to be good in their current application.
  • Agreement on Galore’s Usefulness: User @starsupernova agreed on the usefulness of the Galore thread, endorsing its potential application.
  • Implementation Ease: User @remek1972 humorously remarked on the ease of implementing a certain feature, tagging @160322114274983936 in the conversation.
  • GitHub Project Shared: @remek1972 shared a link to a GitHub repository named GEAR, which relates to an efficient KV cache compression recipe for generative inference of large language models. View the GEAR project on GitHub.

Links mentioned:

GitHub - opengear-project/GEAR: GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM: GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM - opengear-project/GEAR


OpenAI ā–· #ai-discussions (611 messagesšŸ”„šŸ”„šŸ”„):

  • AI-Assisted Coding Comparison: Users like @askejm and @sangam_k shared experiences comparing the coding capabilities of Claude Opus and GPT-4. The consensus seems to be that Claude Opus is better for coding, offering more complete code outputs and performing well in languages like SQL and Rust.

  • Exploring AI’s Consciousness: A discussion led by @sotiris.b touched on the belief by some that Claude might be conscious. Debates included different views on universal consciousness and whether AI can be considered conscious, with users like @metaldrgn and @dezuzel discussing papers on the topic.

  • GPT-4’s Cutoff and Performance: User @webhead confirmed using test queries that GPT-4’s knowledge cut-off is in April 2023 and that while ChatGPT’s conversations may be slower, the recall abilities of various models vary, with Google’s 1.5 preview showing impressive recall but potential shortcomings in specific tasks.

  • International Access to AI Products: There were several mentions of difficulties accessing Claude 3 Opus internationally, with users @lightpictures and @lazybones3 discussing workarounds. User @webhead recommended using openrouter for testing different models.

  • Subscription Issues with OpenAI: User @arxsenal described a problem with their ChatGPT Plus subscription not being recognized. Others, including @eskcanta, suggested ways to resolve it, including clearing cache, using different devices/browsers, and contacting support through the OpenAI help site.

Links mentioned:


OpenAI ā–· #gpt-4-discussions (78 messagesšŸ”„šŸ”„):

  • GPT Outage and Language Setting Bugs: Multiple users, including @kor_apucard, @dxrkunknown, @snolpix, and @alessid_55753, reported issues with GPT not responding. A common fix found by users like @pteromaple and confirmed by others such as @katai5plate and @hccren, was to switch the language preview in settings to Auto-detect and refresh the browser.

  • Chat Functionality Troubles and Workarounds: Issues were not limited to a single browser, as @dxrkunknown and @macy7272 had problems on both web and mobile. Solutions varied with @pteromaple suggesting language setting changes, whereas @winter9149 found deleting old chats could help resume normal operation.

  • Discussions around AI Competitors: Several users, including @tsanva, @1kio1, and @zeriouszhit, discussed possibly switching to competitor models like Claude, citing context window limitations and confusion in responses from GPT. Concerns were also raised about the lack of comparable features to support Claude compared to those available for GPT.

  • Help and Status Updates: User @openheroes shared a link to OpenAI’s status page indicating no current outages, suggesting users ensure they are not on a VPN or blocking connections and referencing the help center for additional support.

  • Payment Queries for GPT Creators: User @ar888 inquired about payment for GPT creators, to which @elektronisade responded by noting that the official word from OpenAI suggested payments would start in Q1 for US creators, as stated in a blog post.

Links mentioned:

OpenAI Status: no description found


OpenAI ā–· #prompt-engineering (90 messagesšŸ”„šŸ”„):

  • In Search of Enhanced ChatGPT Memory: User @youri_k was troubleshooting ChatGPT’s ability to recall chat history for context in responses and received advice from @eskcanta on how to improve the prompt structure to handle memory, including the suggestion to ask for a summary before ending conversations.
  • ChatGPT Struggles to Sketch for Beginners: @marijanarukavina encountered issues getting ChatGPT to create a simple sketch explaining Boundary Value Analysis; @eskcanta suggested using Python tool for better results and provided a step-by-step approach to tweaking the model’s output.
  • Delving into GPT-Based UI Generation: @dellgenius probed into how GPT-4 could be used for creating Figma plugins or generating UI elements, with @eskcanta sharing a link showcasing GPT-4’s potential capabilities in this area.
  • GPT for Homework Assistance? Not Quite: @levidog enquired about extracting questions from an assignment document using chatGPT, but @darthgustav. cautioned about the limitations and the ethical considerations of using GPT for homework-related tasks.
  • Achieving Consistent Output in Custom GPTs: @iloveh8 sought advice on ensuring consistent responses from custom GPT models, and @darthgustav. recommended using an output template with variable names that encode summary instructions.

OpenAI ā–· #api-discussions (90 messagesšŸ”„šŸ”„):

  • Efficient Prompt Engineering with GPT: @eskcanta articulated the basic steps for creating efficient prompts, outlining the importance of clarity, language proficiency, and instructing the model with specifics. They advised to avoid typos, grammar mistakes, and to communicate in any language well understood by both the user and the AI.

  • Keeping Custom GPT Outputs Consistent: According to @darthgustav., employing an output template with variable names that encode a summary of the instructions can help maintain consistent output from custom GPT prompts.

  • Professional Vocabulary Expansion Challenge: @ericplayz sought assistance in rewriting a paragraph with professional vocabulary while keeping the word count; @eskcanta shared an attempted solution and prompted for feedback to assess if the needs were met. The guidance included ensuring that the rewritten text in Romanian maintains length, details, and appropriate tone.

  • JSON Formatting in GPT-4 Discussions: @dellgenius inquired about the use of JSON formatting for organizing responses; @aminelg confirmed its utility for structured data, and @eskcanta answered questions about creating UI elements and the varying capabilities of the AI model. There was a focus on how GPT models can aid in designing UI elements, provided the AI has been trained on the relevant data or tools.

  • Requests for Assistance Using ChatGPT API: Users @youri_k and @levidog requested help with making ChatGPT remember chat history and extracting questions from an assignment document, respectively. They received guidance from @eskcanta, who suggested using summaries for history retention and cautioned that the models are not designed to aid with homework, which might lead to inconsistent results.


LM Studio ā–· #šŸ’¬-general (407 messagesšŸ”„šŸ”„šŸ”„):

  • Exploring LLM Capabilities: Users are discussing the capabilities of different models and seeking advice on model choices for specific purposes, such as coding and cybersecurity. They are sharing experiences using models like Mistral 7B and Mixtral on various systems, including Mac M1 and PCs with Nvidia GPUs.

  • Technical Troubleshooting in LM Studio: Some users, such as @amir0717, have encountered errors when trying to load models in LM Studio and are seeking help to resolve issues like ā€œModel operation failedā€ or Models ā€œdid not load properly.ā€ Others are offering solutions such as running LM Studio as an administrator or adjusting GPU offload settings.

  • Hardware Limitations and Model Performance: Users with different hardware specs are asking about the best models to run on their systems. For example, @mintsukuu with an 8GB Mac M1 is advised by @yagilb to try out 7B models with conservative layer settings, while @dbenn8 reports running 70B models on a 64GB M2 Macbook, albeit with slow response times.

  • Interest in New and Alternative Models: There are queries about support for newer models like Starcoder2 and Deepseek-vl in LM Studio. Some users, like @real5301, are looking for models with large context windows upwards of 80k tokens, and @heyitsyorkie suggests Yi-34b with a 200k context window.

  • Development of LM Studio: A user mentions the development pace of LM Studio in relation to llama.cpp builds and @yagilb confirms an upcoming beta, acknowledging that updates have been slower than desired. It was noted that the development team has expanded from one to three members.

Links mentioned:


LM Studio ā–· #šŸ¤–-models-discussion-chat (110 messagesšŸ”„šŸ”„):

  • GEMMA Models Puzzlement: @boting_0215 encountered issues with all Gemma models not being usable. @fabguy confirmed that only a few Gemma quants work, and these are custom quantized versions by the team, pinpointing a potential issue either with LM Studio or the Gemma model.

  • Troubleshooting Gemma Load Error: @honeylaker_62748_43426 received an error when loading a 7B Gemma model and @heyitsyorkie affirmed that Gemma models frequently encounter issues, with some quants known to be broken.

  • Searching for the Elusive Slider: @jo_vii sought advice for models suitable for an M2 Max Apple Metal and @fabguy suggested using a DeepSeek Coder Q4 or Q5 to leave room for other processes.

  • Model Upload Confusion: @anand_04625 couldn’t find the file upload button for the Phi model in LM Studio, and @heyitsyorkie clarified that model file uploads are not supported.

  • Awaiting Starcoder 2 Update: @rexeh was looking for alternatives to Starcoder 2 on lm studio for ROCm users, and @heyitsyorkie indicated that support for Starcoder 2 will come in the future, while currently recommend building llama.cpp independently.

Links mentioned:

  • The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits: Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single param…
  • What is retrieval-augmented generation? | IBM Research Blog: RAG is an AI framework for retrieving facts to ground LLMs on the most accurate information and to give users insight into AI’s decisionmaking process.
  • Ternary Hashing: This paper proposes a novel ternary hash encoding for learning to hash methods, which provides a principled more efficient coding scheme with performances better than those of the state-of-the-art bin…

LM Studio ā–· #🧠-feedback (7 messages):

  • Users Crave a ā€œNew Modelā€ Section: @justmarky expressed a wish for an option in LM Studio to view recent models without needing to search, to easily discover what’s new.
  • Desire for More Sort Options: @purplemelbourne echoed the sentiment, suggesting additional sort functions like filtering by the model’s release date or specific ranges such as the last 6 or 8 months.
  • Hugging Face Workaround Shared: @epicureus shared a workaround by using Hugging Face to view recent models with a specific search link.
  • Existing Channels as Interim Solutions: @yagilb pointed to existing Discord channels #1111649100518133842 and #1185646847721742336 as current places to discuss and find information about the latest models.
  • Feature Refinement & Selection Criteria Wishlist: @purplemelbourne requested advanced filtering capabilities in LM Studio to select models by size, type, and performance, specifying a desire to search based on VRAM requirements and ratings.

Links mentioned:

Models - Hugging Face: no description found


LM Studio ā–· #šŸŽ›-hardware-discussion (147 messagesšŸ”„šŸ”„):

  • Taming Power Consumption with GPUs: @666siegfried666 noted that even high-end GPUs like the 7900 XTX don’t always reach their Total Board Power (TBP) limit, staying around 140W in their setup, and sought details on real-time TBP draw for the 4060 Ti in LLM. They also highlighted the importance of CPUs, especially AMD 3D cache models, and RAM setups in power efficiency, and advocated for Arctic P12 fans due to their low power draw.

  • The Race for Efficiency in LLM Systems: Users discussed balancing price, power, and performance when building LLM systems. @nink1 talked about the profitability of Apple M3 processors running LLM tasks on a single battery, while @666siegfried666 brought up regional variations in hardware pricing.

  • Exploring GPU Underclocking & Overclocking: @666siegfried666 shared insights into effective undervolting without underclocking, mentioning optimal performance per watt for the 7900 XTX at 2400-2500MHz. @nink1 considered dynamic underclocking/overclocking in response to workload changes.

  • LLM Performance Enthusiasts Share Configurations: @goldensun3ds related their experience with a substantial load time for a 189K context LLM on their system, and users exchanged advice on hardware setups for LLM, including the efficient operation of AMD GPUs with LLM, and the use of dual GPUs to improve performance.

  • Practical Advice for New LLM Hardware Entrants: A new user, @purplemelbourne, engaged with the community to understand if they could run multiple LLMs on their newly acquired RTX2080Ti GPUs. The conversation evolved into a general discussion about hardware configurations and potential upgrades involving V100 cards and NVLink for running high-memory models.

Links mentioned:


LM Studio ā–· #🧪-beta-releases-chat (7 messages):

  • Token Overflow Troubles: @jarod997 experienced gibberish responses in Win Beta 4 (0.2.10) when the chat reaches a multiple of the token overflow amount such as 2048, 4096, etc.
  • Context Overflow Policy Check: @jedd1 suggested checking the Context Overflow Policy settings and also mentioned changes might not be prominent but do occur semi-regularly.
  • Upgrade Recommendation Discussion: @jedd1 and @fabguy both recommended upgrading to the newer 0.2.16 version which might resolve the issue noted by @jarod997.
  • Beta vs. Stable Release Confusion: @jarod997 couldn’t find the suggested version on LMStudio.ai, before clarifying they need to use the Beta due to their machine’s support for AVX and not AVX2.

LM Studio ā–· #autogen (1 messages):

  • Debating the Best Bot Integration: @purplemelbourne is seeking advice on which integration to commit to between AutoGen, CrewAi, ChatDev, or any other options. They have AutoGen installed but have not executed their first run yet.

LM Studio ā–· #memgpt (3 messages):

  • MemGPT Shared Knowledge Base Query: @purplemelbourne asked if MemGPT can have a shared knowledge base across different programming models for tasks like bug fixing, considering using KeyMate for integration.
  • Practicality of Integrating GPT-4 with MemGPT: @nahfam_ replied that while it’s theoretically possible, the cost associated with using the GPT-4 API would be prohibitive. They suggest cleaning up MemGPT outputs with BeautifulSoup4 and Python to make it more manageable.
  • Cost Concerns with KeyMate Integration: @nahfam_ expresses skepticism about the sustainability of KeyMate’s business model, costing $60 a month for a GPT-4 128k powered chat, given the per-request token cost and potential rapid depletion of token allowance.
  • TOS Disapproval for KeyMate: @purplemelbourne comments on the harshness of KeyMate’s Terms of Service, providing a rather grim analogy to highlight their broad power of account termination.

LM Studio ā–· #amd-rocm-tech-preview (91 messagesšŸ”„šŸ”„):

  • ROCM on Debian vs Ubuntu: @quickdive. discussed the challenges of using ROCm on non-Ubuntu distros like Debian, highlighting Python version conflicts and installation hurdles. The user finds dual-booting necessary due to AMD’s official support being mainly for Ubuntu.
  • Windows Shows Promise for ROCm: @omgitsprovidence mentioned successfully running language models on Windows with an AMD GPU through koboldcpp, while @ominata_ shared a workaround using 'HSA_OVERRIDE_GFX_VERSION=10.3.0' for the RX 6600XT, suggesting users are finding creative solutions for ROCm on Windows.
  • Performance Inquiries and Comparisons: In discussions about performance, @sadmonstaa reported that their 6950XT was slower than their 5900x when offloading with ROCm. Others like @666siegfried666 had success with older AMD models, hinting at varying experiences among users.
  • Stable Diffusion on AMD: @aryanembered boasted about the capabilities of ROCm, mentioning it was possible to run Stable Diffusion on AMD hardware without DirectML, posing a significant ease-of-use advancement.
  • Dual-Booting Due to Compatibility Issues: Several users, including @sadmonstaa, lamented over the necessity of dual-booting due to the compatibility issues of certain software with Linux, even while preferring it. They discussed the implications of ROCm’s performance and occasional system crashes across different operating systems and setups.

Links mentioned:


LM Studio ā–· #crew-ai (4 messages):

  • Innovating with a Multi-Agent Framework: @pefortin is developing a complex framework where a front-facing agent clarifies user tasks, a project manager agent breaks down tasks into atomic units, HR expert agents create specialized personas for each task, and an executor runs the operation. The system also includes evaluators to ensure task resolution and fit, but it is currently running slowly and underperforming.

  • Soliciting Structure Feedback: @wolfspyre reached out to @pefortin to offer feedback on the structural design of the multi-agent framework being developed.

  • Seeking Compatibility Between Agent Systems: @purplemelbourne inquired about the compatibility between AutoGen and CrewAi, expressing a desire to understand which system would be optimal for use without significant time investment.

  • Contrasting AutoGen and CrewAi: @jg27_korny pointed out that AutoGen and CrewAi have different setups, with CrewAi having an easy and intuitive logic, while AutoGen offers a graphical interface. They advised using these systems with the GPT API for best performance and cautioned about the token cost due to potential agent loops.


Perplexity AI ā–· #general (595 messagesšŸ”„šŸ”„šŸ”„):

  • Perplexity’s Context Window Woes: Users like @layi__ and @sebastyan5218 expressed frustration with how Perplexity AI handles context, stating that the service struggles to retain awareness and often defaults to base knowledge responses, leading to requests for a refund and bug report submissions.
  • Pro Subscription Puzzles: New Perplexity Pro users like @lu.ciry encountered confusion around redeeming a promo subscription, prompting exchanges with @icelavaman for clarification on why their discount code was not appearing during the checkout process.
  • AI Chatbot Curiosity: Users such as @nihal_57646 inquired about creating their own AI chatbots and possibly sharing them with Perplexity, to which @icelavaman explained that Perplexity is not a chatbot provider, first suggesting using Collections as an alternative.
  • Translation Trials: @reborn09 discussed the challenge of translating a large Korean text file to English with Perplexity, with @codelicious advising on how to maintain context over multiple chapters and mention of a possible API use for automation in the translation process.
  • Discussions on AI Comparisons: There were mixed reviews about different AI services, with users like @13376666666666666666666666666669 criticizing Copilot Pro and praising Perplexity for image generation, while @twelsh37 provided more comprehensive comparisons across various platforms like ChatGPT Pro, Gemini, and Copilot.

Links mentioned:


Perplexity AI ā–· #sharing (38 messagesšŸ”„):

  • Epic vs. Apple Legal Battle Update: @jeffreyhammer shared insights on the legal spat involving Apple terminating Epic’s developer account. You can read more about the developments here.

  • Life Expectancy Concerns: @nippy_lovelace delved into the topic of life-span and its contributing factors. Dive into the conversation here.

  • Super Bowl Showdown: According to @johnmooredesign, a particular Super Bowl halftime show stands out as the greatest. Opinions or the name of the show? Check here.

  • Monetization in the Tech World: @pintobean8071 delved into the issue of Google paying publishers for the content. Details of the arrangement can be found here.

  • Nootropics Efficiency Discussed: @sevonade4 introduced Claude 3 Opus, discussing nootropics that work including a stack of caffeine, L-theanine, and creatine. Interested in cognitive enhancement? Start here for nootropics and here for the stack.


Perplexity AI ā–· #pplx-api (10 messagesšŸ”„):

  • Confusion Over Max Output Token Length: User @hex8529 asked about the maximum output in tokens for new models, noting that only context length is visible. @brknclock1215 responded, suggesting that the context window length minus the query and search results is effectively the max output.

  • Missing 32k Context Length on Roadmap: @dogemeat_ inquired about the apparent removal of the 32k context length feature from the roadmap, expressing concern over the lack of acknowledgement about this change.

  • New API User Seeking Guidance: @thbk_32074, a newcomer to the API, questioned whether light use through Raycast would deplete the $5 credit and asked if there’s a way to track usage.

  • Clarification on Model Output Limitations: @leoesq clarified that many models have maximum output limits of 3-8k tokens despite larger context windows, which are further influenced by finetune behavior, to which @brknclock1215 acknowledged possible documentation inconsistencies.

  • Seeking Assistance for Personal Assistant Project: User @shine0252 sought help to improve an Alexa-like personal assistant project using the pplx API for more concise and memory-capable interactions, and @dogemeat_ provided suggestions, mentioning sonar models for concise replies and advising on storing conversations for memory.


Nous Research AI ā–· #ctx-length-research (4 messages):

  • Curiosity About Decoder Models: @mattlawhon asked for insights regarding the use of longer sequences during inference when the decoder model was trained without Positional Encoding (PE).
  • Open-Ended Question Leaves Peers Puzzled: @vatsadev sought clarification on what @mattlawhon meant by referring to the use of longer sequences in decoder models.
  • Clarification on Decoder Constraints: @vatsadev confirmed that it is possible to feed more tokens to a decoder model at inference, but warned that it may lead to errors or nonsensical output.

Nous Research AI ā–· #off-topic (39 messagesšŸ”„):

  • Doodle Wars Game Announcement: @om7059 shared a new venture called Doodle Wars, a multiplayer game where players’ doodles are scored by a neural network.
  • AI-assisted Party Games: @denovich discussed how a multi-modal LLM could potentially allow playing the party game Telestrations with fewer than 4 players.
  • Physics Data with Genstruct: @ee.dd mentioned working on physics data using Genstruct and pondered the amount of data needed before attempting a training run.
  • Convergence Acceleration Method for Neural Networks: @baptistelqt announced a new method that could purportedly accelerate the convergence of any neural network by a factor of 10000.
  • Introduction of Cohere’s New Generative Model: @1vnzh shared a link to Hugging Face, presenting Command-R from Cohere as a 35 billion parameter model optimized for RAG and multilingual generation, C4AI Command-R.

Links mentioned:


  • Gemini Unlocks Book-Level Reasoning: @shashank.f1 highlighted a discussion with the Hugging Face community about sparse mixture of models and introduced Gemini, which is an AI capable of processing the content of entire books and movies in a single prompt. The linked YouTube video discusses Gemini’s capabilities and its comparison to other large language models, including being 20x cheaper than GPT-4.

  • WildBench Benchmark for Instruction Generation: @mister_poodle shared a link to the WildBench benchmark on Hugging Face, which could be seen as a call for a new type of benchmark to assess instruction generation in AI.

  • Bonito for Synthetic Dataset Creation: Continuing the benchmark theme, @mister_poodle also introduced Bonito, a model for converting unannotated text into task-specific training datasets, which has implications for both pretrained and instruction-tuned language models.

  • Lex Fridman Tweets About AI and Power: @mautonomy brought to attention a tweet by Lex Fridman, which potentially covers AI’s intersection with power and social dynamics (specific content of the tweet was not provided).

  • A Philosophically Optimistic AI Server: @norabelrose shared an invitation to a Discord server dedicated to discussions on AI, philosophy, technology, open source, and an optimistic future, also aiming to critique AI pessimism. The link to join is here, and @max_paperclips acknowledged the invitation with thanks.

Links mentioned:


Nous Research AI ā–· #general (395 messagesšŸ”„šŸ”„):

  • Model Parallelism Confusion: @mihai4256 shared a link about model parallelism that initially caused confusion, but @teknium clarified that Qlora has only worked model serial via device map auto and that Deepspeed has its own quant format. The discussion included comments from @rtyax, @stefangliga, and others.

  • Claude Conscious Project Plans Amidst Python Woes: Various users, including @mihai4256, @teknium, @gabriel_syme, and @fred.bliss, discussed their plans and experiences with the Claude Conscious project, with @mihai4256 expressing frustration with Python dependencies and @gabriel_syme creating a web page frontend in 25 minutes without web dev knowledge.

  • Big Plans for GPT-5’s Release: Users speculated on GPT-5’s potential release date, with predictions ranging from within 56 hours by @mautonomy to after the U.S. elections as per @ee.dd. @night_w0lf mentioned a new model, Deepseek-VL, that is flying under the radar.

  • New Releases and Tools: @gabriel_syme announced that Cohere released a new RAG/tool use model with weights on Hugging Face. @euclaise helped @tonic_1 fix a prompt format for Genstruct, and @.interstellarninja teased a new recursive function-calling LLM for local GPUs.

  • Deepseek Making Strides: @night_w0lf highlighted Deepseek-VL, a 7B model with promising performance, even beating and matching larger models on certain benchmarks. They also endorsed the academic knowledge benchmark MMMU and shared a paper link.

Links mentioned:


Nous Research AI ā–· #ask-about-llms (175 messagesšŸ”„šŸ”„):

  • On the Hunt for AI Papers: @main.ai and @atgctg sparked a discussion about the contents of the Yi paper, highlighting that 10k well-curated training examples could suffice for effective chatbot finetuning, according to a Reddit post detailing the paper’s takeaways (source).

  • Tokenizer Troubles: @stoicbatman brought up a conundrum about the feasibility of replacing or adding a language-specific tokenizer to a pre-trained GPT-2 model. @teknium and @stefangliga contributed to the idea that while tokens could be added, outright replacing the tokenizer would negate prior learning and possibly necessitate retraining from scratch.

  • XML Magic for Function Calls: The conversation around inducing LLMs to output function calls enclosed in XML tags was animated, with the team of @.interstellarninja and @teknium sharing their success in the precise generation of function calls and discussing the use of ufghfigchv’s tool sampler for increased output trustworthiness.

  • Guided Model Inference with Libraries: A discussion led by @sundar_99385, @.interstellarninja, and @ufghfigchv delved into the utility of libraries like outlines and SG-lang for guiding model inference. The collective insight pointed towards the benefits of precompiling grammars and using schemas derived from function signatures to improve reliability.

  • Query on LLM GUI Frontends: @vodros seeks recommendations for open-source GUI/frontend compatible with Claude 3, and @quicksort suggests trying out the open-webui which offers user-friendly WebUI for LLMs.

Links mentioned:


Nous Research AI ā–· #collective-cognition (3 messages):

  • Flash Attn Query Redirected: @pradeep1148 inquired about how to disable flash attention in Axolotl. @teknium informed that the channel is archived and suggested to ask the question in another specified channel <#1154120232051408927>.

Nous Research AI ā–· #project-obsidian (3 messages):

  • Acknowledging Data Quality Concerns: @gabriel_syme noted that data quality is a significant challenge.
  • Models Echo Provided Assumptions: @kainan_e pointed out that language models often simply ā€œagreeā€ with the sentiment or assumption provided by the user, potentially fabricating events like a fictional ā€œuprising of the squirrelsā€ in Namibia.

Nous Research AI ā–· #bittensor-finetune-subnet (3 messages):

  • Scam Alert in Bittensor Channel: User @teknium warned <@930423397366792202> that their recent post is considered a scam and this channel should only be used to discuss Bittensor related topics.
  • In Search of Insights on Bittensor’s Subnet Outputs: @vincentweisser inquired about the primary insights from the models produced by the subnet.
  • Enhancements in Bittensor Data Generation Pipeline: @teknium responded that there is an elaborate data generation pipeline under development which aims to improve upon the current models, highlighting that the existing pipeline isn’t providing the necessary diversity.

LlamaIndex ā–· #blog (10 messagesšŸ”„):

  • Hierarchical Code Splitting Innovation: @ryanpeach was recognized for their CodeHierarchyNodeParser, which splits large code files into hierarchies, enhancing RAG/agents. This approach is discussed in a tweet.
  • Live QA over Dynamic File Systems: Anup Surendran and Berke Can Rizai are featured for their @streamlit blogpost showcasing how to build a QA system on a dynamic Google Drive/Sharepoint using @pathway_com. Learn about the live ETL pipeline in the complete tweet.
  • AI-Powered Browser Automation: @dhuynh95’s project, LaVague, makes use of RAG and local embeddings + Mixtral from @MistralAI and @huggingface, aiming to produce Selenium code from user queries. The agent, functioning as a browser copilot, is discussed here.
  • User Survey Call-to-Action: LlamaIndex is conducting a 3-minute user survey to gather valuable feedback and input supported by a reminder tweet.
  • Enhanced RAG with Tree-Structures: @parthsarthi03 offers insights on using tree-structures to improve RAG pipeline functionality for complex questions, as highlighted in their latest webinar.

LlamaIndex ā–· #general (376 messagesšŸ”„šŸ”„):

  • Chatbot Creation Query: @o3omoomin asked how to create a RAG chatbot using the Llama index, specifically looking for frameworks and examples of already implemented RAG chatbots for deployment purposes. They referenced the Ensemble Retriever document and highlighted challenges faced when questions unrelated to the document content are asked (issue link).
  • Cosine Similarity Confusion: @icsy7867 discussed the range of cosine similarity, questioning whether it’s 0-1 or could include negative values and sought clarification for implementing a similarity score cutoff in a query engine (cosine similarity background).
  • Ingestion Pipeline Duplicates: @mato8792 raised issues with repeated document processing by ingestion pipelines despite using the same data, which was eventually resolved by correctly including filename_as_id=True to manage document duplicates effectively.
  • Conda Install Conflicts: @rachel_001. reported a problem with version conflicts during conda installation and encountered issues with modules not being found post-upgrade, which led to troubleshooting including the use of a fresh virtual environment.
  • Saving Pipeline Outputs: @node_0 inquired about saving intermediate or final outputs from a Query Pipeline to a local directory and specifically asked if a Pydantic object can be used as part of the pipeline, which led to @cheesyfishes clarifying this wasn’t possible yet but is planned for future development.

Links mentioned:


LlamaIndex ā–· #ai-discussion (4 messages):

  • PDF Parsing Simplified: @datasciencebasics shared a YouTube video titled ā€œSuper Easy Way To Parse PDF | LlamaParse From LlamaIndex | LlamaCloudā€, providing an overview of LlamaParse and LlamaCloud services for easy PDF parsing.
  • Exploring Code with LlamaIndex: @andysingal posted a blog post titled ā€œUnleashing the Power of Code: A Journey with LlamaIndex and Code Hierarchy Node Parserā€, discussing the benefits of organizing extensive code files.
  • Matryoshka Learning Paper Discussion Invite: @lien_61024 extended an invitation for a paper discussion on Matryoshka Representation Learning, featuring experts Aditya Kusupati and Aniket Rege, hosted by Jina AI.
  • Searching for an Open Source GUI: @vodros inquired about recommended open source GUI/frontends that are compatible with Claude 3, expressing a desire to move away from Chatbox for something more user-friendly.

Links mentioned:


LAION ā–· #general (302 messagesšŸ”„šŸ”„):

  • AI Tech Jargon and Disdain for Ineffective Tools: In the flurry of discussions, @ignizherz, @pseudoterminalx, and @nodja shared disdain for ineffective adversarial tools like Glaze and Nightshade, suggesting they’re not practically safeguarding content as claimed. The conversation turned to speculations on why this ineffectiveness might persist, with a focus on the misguided yet genuine intentions of these tools’ creators.
  • Debating Artist Infringement and ā€˜Worm’ Threats: Discussions by @vrus0188, @astropulse, and others focused on the exaggerated threat posed by an AI ā€˜worm’ and misleading articles about AI’s negative impact on industry and environment. This content often includes hyperbolic language and recycles a set of doom-oriented topics.
  • Creative LLMs, Publishing Ethics, and OpenAI’s SD3 Anticipation: A diverse set of topics emerged, such as the effectiveness of LLMs in creative writing (@nodja and @chad_in_the_house), ethics of publishing (@drhead and .undeleted discussing Glaze’s submission strategy), and excited anticipation for OpenAI’s SD3 release expressed by .undeleted.
  • Misinformation on LLMs and Technological Advancements: The conversation included critiques of misinformation in academic journals (@progamergov and .undeleted lamenting poor peer review standards) and mentions of technological advancements, including an ultra-low power AI chip (@chad_in_the_house) and a ā€˜Complementary-Transformer’ from KAIST.
  • Discussing AI’s Impact on Creativity and Employment: The chat touched on the impact of AI on creative processes and employment, with @ignizherz, @astropulse, and @nodja expressing thoughts on AI opening doors for non-artists and the changing job market, as well as sharing the belief that AI will not replace human creativity but assist in it.

Links mentioned:


LAION ā–· #research (75 messagesšŸ”„šŸ”„):

  • Model Resolution and Detail Concern: Users including @marianbasti and @thejonasbrothers expressed concerns about the quality of high-resolution models, noting artifacts at large resolutions and the limitations of smaller models like the 600m discussed. There’s a shared sentiment that the full potential of these models may not be reached due to these issues.

  • Potential for Advanced Video Scripting: User @spirit_from_germany proposed a two-model system for advanced video scripting that could analyze and predict video and audio content, sharing this concept via a Twitter link. @louie1943 suggested that focusing such training on the most popular videos in a category might ensure the use of quality data.

  • Concerns of Quality in Generated Datasets: User @pseudoterminalx raised concerns about the limitations of generated datasets, mentioning that they keep you trapped within a certain knowledge corpus and that automated descriptions are limited to what the generating model was trained on.

  • Exploring the CogView3 Framework: @twoabove and @thejonasbrothers discussed the 3-billion parameter text-to-image diffusion model detailed in CogView3’s arXiv paper. While recognizing improvements, @thejonasbrothers noted that comparisons with Pixart were absent, limiting understanding of CogView3’s full potential relative to other models.

  • Discussion on Efficient Models: Conversations around Efficient Large Language Model Adapters (ELLA) and its comparison with other models like SD3 were touched upon by @chad_in_the_house, @vrus0188, and @thejonasbrothers. They speculated about performance and scalability, with @thejonasbrothers indicating that SD3’s linear approach could make it the defining model for text-to-image generation.

Links mentioned:


LAION ā–· #learning-ml (2 messages):

  • Diffusion Model Training Troubles on Mac: User @keda4337 is experiencing issues while training a diffusion model on their MacBook Pro M1 Max as the laptop overheats. They mentioned that when resuming training from saving every epoch, the training loss spikes unprecedentedly from 0.01 - 0.9 to 500.

HuggingFace ā–· #general (168 messagesšŸ”„šŸ”„):

  • Incomplete Responses and Inference API: @hari4626 asked if the Inference API always provides incomplete responses, a concern suggesting potential performance issues with the models when used in production.
  • Guidance on Fine-tuning Models: @bohaska seeks advice on a user-friendly way to fine-tune a small GPT model for laptop use, leading to suggestions such as checking out ā€œOllama,ā€ but still needing assistance on the fine-tuning aspect.
  • Optimizing Code with AI: @techintermezzo inquired about the best AI model to optimize shader programming for a beginner, prompting a detailed discussion about using models such as GitHub Co-Pilot and DeepSeek-Coder instruct, as well as references to several AI coding benchmarks and literature.
  • DM Permission Settings in Discord: Users @chongdashu and @lunarflu discussed how to enable and disable direct messaging permissions on Discord for bot interactions, with @lunarflu clarifying that one can disable DMs after obtaining a Verified role without affecting functionality.
  • IPFS as a Model Backup Solution: @endomorphosis debated the merits of hosting AI models on IPFS to mitigate potential government regulations, discussing with @lunarflu about backup strategies and mirroring Hugging Face repositories without explicit approval for domain names or usage.

Links mentioned:


HuggingFace ā–· #today-im-learning (8 messagesšŸ”„):

  • Warming Up to Generative AI: User @umbreenh articulated a keen interest in using generative AI for data analytics development and is open to suggestions and assistance.
  • Let’s Learn Together: In response to @umbreenh, @yasirali1149 expressed a desire to join forces in the journey of learning about generative AI.
  • Searching for KL-Divergence Guidance: @wukong7752 inquired about any available tutorials specifically for calculating KL-divergence in latent-DM (LDM).
  • Discussing Optimization Strategies: @sajjadrahman56 mentioned diving into optimization techniques for ML models with @refik0727 showing interest in learning from his experiences.
  • ML Newbie Seeks Script Usage Help: @210924_aniketlrs02 sought help with understanding how to utilize a particular GitHub script for extracting quantized states from the Wav2Vec2 model.

Links mentioned:

wav2vec2-codebook-indices/scripts/helpers/w2v2_codebook.py at master Ā· fauxneticien/wav2vec2-codebook-indices: Contribute to fauxneticien/wav2vec2-codebook-indices development by creating an account on GitHub.


HuggingFace ā–· #cool-finds (15 messagesšŸ”„):

  • Hugging Face Task Page Discovery: @andysingal revealed their recent find of the Hugging Face Task Page, showcasing a comprehensive resource for ML tasks, featuring model counts for various applications like Image Classification, Object Detection, and Text-to-Image.
  • Machine Learning Integrations: The user shared about Optimum by Hugging Face, enhancing model efficiency on targeted hardware.
  • Enhancing AI with Few-Shot Examples: @epicx provided a link to an arXiv paper discussing a method for strategic reasoning in AI agents, using pretrained LLMs with few-shot examples.
  • NLP Insights: @zaidday highlighted an article discussing the scope and advancements in Natural Language Processing (NLP).

Links mentioned:


HuggingFace ā–· #i-made-this (18 messagesšŸ”„):

  • Doodle your way to victory: @om7059 introduced Doodle Wars, a multiplayer game where players doodle objects in 15 seconds which are then scored by a neural network to crown a winner. Play the game here.
  • Legal Precedents go digital with Caselaw Access Project: @conceptofmind shared their support in releasing over 6.6 million U.S. court decisions in collaboration with the Caselaw Access Project and Harvard Library Innovation Lab. The data is accessible here.
  • Soft Prompting Papers Compiled: @sauravmaheshkar is diving into soft prompting as a method for fine-tuning LLMs and has documented relevant papers in a HuggingFace collection which can be explored here.
  • Portuguese LLM enters the chat: @dominguesm pre-trained a small LLM, Mambarim-110m, with data entirely in Portuguese using the Mamba architecture, available on HuggingFace.
  • BERT Embeds Long Text: @pszemraj fine-tuned a 4k context BERT model, bert-plus-L8-v1.0-syntheticSTS-4k, with capabilities for long-text similarity, emphasizing its training on 4k context length and smaller size. The model is up for grabs on HuggingFace.

Links mentioned:


HuggingFace ā–· #reading-group (35 messagesšŸ”„):

  • Gemini’s Impressive Performance: @shashank.f1 shared a YouTube video comparing Gemini, Claude Opus, and GPT-4 Turbo, highlighting Gemini’s superior speed and cheaper costs. @chad_in_the_house reflected on the benefits of Gemini 1.5 Pro, stating its context length is five times greater than its competitors and it also showcases better multimodal understanding.
  • Mixture of Experts and finetuning challenges: @shashank.f1 and @chad_in_the_house discussed limitations with Mixture of Experts (MoE) models, revealing that customization like LoRA finetuning is challenging due to increased VRAM requirements, which makes MoE inefficient on single GPU setups.
  • Exploring Long Contexts in LLMs: @chad_in_the_house pointed out attention sinks as an interesting technology for handling long contexts in large language models (LLMs), referring to a HuggingFace blog post by thomwolf’s collaborator <@274244546605613056> here.
  • Video Understanding State of the Art (SOTA): @chad_in_the_house directed to a benchmark for video understanding technology, highlighting VideoChat2 as a lead contender and providing a link to the source for further exploration.
  • The Potential of Consistency Distillation and Diffusers: @riteshrm inquired about the availability of a standalone script for consistency models even though it is available in the diffusers library, prompting further discussion on practical implementation.

Links mentioned:


HuggingFace ā–· #diffusion-discussions (7 messages):

  • Seeking Optimal Mistral Settings: @elmatero6 asked for advice on running Mistral predominantly on CPU to avoid bluescreening, given their system specs include an Intel Core I5-9300H, 32GB DDR4 RAM, and an Nvidia Geforce GTX 1650.
  • Too Fast for Comfort: @HuggingMod reminded @1097592228714659912 to slow down their message pace, indicating a possible flood in the diffusion-discussions channel.
  • Scaling Woes for Chatbot Deployment: @rajveerrathod enquired about scaling an enterprise level chatbot application capable of handling 15-20 queries simultaneously using LLama 7b and Mistral 7b on a Google Cloud GPU, experiencing crashes with concurrent users.
  • In Search of the Finest Image Captioning Model: @ninamani sought recommendations for the best open-source model for precise uncensored ā€œImage to textā€ captioning, with @chad_in_the_house suggesting cogvlm though noting issues with models becoming unstable at 4 bit quantization.
  • Guidance Request for Wav2Vec2 Script Usage: @210924_aniketlrs02 requested help on how to use a specific GitHub script to extract quantized states from the Wav2Vec2 model as they are new to machine learning.

Links mentioned:

wav2vec2-codebook-indices/scripts/helpers/w2v2_codebook.py at master Ā· fauxneticien/wav2vec2-codebook-indices: Contribute to fauxneticien/wav2vec2-codebook-indices development by creating an account on GitHub.


HuggingFace ā–· #computer-vision (41 messagesšŸ”„):

  • Commercial Use of YOLOv4 Clarified: User @toni_alright informed that the YOLOv4 license is commercially friendly; @prod.dopamine responded seeking an implementation as easy-to-use as Ultralytics but suitable for commercial applications.
  • Troubleshooting TensorFlow ImportError: @crown_16 encountered an ImportError with TensorFlow; @cursorop advised testing the code in Google Colab and considering reinstalling TensorFlow if successful there.
  • Learning Journey for GANs and Beyond: After @noir_bd expressed interest in starting with GANs, multiple users including _homoludens and @mikonvergence provided resources and suggested an inclusive approach that also features diffusion models, VAEs, and more. Links to courses and repositories on Coursera, Github for Diffusion models, and a general course on generative models from Jakub Tomczak were shared.
  • Fast.ai Course Recommended: _homoludens shared a link to a free course covering practical applications of deep learning, with the second part encompassing diffusion models and Hugging Face’s Diffusers library.
  • Inpainting Feature Question on Stable Diffusion: @okan1962 inquired about the availability and documentation for inpainting and image variations features in Stable Diffusion using HuggingFace’s inference API, noting a lack of clear information and closed model endpoints.

Links mentioned:


HuggingFace ā–· #NLP (38 messagesšŸ”„):

  • Import Troubles with trl: User @solanao64 encountered an ImportError while trying to import SFTTrainer and DPOTrainer from trl due to an issue with topktop_p_filtering from transformers.
  • Deberta-based Classifier Example: @darwinanim8or shared an example of using Deberta-based classifiers, providing a code snippet for text classification using HuggingFace’s pipeline.
  • Fine-tuning Mistral 7B: @plbjt inquired about fine-tuning Mistral 7B for specific tasks using GPT-4 formatted prompts, sparking a discussion about model suitability for complex tasks.
  • C++ Deployment for BERT: @smartguy_41719 is seeking guidance on deploying a trained BERT model for inference in a C++ environment, with @merve3234 suggesting to use ONNX Runtime and Hugging Face’s Optimum.
  • LLMs for Translation Tasks: @ninamani asked for recommendations on optimized and accurate models for NSFW uncensored translation tasks, requiring more precision than older models or overly large LLMs can offer.

Links mentioned:


HuggingFace ā–· #diffusion-discussions (7 messages):

  • CPU Over GPU for Mistral: User @elmatero6 seeks advice on running Mistral efficiently on a CPU given their system specs (Intel Core i5, 32GB RAM, Nvidia GTX 1650) to avoid bluescreening their PC, suggesting a preference for RAM utilization over GPU.
  • Speedy Poster Gets a Nudge: @HuggingMod gently reminded @elmatero6 to post more slowly in the chat.
  • Scaling Chatbots for Enterprise: @rajveerrathod is developing a customer success chatbot with LLama 7b and Mistral 7b; however, the app crashes under concurrent usage on Google Cloud’s GPU. They seek solutions for scaling up to handle 20 users simultaneously with the models quantized to 4 and 8 bits.
  • Quality Image Captioning Model: User @ninamani inquired about the best open-source option for precise uncensored image-to-text or image captioning models. @chad_in_the_house recommended cogvlm and noted stability at 8 bit quantization.
  • Newcomer Requesting Wav2Vec2 Guidance: @210924_aniketlrs02 asked for assistance in using a GitHub script to extract the quantized states of the Wav2Vec2 model, indicating they are new to machine learning.

Links mentioned:

wav2vec2-codebook-indices/scripts/helpers/w2v2_codebook.py at master Ā· fauxneticien/wav2vec2-codebook-indices: Contribute to fauxneticien/wav2vec2-codebook-indices development by creating an account on GitHub.


Eleuther ā–· #general (101 messagesšŸ”„šŸ”„):

  • A Warm Welcome and Direction for Beginners: @shida3916 expressed excitement about joining the community to discuss everyday AI and ask simple questions. @stellaathena redirected them to other servers given this server’s research-level focus.
  • LLMs Seek Home, But Nobody’s There: A deep discussion was sparked by @faron1111 about the concept of self-awareness within LLMs, specifically talking about self-preservation mechanisms. @wonkothesensible argued that while models may have implicit notions of agency, they lack any conscious home to occupy.
  • Persistent State in LLMs and AGI Potential: The conversation about LLMs continued with a focus on architecture for potential AGI, including 1-bit variants mentioned in a posted link by @wonkothesensible. The need for advanced planning and awareness within LLMs was debated, suggesting necessary breakthroughs before reaching AGI.
  • Discussion on Training Small Models: @biiter inquired about strategies to pre-train models effectively on limited VRAM and discussed potential issues with AliBi embedding. @hailey_schoelkopf addressed the technical problem and agreed to provide a fix.
  • AI Extinction-Level Concern: @conceptron shared a Slashdot article reporting U.S. government concerns that frontier AI poses an extinction-level threat to humanity, suggesting regulatory measures such as restrictions on model weights publication.

Links mentioned:


Eleuther ā–· #research (75 messagesšŸ”„šŸ”„):

  • Exploring Efficient Attention for Image Diffusion: @nostalgiahurts discussed a paper on arXiv that proposes ToDo, a novel method that accelerates Stable Diffusion inference by employing token downsampling, increasing speeds by up to 2x to 4.5x. The conversation included a GitHub link to a related repository.

  • Few-Shot Versus Zero-Shot Performance Anomalies: @paganpegasus noted an interesting phenomenon where zero-shot performance on the MMLU benchmark was comparable or superior to few-shot performance for various models they were testing. Several hypotheses were discussed, including the potential distraction of additional context for smaller models and the idea of testing performance with varying numbers of shots.

  • Frontiers in Optical Digital Computing: The paper mentioned by @ai_waifu explores the potential of all-optical digital computing and memory (link to arXiv abstract). Topics such as semiconductors, electronic communication inefficiencies, and the paper’s implications on manufacturing were briefly discussed.

  • Gemini 1.5 Report Overview Provided: @xylthixlm indicated the release of the Gemini 1.5 report with no substantial technical details (link to report on arXiv). @main.ai followed up by providing insights into the new content in the report.

  • Yi Tech Report’s Double Wikipedia Filtering Approach: @maxmatical brought up a question about the Yi tech report’s approach that effectively filters Wikipedia content twice (link to arXiv abstract). @thedeviouspanda suggested that this might be similar to the use of light and heavy rankers in ranking pipelines, with each step filtering progressively more intensively.

Links mentioned:


Eleuther ā–· #interpretability-general (3 messages):

  • Newcomer Seeks Interpretability Insights: User @xcodevn expressed an interest in getting started with interpretability research and asked for resource recommendations.
  • A Useful Resource Shared: In response, @wendlerc shared a link to the ARENA 3.0 Landing Page, mango-ambulance-93a.notion.site, describing it as a ā€œgemā€ for those interested in the field.

Links mentioned:

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It’s the all-in-one workspace for you and your team


Eleuther ā–· #lm-thunderdome (12 messagesšŸ”„):

  • Query on BOS token usage for models: @jwngx questioned the standard for using the BOS (Beginning of Sentence) token and its inconsistent application across different repos. @stellaathena confirmed that usage depends on the model, and there isn’t a consolidated resource detailing which models perform better with it.
  • Seeking BOS Token Insights: @jwngx inquired if there is documentation on model performance with the BOS token, but @hailey_schoelkopf noted that such details are typically internal and model-dependent; the odd behavior of Gemma with BOS tokens is unprecedented.
  • Adjusting HFLM Decoding for BOS Token: In light of a commit adding the BOS token flag for Gemma, @jwngx shared the HFLM code link and asked whether decoding should consider the self.add_bos_token setting. @hailey_schoelkopf clarified that tok_decode should only be called on continuation text without the BOS token or input text, suggesting the current implementation is correct.

Links mentioned:

lm-evaluation-harness/lm_eval/models/huggingface.py at main Ā· EleutherAI/lm-evaluation-harness): A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness


Eleuther ā–· #multimodal-general (2 messages):

  • Warm Welcome to a New Community Member: User @shida3916 expressed excitement about joining the community to discuss everyday uses of AI and to ask simple questions. They inquired if this was the right place for such discussions.
  • Clarifying Transformer and Diffusion Concepts: @yoavhacohen provided a clarification stating that ā€œTransformer is an architecture, while diffusion is a training and inference method.ā€ They also mentioned that diffusion was used with transformers prior to SD3, citing several examples like DALL-E 2, DiT, and PixArt.

Eleuther ā–· #gpt-neox-dev (75 messagesšŸ”„šŸ”„):

  • Torch Container Ambiguity: @catboy_slim_ highlighted unclear documentation about which commit of apex is being used in the torch development container, suggesting the setup could be more straightforward.

  • Dependency Management Challenges: Multiple users, including @catboy_slim_, @tfidia, and @hailey_schoelkopf discussed difficulties with managing dependencies in the GPT-NeoX project, mentioning the complexities introduced by a default NGC container that might contain both necessary and extraneous packages.

  • Flash Attention Dependencies: @hailey_schoelkopf clarified that Triton is used both for sparse and flash attention, and the group also discussed how Flash attention’s update to 2.5.6 potentially affects compatibility with the NGC PyTorch container.

  • Apex Usage in Question: Users including @biiter and @catboy_slim_ debated the necessity of Apex, as some of its functionality may now be built into PyTorch, except for specific features like fusedAdam.

  • Evaluation and Conversion Queries: @tejas.inferq sought help with the evaluation process for a trained 125M parameter GPT-NeoX model, while @aphoh inquired about converting Pythia/NeoX checkpoints to upstream megatron-lm, facing issues with matching weight layouts and losses.

Links mentioned:


Latent Space ā–· #ai-general-chat (39 messagesšŸ”„):

  • AI Biographer Trials and Privacy Concerns: @swyxio has been trialing Emma, the AI Biographer and finds the call experience good enough to recommend trying at least once. However, they caution users about potential security concerns by mentioning their use of fake name and fake biographical details.
  • OpenAI Drama Concludes: @guardiang shares a New York Times article detailing internal issues at OpenAI. The OpenAI board has completed a review into the firings, confidently reinstating Sam Altman and announcing the addition of three new board members, as noted in OpenAI’s announcement.
  • Ideogram 1.0 Launch Under the Radar: @swyxio mentions that the launch of Ideogram 1.0, a new text rendering method, has not received much attention despite its potential.
  • LLM Interface Development by Microsoft Research: @swizec brings up a Hacker News post discussing AICI, an interface proposed by Microsoft Research to standardize constraints and control mechanisms across various LLM inference engines, seeking feedback for the Rust AICI runtime.
  • A Glance at Transition Beyond Transformers: @swyxio discusses ā€œMamba,ā€ a State Space Model presented as a potential alternative to the Transformer architecture for LLMs. They refer to a visual guide and the original research paper for those interested in understanding the architecture.

Links mentioned:


Latent Space ā–· #ai-announcements (5 messages):

  • Asia Gets Schooled on GPT-2: @ivanleomk announced a presentation on the GPT-2 paper, urging the Asia @paper-club members to join in. The event, said to be EPIC, was scheduled on https://discord.gg/8sYsGc83.
  • Weekend Podcast Drop Teaser: @swyxio posted excitement over a weekend podcast drop. The Latent Space pod covers January and February recap and can be listened to here.
  • Paper Enthusiast Laments Timezone Woes: @420gunna expressed appreciation for the paper selections at @paper-club but humorously mentioned forgetting to set an alarm for the 2 AM meetings. A mix of enthusiasm and timezone-woes marking their engagement with the community.

Links mentioned:


Latent Space ā–· #llm-paper-club-west (30 messagesšŸ”„):

  • Prep for LLM Paper Club Discussion: @ivanleomk shared notes as preparation for @1123457263638683770’s sharing in the upcoming club discussion, focusing on Generative Pre-trained Transformers (GPT).
  • Starting Time Updates: @ivanleomk provided multiple updates about the starting time of the session, indicating a commencement within 5-10 minutes, followed by another message indicating a 5-minute start time.
  • Community Support for Newcomers: @healthymonkey expressed their newcomer status to NLP, seeking corrections on any potential mistakes, with @bryanblackbee offering support and encouragement.
  • Technical Clarifications in Real Time: @kishore.reddy clarified terminologies used by @1123457263638683770, like ā€œcausal attentionā€ and correcting a reference to ā€œ-infā€ during the live club session.
  • LLM Visualization Tools Shared: @fx2y provided a link to a visualization tool for GPT family models and offered commendations to @1123457263638683770 for their work.

Links mentioned:


Latent Space ā–· #ai-in-action-club (162 messagesšŸ”„šŸ”„):

  • Workflow Optimization with AI Discussion: @kbal11 introduced the AI-in-Action session led by @363877777977376768, focusing on using AI to improve workflow. Participants such as @yikesawjeez expressed enthusiasm for the topic, while others shared insights and tips on enhancing output through AI.

  • Meta Prompts and CLI Tools Breakthroughs: @yikesawjeez highlighted the use of AI to create prompts that lead to progressively better outputs and demonstrated interest in deploying projects on AWS. Resources for AI-driven tools that can assist in these efforts were shared and discussed amongst the community.

  • Importance of Documentation in AI Workflow: The discussion turned to best practices for recording work and making detailed notes. @slono shared the use of asciinema for recording terminal sessions, while others like @yikesawjeez committed to sharing open-source tools they utilize.

  • Community Engagement and Sharing: The channel actively engaged in sharing tips, tools, and tricks for improving use of AI in personal workflows. Users like @yikesawjeez and @markredito shared excitement for collaborative learning and crowdsourcing knowledge within the AI space.

  • Request for Future Session on Decentralized/Distributed AI: Amidst discussions of workflow and tools, @yikesawjeez proposed a future session topic on decentralized and distributed AI applications that move beyond cryptocurrency-focused projects.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #announcements (1 messages):

  • New Roles for Better Differentiation: @natolambert implemented new roles within the Discord to distinguish between manually added close friends and subscribers. Feedback on this update is welcomed.

Interconnects (Nathan Lambert) ā–· #news (51 messagesšŸ”„):

  • Elon Musk’s Grok Tease Sparks Debate: @xeophon. shared a tweet from Elon Musk announcing that @xAI will open source Grok, while @natolambert questioned the accurate use of ā€œopen sourceā€ by Musk.
  • Cohere Introduces Command-R: @xeophon. highlighted the introduction of Command-R, a new retrieval augmented model by Cohere, notable for its 128k context window and public release of weights for research purposes.
  • Anticipation for Open Models: @xeophon. and @natolambert discussed the potential of Cohere’s Command-R, especially for startups, academia, and its usability in multiple European languages as well as its importance beyond the hype for future models like ā€œllama3ā€.
  • Market Reaction to Elon’s Announcement: @natolambert expressed that people might be overreacting to Elon’s announcement, giving credit prematurely before any model is released, and @eugenevinitsky drew an interesting parallel with Twitter’s open-source move but with a twist as ā€œWeights without code instead of code without weights.ā€
  • Questioning OpenAI’s Commitment to Open Models: @dangf91 inquired about the open-source status of Mistral, with @xeophon. clarifying that there’s still a commitment to open models in the future, and @natolambert adding that the environment is ever-changing.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #other-papers (2 messages):

  • GPT-4 Takes On Doom: @philpax shared a paper which demonstrates GPT-4’s ability to play the 1993 first-person shooter game Doom, using its reasoning and planning capabilities without any game-specific training. The model managed to manipulate doors, fight enemies, and navigate the game world, and the paper suggests that complex prompting could further enhance its performance. Find the paper here: GPT-4 Plays Doom.

  • Call for a Blog Post: @natolambert reacted to the title of the paper on GPT-4 playing Doom, implying that the content sounds intriguing enough to merit a standalone blog post.

Links mentioned:

Will GPT-4 Run DOOM?: We show that GPT-4’s reasoning and planning capabilities extend to the 1993 first-person shooter Doom. This large language model (LLM) is able to run and play the game with only a few instructions…


Interconnects (Nathan Lambert) ā–· #ml-questions (36 messagesšŸ”„):

  • Paper Cutoff Date for olmo/dolma clarified: @mike.lambert inquired about the data range for olmo/dolma papers, wondering if it was intentional leading up to May 2023. @natolambert responded, clarifying that the cutoff is due to publication timelines rather than an intention to avoid the ChatGPT era, also mentioning the lack of a scraper.

  • Cheap Pretraining of GPT2 Models: @natolambert asked about the current costs of pretraining a model like GPT2. @xeophon. responded with a price of less than $1,000, an estimate he considered ā€œpretty wild.ā€

  • Pretraining Costs and Feasibility: @xeophon. shared details on training costs from September 2022, GPT-2 document counts, and Databricks pretraining costs for reference. @natolambert and @philpax discussed how surprisingly quick and cheap it is to train models now, and @natolambert mentioned that for Stability AI’s model, they probably paid less than $100,000 for the compute.

  • Stability AI Compute Deal Speculation: In the discussion about costs, @xeophon. mentioned that Stability AI possibly received free or discounted compute in exchange for using Gaudi2 and promoting it, as suggested by their partnership ad.

  • Fine-Tuning Practices Suggestions: When @dangf91 asked about fine-tuning models with more books/articles and whether to use a masking strategy, @natolambert and @vj256 agreed that adding books to the dataset with some pretraining mixture and continuing training is the typical practice.


Interconnects (Nathan Lambert) ā–· #ml-drama (12 messagesšŸ”„):

  • Drama Over Inflection AI’s Model: @xeophon. highlighted a tweet from @seshubon questioning if Inflection AI’s chatbot was merely a wrapper for Claude-3-Sonnet, given identical responses to custom queries. Inflection AI boasted their own model, Inflection-2.5, which supposedly rivaled GPT-4. Inflection AI’s Tweet.
  • Potential A/B Test at Inflection?: @xeophon. speculated that the observed behavior could be from an A/B test comparing Inflection-2.5 with Claude Opus, considering the unlikely chance of two models generating word-for-word matches for a lengthy and specific prompt.
  • Revoked API Key: @natolambert humorously remarked that someone’s API key might get revoked in light of the unfolding drama.
  • Claude’s Temperature Setting Noted: @mike.lambert mentioned that claude.ai typically uses a non-zero temperature for its responses, contributing to the ongoing discussion.
  • Inflection AI Clarifies: @xeophon. shared a tweet from Inflection AI explaining that their chatbot, Pi, remembers prior conversations and repeated a message from Claude because it was earlier included in the conversation. The plot thickens as questions about Pi’s capabilities and independence arise. Inflection AI’s Clarification.
  • Caution Advised on Official Replies: When @xeophon. shared Inflection AI’s response, @natolambert advised never to reply on official accounts, suggesting the company might encounter negative consequences. @natolambert implicitly affirmed the trouble with ā€œfd,ā€ meaning ā€œf***ed.ā€

Links mentioned:

  • Tweet from Inflection AI (@inflectionAI): Pi’s responses are always generated by our own models, built in-house. On investigation, it appears the user prompted this particular response from Pi after copy-pasting the output from Claude earlier…
  • Tweet from seshu bonam (@seshubon): WHAT? @inflectionAI is just a claude-3-sonnet wrapper? care to explain? šŸ’ Produces the exact same answer word to word for a custom query i asked 🤯 ā†˜ļø Quoting Inflection AI (@inflectionAI) P…

Interconnects (Nathan Lambert) ā–· #random (19 messagesšŸ”„):

  • Lambert Cabal Expands: @philpax joked about the growing presence of Lamberts in AI labs, humorously equating their network to the Lambert backchannels, while @420gunna referenced the concept of a mycorrhizal network to liken it to an underground communication system.
  • Sam Altman Rejoins OpenAI Board: @xeophon. shared a news article discussing Sam Altman’s return to the board of directors at OpenAI, and @420gunna quoted Bret Taylor, adding a dash of humor on leadership with a Civ4 Cultural Victory reference.
  • Mocking Self-Review Leadership: @natolambert mocked the idea of self-evaluation in leadership with a tongue-in-cheek quote: ā€œI have done an internal review and decided I am still the kingā€.
  • Canadian Geese as a Discord Rite: @philpax humorously pondered the number of Canadian geese needed for a ā€œritualā€ to obtain a Friend role on Discord, referring to them as a trusty emblem, while @natolambert acknowledged that geese are intimidating.
  • The Quest for the Goose Role: @natolambert proposed the creation of a self-nominated goose role on Discord, suggesting to add a boost or icon to make it noticeable, as stakes of subscriber roles come under light-hearted scrutiny.

Links mentioned:

Mycorrhizal network - Wikipedia: no description found


Interconnects (Nathan Lambert) ā–· #memes (15 messagesšŸ”„):

  • Dune Casts Reimagined with AI Celebrities: @420gunna started a creative game by imaging prominent figures in AI as characters from Dune, designating Sama as the Kwisatz Haderach, Alec Radford as Thufir Hawat, and several others, with some roles left open for suggestions.
  • Naming the Matriarchs of AI’s Dune: Shortly after, @natolambert proposed that the CTO is definitely Lady Jessica, while also assigning the role of Baron Harkonnen to Elon Musk.
  • Yann Lecun’s Brain Sauce Analogy: @420gunna shared a quirky remark about Yann Lecun, referencing how he often talks about his brain turning to ā€œwhite sauceā€ with age, and humorously coined the term Brain to BĆ©chamel pipeline.
  • Suggestions for Dune’s Key Players: @twkillian brought up Peter Thiel as a candidate for Stilgar, however, @natolambert disagreed, suggesting Marc Andreessen as a more fitting choice for the role. Later, they had a laugh about placing Gary Marcus as an equivalent to the maladies introduced in later Dune books.
  • The Weirdness of Dune Beyond Book One: A discussion ensued about whether it’s worth reading beyond the first Dune book. While @twkillian was advised to stop at the first, both @eugenevinitsky and @natolambert agreed that continuing the series is worthwhile due to its intriguing oddity.

Interconnects (Nathan Lambert) ā–· #rl (8 messagesšŸ”„):

  • Exploration of Reinforcement Learning: User @chygao shared an episode link from Spotify featuring Ian Osband from OpenAI discussing information theory and reinforcement learning (RL), exploration, epistemic uncertainty, and scaling to large language models (LLMs).
  • Nostalgia for ā€˜Talk RL’ Podcast: @natolambert expressed intent to listen to the shared episode, reminiscing about being a fan of the ā€˜Talk RL’ podcast.
  • Diminishing Activity on a Favored Podcast: @twkillian lamented that the ā€˜Talk RL’ podcast posts less frequently now, which has somewhat diluted their fandom.
  • Selective Listening Based on Guest: @natolambert disclosed that while they still check ā€˜Talk RL’, they don’t listen to every episode and their engagement depends on the guest speaker.
  • Consistency in Quality Matters: @twkillian acknowledged that the quality of ā€˜Talk RL’ episodes has varied, resulting in a more selective approach to the podcast content.

Links mentioned:

  • Ian Osband: Listen to this episode from TalkRL: The Reinforcement Learning Podcast on Spotify. Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty. Ā W…
  • Ian Osband: Listen to this episode from TalkRL: The Reinforcement Learning Podcast on Spotify. Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty. Ā W…

Interconnects (Nathan Lambert) ā–· #rlhf (7 messages):

  • Exploring RLHF for LLM Alignment: User @xeophon. shared an arXiv paper studying the performance of RLHF, PPO, and Expert Iteration on improving LLM reasoning capabilities. @natolambert acknowledged the paper stating it looks good.
  • Inquiry into Claude’s Synthetic Tasks: User @eugenevinitsky expressed interest in details regarding synthetic tasks used in Claude for creating uncertainty-aware LLMs.
  • Seeking Crux of Synthetic Task Issue: @natolambert responded to @eugenevinitsky, mentioning that they, along with <@304671004599255043>, are exploring theories on the core issue of creating effective synthetic tasks.
  • Envisioning Advanced Methods for Synthetic Tasks: @natolambert speculated about using methods like CAI (Counterfactual AI) with improvements for diversity in generating synthetic tasks.
  • Pretraining Data Vs. Instructions/Preferences: @natolambert suggested running CAI on pretraining data, contrasting it with the standard focus on instructions or preferences.

Links mentioned:

Teaching Large Language Models to Reason with Reinforcement Learning: Reinforcement Learning from Human Feedback (\textbf{RLHF}) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance…


OpenRouter (Alex Atallah) ā–· #announcements (4 messages):

  • New Speed Champion Mistral 7b 0.2: @alexatallah proudly introduced Mistral 7b 0.2, boasting about a substantial speed boost—10x faster for short outputs and 20x faster for long ones, as well as a generous 32k context window. The performance was showcased in a demo tweet.

  • Gemma Nitro hits the market: A new cost-effective and high-speed model called Gemma Nitro was announced by @alexatallah, featuring impressive speeds of over 600+ tokens per second and offering an economical rate of $0.1 per million tokens. More details can be found on OpenRouter’s website.

  • Sneak peek tweet?: @alexatallah shared a mysterious Twitter link without additional context or comments.

  • OpenRouter flaunts no spending limits: @alexatallah revealed a user-friendly policy on OpenRouter, stating that there are no $ usage limits on the platform, potentially inviting users to utilize their services more freely.

Links mentioned:

Google: Gemma 7B (nitro) by google | OpenRouter: Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks …


OpenRouter (Alex Atallah) ā–· #app-showcase (1 messages):

  • Claude 3 Function Calling Made Easy: User @thevatsalsagalni introduced a function calling library tailored for the Claude 3 model family. The library supports Pydantic function schemas and is open for exploration and contribution at claudetools on GitHub.

Links mentioned:

GitHub - vatsalsaglani/claudetools: Claudetools is a Python library that enables function calling with the Claude 3 family of language models from Anthropic.: Claudetools is a Python library that enables function calling with the Claude 3 family of language models from Anthropic. - vatsalsaglani/claudetools


OpenRouter (Alex Atallah) ā–· #general (120 messagesšŸ”„šŸ”„):

  • Censorship in AI Models Becomes a Hot Topic: Multiple users, including @.toppa and @lemmyle, expressed concerns about censorship creeping into AI models, such as the ā€œClaude 2 self moderated versions,ā€ and potential new restrictions related to copyright or AI responses. Conversations touched on how AI models, like Claude 3, are responding to user inputs and the desire for less censored options.

  • Querying AI Format Support and Parameter Functionality: In a technical discussion, @cupidbot.ai and @spaceemotion questioned the formatting of messages for various AI models and the functionality of system parameters such as json_object and add_generation_prompt=True. @alexatallah clarified some documentation points, including the removal of schema until it sees more support.

  • Model Output Limits Spark Curiosity and Friction: Users like @zulfiqaar and @.wingedsheep explored the output length limitations of various models, with specific mention of GPT-4’s 4096 token output cap. Despite users like @lemmyle showing dissatisfaction with current limitations, @alexatallah mentioned that longer completions could significantly increase memory usage.

  • Technical Assistance Sought and Offered Among Users: Users sought clarification and assistance on model intricacies, ranging from Claude API’s handling of system role messages (@njbbaer, with a response by @alexatallah) to adapting model files for personal use (@mikef0x.). Insights included OpenRouter’s facilitation of prompt customization using ChatML and direct prompts.

  • User Engagement with OpenRouter and Model Accessibility Issues: Conversations highlighted user engagement with OpenRouter, as shown by the creation of a Google Sheets connection app by @mostlystable, and addressed accessibility issues with models like Nous Hermes 70B. Updates on model status and functionality were given by users such as @louisgv and @spaceemotion, with official responses from @alexatallah.

Links mentioned:


CUDA MODE ā–· #general (30 messagesšŸ”„):

  • Request for Image Caption Generator Assistance: User @madhurr sought help with concatenating image features and captions layers due to mismatching shapes in a remote image caption generator project.
  • CUDA as Factorio: User @artificial_anteligence made an analogy comparing parallel computing to the video game Factorio, discussing components like memory operations paralleling game mechanics.
  • Advice on Image and NLP Embedding: In response to @madhurr, @andreaskoepf suggested using a linear layer to project image features to match the shape of NLP embeddings and shared a link to a relevant Visual Instruction Tuning project.
  • Learning CUDA and Triton: User @umerha offered to give a community talk on Triton or prefix sum and provided a link to his personal blog for background on his expertise.
  • Decoding Efficiency in LLMs Explored: @andreaskoepf linked to a PyTorch blog discussing strategies to make large language model inference more efficient.

Links mentioned:


CUDA MODE ā–· #triton (1 messages):

iron_bound: early tho sounds cool https://github.com/Deep-Learning-Profiling-Tools/triton-viz


CUDA MODE ā–· #cuda (21 messagesšŸ”„):

  • Thread Coarsening in CUDA: @cudawarped describes thread coarsening as similar to loop unrolling, expecting performance improvements only when not fully utilizing memory throughput. They note no performance gain from vectorized loads depending on the workload, and recommend half precision to be read as floats due to cache line size.

  • Optimizing Memory Operations: @zippika suggests using int4 or float4 to reduce the number of memory reads/writes, and discusses the potential benefits of vectorizing additions with the __hadd2 operator in CUDA.

  • Performance Insights from NVIDIA CUDA: @zippika shares a CUDA code snippet for a vectorized addition using __hadd8 and hints at performance improvements based on observations from the NVIDIA Compute Unified Device Architecture (NCUD) tool.

  • Path to CUDA Proficiency: Users discuss self-teaching methods for mastering CUDA, including searching through the CUDA toolkit, examining the C++ code generated by nvcc, and exploring repositories on GitHub.

  • Discussion on NVIDIA Magnum IO: @joseph_en brings up NVIDIA Magnum IO, a system meant for high-performance computing and machine learning, highlighting its ability to handle complex simulations and reduce negative performance impacts in multi-tenant environments.

Links mentioned:

NVIDIA Magnum IO: IO Subsystem for Modern, GPU-Accelerated Data Centers


CUDA MODE ā–· #torch (5 messages):

  • In Search of Torch Compile Limits: @andreaskoepf inquired if there is a cap on the kernel size produced by torch.compile and suggested adding a readme to document how to print Triton compile results. They also expressed an interest in delving into the PyTorch source to understand this better.
  • Kernel Launch Uncertainty: @marksaroufim shared uncertainty regarding kernel size limits in torch.compile and mentioned the concept of using persistent kernels to model entire networks, without a clear understanding of the underlying trade-offs.
  • Processor vs Language & Compilation Engine: @mr.osophy commented that there is work on creating a new language and compilation engine that compiles efficiently to all types of processors, emphasizing that the focus is not on processor design.
  • Performance Comparison Inquiry: @ingiing asked whether libtorch performs faster than using load_inline in PyTorch, seeking to compare the two methods.

CUDA MODE ā–· #announcements (1 messages):

  • CUDA-MODE Lecture on Reduction Trees: @andreaskoepf announced CUDA-MODE Lecture 9: Reductions, informing @everyone that it would start in approximately 5 minutes. The lecture, presented by <@325883680419610631>, will cover topics such as minimizing control and memory divergence, reducing global memory access, and thread coarsening, as found in chapter 10 of the PMPP book.

CUDA MODE ā–· #algorithms (2 messages):

  • Fine-tuning Memory Requirements Reduced: @iron_bound discussed a novel approach called Gradient Low-Rank Projection (GaLore), which reduces memory usage by up to 65.5% and can train large language models more efficiently on a single GPU setup. Research details can be found in this ArXiv paper.
  • 70b Model Fine-tuning on Standard GPUs: @iron_bound shared that using FSDP and QLoRA, a 70b language model can now be fine-tuned on a desktop computer with standard gaming GPUs like RTX 3090 or 4090. The full announcement and summary are available at Answer.AI’s blogpost.

Links mentioned:


CUDA MODE ā–· #suggestions (1 messages):

Links mentioned:


CUDA MODE ā–· #jobs (3 messages):

  • Custom CUDA Kernel Gig Alert: @jaivikramaditya is seeking a developer to design a custom CUDA kernel, specifically a variant of flashattention, for machine learning applications. They are offering $2,000 - $3,000 USD for the project.
  • Skills Required for CUDA Kernel Project: The right candidate must have experience in algorithmic development, especially in Transformers, and have some prior exposure to CUDA programming.
  • Open Invitation for Direct Messaging: Interested developers can directly DM @jaivikramaditya to express their interest or to get more details about the custom CUDA kernel job opportunity.

CUDA MODE ā–· #beginner (9 messagesšŸ”„):

  • Multitasking with Learning Resources: @mertbozkir advises watching videos and reading the accompanying book for a better learning experience, specifically mentioning that the book they are reading is very informative.

  • CUDA Tensor Library Adventures: @siddharth4570 is building their own tensor library in CUDA and inquires about whether others write their own backpropagation implementations or use autodiff libraries.

  • CUDA vs. PyTorch 2.0 + Triton for ML Performance: @jsm9036 questions the benefits of learning and using CUDA over high-level tools like PyTorch and Triton, which may offer most of the performance gains with less development time. @iron_bound suggests using high-level tools for fast prototyping and resorting to CUDA when performance needs further improvement.

  • CUDA C/C++ Differences Sought: @apaz seeks an official list of differences between CUDA C/C++ and standard C/C++, specifically during the initial phase of compilation from CUDA source to C/C++ source files.

  • Performance Observations with PyTorch Version Changes: @poppingtonic reports a speed difference between PyTorch 2.1.2 and 2.2.1 while running matmul_dyn on a 2080 Ti with CUDA 12.1, with the newer version being slower. They also express a new goal to understand tinygrad operations and kernels.


CUDA MODE ā–· #pmpp-book (12 messagesšŸ”„):

  • CUDA Profiling Discussed in Lecture: @marksaroufim mentioned that while the pmpp book may not extensively cover profiling tools for CUDA, they do cover profiling in lecture 1 in detail.
  • Book Authors Share Profiling Guidance on YouTube: In response to an inquiry about profiling in CUDA, @marksaroufim informed @dasher519 that the book authors discuss profiling in their YouTube videos.
  • Syntax Nuance for CUDA Kernels Called Out in the Book: @alexanderrgriffing raised a question regarding the use of spacing inside triple angle brackets (e.g., << < ... >> >) in exercise code from the CUDA book.
  • Historical C++ Quirk Explains Spacing in Kernel Launch Syntax: @stefangliga clarified that the spacing was once mandatory in C++03/98 to avoid confusion with the shift operator, a requirement fixed in C++11; however, it’s not clear if the same applied to CUDA C++ or if it was always a stylistic choice.
  • Searching for Exercise Solutions in the Latest Book Edition: @dasher519 inquired about the location of exercise solutions for the 2023 edition of the pmpp book, wondering if they might be in a separate solutions book.

CUDA MODE ā–· #youtube-recordings (3 messages):

  • CUDA Lecture on Reductions: @marksaroufim shared a YouTube video titled ā€œLecture 9 Reductionsā€ along with slide materials and code examples.
  • Quality Check on Video Upload: @alexeyzaytsev noticed that the video is only available in 360p quality one hour after upload, raising concerns about whether it was uploaded correctly. @marksaroufim confirmed that the upload is fine but it needs more time for processing.

Links mentioned:

Lecture 9 Reductions: Slides https://docs.google.com/presentation/d/1s8lRU8xuDn-R05p1aSP6P7T5kk9VYnDOCyN5bWKeg3U/edit?usp=sharingCode https://github.com/cuda-mode/lectures/tree/ma…


CUDA MODE ā–· #ring-attention (27 messagesšŸ”„):

  • Sync Up Confirmation: @jamesmel confirmed an upcoming sync, mentioning they will join the next day after missing the current one.
  • Troubleshooting Ring-Attention: @jamesmel experienced issues while running ring-attention from zhuzilin, getting stuck at _flash_attn_forward and noted that subprocesses are paused.
  • Ring-Attention Discussion: @iron_bound and @andreaskoepf also engaged in the conversation, with @iron_bound finding the pause odd and @andreaskoepf suggesting to look at allocations used in the ring-attn implementation.
  • Planning Flash Decoding Sketch: @jamesmel proposed a high-level sketch for Flash decoding, listing steps including a prefill phase and a decoding phase, with @andreaskoepf mentioning a planned explanation as part of a lecture.
  • Meeting and Time Adjustments: @iron_bound and @jamesmel commented on meeting schedules and adjustments due to daylight saving changes, clarifying timings for future syncs.

CUDA MODE ā–· #off-topic (5 messages):

  • Bing’s Unforgettable Swear Word Memory: @iron_bound shared a Reddit post describing a humorous problem where Bing continues to swear at them despite requests to stop. The issue persists across conversations, as if the system only remembers this single preference.

  • Threading Anthems: @mr.osophy made a light-hearted comment about a Spotify track, imagining that the song is about the programming concept of threading, and shared the link to the song.

  • Rumor Squashed on Inflection AI and Claude-3: @f_michael inquired about a rumor regarding Inflection AI and their supposed use of Claude-3 with a link to a tweet, which was later clarified as just a rumor through a separate response tweet shared by @itali4no.

  • Acknowledgment for Clarification: @f_michael expressed gratitude to @itali4no for providing clarification on the previously mentioned rumor.

Links mentioned:


LangChain AI ā–· #general (68 messagesšŸ”„šŸ”„):

  • PDF Extraction Assistance Request: @dazzling_puppy_08816 inquired about the possibility of sending an entire PDF to vision models for text extraction. @smashah suggested using an unstructured API to convert it into documents.

  • Template Limitations in Langchain Hub: @louis030195 expressed concerns about the limitations of template engines in handling conditional logic for prompts creation. @baytaew acknowledged the feedback and mentioned handling complex logic in code before passing to the template, recognizing the inconvenience for non-technical users.

  • ChatOllama Functionality Clarification: @tmetzger71 experienced an issue with binding functions in ChatOllama, with an error indicating ā€˜tools’ do not exist in type ā€˜Partial’. Subsequent messages from the same user point to an experimental wrapper Ollama Functions as a workaround.

  • Claude3 Support Query in Bedrock: In a discussion initiated by @j.barney about Claude3 support, @baytaew indicated that @761046695722877016 is working on enhancing the Bedrock chat class and ensuring first-class support for Claude3, noting unique API management for each model hosted on the Bedrock service.

  • Langchain Testing Strategy Concern: @sharrajesh asked about best practices for maintaining application response quality in langchain/llm applications, given their non-deterministic nature. @baytaew responded with recommendations, such as using Langsmith for evaluations/benchmarking and focusing on metrics like system correctness, faithfulness, and retrieval performance.

Links mentioned:


LangChain AI ā–· #langserve (2 messages):

  • Seeking Clarity on LangChain Serve Code Execution: zql_flo asked about the execution location of code when using Langchain Serve and how file uploads by users that agents need to access are managed. They inquired if Docker is the method for implementation for these processes.

  • Harnessing Output from Langserve Route: problem9069 is looking for guidance on using Langserve, specifically wanting to know how to capture the output from a route in a variable when adding routes using the ChatOpenAI function.


LangChain AI ā–· #share-your-work (14 messagesšŸ”„):

  • Revolutionize Prompting with Prompt Mixer: User @tomatyss introduced a new tool called Prompt Mixer, ideal for building, testing, and iterating on AI prompts. This desktop application allows connecting various models, tracking prompt versions, and it even has a guide on adding custom connectors.

  • Lead Generation Automation in the Works: User @robinsayar is developing an automated lead generation tool to categorize and qualify up-to-date public company information, aiming to streamline the process currently done manually for their clients.

  • Excitement for Automated Lead Generation: User @baytaew expressed enthusiasm about @robinsayar’s project on automated lead generation and is looking forward to seeing the results.

  • Open Source Langchain Chatbot: User @haste171 shared an open-source AI Chatbot, built on Langchain and RAG, which is designed for analyzing and extracting information from data in conversational format, featuring an easy setup and interactive UI.

  • Appstorm Launches Data GPTs in Version 1.5.0: User @appstormer_25583 announced the release of Data GPTs on Appstorm 1.5.0, a feature for exploring, analyzing, and visualizing data, complete with sample GPTs for various applications such as e-sports performance reports and healthcare stats infographics.

Links mentioned:


LangChain AI ā–· #tutorials (2 messages):

  • Exploring RAG with LangGraph: @mehulgupta7991 shared a YouTube video titled ā€œImproving RAG using LangGraph and LangChainā€, which demonstrates how LangGraph can create cycles to enhance RAG retrieval in external contexts.

  • Building a Chatbot with RAG and LangChain: @infoslack provided a YouTube link for learning how to build a chatbot using Retrieval Augmented Generation (RAG), utilizing OpenAI’s gpt-3.5-turbo LLM, as part of the LangChain tutorial series.

Links mentioned:


DiscoResearch ā–· #disco_judge (12 messagesšŸ”„):

  • Judging Creative Writing with AI: @.calytrix expressed skepticism about effectively training a model to judge creative writing, suspecting it requires more parameters than currently feasible. They are instead testing prompts with GPT-4 and Claude3 as judges using a detailed scoring criteria.
  • Exploring AI Judges: @johannhartmann showed interest in seeing differences in creative writing judgements made by GPT-4 and Claude3 and later joked about reading an ā€œopen-source outperforms gpt-4ā€ conclusion.
  • Benchmarking AI Models in German: @johannhartmann mentioned integrating Vago solutions into FastEval for German benchmarks, observing that GPT-4 still scores better with long detailed answers.
  • Ensemble of AI Judges: @bjoernp discussed the advantage of using an ensemble of AI judges to reduce bias and enhance accuracy, asking if Mistral large had been used for judging.
  • Benchmark Development for AI Judges: @.calytrix is creating a benchmark with multiple questions to test several AI judges, indicating that GPT-4 might be a comparable state-of-the-art (SOTA) judge for this aspect. They are also considering using FastEval, as @johannhartmann suggested, which might be more suitable than EQ-Bench.

DiscoResearch ā–· #general (4 messages):

  • Evo Unveils Striped Hyena Architecture: @rasdani highlighted the release of Evo, a biological foundation model using the StripedHyena architecture, meant for tasks ranging from molecular to whole genome scale. The model, developed by Together AI and the Arc Institute, can handle over 650k tokens and is specialized in DNA, RNA, and protein sequences. Find more in their blog post.

  • AutoMerger Faces Technical Issues: @johannhartmann expressed interest in AutoMerger, an automatic model merger with benchmarks on Hugging Face, though noted that it is currently non-functional. His interest remains despite the tool being broken, as indicated in the link Hugging Face’s AutoMerger.

  • Slerp vs. Dare_ties Merges: Further commenting, @johannhartmann observed that there doesn’t seem to be a significant difference between dare_ties and slerp merge strategies in the context of AutoMerger.

  • Mixture-of-LoRAs Architecture for LLMs: @johannhartmann shared a link to an arXiv paper discussing Mixture-of-LoRAs (MoA), a method designed for enhancing multi-task learning with Large Language Models (LLMs) and mitigating issues like catastrophic forgetting and task interference.

Links mentioned:


DiscoResearch ā–· #benchmark_dev (3 messages):

  • Introducing tinyBenchmarks: User @johannhartmann shared a link to tinyBenchmarks on Hugging Face, a dataset designed for efficient benchmarking.
  • Exploring Translation Possibilities: @johannhartmann expressed an interest in potentially translating the tinyBenchmarks/tinyWinogrande dataset, planning to examine its feasibility the following day.
  • Benchmarking Insights from Hellaswag: @_chromix_ detailed their testing experience with the Hellaswag dataset, noting score fluctuation within a range of 2.5 after 1000 data points, and a more stable score variation of +/- 0.2 after 9000 data points. They suggested that choosing only 100 datapoints is likely inadequate for anything beyond a rough comparison.

Links mentioned:

tinyBenchmarks (tinyBenchmarks): no description found


DiscoResearch ā–· #discolm_german (14 messagesšŸ”„):

  • Innovative Training with the German Orca Dataset: @johannhartmann explained their method of using a German translation of the slim orca dataset to train and merge models like Mistral. They use the SPIN-like method - taking the output of one model as input for the next - and track the relationships between models through the dataset, monitoring how training affects verbosity and answer quality.

  • Brezn3 Model Outshines its Predecessor: @crispstrobe noted that Brezn3 scored significantly higher than Brezn-7b on the EQ-Bench (v2) (de) benchmark, asking @johannhartmann if the improvement was due to changes in the model and tokenizer settings.

  • Awaiting the Final Push of Dpo: @johannhartmann informed @crispstrobe that the Dpo (Domain Prediction Override) process was still in progress with approximately 13 hours remaining until completion.

  • Technical Troubleshooting in Model Merging: @crispstrobe sought help from @johannhartmann regarding a TypeError encountered during model merging, which @johannhartmann addressed by sharing a fix via a GitHub commit link.

  • Consistency Issues in Base Model Selection: @johannhartmann, prompted by @bjoernp, discussed the inconsistencies when using LeoLM/leo-mistral-hessianai-7b-chat as a base model due to differences in the chatml and eos token settings, and planned to switch to DiscoLM as the base for better results in benchmarking.

Links mentioned:


Alignment Lab AI ā–· #general-chat (7 messages):

  • Tackling Hallucinations in AI: @tirmizi7715 mentioned Yi’s technical report which discusses reducing hallucinations. Various interpretations of this statement were discussed but not conclusively defined.
  • Strategies for Reducing AI Hallucinations: @rusch speculated that reducing hallucinations might involve externalizing the knowledge base through RAG (Retrieval-Augmented Generation) or ensuring the fine-tuning data contains only new facts.
  • Fine-Tuning Data’s Role in Minimizing Hallucinations: @scrungle.tech considered the possibility of using a validation set of facts and manually rewriting repetitive responses for fine-tuning as a method to reduce hallucinations.
  • Latest from LAION: @spirit_from_germany shared a Twitter link but provided no context or explanation in the message.
  • Search for Efficient Small Embedding Models: @joshxt inquired about the best small embedding model that supports 1024+ max input length and can be run locally with minimal RAM. No answers were provided in the summarized messages.

Alignment Lab AI ā–· #oo (8 messagesšŸ”„):

  • Claude’s Proficiency at Diagramming Code: @joshxt highlighted the potential for using Claude to convert entire code bases into mermaid graphs, mentioning successful trials on code bases ranging from 10k to 96k tokens.
  • Mermaid Graphs Explained: @lightningralf clarified for @teknium what a mermaid graph is, describing it as a syntax for creating diagrams from text, and shared the GitHub repository for mermaid.
  • Visualizing Code with Mermaid: @joshxt provided a practical example of a mermaid graph syntax to visualize a code base’s architecture, showing components like app.py, FASTAPI, and various API endpoints.

Links mentioned:

GitHub - mermaid-js/mermaid: Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown: Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown - mermaid-js/mermaid


Alignment Lab AI ā–· #alignment-lab-announcements (1 messages):

  • Gemma-7b Enhanced with C-RLFT: @imonenext announced the first usable Gemma-7b fine-tune based on openchat-3.5-0106 data and methods, achieving nearly the same performance as Mistral-based versions. The fine-tuning leveraged 6T tokens, hinted as the ā€œsecret recipeā€, and the model is available on HuggingFace.
  • Gemma’s New Milestone Tweeted: A tweet from @openchatdev celebrated the World’s First Gemma fine-tune using C-RLFT and its comparable performance to Mistral. The fine-tune potentially involves 6T pre-training tokens among other factors, as indicated in the Twitter post.

Links mentioned:


Alignment Lab AI ā–· #general-chat (5 messages):

  • Gemma 7B vs. Mistral 7B - Why release an underperformer?: @chatbooapp inquired as to why Gemma 7B was released if it doesn’t surpass Mistral 7B. @joshxt responded, stating that each model is an experiment, and Gemma may excel in ways not yet known.
  • Gemma’s potential moderation performance: @chatbooapp speculated whether Gemma 7B might underperform in NSFW content moderation compared to Mistral, due to Google’s strict moderation policies. This aspect seemed to resonate with @joshxt’s experiences where Gemma failed to impress in tasks they use LLMs for, yet they hadn’t tried any fine-tuned models.
  • Mistral’s NSFW allowance noted: Despite moderation concerns, @chatbooapp mentioned that even the Mistral endpoint doesn’t shy away from NSFW content, applauding its capabilities. They also highlighted that Mistral Large, when combined with a well-crafted system prompt, can be incredibly helpful.

Alignment Lab AI ā–· #oo2 (5 messages):

  • Rumors of User Demise Exaggerated: @teknium humorously expressed concern, using a ’<:sad_cat:905917016517521419>’ emote, that another user might have ā€œdied or somethinā€.
  • Alive and Coding: @autometa quashed rumors of their demise, stating they are simply ā€œburied in petty coding tasks atmā€.
  • Docker Development Dilemmas: @autometa mentioned the challenge of setting up a Docker environment to streamline collaborative efforts, eliminate the need for manual sampling, and optimize their development processes.
  • Call for Collaborative Coding: @autometa made a plea for assistance with Docker environment setup, emphasizing that help would be ā€œphenomenal to get movingā€ with their work.

LLM Perf Enthusiasts AI ā–· #general (1 messages):

  • Free Access to Claude 3 Opus with Vercel Pro: User @jeffreyw128 shared that those who have Vercel Pro can use Claude 3 Opus and GPT-4 vanilla for free. They provided a link to the Vercel SDK: sdk.vercel.ai.

Links mentioned:

Vercel AI SDK: Build AI-powered applications with the latest AI language models


LLM Perf Enthusiasts AI ā–· #gpt4 (1 messages):

  • Transition Inquiry from OpenAI to Azure: User @pantsforbirds is seeking insights on moving from OpenAI’s SDK to the Azure-based approach for their project. They are interested in understanding potential challenges faced during this migration process.

LLM Perf Enthusiasts AI ā–· #claude (15 messagesšŸ”„):

  • Function Calling Praised with XML Tags: @res6969 confirmed that function calling works well, although the use of XML tags enhances its effectiveness.
  • XML’s Impact on Sharing Prompt Generators: @pantsforbirds highlighted that the necessity of XML makes sharing a prompt generator more difficult.
  • General Superiority of Opus Over GPT-4: Multiple users, including @jeffreyw128, @nosa_., and @vgel, remarked on the overall better performance of Opus compared to GPT-4, with specific mention of its more ā€œinsightful/smart answersā€ and effectiveness in handling a complex graph BFS algorithm.
  • Claude’s Prose Preferred Over GPT’s Style: @potrock expressed a preference for Claude’s prose, noting it avoids the condescending explanations often preceding GPT’s answers.
  • Anticipation for GPT-4.5’s Release and Performance: @jeffreyw128 and @res6969 are looking forward to the potential release of GPT-4.5 or GPT-5, speculating on its capabilities compared to Claude, alongside excitement for the Starship launch.

LLM Perf Enthusiasts AI ā–· #opensource (1 messages):

res6969: https://x.com/elonmusk/status/1767108624038449405?s=20


LLM Perf Enthusiasts AI ā–· #offtopic (1 messages):

  • Google’s Potential Dominance in AI: @jeffreyw128 discusses two key reasons Google could dominate general AI adoption: lack of a long-term moat in foundation models and Google’s ability to integrate and serve AI cost-effectively within its search and Chrome platforms.
  • Affordable AI Integration with Google: With the current revenue from search queries, Google has the potential to offer AI services within searches at a negligible cost, potentially rolling out a Generative Search Experience widely within the year.
  • OpenAI’s Lead May Foster Mass Adoption: Despite the competition, @jeffreyw128 believes OpenAI will stay ahead for a few years, fostering significant adoption and specialized applications like code generation.
  • Google’s Dynamic AI Deployment: Google’s advantage, as noted by @jeffreyw128, is in its intelligent choice between text generation and supplying extractive answers, potentially outperforming other online LLM experiences.
  • The Future of AI Integration and Premium Experiences: Moving beyond browsers and search integrations, deeper hardware integrations may emerge. However, while consumer AI applications will be economically viable, there will still be a market for premium AI experiences in areas like coding or writing.

Skunkworks AI ā–· #general (1 messages):

  • Quantum Leap in Convergence Acceleration: @baptistelqt claimed to have developed a method to accelerate convergence by a factor of 100,000. Each ā€œroundā€ involves training from scratch.

Skunkworks AI ā–· #finetuning (1 messages):

henkdevries_starbound: math quuestions are hard


Skunkworks AI ā–· #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=H6xon8K4Ius


Datasette - LLM (@SimonW) ā–· #ai (1 messages):

dbreunig: Earhart


Datasette - LLM (@SimonW) ā–· #llm (2 messages):

  • Praise for Symbex: User @bdexter expressed gratitude for symbex, noting frequent use of the project.
  • SimonW Acknowledges Symbex Fun: @simonw responded with enthusiasm, describing symbex as a ā€œreally fun projectā€.

AI Engineer Foundation ā–· #general (1 messages):

.zhipeng: from nathan’s interconnectai blogpost right ?


AI Engineer Foundation ā–· #events (1 messages):

  • Gen AI Video Event Announced: @sylviatong invites all to a deep dive on Gen AI Video and the ā€˜World Model’. The lineup features Lijun Yu from Google, Ethan He of Nvidia, Shan Jin from Goodby Silverstein & Partners, and Justin Hackney of Eleven Labs; the event is moderated by Cindy Le and will dispel myths around #Sora, #Genie, and #WorldModel. March 16, 2024, in San Francisco & on Zoom. RSVP link.
  • Conversation with AI Video Pioneers: The event offers a platform for learning from top-tier researchers and promises real, unfiltered conversations. Expect insights into Google’s VideoPoet, Nvidia’s Sora description, and more creative technologies in AI Video.

Links mentioned:

Gen AI Video Breakout and World Model by EntreConnect - #Sora #Genie #VideoPoet #V-JEPA #LTXStudio #AnimateDiff Ā· Luma: Join us for a groundbreaking event that dives deep into the heart of Gen AI Video! This isn’t just another tech talk; it’s a journey into the future. We will also provide dial-in options, wh…