> AI News for 5/2/2024-5/3/2024. We checked 7 subreddits and [**373** Twitters](https://twitter.com/i/lists/1585430245762441216) and **28** Discords (**418** channels, and **5847** messages) for you. Estimated reading time saved (at 200wpm): **642 minutes**.

It’s been a quiet week for AI news. This is a fun new Kaggle challenge:

You’ll work with a dataset from the Chatbot Arena, containing conversations and user preferences across various LLMs. By developing a model that accurately predicts human preferences, you’ll contribute to improving chatbot performance and alignment with user expectations. The training dataset includes over 55,000 real-world user and LLM conversations and user preferences, with personally identifiable information removed. Your solution submission will be tested on a hidden test set of 25,000 samples.

The competition will run until August 5th, with a total prize of $100,000, featuring a $25,000 prize for 1st place, 20,000 prizes for 2nd through 4th places, and a 15,000 prize for 5th place.


Table of Contents

[TOC]


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

LLM Model Releases and Benchmarks

  • Llama 3 Models: @DrJimFan announced DrEureka, an LLM agent that writes code to train robot skills in simulation and enables zero-shot transfer to the real world. @GroqInc’s Llama 3 70B model is breaking performance records at $0.65/1M input and $0.9/1M output tokens. @bindureddy notes Llama 3 models from Groq are leading while OpenAI focuses on hyping GPT-5.
  • Benchmarking LLMs: @DrJimFan suggests 3 types of LLM evaluations that matter: privately held test sets with publicly reported scores by trusted 3rd parties like @scale_AI, public comparative benchmarks like @lmsysorg’s Chatbot Arena, and privately curated internal benchmarks for each company’s use cases. @percyliang notes some models perform poorly with certain prompts on GSM8K benchmark.
  • Open Source Evaluator LLMs: @seungonekim introduces Prometheus 2, open source evaluator LLMs that closely mirror human and GPT-4 judgments and support direct assessment and pairwise ranking formats. They outperform proprietary LMs like GPT-4 and Claude 3 Opus on building LM judges.

Datasets and Benchmarking

  • GSM1K Dataset: @percyliang discussed how models are sensitive to prompts on the new GSM1K dataset, needing sampling and majority voting to reduce noise. Some perform poorly with extra hints.
  • WildChat1M ChatGPT Logs: @_akhaliq shared the WildChat dataset from AI2 with over 1M ChatGPT interaction logs in the wild. It has 2.5M turns, diverse prompts, many languages, and toxic examples.
  • Kaggle Human Preference Prediction: @lmsysorg announced a $100K Kaggle competition to predict user preferences between LLM responses in their Chatbot Arena, based on a new dataset with 55K user/LLM conversations.
  • Contamination Database: @clefourrier noted a new open database to track contamination of models and datasets to help select “safe” artifacts for model creation.

Techniques for Efficient LLM Training and Inference

  • LoRA for Parameter Efficient Fine-Tuning: @mobicham assesses viability of training and serving LLMs fine-tuned with quantized low rank adapters (LoRA) across 10 base models and 31 tasks. 4-bit LoRA models outperform base models by 34 points and GPT-4 by 10 points on average. LoRAX inference server enables deploying multiple LoRA models on a single GPU.
  • Efficient Model Alignment with NeMo-Aligner: @NVIDIA introduces NeMo-Aligner, a scalable toolkit for efficient LLM alignment techniques like RLHF, DPO, SteerLM, SPIN. It scales to hundreds of GPUs for training large models.
  • Factuality-Aware Alignment to Reduce Hallucination: @mobicham proposes factuality-aware SFT and RL alignment to guide LLMs to output more factual responses. Training LLMs on new knowledge or unfamiliar texts can encourage hallucination.

Multimodal and Long-Range LLMs

  • Multimodal LLM for Automated Audio Description: @mobicham introduces an automated audio description pipeline using multimodal instruction-following capacities of GPT-4V. It produces ADs compliant with natural language production standards while maintaining contextual consistency.
  • Extending LLM Context Windows: @rohanpaul_ai reports extending Llama-3-8B’s context 10-fold to 80K tokens overnight using only 3.5K synthetic QA pairs. The resulting model excels at long-context tasks like book QA and summarization, rivaling GPT-4.
  • Consistent Long-Range Video Generation: @mobicham proposes StoryDiffusion framework for consistent long-range image/video generation from text. It introduces Consistent Self-Attention and Semantic Motion Predictor to maintain consistency across generated frames.

Emerging Architectures and Training Paradigms

  • Kolmogorov-Arnold Networks as MLP Alternative: @rohanpaul_ai reports Kolmogorov-Arnold Networks (KANs) as a novel alternative to MLPs. KANs use learnable activation functions on edges and replace weights with learnable splines. They achieve higher accuracy with fewer parameters and avoid curse of dimensionality.
  • Apple’s On-Device LLMs and AI-Enabled Browser: @rohanpaul_ai notes Apple introducing OpenELM, a family of small on-device LLMs, and an AI-enabled Safari browser at WWDC. On-device LLMs enable free inference without API calls.

Miscellaneous

  • WildChat1M ChatGPT Interaction Dataset: @mobicham introduces WildChat1M, a dataset of 1M user-ChatGPT conversations with over 2.5M interaction turns. It offers diverse prompts, multiple languages, and captures various use cases and user behaviors across regions.
  • Open Source Libraries for ML Deployment: @dl_weekly shares a curated list of open source libraries to deploy, monitor, version and scale machine learning models in production.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Model Releases and Updates

AI Applications and Demos

AI Societal Impact and Concerns

AI Research and Benchmarking


AI Discord Recap

A summary of Summaries of Summaries

1. Large Language Model (LLM) Advancements and Challenges

2. AI Model Fine-tuning and Optimization Strategies

3. Open Source AI Frameworks and Libraries

4. AI Hardware Acceleration and Optimization

5. Misc

  • LM Studio Introduces CLI Tool and Addresses Bugs: LM Studio launched lms, a new CLI tool to manage local LLMs, starting/stopping servers, and debugging. It requires LM Studio 0.2.22+ and is open source on GitHub. The latest update also fixed a bug causing entire context to be included in model responses. Users explored running LM Studio headlessly and embedding it in scalable server solutions.

  • Quantization Challenges and Context Expansion in LLMs: Quantization’s impact on LLaMA 3 performance was a hot topic, with a Reddit discussion and research paper suggesting significant quality loss. Meanwhile, LLama-3 8B achieved over 1040k context length with Crusoe Energy’s compute, and Jamba-Instruct from AI21 Labs expressed interest in much larger context windows.


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

New Frontier in LLM Finetuning: Community members discussed near-full finetuning possibilities with Unsloth, exploring the potential of setting all parameters except layernorms to trainable. While Unsloth is focused on addressing llama.cpp and GGUF conversions, particularly the quantization and loading checkpoint shards challenge, sentiment analysis enthusiasts received tips on formatting vast databases for LLM compatibility.

Experimental Full Finetuning Tactics and Dataset Structuring: Unofficial strategies to enable full finetuning on Unsloth were shared, demonstrating improved losses relative to standard Hugging Face implementations. Discussions also delved into ideal dataset structuring for optimization, suggesting strategies for handling multiple “rejected” responses.

Phi 3 Executes in Browser, But Llama 3 Discord Absent: A tweet here demonstrated running Phi 3 in a web browser, while a member clarified that no dedicated Discord channel exists for Llama 3. Meanwhile, incorporating new roles in Llama 3 sparked debate, with type=code being a suggested alternative for tool_call.

Adapting Llama 3 With Self-Discovery and Triton’s TK-GEMM: One ingenious user applied techniques from the Self-Discovery paper to enhance the reasoning capabilities of ChatGPT. Moreover, a PyTorch blog post highlighted Triton’s FP8 GEMM to accelerate Llama 3 on NVIDIA H100 GPUs, promising optimization insights.

Quantization Quandary and Finetuning Finesse: Issues emerged when converting Llama 3 to GGUF, impacting fine-tuning data integrity, and similar problems arose when melding Lora with GGUF models. However, a pathway to understanding finetuning and model management is becoming clearer, with established community members suggesting the use of Unsloth’s Colab notebooks for guidance.


Stability.ai (Stable Diffusion) Discord

  • Slash Commands Get Ghosted: Engineers observed the mysterious disappearance of the /faq command in the Discord commands, triggering a wave of jokes about its noticeable absence only after it was gone.

  • Graphical Debate: Nvidia vs. AMD: A hot topic was the choice between Nvidia’s 4080 and 3090 GPUs versus AMD’s 7900xtx, with discussions centered around VRAM capacity and the merits of waiting for Nvidia’s forthcoming 5000 series for future resilience.

  • Conversion Curiosity with RTX 4080: Queries were raised about the time efficiency of an RTX 4080 in converting videos into anime-style using AI, with members seeking performance benchmarks for such tasks.

  • GPU Loyalties Split the Room: Members heatedly debated the advantages of Nvidia GPUs over AMD for AI applications, with a few advocates for AMD drawing from their positive experiences, despite Nvidia’s touted new Blackwell architecture.

  • Enhancement Enigmas: Text and Image Upscaling: Various methods for AI-assisted text addition to images and image upscaling were shared, including applications like Davinci Resolve for text and upscaling tools like ComfyUI and the Harrlogos XL for Stable Diffusion’s custom text generation.


CUDA MODE Discord

Gradient Adornments in Conversations: Discord members discussed advanced gradient techniques within PyTorch, where create_graph=True is employed for finer gradient details and Hessian-vector products. Techniques to estimate the Hessian’s diagonal were mentioned, leveraging randomness for the estimations.

Triton Trials and Triumphs: Engineers faced challenges with IncompatibleTypeErrorImpl in Triton, but found solace in a tl.cast function fix after stumbling upon a gather function issue. Kernel debugging with PyTorch in PyCharm also proved problematic, even when setting TRITON_INTERPRET to "1".

Patching it Up with tinygrad: Members shared a multi-GPU support patch for tinygrad, endorsing Nvidia’s open drivers. A GitHub conundrum surfaced about the right way to install custom PyTorch and CUDA extensions, seeking clarity through examples in the PyTorch AO library’s setup process.

Catalyzing Community Contributions: The Effort project on GitHub received accolades for its impactful structure, while GreenBitAI’s toolkit was introduced as an ML framework enhancing PyTorch. It includes innovative gradient calculation methods and a potentially useful gemv kernel for inference spotlighted in bitblas.

torch woes and wins: PyTorch developers debated build strategies and optimizations, from build times for linear algebra components to kernel performance. The idea of padding vocabulary size to fairly compete in performance benchmarks was deliberated, revealing the nuanced considerations needed for equitable measures.

A Taste of LLM Innards: The llm.c project reached new efficiencies with 167K tokens/second using CUDA optimization techniques. Key discussions on CUDA streams, fused classifiers, and the strategic use of atom variables with scratch buffers highlighted the dense technical camaraderie.

Open Source Intel: It was briefly mentioned that Intel is now added to the PyTorch website, indicating a potential integration or support update.


LM Studio Discord

CLI Joins the LM Studio Toolbox: LM Studio has launched its new CLI tool, lms, designed to simplify the management of local LLMs, including loading and unloading models and starting or stopping servers. The CLI tool is available for the latest LM Studio 0.2.22 and beyond, and users are encouraged to contribute to its open source GitHub repository.

Llama’s Conversion Complication: Collaboration in the LM Studio guild led to the successful resolution of several integration issues with llama.cpp, utilizing scripts such as convert-hf-to-gguf. Some users faced FileNotFoundError that was fixed by redownloading necessary files via huggingface-cli, with the community assisting in addressing conversion execution problems.

Model Performance and Oddities: Discussion in the models channel revealed endeavors to enhance story writing with Goliath 120B Longlora models and experiments to assess recall capabilities of models like LLAMA 3 on extensive texts. A curiosity emerged about ChatQA 1.5 showcasing unexpected response templates, whereas a bug in the latest LM Studio 0.2.22 prompted a new update for corrected behavior.

ROCm’s Growing Pains and Triumphs: Members explored the capabilities of the latest LM Studio 0.2.22 ROCm Preview, with some testing the upper limits of RAM and context sizes and others addressing issues with embedding models. The introduction of lms CLI for AMD ROCm’s preview and Linux support triggered spirited discussions about the tool’s potential, bolstered by efforts in headless mode execution and dockerization.

Server-Client Connect Unlocked: Tips and fixes for configurations were shred, including a handy way to repopulate default configs, resolving access to LM Studio through WSL by using correct IP addresses, and enabling seamless communication between Windows and WSL environments for the app without additional complexity.


Perplexity AI Discord

  • Beta Test Battalion Assembles: The recruitment drive for Pages beta testers has successfully concluded, with the team voicing their appreciation and directing attention to upcoming updates on development progress.

  • Perplexing Browser Predicaments and Payment Puzzles: Technical issues were flagged with Perplexity not functioning on Safari and Brave browsers, while a user’s query about an unwanted subscription charge was directed to [email protected] for resolution. Enhancements for voice command functionality and clarity on usage limits for models like Gemini 1.5 Pro and GPT-4 Turbo were hot topics, alongside excitement for emerging AI technology advancements.

  • Share Wisely and Prosper: Reminders were sent to ensure threads are made shareable before linking on Discord, encompassing a range of interests from lunar queries to musical AI discoveries. Concerns over printer privacy and an exploration of AI-generated content underlined the guild’s diverse focus areas.

  • AI API Adventures and Accuracies: Discussions centered on making effective use of the Sonar Large model through precise prompts and prompt optimization techniques. Variable results with the API highlighted the need for tweaking settings like frequency_penalty, temperature, and top_p to enhance response quality, with guidance pointing towards transitioning to the latest Sonar models for improved accuracy.


Nous Research AI Discord

Hermes 2 Pro Hops into the Fray: The recently released Hermes 2 Pro integrated with LLaMA weights is making waves with its advanced QA, Function Calling, and JSON Mode capabilities. It’s garnering attention for exceptional inference speeds on mobile devices and has support material on GitHub and Hugging Face.

ChatML Equation S-Bahn: Tweaks to enable ChatML like using token replacement strategies and altering EOS symbols are being dissected by members, though details on the modifications are sparse.

World-sim Codex: A lively discussion around world-sim pointed out recent updates and shifts, such as the introduction of the Iron Age, and shared resources on consciousness and AI with links to YouTube talks.

Dataset Seekers Untie: Members queried about free generic datasets suitable for finetuning LLMs prior to initiating mining sequences, prompting shared interest but limited response in channels marked #bittensor-finetune-subnet and #rag-dataset.

LLama Crafting Corner: Troubleshooting around llamacpp led to suggestions of using ollama to sidestep handling C directly and to employ techniques like quantization and pruning for ideal CPU-run LLM scenarios. The conversations also explored the intriguing concept of moral non-commutativity in retrocausality and the psychological impacts therein.


Modular (Mojo 🔥) Discord

Bringing Mojo to the Command Line: The prism CLI toolkit for Mojo has been augmented with new features such as persistent flags, hooks, and flag groups. Updates are showcased on the project’s GitHub page.

Test Driven Mojo Development: mojo-pytest, the plugin for testing in Mojo, now supports the new version 24.3. An issue to improve debuggability is tracked at Issue #9 on GitHub.

NuMojo Outpaces Rivals: The NuMojo project, aiming to enhance Mojo’s standard library tensor functionality, has been updated for Mojo version 24.3 and shown to perform better than NumPy and Numba in benchmarks. Check out NuMojo’s progress on GitHub.

Adventures in Learning Mojo: For those curious to integrate Mojo into workflows, a new “Let’s mojo build -D your own -D version=1 app” tutorial is available. It’s designed to illustrate Mojo’s capabilities through a series of workflows and can be found on GitHub.

Nightly Releases Keeping Mojo Fresh: Mojo’s development strides forward with more frequent nightly releases—eventually daily—aligning with infrastructure improvements. Nightly changelogs, like the introduction of __source_location() and improved docstring flexibility, can be perused at the Modular Docs Changelog.

Maxing Out on MAX Extensibility: MAX 24.3 introduces the brand new MAX Engine Extensibility API which aims to perfect PyTorch, ONNX, and Mojo model integrations. Detailed information on performance and hardware optimization is provided in the MAX Graph APIs.


OpenAI Discord

AI Job Market Roulette: The community engaged in a humorous debate about the fleeting nature of high-paying jobs in AI, with quips about the potential profitability of unconventional career paths like AI CEO or even a dentist.

Speculation Station for GPT-5 Ticket Prices: There’s chatter on the potential pricing strategy for GPT-5, with the group divided on whether OpenAI would opt for regional pricing models or stick with a single price point for all.

Deja Vu for GPT-3 Devotees and Chat Rooms: Members expressed nostalgia over GPT-3 and Codex, despite the buzz around GPT-4, and raised questions about the absence of voice chat rooms for real-time discussion, citing moderation concerns.

Response Time Riddle with GPT-4: Talks about GPT-4’s response times being slower than GPT-3.5, with mentions of gpt4 turbo facing significant latency, indicating that engineers are keeping a close eye on performance metrics.

Cutting Through the Clutter in AI Research: Discussions emphasized the distinction between publicly available research papers and the unrealistic expectation of OpenAI releasing fully trained proprietary models, due to their computational demands and proprietary elements.


HuggingFace Discord

Code Whispering with Moondream and FluentlyXL: Community contributions showcase Moondream 2 for batch processing and FluentlyXL v4, as well as Portuguese translations of HF’s Audio course and a new MPI Codes repository for MPI development. An intelligence boost for LangChain and FinBERT’s financial sentiment tuning were also discussed.

Babel Fish’s Extended Family: The multilingual sphere expands with BLOOM supporting 55 languages and research on improving LLMs, exemplified by a curated list and the RARR approach for automatic attributions in text generation. Members are also keen on deploying models with Ray and assessing quality metrics for refined prompts.

Diffusion Model Mixology: In diffusion discussions, the community explores techniques for merging pipelines and partial diffusion methods, with a notable partial diffusion pull request for SD 1.5 found on GitHub. Overall, the topic of efficient and innovative model merging strategies garners attention.

Model Fine-Tuning Finesse: Best practices for fine-tuning models, like only adjusting classifier weights and customizing training loops, are debated, with a detailed guide on HuggingFace’s Transformers and Keras. Members also discuss visual confirmations of models like Fluently-XL-v4 outperforming others on Instagram.

Seeking AI Mentors and Conversationalists: The community expresses a need for parquet converter-bots and more structured ways for members to provide peer support, like a possible #cv-study-group, while sharing knowledge and links for upskilling, such as a YouTube video on fine-tuning AI models and an exploration of graph ML’s impact on LLMs.


LlamaIndex Discord

  • RAG Stack Up: The LlamaIndex community shared resources on creating efficient data stacks and RAG pipelines with a focus on boosting query precision. @tchutch94 and @seldo contributed to a detailed tutorial, which can be read here; while the OpenAI assistant API v2 was praised for its effectiveness but flagged for high costs per query.

  • Airbnb’s Listing Leap: Harshad Suryawanshi unveiled a guide for a RAG application capable of filtering Airbnb listings using natural language, leveraging MistralAI’s Mixtral 8x7b tools. Detailed documentation and a repository guide have been provided here.

  • Introspective Agents Intro: New introspective features in LlamaIndex 10.34 were highlighted, promising self-reflective agents capable of iterative response improvements and future Huggingface integration. Concerns were raised regarding content sensitivity, advising caution with the implementation detailed here.

  • Pandas in Finance, MongoDB Mysteries, and More: There’s ongoing dialogue on leveraging the Pandas Query Engine for financial applications, fine-tuning MongoDB for LlamaIndex querying, rectifying llamacpp deadlocks, and employing Trulens for observability. One member signaled memory usage spikes with LlamaIndex, indicating an urgent need for memory management optimization.

  • Challenges and Code: The guild witnessed requests for technical advice, from setting up financial analysis applications to addressing potential deadlocks in parallel requests with llamacpp. There’s an active pursuit of alternative methods for specific MongoDB operations and guidance on memory issues with LlamaIndex, with additional links provided for community learning and support.


Latent Space Discord

  • Suno Sings New Melodies: An AI-in-action-club member sparked interest about Suno’s music generation ability, anticipating whether it can compose entire music tracks independently, with focus on its audio tokenizing technique.
  • Mamba Conversations Go Deep: In llm-paper-club-west, enthusiasts are delving into Mamba’s inner workings with a Notion deep dive (A Mamba Deep Dive) and debating its selective recall and sensitivity to overfitting.
  • Audio Innovation at Its Finest: Discussions in AI-in-action-club revolved around processing and generating audio with autoencoders and latent diffusion, citing concern for harmonic distortion and referencing a blog about the snake activation function which might mitigate this issue.
  • Unlocking Gemini’s Potential: A user in ai-general-chat sought tools compatible with Gemini 1.5, yet expressed preference for Opus or Cursor due to better performance with long contexts.
  • SQLite Searches in New Dimensions: A mention in ai-general-chat of a new vector search extension for SQLite, namely sqlite-vec, indicates a stride in improving vector search functionalities within databases.

Eleuther Discord

LLMs Translating Before Answering: Engineers debate Large Language Models (LLMs) processing multilingual inputs by potentially converting them to English first, referencing “Understanding Language Models by Fine-grained Language Identification”. An important nuance for those looking to optimize multilingual LLM systems.

Lost Research Directions Evoke Nostalgia: A reflective exchange on understudied ML fields, such as adversarial robustness and domain-specific modeling, lamented due to the industry’s overshadowing allure. Notably poignant for the career paths of researchers in the field.

Leakage Looms Over Benchmarks: Concerns in benchmark dataset leakage for LLMs stir conversation, emphasizing the challenges in gauging leaks and rectifying them. Two papers, one on leakage detection and another proposing new methods like fresh benchmark questions, fuel the discussion.

English as a Pivot in LLMs Proves Generative: llama models’ findings suggest English as a pivot language is a sound strategy, potentially boosting those working on cross-model generalizability. Such replication adds weight to the approach for those developing multilingual LLMs.

Language Models Dream of Chess Mastery: A study involving a transformer trained solely on chess games achieves high performance, sans heuristics, as cited in a DeepMind paper. Demonstrates the scope of scale training for AI engineers interested in out-of-box model applications.

Grandmaster-Level Chess Without Search: A study using a transformer model trained on a dataset of 10 million chess games was brought up, demonstrating the model’s high performance in chess without domain-specific enhancements or explicit search algorithms. The DeepMind paper indicates that training models at scale can lead to competitive levels of play without the approaches traditional chess engines use.


OpenAccess AI Collective (axolotl) Discord

  • LLama-3 8B Stretches its Legs: LLama-3 8B successfully extended its context length to over 1040k, crucially supported by Crusoe Energy’s compute, incorporating an adjusted RoPE theta for advanced long-context handling in large language models.

  • Optimization Achieved in Axolotl Repo: A significant improvement has been contributed via a PR that solves a bottleneck in the orpo trainer by enabling it to utilize multiple workers for data preprocessing, as detailed at GitHub PR #1583, which could enhance speed across various training configurations like DPO, SFT, and CPO.

  • Prompt Design Evolves and llama.cpp Runs Rings Around Inference: Prompt fine-tuning insights emerged, revealing the inclusion of ChatML tokens within system prompts improves tokenization, while llama.cpp upgrade resulted in a 30% increase of Hermes 2 Pro Llama 3 8B inference speed on 8GB RAM Android devices.

  • Conversion Complexities with llama.cpp: Troubles converting SafeTensors to GGUF were voiced, highlighting the limitations of llama.cpp’s script, which lacks the breadth of conversion options such as q4k. Solutions were explored, with a script for conversion provided, yet the quest for expanded output types persists.

  • DeepSpeed Stage 3 Flashes Past VRAM Limitations: ZeRO-3 optimizations do not impact model quality but require careful integration, potentially harmonizing with Flash Attention for fine-tuning pursuits. When applied correctly, these technologies can augment training speeds and enable larger batch sizes without necessitating complex parallelism—with experience shared on Axolotl’s GitHub and corroborated by DeepSpeed documentation.


OpenInterpreter Discord

Documentation Dilemma Resolved: Access to instructions for Ollama, Jan.ai, and Llamafile is improved with a direct link to the Open Interpreter local installation guide, emphasizing dolphin-mixtral configurations to streamline the setup process.

Performance Enhancements for Whisper RKNN: A notable 250% performance surge is achieved for Whisper RKNN on Rockchip RK3588 SBCs as shared in the rbrisita’s GitHub branch, and there’s an anticipation of upcoming LLM RKNN feature integrations.

AI Vtubing Enters Open Source Arena: The AI Vtuber community benefits from a pair of new resources: an AI Vtuber starter kit on GitHub, and an offline-ready, API-free Vtuber repository, with a live proof-of-concept showcased on YouTube.

Interactivity Extended to Mobile: Insight into hosting Open Interpreter on servers for broader access and setting up mobile-friendly, local models was shared, linking to specific Android device setup and running Open Interpreter locally.

Sound Choices in Speaker Selection: A discerning approach is underway to select the optimal speaker for an unnamed electronics project, promising future insights based on the integration and validation results.


OpenRouter (Alex Atallah) Discord

OpenRouter Battles Traffic Surge: OpenRouter grappled with higher-than-normal errors due to a traffic spike, with scaling efforts in progress to mitigate intermittent connectivity issues.

Money Moves: A proposal to integrate WeChat Pay and Alipay via Stripe was discussed, with the community aware of it requiring additional paperwork; meanwhile, suggestions to develop an app for smoother transactions using Google payment services were also floated.

Model Size Matters: The AI community showed keen interest in next-generation language models like LLaMA-3, with anticipation for potential releases by entities like Soliloquy, while recognizing the limitations tied to proprietary models.

Fine-Tuning Finesse: Engineers debated the risk of model dumbing post-fine-tuning without instruct datasets, agreeing that blending old and new data might safeguard against catastrophic forgetting.

Gemini Pro Troubleshooting: Technical solutions were shared for problems encountered with Gemini Pro messages, such as starting prompts with an “assistant” role to facilitate better interactions.


AI Stack Devs (Yoko Li) Discord

StoryDiffusion Crafted by Angry Penguin: StoryDiffusion sparks interest, engaging members with AI storytelling potential, following a link shared by angry.penguin.

AI Town Troubles and Tools: Disruptions from empty messages and strings of numbers in ai-town-discuss highlight tokenizer concerns; meanwhile, resources like @TheoMediaAI’s AI simulation exploration and @cocktailpeanut’s sqlite replay web app for AI Town catch attention.

Node Woes in Backend Development: Incorrect Node version causes stumbling blocks in local deployment of convex-local-backend; workaround involves switching to Node v18. A community-sourced issue was logged regarding a TypeError with .ts extension during setup.

Raspberry Pi Channel Piqued Interest: An expression of deep contemplation and a member’s acknowledgment reveal that the ai-raspberry-pi channel meets certain members’ specialized interests in AI development on small-scale hardware.

Cocktail Peanut Receives Undefined Kudos: A mysterious member praises cocktail peanut amid discussions but leaves the community guessing the work or breakthrough being referenced.


LAION Discord

  • SoundStream Hits a Sour Note: An AI engineer faced implementation issues with Google’s SoundStream, but others recommended a concrete solution—a GitHub repository that could offer guidance.

  • Sharing is Caring in the Art AI Space: A newcomer who completed a Stable Diffusion Udemy course is willing to share it with peers, aiming to forge connections and further hone their skills in AI-generated art.

  • AI Community Gets Playful with Investments: In a lighter moment, AI enthusiasts joked about their investment strategies, humorously preferring services that would either significantly multiply or halve their money.

  • Quest for Prompt Adherence in Model Training: Discourse revealed skepticism regarding the effectiveness of using both T5 text encoder and CLIP in improving prompt adherence in model training, sparking a mix of surprise and theories about the role of CLIP dropout.

  • Back to Basics for Bigger Isn’t Always Better: Within the StableDiffusion space, the focus is migrating from building larger models to enhancing architecture and training methodologies on smaller models due to hardware limitations. This highlights the importance of nuanced training with CLIP to sidestep embedded biases and constraints.

  • Dataset Debate Rages On: A heated chat about dataset choices showed a preference for real-world datasets like MNIST, CIFAR, or ImageNet over synthetic ones to better showcase interpretability in models.

  • Interpretability or Applicability?: Skeptics in the conversation debated whether methods developed for interpretability also effectively translate into solving real-world challenges, adding a layer of practicality to the discussion.

  • A Mysterious New Arrival: StoryDiffusion appeared on the scene courtesy of a guild member, albeit with no further explanation, leaving the engineers to scratch their heads about its use or importance.


LangChain AI Discord

Hackathon Alert: Build AI Products in 54 Hours for Cash: The BeeLoud hackathon, scheduled for May 10-12, invites participants to create AI innovations within 54 hours, with a prize pool of up to $25,000. For more details, see Build - BeeLoud.

LangChain and RAG Empower Email Crafting: LangChain’s LangGraph Agents now leverage Retrieval-Augmented Generation (RAG) to enhance AI-assisted email drafting, promising both efficiency and quality improvements, as detailed in a Medium article.

Java Devs, Meet LangChain: A newly available langchain4j Java port of LangChain has been announced, broadening the scope for integrating AI applications across different platforms and languages. Interested engineers can explore langchain4j on GitHub.

Dragonfly Boosts LangChain’s Performance: By integrating the Dragonfly in-memory data store with LangChain, developers can expect improved chatbot performance and context management which is explained with examples in their latest blog post.

Langserve Decoded: The langserve feedback endpoint clarification was provided, where an “OK” response merely indicates that feedback has been successfully submitted, but might still be rejected if the server deems it unauthenticated or invalid.


Interconnects (Nathan Lambert) Discord

  • Leaked Model Mayhem: A leaked model, possibly from GDM featuring oddly specific quant, was discussed with references to a tweet and mysterious 4chan postings hinting at a breach.
  • Prometheus 2 Rises: A new language model, Prometheus 2, introduced in a research paper, claims superior evaluation abilities over GPT-4, sparking conversations about its efficacy and utility.
  • Competition Heats Up with Big Prize Pools: LMSYS launched a $100K human preference prediction competition, as mentioned in a tweet, leveraging conversations from popular language models like GPT-4 and Mistral.
  • The PPO-REINFORCE Connection: An exploration suggesting that Proximal Policy Optimization (PPO) could reduce to the REINFORCE algorithm under certain conditions spurred ongoing discussion, with a resource shared from OpenAI’s Spinning Up documentation.
  • The Undisclosed Value of Value Functions: Debates about why value functions aren’t typically released post-RLHF training led to recognizing the potential wealth of insights they hold for reinforcement learning, despite them not being a standard share-out in the community.

Cohere Discord

PDF Search System Unearthed: A member proposed a search system for large PDF documents, discussing strategies including document summarization via LLMs, embedding generation for semantic search, and LLMs-based key information indexing.

Llama Tokenization Mysteries Revealed: Queries arose regarding the necessity of a beginning-of-string (<BOS_TOKEN>) when using the llama-cpp-python library with Command R+, with observations of its automatic inclusion during tokenization.

RAG Access with Cohere Confirmed: A user’s question about the feasibility of using a free Cohere API key for RAG was answered, confirmation was given of its availability, albeit with rate limitations.

C4AI Command R+ Gets Quantized: Technical conversation unfolded around the C4AI Command R+ model, with a focus on its quantized variant, and varying system requirements for local implementation.

Code Interpreter SDK Takes the Stage: An announcement regarding the launch of the Code Interpreter SDK surfaced, alongside a discussion about its distinction in the context of pre-existing technologies.


Mozilla AI Discord

  • llamafile Takes the Leap to Systemd: Engineers have shared a systemd script for deploying llamafile on Rocky Linux 9, which includes detailed execution commands and the configuration of necessary arguments like server port and model path.
  • Server Mode Gets a URL Facelift: Responding to a request for specifying a base URL in server mode, an issue was raised on GitHub for proxy support in llamafile, which would facilitate serving it under a subdirectory through Nginx.
  • Ein, Zwei, Whisper!: The community showed interest in the distil-whisper-large-v3-german model, with discussions on its application in a speech-to-text, LLM processing, and text-to-speech pipeline that could culminate in a detailed blog post.
  • Vector Space Mysteries: A discrepancy in embedding directions between llamafile and llama.cpp was highlighted, where a low cosine similarity points to an issue described on GitHub, and was tested with available Python scripts.
  • Chatty Files and Code: To facilitate conversational interaction with documents and code using llamafile, members recommended utilizing curl API calls, with reference to example scripts found in the llama.cpp chat script repository.

tinygrad (George Hotz) Discord

  • Tinygrad Makes Strides and Welcomes New Contributors: Tinygrad has reportedly made significant progress recently, and a member celebrated their first commit to the project, marking a personal milestone.

  • Blobfile’s Role in Llama.py Explained: Users clarified that blobfile is crucial for load_tiktoken_bpe function in examples/llama.py, enhancing understanding among peers.

  • Troubleshooting Tinygrad’s Forward Pass: One engineer faced challenges with the forward pass compute graph, which was addressed by prompting execution with out.item() or out.realize() and installing missing libraries to fix a NameError.

  • Resolving Graph Visualization Issues in Tinygrad: Installation errors with networkx and pydot were resolved by installing pydot and graphviz, respectively, after which a member recommended the documentation be updated to help others avoid the sh: dot: command not found error.

  • Community Collaboration Drives Documentation Improvement: The resolution of the ‘dot command’ issue via graphviz installation highlights the collaborative spirit of the community, prompting a practical suggestion to update the project’s documentation to aid future users.


AI21 Labs (Jamba) Discord

Jamba-Instruct Is Live: AI21 Labs has launched Jamba-Instruct, a sophisticated instruction-tuned hybrid SSM-Transformer model, designed to enhance commercial application performance. The company highlights the model’s capabilities in a recent Twitter announcement and a detailed blog post.

AI21 Labs Welcomes Feedback for Jamba-Instruct: AI21 Labs is inviting industry feedback for Jamba-Instruct and indicates their openness to discuss custom requirements, including context windows exceeding the initial 256K limit.

Reading Up on Jamba-Instruct: Engineers interested in the Jamba-Instruct model can gain a deeper understanding by reading the official blog post, which talks about its deployment for reliable commercial use and quality benchmarks.

Higher Context Windows on the Horizon: An AI21 Labs staff member has expressed their interest in exploring significantly larger context windows for Jamba-Instruct and has invited users to collaborate on this potential expansion to meet specific use scenarios.


Alignment Lab AI Discord

  • Quick Alert: Fast Compute Grants: AI enthusiasts and engineers take note, a tweet by @PrimeIntellect announces availability of fast compute grants for those in need. Check out the details in their Fast Compute Grants Tweet.

DiscoResearch Discord

  • Quantization Woes for LLaMA 3: A conversation on the guild revolved around the impact of quantization on LLaMA models, with a Discord member citing a Reddit discussion and research paper that discuss the performance hit when applying low-bit quantization to LLaMA 3.
  • Chinchilla Law Ignored, Performance Suffers: The guild also explored how the significant quantization of Meta’s LLaMA may lead to substantial information loss due to the neglect of the chinchilla scaling law and the model’s training on 15T tokens. This suggests larger models might experience even more pronounced degradation with increased precision reduction.

Skunkworks AI Discord

  • Skunkworks AI Projects Hook Up With Fast Compute Grants: Ambitious Skunkworks projects can potentially receive fast compute grants, as divulged in a Twitter announcement by a guild member. Interested engineers should explore this opportunity for support on cutting-edge initiatives.

Datasette - LLM (@SimonW) Discord

  • AI to Tidy Up Local Model Piles: An individual highlighted the need for an LLM (large language model) to address the issue of managing and cleaning 7B local models scattered across various directories due to the multitude of apps and libraries. Frustration was aired over the organization or lack thereof, suggesting a potential area for tool or algorithm development.

The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (734 messages🔥🔥🔥):

  • Community Explores Full Finetuning with Unsloth: Members initiated a detailed discussion on whether full parameter finetuning is feasible with Unsloth. Despite initial claims that only LoRA (a parameter-efficient training method) is supported, some discovered that setting all parameters except layernorms to trainable seemed to enable a form of near-full finetuning.
  • Optimization for GGUF Files: The Unsloth team announced they are working on fixing issues with llama.cpp and GGUF (Generalized GPU Format) conversions, responding to community members’ difficulties with quantization and loading checkpoint shards.
  • Sentiment Analysis Model Guidance Sought: A member seeking help in creating a sentiment analysis model based on a large database of country-scale reviews received guidance on converting various document types to a proper format for use with LLMs.
  • Assistance Offered for Dataset Formatting and ORPO: Members discussed ways to structure datasets for preference optimization using Unsloth, including strategies for multiple “rejected” responses. The community provided insights and possible solutions to help navigate the process.
  • Unofficial Full Finetuning Tactics Shared: While official support for full finetuning isn’t provided in Unsloth, community members experimented with enabling it by adjusting model parameters manually. A positive note was that losses seemed to improve and memory benefits were still evident compared to Hugging Face implementations.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (20 messages🔥):

  • Phi 3 in Your Browser: A tweet shows someone running Phi 3 inside a web browser, specifically highlighted with “lmao”. The tweet can be found here.
  • LLAMA 3 Discord Channel Nonexistent: A question was raised about the existence of a LLAMA 3 Discord channel, to which a member replied that such a channel does not exist.
  • Crafting New Roles in LLAMA 3: A question regarding the possibility of adding new roles to LLAMA 3 was raised, linking to a GitHub repository. The response suggested a simple replacement using type=code instead of tool_call.
  • Self-Discovery Paper Techniques Applied: A user found it useful to force ChatGPT to memorize 39 reasoning modules from the Self-Discovery paper, recommending its application to complex reasoning tasks. The paper is accessible here.
  • Triton’s Acceleration of LLAMA 3: A blog post from PyTorch showcases TK-GEMM, a tool using Triton FP8 GEMM that optimizes LLAMA 3 on NVIDIA H100 GPUs. The blog, including performance comparisons and technical details, can be viewed here.

Unsloth AI (Daniel Han) ▷ #help (580 messages🔥🔥🔥):

  • GGUF Conversion Issues with Llama 3 Identified: A member highlighted a critical issue where Llama 3 loses fine-tuning data during conversion to GGUF format. The problem seems inherent in GGUF regardless of precision, as tested in FP16 and Q8; discussions with Unsloth and suggestions from the community have yet to resolve it.

  • Lora Adapter Merging Problems: Attempting to merge Lora adapters with GGUF models resulted in the fine-tuning being partially lost. Despite suggestions to use separate Lora adapters with GGUF models, the outcomes did not meet expectations, worsening with combined GGUF and Lora.

  • Inference and Finetuning Solutions for Llama 3 Shared: Users shared their finetuning strategies using the original INSTRUCT model with Llama 3 and appending eos_token after instructions. It was noted that posting to /completion one needs to pass all chat tokens, which some users may have missed, while initiating servers with Llama 3 requires setting --override-kv for the tokenizer.

  • Possible Issues with Llama.cpp for Llama 3: Members are suspecting there may be issues with compatibility between llama.cpp and the newly released Llama 3, given the similarity in problems outlined on the llama.cpp’s issues section.

  • Seeking Help and Following Roadmaps: New users are seeking step-by-step help for finetuning models like Gemma and Llama. More experienced community members pointed to Unsloth’s notebooks for Llama and Gemma, and suggested searching for AI/ML courses and tutorials on platforms like YouTube.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (162 messages🔥🔥):

  • Channel Collaboration Conundrum: A member inquired about creating a channel for collaboration and coding together, specifically for users interested in finding partners to work with overnight or during the weekend. The idea was compared to EleutherAI’s community project channel, and there were suggestions to re-purpose or replace existing channels (like shelving <#1180145007261401178> in favor of a new community projects channel) to foster collaboration.

  • The Hurdles of Specialization: A conversation emerged about the feasibility of specializing a 7B model for complex tasks, such as cryptographic proof generation. Multiple users weighed in, agreeing that such a task may be overly ambitious for a small LLM (7B). It was suggested that although a smaller model can outperform larger ones in highly specialized use cases, they’re typically not on par with larger models like GPT-4 or Claude.

  • Data and Compute Considerations: The discussions also touched upon the importance of data size and quality in LLM training, with a member seeking advice on how to utilize their resources effectively, including 32 H100 GPUs. It was highlighted that model size and data preparation are crucial factors in achieving high performance, and the keys to success are case-dependent.

  • Showcasing and Learning through Community Experience: Drsharma24 expressed a desire to learn from the community’s experiences and build a space for discussing successes and strategies around fine-tuning and model training, similar to Hugging Face’s platform. The conversation underscored that the Unsloth AI community could benefit from such knowledge sharing.

  • Financial Viability vs. Pure Experimentation: The chat touched upon the distinction between developing a business use case versus experimenting and learning from model training. A member suggested that business use cases require training data that adequately reflect production environments, while others emphasized the importance of keeping the end goal in mind.

Link mentioned: Dog Awkward GIF - Dog Awkward Awkward dog - Discover & Share GIFs: Click to view the GIF


Stability.ai (Stable Diffusion) ▷ #general-chat (753 messages🔥🔥🔥):

  • FAQs Disappear from Discord Commands: Users noticed the absence of the /faq command, pondering its removal. It turns out the command was indeed gone, leading to members jesting and realizing its absence only after interacting with a bot.

  • Debating GPU Choices for AI: Participants discussed various GPU options like Nvidia’s 4080 and 3090, AMD’s 7900xtx, considering VRAM size and futureproofing. The release of Nvidia’s 5000 series GPUs was hotly anticipated, prompting users to suggest waiting for the new series instead of investing in soon-to-be-outdated graphics cards.

  • Video to Anime Conversion Inquiry: One member inquired about the time taken by an RTX 4080 to convert a video into anime-style footage, asking for benchmarks regarding video conversions using AI.

  • Opinions Clash on AMD vs. Nvidia for AI: The conversation heated up around whether to choose AMD or Nvidia GPUs for AI tasks. While some argued for the superiority of Nvidia, especially with new technologies like the Blackwell architecture, one user defended AMD based on personal success with the brand.

  • Seeking Solutions for Text and Image Upscaling: Users discussed best paths to add text to images using AI and queried about optimal methods for upscaling images. While tools like Davinci Resolve and Kittl were suggested for text, discussions on image upscaling tools were interspersed with mentions of ComfyUI, a versatile platform for AI image manipulation.

Links mentioned:


CUDA MODE ▷ #general (3 messages):

  • Tackling Gradient Details: A member pointed out that setting create_graph=True might be necessary for obtaining certain gradient details in computations.
  • Clarifying Hessian Confusion: The same member later clarified their thinking, it’s not about the diagonal but rather about calculating the Hessian-vector product with respect to the weights twice.
  • Estimating Hessian Diagonal via Randomness: Another member mentioned seeing a trick in a paper that could estimate the Hessian’s diagonal using randomness combined with the Hessian-vector product.

CUDA MODE ▷ #triton (2 messages):

  • Triton Newcomer’s Gather Procedure Stumbles: A new member faced an IncompatibleTypeErrorImpl when implementing a simple gather procedure in Triton, attempting to copy values from one tensor into another using pointer arithmetic. They later realized the issue involved using the wrong tensor type and noted a potential solution with the newly introduced tl.cast function (PR #3813 on Triton).
  • Kernel Debugging Challenges in PyCharm: The same member struggled with setting breakpoints inside a Triton kernel using PyCharm, despite having set TRITON_INTERPRET to "1" as suggested in the repository documentation, and didn’t succeed with the breakpoint() function either.

Link mentioned: [Frontend] Add tl.cast function. by jlebar · Pull Request #3813 · openai/triton: This resolves an inconsistency in Triton, that every other function on Tensors has an associated free function — i.e. you can do x.foo and tl.foo(x).


CUDA MODE ▷ #cuda (6 messages):

  • tinygrad Gets NVIDIA Open Driver Patch: A member shared a tinygrad patch for multi-GPU support with NVIDIA’s open driver, providing documentation that might be useful to others experiencing similar installation issues.
  • Kernel Module Consideration for Long Term Support: The long-term support for peer-to-peer memory fix on NVIDIA cards was questioned, leading to a discussion about whether creating a kernel module would be a viable solution.
  • Query on Custom CUDA Extension Installation: A member sought advice on the correct way to install a custom PyTorch/CUDA extension within a setup.py file, highlighting issues with the existing method which can be found in their GitHub repository.
  • Sharing Solutions for CUDA Extension Setups in PyTorch: Another member offered help by linking to pull requests that illustrate how custom CUDA extensions are managed within the PyTorch AO library. They provided links to specifics on the setup process and related PRs (PR#135, PR#186, PR#176).

Links mentioned:


CUDA MODE ▷ #torch (43 messages🔥):

  • PyTorch PR Pains: A contributor, kashimoo, expresses frustration with the slow build times of linear algebra components in PyTorch and a separate PR that was reverted due to issues with Meta’s internal builds. chhillee confirms that such setbacks are common due to PyTorch’s “github first” policy and offers to connect kashimoo with more knowledgeable contributors on the Slack channel.

  • Debugging Symbols for PyTorch Development: kashimoo inquires about building specific directories with debugging symbols to facilitate the use of gdb. While chhillee suggests using an available script on the PyTorch development forum, kashimoo thinks it might not be enough for their purposes.

  • Dynamic Compilation Challenges in PyTorch: benjamin_w reports issues when using dynamic=True with torch.compile(...) in conjunction with Distributed Data Parallel (DDP) in PyTorch 2.3. While the approach worked in PyTorch 2.2.2, it appears to lead to recompilation for each batch in version 2.3. marksaroufim advises against using dynamic=True and suggests manually marking sequence lengths as dynamic instead.

  • Improving Issue Triage on CUDA MODE Discord: marksaroufim and others discuss ways to handle the growing number of issues on the server, proposing the idea of a bot that parses and automatically files issues on GitHub, with jamesmel offering to implement the bot. It’s decided to open issues in cuda mode for now to manage the influx.

  • Torch Compile Optimization for Variable Lengths: Troubleshooting continues as benjamin_w struggles with ConstraintViolationError when using torch._dynamo.mark_dynamic(inputs, index=1) on PyTorch 2.2 & 2.3 for dynamic sequence lengths. They prefer persistent model compilation over multiple batches, but encounter brittle behavior. marksaroufim suggests that creating a GitHub issue would be best for resolving the issue.

Links mentioned:


CUDA MODE ▷ #algorithms (5 messages):

  • Kudos on Effort Project: A member praised the Effort project on GitHub, finding it to be quite astonishing.
  • Matrix Multiplication Confusion: A mistake was highlighted in a matrix multiplication example, pointing out that the inner dimensions of a 3 x 1 and 3 x 3 matrix do not align for the operation.
  • Quick Correction Promised: The author acknowledged the mix-up regarding vector orientations and expressed intent to correct it, noting a similar mistake had previously been flagged.

CUDA MODE ▷ #cool-links (4 messages):

  • Averting Catastrophic Forgetting: A member found Ziming Liu’s tweet interesting for demonstrating how to avoid catastrophic forgetting in a toy test-case.
  • In Search of Speed: It was noted that the solution to catastrophic forgetting is “currently very slow,” leading to curiosity about potential methods to increase its speed.

CUDA MODE ▷ #torchao (2 messages):

  • FP6 Support Candidate for Custom CUDA Extension: A new candidate for a custom CUDA extension has been identified – the FP6 support, following a GitHub issue discussion on PyTorch’s AO repository. An offer to help anyone interested in contributing to this extension was extended.

  • Community Member Shows Interest in FP6: Despite lacking experience, one community member has expressed enthusiasm to contribute to the new FP6 support project and is currently endeavoring to understand the relevant research paper to determine where they could realistically contribute.

Links mentioned:

  • pyto - Overview: pyto has 2 repositories available. Follow their code on GitHub.
  • FP6 dtype! · Issue #208 · pytorch/ao: 🚀 The feature, motivation and pitch https://arxiv.org/abs/2401.14112 I think you guys are really going to like this. The deepspeed developers introduce FP6 datatype on cards without fp8 support, wh.....

CUDA MODE ▷ #off-topic (9 messages🔥):

  • Seeking Karpathy’s Video Setup Advice: A member asked for recommendations to achieve a video setup akin to Andrej Karpathy’s, with live screenshare and a small camera view. They were linked to a YouTube video by Karpathy as a reference point.

  • OBS Streamlabs: A Go-to for Video Production: In response to an inquiry about simple video set-up, OBS Streamlabs was suggested. The community member mentioned there are plenty of tutorials available for this versatile tool.

  • Enhancing Video Quality with iPhone & Mount: For better video calls or recordings, it was recommended to use an iPhone with a Mac for superior camera and mic quality over typical laptop equipment, citing a KDD Webcam Stand as a useful accessory.

  • Anime Appreciation Break: A member expressed their curiosity about anime preferences, leading to a brief exchange where favorites like Naruto, One Punch Man, Berserk, and Jujutsu Kaisen (JJK) were cited for their high-quality animations and captivating fight scenes.

Link mentioned: Let’s build the GPT Tokenizer: The Tokenizer is a necessary and pervasive component of Large Language Models (LLMs), where it translates between strings and tokens (text chunks). Tokenizer…


CUDA MODE ▷ #triton-puzzles (1 messages):

srush1301: Hmm, yeah this description is wrong. I will update with a clearer version


CUDA MODE ▷ #hqq (4 messages):

  • GreenBitAI Introduces a New Toolkit: A member shared a link to GreenBitAI’s toolkit for fine-tuning, inferencing, and evaluating Large Language Models (LLMs), describing it as more of an ML framework augmenting PyTorch compared to bitblas, which is focused on matrix multiplication operations.
  • BitBlas Offers a Promising Kernel for Inference: A toolkit named BitBlas was mentioned to have a fast gemv kernel for 2-bit operations, which could be beneficial for inference, although it has not been tried yet by the member.
  • Binary Matmul in GreenBitAI’s Engine: The discussion continues with mention of GreenBitAI’s cutlass kernels, especially one that performs binary matrix multiplication, which is a part of their toolkit enhancing PyTorch.
  • Innovative Gradients Calculation Noted in GreenBitAI Toolkit: A member highlighted that GreenBitAI’s toolkit includes code that calculates the gradients of weights during training, as seen in their q4_layer.py file, and they expressed curiosity regarding the potential VRAM usage since the gradients are not packed.

Links mentioned:


CUDA MODE ▷ #llmdotc (644 messages🔥🔥🔥):

  • CUDA and Memory Optimization Discussions: The team achieved 167K tokens/second, outperforming PyTorch’s 150K tok/s, by optimizing CUDA kernels and introducing changes like CUDA streams and fused classifiers. They’re discussing the impact of bias kernel optimizations and potential next steps for further gains. See the related discussion and pull request.

  • Scratch Buffers and Atom Variables: They’ve introduced scratch buffers to handle atom variables more efficiently. The usage of fp32 atomics on a scratch buffer and then a read and round/write to bf16 is suggested to avoid slow fp32 atomics in global memory.

  • Profiling Script Updates: Updates have been made to the profiling script, improving robustness against CUDA library updates and separating NVIDIA kernel times from llm.c kernel times. The script changes are tracked in this pull request.

  • PyTorch Padding: There is a debate on whether padding PyTorch’s vocabulary size is fair for performance comparison, with acknowledgment that it’s not straightforward and involves ensuring that padded dimensions are not used in the loss or during sampling.

  • Layernorm and Residual Calculations: The conversation touched upon saving variance and mean of layernorm in fp32 for stability and performance benefits, although it hasn’t been implemented in llm.c due to code simplicity and the bf16 type used for activations.

Links mentioned:


CUDA MODE ▷ #oneapi (1 messages):

neurondeep: also added intel on pytorch webpage


LM Studio ▷ #💬-general (350 messages🔥🔥):

  • Llama.cpp Integration Issues and Solutions: Members discuss issues integrating llama.cpp with LM Studio. Conversations involve the need for certain file versions and the use of the convert-hf-to-gguf script, with one member facing a FileNotFoundError due to missing config.json and resolving it by redownloading files through huggingface-cli. Subsequent issues with conversion and usage are tackled collaboratively.

  • Rolling Back to Previous LM Studio Versions: Users experience bugs in the LM Studio version 0.2.22 where the Chat provides the entire context, not just the response. After several attempts to resolve and rollback to version 0.2.21, the issue is eventually fixed in the latest update, confirmed by multiple users.

  • Launch of LM Studio In Terminal (lms) Tool: Discussion on the new lms tool accompanies its release alongside LM Studio 0.2.22, explaining its utility in automating tasks, starting an API server, and managing a model without UI interaction. Subsequent conversation clarifies that lms is a controller for the app, not a standalone tool.

  • Running LM Studio Headless Mode: Several users discuss and attempt various methods to run LM Studio in a headless mode, using commands like xvfb-run to bypass GUI requirements. The conversation concludes that official headless support is not yet available, despite community workarounds.

  • Embedding LMS in Scalable Server Solutions: Members express positivity towards the potential of embedding LM Studio in a high-availability server pattern across clusters, inquiring about configurations using specific presets via CLI or the UI, suggesting future feature enhancements.

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (159 messages🔥🔥):

  • Quest for Quality Story Writing: A user is seeking help to create iQuant versions of Goliath 120B Longlora on their PC for high-quality story writing, requiring a minimum of 8K context for usability; they’ve offered Humblebundle Steam games as a reward for assistance. They highlighted the need for higher quality beyond what models like LLAMA 3 8B can provide, sharing their system prompt details located on Google Docs.

  • Model Recall Experimentation: Several chats involve users testing the recall ability of various models, particularly LLAMA 3’s ability to recall Bible verses. A user has set up a GitHub repository for a “bible recall benchmark” and observed a very low recall rate across the entire Bible.

  • Exploring the Cognitive Horizons: Users have been experimenting with models to see how they recall extensive texts like the Bible and discussing creating instances that can communicate between each other for better outcomes. One user proposed using “agents” that could optimize narrative quality, citing a YouTube video for reference.

  • Template Troubles and Quirky Answers: A user experimenting with a new release, ChatQA 1.5, reports oddities in response templates leading to bizarre responses, even when applying suggested changes such as adding spaces or newlines to the chat template.

  • In Quest of Unrestricted Coding Model: A user inquires about a good small model in the 2B parameter for coding applications with minimal censorship, but no suggestions have been provided in the discussion. Another user is looking for a model to read documents and PDFs, with Command-R by Cohere being suggested for document understanding tasks, albeit with concerns regarding its hardware requirements.

Links mentioned:


LM Studio ▷ #announcements (2 messages):

  • LM Studio Introduces Companion CLI ‘lms’: LM Studio has rolled out a new command-line interface, lms, to ease the load/unload of LLMs and management of local servers. Community members can install the CLI with npx lmstudio install-cli and contribute to its MIT licensed source code at GitHub - lmstudio-ai/lms.

  • LM Studio 0.2.22 Bug Fix Released: A bug affecting model responses by inadvertently including the entire context within them has been fixed in LM Studio 0.2.22. Users encountering this issue can download the updated version from lmstudio.ai.

Links mentioned:


LM Studio ▷ #⚙-configs-discussion (8 messages🔥):

  • Quick Fix for Llama and Phi-3 Configs: If you delete your configs folder and relaunch the app, it will repopulate with the default configurations. Backing up any important config files first is advised.

  • WSL Woes with LM Studio: Trying to connect to LM Studio through Windows Subsystem for Linux (WSL) can fail if using localhost addresses, since 127.0.0.1 accesses the local loop of the VM. ipconfig can help find the correct IP to use.

  • Passing Through Ports for Windows-WSL Communication: A member suggested the use of a reverse proxy or port proxy with netsh interface portproxy add v4tov4 command to communicate between Windows and WSL for LM Studio. No additional layers of complexity with listen addresses are necessary according to another member.


LM Studio ▷ #🎛-hardware-discussion (4 messages):

  • Seeking the Link to a VRAM Fix: A member mentioned a fix that should be linked, claiming that “it really does work” as a better solution than disabling iGPU in BIOS.
  • GPU Hunt for VRAM: A user inquired about a “cheapish low profile, lowish power 12 GB + GDDR6 GPU” for a second PCI-E slot aimed specifically at utilizing the VRAM.
  • RTX 3060 as a VRAM Solution: In response to a query about a GPU for VRAM usage, another member suggested considering the Nvidia RTX 3060.

LM Studio ▷ #🧪-beta-releases-chat (18 messages🔥):

  • Multi Model Session Context Window Confusion: A member discussed an issue with the Multi Model Session feature where they were unable to change the context window size, defaulting to 2048, and experienced timeouts when requests queued up. They indicated that the tool entered a ‘Rolling Window’ mode and generated irrelevant responses.

  • Ubuntu Users Get Running Tips: In response to a question about running the tool on Ubuntu, a simple instruction set was given: download the Appimage, make it executable, and run the application.

  • Docker Enthusiasts Can Go Headless: An improvement to the software now allows it to run headlessly, which a member noted could enable them to finally create a working Docker image for testing.

  • Configurations and CLI Questions Addressed: Members inquired about persisting settings like GPU offload and CORS through the CLI, prompting another to clarify that model configurations in the “My Models” page default but can be overridden in CLI/SDK per field.

  • Possible Bug Identified in GPU Layer Configuration: An issue was reported regarding a config preset for GPU layers being overridden when loading models via CLI. It was suggested to open a GitHub issue to address this, and a link to the configurations schema was provided for reference on available parameters.

Links mentioned:


LM Studio ▷ #amd-rocm-tech-preview (32 messages🔥):

  • LM Studio CLI Launches for ROCm: LM Studio introduces lms, a new CLI for managing LLMs and running the local server on AMD ROCm Preview Beta, now open source on GitHub. Users can download the latest LM Studio 0.2.22 ROCm Preview to utilize lms, which comes with the additional benefit of having OpenCl pre-packaged for new users.

  • Prompt in API Response Bug Acknowledged: A user noted that the prompt is included in the API response, a known issue in the latest build. The LM Studio team rapidly acknowledged and confirmed an imminent fix was pushed live, users have verified it.

  • Large Context Size Exploration: A participant tested the RAM scaling versus context size by attempting a context of 131072 tokens with Phi 3, but it failed. However, they could successfully run a context size of 60000 tokens with 32 GB RAM on a 7900XTX GPU.

  • Seeking Clarification on Embedding Model Issue: User reported an issue when trying to load the embedding model in the new release. An immediate fix was released by LM Studio, and the user confirmed the solution worked after re-downloading from LM Studio ROCm download page.

  • Discussion on Linux Support for ROCm: Participants are discussing running ROCm on Linux, with one sharing their experience of using ROCm on Mesa’s opencl implementation and hoping for a Linux-supported ROCm build, while another suggested using lm-studio to download models for local llama.cpp build could be a workaround.

Links mentioned:


LM Studio ▷ #model-announcements (1 messages):


LM Studio ▷ #🛠-dev-chat (69 messages🔥🔥):

  • CLI Companion for LM Studio Introduced: LM Studio’s new CLI tool, lms, has been released to facilitate loading LLMs, starting/stopping servers, and debugging. Users can install it directly and it requires LM Studio 0.2.22 or newer.

  • Headless Tutorial for Running LM Studio: A member shared a poorly written hacky headless tutorial for running LM Studio headlessly, which includes instructions using xvfb to emulate X11 session and bootstrapping lms. Another member confirmed getting it to work on Ubuntu Server after some troubleshooting.

  • Resolving App Exit Issues: There were several messages focused on addressing an issue where the LM Studio app exited upon a command, with discussions about troubleshooting steps such as using ctrl+z then bg, disown -ah, and --no-sandbox flags.

  • Scripting to Streamline Installations: A member has expressed intentions to create a script that will automate headless installations of LM Studio, allowing for a more straightforward setup with one command.

  • Progress Towards Dockerization: A member expressed excitement over being able to create a Docker container for LM Studio, which would ease running and testing models on servers, following the successful headless installation tutorial.

Links mentioned:


Perplexity AI ▷ #announcements (1 messages):

  • Beta Tester Recruitment for Pages Concluded: The recruitment for beta testers for Pages has met the desired number of participants. The team expressed gratitude and advised everyone to stay tuned for further updates on the development of Pages.

Perplexity AI ▷ #general (308 messages🔥🔥):

  • Technical Difficulties with Perplexity: Users have reported issues with Perplexity not functioning correctly on Safari and Brave browsers, such as not being able to send prompts or register due to unresponsive buttons. Others experience persistent sourcing from previously uploaded files during conversations, mistakenly retrieving data from earlier requests.

  • Subscription and Payment Inquiry: A user inquired about obtaining a refund for an unwanted monthly subscription charge and was advised to contact [email protected] for assistance.

  • Feature Requests and Feedback for Perplexity’s Tools: Members have expressed a desire for improved functionality with voice commands and the continuation of certain features, suggesting enhancements like avoiding premature command termination and enabling continuous listening.

  • Usage and Model Limits Discussed: There is confusion regarding usage limits for different models and tools within Perplexity, with some users unsure about daily query allowances, and others debating the comparative capabilities of different AI models like Gemini 1.5 Pro, Claude Opus, and GPT-4 Turbo.

  • Anticipation for Future AI Developments and Competitor Platforms: The community anticipates new AI models such as the rumored “GPT-5” and potential upcoming Perplexity competitors from OpenAI. Additionally, there are discussions on the distinctions between search engines and knowledge engines, with speculations about how these tech advancements might evolve and integrate with existing platforms.

Links mentioned:


Perplexity AI ▷ #sharing (22 messages🔥):


Perplexity AI ▷ #pplx-api (41 messages🔥):

  • Sonar Large Availability and Typo Clarification: Sonar Large is available for use via the API, and the model cards documentation shows it listed with a 32k context length. A confusion about the parameter count led to a clarification that Sonar Large is a 70B model, contrary to a typo suggesting it’s 8x7B.

  • Prompt Precision Leads to Better Results: Members noted improved outcomes when using precise terms in prompts, such as specifying https:// in front of URLs. One user’s experience with llama-3-sonar-large-32k-online yielded better results after adjusting prompts to generate markdown lists of competitors.

  • API Client Experiences Variable Results: Even after adjusting prompts, a user reported inconsistent results from the API, which sometimes provided correct competitors and at other times failed. Tweaking AI model settings and prompt optimization was suggested as a solution.

  • Model Transition Guidance Sought: Queries were raised regarding the need to transition from sonar-medium-online to newer models. Advice received suggested trying llama-3-sonar-small-32k-online for better accuracy, with a clear indication that an update to the new models would eventually be necessary.

  • Adjusting AI Parameters to Improve Responses: To improve accuracy, a user tested different frequency_penalty, temperature, and top_p settings, finding that changes to these parameters influenced the relevance and correctness of the AI’s responses.

Links mentioned:


Nous Research AI ▷ #off-topic (15 messages🔥):

  • Exploring Retrocausality and Morality: The concept of moral non-commutativity in retrocausality was discussed, highlighting the psychological perspective where patients do not distinguish between cause and consequence in moral actions, impacting the integrity of an observer’s moral framework.
  • Seeking Llamacpp Guidance: A member asked for a beginner’s guide to llamacpp after experiencing issues with the model generating nonsensical outputs and a website automatically writing a C function.
  • Using Llama on CPU: It was suggested to use ollama as a backend for llamacpp to avoid directly dealing with C, and a discussion touched on the advances in running large language models like LLMs on CPUs utilizing techniques like quantization and pruning.
  • Waiting for lmstudio Approval: One member expressed frustrations about not being able to use their laptop for model-related tasks due to waiting for approval from lmstudio.
  • Saint Petersburg Transformation Over Time: Photos of Saint Petersburg’s Ligovsky Avenue at Vosstaniya Square from 2002 versus 2024 prompted a joke about the improved color accuracy in cameras.

Nous Research AI ▷ #interesting-links (19 messages🔥):

  • Proprietary Intrigue: A proprietary Twitter post sparked interest, but the details remain undisclosed, no further information provided.
  • Haystack Goes Embedded: The GitHub repository for haystack-embedded by carsonpo was highlighted, an open-source contribution for embedded machine learning development accessible here.
  • Excitement Over WildChat Dataset: The allenai WildChat dataset has generated conversation, however, access requires agreement to the AI2 ImpACT License. The dataset seems to feature “long multiturn convos” and is housed on Hugging Face’s platform, with anticipation for a new version indicated by a URL to WildChat-1M.
  • Mixed Messages on Dataset Release: Discussion centered around whether the new WildChat dataset had been open-sourced, with confirmation seen via a link on an arXiv abstract.
  • Preference for OPUS in Long Conversations: A member mentioned a preference for the OPUS model versus others when dealing with long conversational contexts, suggesting better performance after “10/20k of prompting.”

Links mentioned:


Nous Research AI ▷ #general (104 messages🔥🔥):

  • Hermes Upgrade Unveiled: Nous has released Hermes 2 Pro with LLaMA weights, boasting capabilities in good QA, Function Calling and JSON Mode with vision multimodal. Models and test code available on Hugging Face.

  • Enabling Advanced Function Calling: BLOC97 explained that an LLM with Function Calling is aware of external function/tool calls to validate answers instead of simulating answers, and teknium shared a GitHub repo with examples of tunes specific to function calling for Hermes.

  • Function Call Dataset Insights: Glaive function-calling dataset V2 was shared to showcase the structure of a data set used for training a model with a function-calling feature. The conversations surrounding the use of these datasets emphasized their potential for advanced LLM applications.

  • The Impact of llcpp on Model Performance: Diabolic6045 experienced exceptional inference speeds when using Hermes 2 Pro with llama.cpp on an Android device with 8GB RAM, highlighting the efficiency of the technology.

  • Leveraging Hermes 2 Pro with CrewAI and LocalAI: .interstellarninja provided solutions for using Hermes 2 Pro function-calling with CrewAI by sharing a Jupyter notebook. They also pointed to the LocalAI API supporting function-calling with OpenAI API tool calls format detailed in their repository.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (45 messages🔥):

  • ChatML Configurations Revealed: Members were discussing the modifications needed to enable ChatML, mentioning token replacement and adjustments in the model configuration such as replacing EOS with `

Links mentioned:


Nous Research AI ▷ #bittensor-finetune-subnet (1 messages):

  • Enthusiasm for LLM Finetuning: A new member expressed keen interest in finetuning a large language model (LLM) before becoming a miner. They sought advice on how to find datasets suitable for this purpose and the kind of data required for effective finetuning.

Nous Research AI ▷ #rag-dataset (1 messages):

felixultimaforeverromanempire: anyone know fo good free generic data sets?


Nous Research AI ▷ #world-sim (86 messages🔥🔥):

  • Iron Age Update in World-sim: A member mentioned being on world 11 in the game where an Iron Age update was recently implemented.
  • Gaming Nostalgia with "Spore": A member reminisced about spending over 100 hours playing the game "Spore."
  • Anticipation for Upcoming Updates and Celebrations: A member expressed excitement for something coming this weekend and shared that they will turn 18, marking it as a significant birthday.
  • Discussion on AI and Consciousness: Members expressed admiration for a speech by Joscha on consciousness, citing its profound impact, and shared related [YouTube](https://www.youtube.com/watch?v=abWnhmZIL3w) videos on the topic.
  • New Discord Role for Worldsim Updates: A new role was created to tag members for smaller worldsim/worldclient related information, with several members requesting to be added to it, which can be obtained via the channel.

Links mentioned:

  • 37C3 - Synthetic Sentience: https://media.ccc.de/v/37c3-12167-synthetic_sentienceCan Artificial Intelligence become conscious?Despite the rapid progress of AI capabilities, the core que...
  • Cyber Animism by Joscha Bach: This is a 1 hour 45 minute talk by Joscha Bach (http://bach.ai/) given in our Center.
  • World Simulation Talks @ AGI House SF: 0:00 Conversation1:31 Kickoff by Jeremy Nixon6:08 Karan Malhotra of Nous Research26:22 Rob Hasfield: CEO of Websim1:00:08 Ivan Vendrov of Midjourney [Real ti...

Modular (Mojo 🔥) ▷ #general (99 messages🔥🔥):

  • Mojo Joins the Language Race: A YouTube video featuring Chris Lattner, discussing “Mojo Lang - Tomorrow’s High Performance Python?” was shared, highlighting the new language’s attempt to integrate the best programming techniques from CPU/GPU development.
  • Learning Mojo with a Python Background: Discussion centers around the relationship between Python and the new Mojo language, with members noting that while Mojo has similarities to Python and can use Python objects directly, due to strong type checking and other systems-programming features, there are significant differences. Mojo documentation is recommended for those looking to understand Mojo’s unique features.
  • Open Source Contributions and Guidance: Members are encouraged to contribute to the open-source Mojo standard library, with links to the GitHub contributing guide and a Modular blog post offering step-by-step instructions for potential contributors.
  • Discussions on Development Coordination: There is an ongoing dialogue about how best to manage contributions and avoid duplication of effort on GitHub issues. One proposal includes the use of a PR template to help link issues and PRs effectively.
  • Assessment of Mojo’s Advantages: Conversations delve into what sets Mojo apart, such as performance, predictability, and portability features. It is also mentioned that Mojo’s build system automatically autotunes for optimal performance on different hardware, as demonstrated by a Jeremy Howard’s video on autotuning.

Links mentioned:


Modular (Mojo 🔥) ▷ #💬︱twitter (3 messages):

  • Modular’s Latest Tweets: Modular shared a tweet, accessible via this link, but the content of the tweet was not discussed.
  • Another Tweet from Modular: A second tweet was shared by Modular, which can be found here, though no further details or discussion points about the tweet were provided.
  • Modular Tweets Again: Modular posted another tweet, which can be seen at this link. There was no accompanying conversation or explanation of its significance in the chat.

Modular (Mojo 🔥) ▷ #✍︱blog (2 messages):

  • Modular Celebrates Community Contributions in Mojo 24.3: Mojo 🔥 24.3 has been released with significant community involvement after the open-sourcing of Mojo’s standard library. The update boasts contributions that bolster the platform’s capabilities, with special thanks to contributors like @LJ-9801, @mikowals, and others listed in the release notes.

  • Unveiling MAX 24.3 with Engine Extensibility: The MAX 24.3 update features the new MAX Engine Extensibility API, enhancing the ability for developers to build and run AI pipelines efficiently. This version offers improved integration for PyTorch, ONNX, and Mojo models, as well as a range of performance optimizations for diverse hardware through the MAX Graph APIs.

Links mentioned:


Modular (Mojo 🔥) ▷ #announcements (1 messages):

  • MAX ⚡️ and Mojo 🔥 Release 24.3 Goes Live: Release 24.3 is now available, including the latest versions of MAX and Mojo. The installation commands are provided, and the release can be accessed via a simple curl script and Modular CLI commands.
  • Celebrating One Year of Mojo 🔥: This update marks the first anniversary of Mojo, with gratitude extended to the community for their contributions to the release.
  • Launch Blog and Extensibility Features Explained: Interested users can read about the launch on the official blog post and learn about the new MAX extensibility features in a dedicated blog post.
  • Community Contributions Recognized: The changelog mentions 32 significant changes, fixes, and features contributed by the community, highlighting the collaborative efforts in the development process.

Modular (Mojo 🔥) ▷ #ai (4 messages):

  • The Challenge of Simulating Consciousness: The discussion touched on the complexity of simulating consciousness, suggesting that it not only requires scientific understanding but also philosophical insights. It was proposed that starting with simpler organisms could be the key, as their brains might be easier to map and replicate in code.

  • Hoffman’s Work Inspires Future Academia: One member expressed their plans to transfer to UCI to be closer to the work of Donald Hoffman, a professor who is actively working on mapping conscious experiences. This aligns with the view of functionalism, where simulating brain functions might be more feasible than replicating the brain entirely.

  • Aspiring to Explore Consciousness: Another member shared their goal of working on the simulation of consciousness, resonating with the previous discussion on the subject.


Modular (Mojo 🔥) ▷ #tech-news (2 messages):

  • CHERI Blossoming into Daily Use: Chats highlight that the Capability Hardware Enhanced RISC Instructions (CHERI) offers considerable promise in improving hardware security, with potential to nullify 70% of current vulnerability exploits. The discussion was spurred by a recent conference playlist that delves into the advancements within the CHERI ecosystem.

  • A Paradigm Shift in Software Development: With the adoption of CHERI, software development could see a seismic shift, as processes could become orders of magnitude faster, resulting in efficient UNIX-style programming with high performance. This potentiality is discussed in the context of CHERI facilitating lightning-fast IPC and the inherent benefits of such capabilities.

  • Sandboxes Entering the Fast Lane: The conversation moved towards how CHERI’s scalable compartmentalization could fundamentally change environments that utilize sandboxes, impacting web browsers, virtual machines, and even edge computing. A YouTube video was referenced, illustrating this transformative tech.

  • Potential Redundancy of Traditional Security Measures: Speculation abounds that with the rise of CHERI, traditional hardware security like MMU-based memory protection or address space layout randomization might become obsolete, thereby simplifying hardware design and enhancing software speed.

  • Microkernels Could Take Center Stage: One member pondered if CHERI could precipitate a revolution in OS development, where the traditionally high cost of IPC in microkernels is countered, making them a potentially dominant architecture.

Links mentioned:


Modular (Mojo 🔥) ▷ #🔥mojo (137 messages🔥🔥):

  • Mojo Reference Semantics Still in Flux: Mojo’s semantics for references and lifetimes are being actively designed to offer simpler yet flexible structures than the existing prototype. A design will be shared publicly, and there is an ongoing debate on whether the semantics of reference and lifetime is nearly complete or will continue to have layers added on top.
  • Crash Reports and Bug Tracking: Discussions point to a crash report and a bug related to struct lifetime that requires attention. Concerns are raised over compiler crashes that should instead provide meaningful error messages.
  • InlineArray Intrigues and Issues: InlineArray is not yet on stable build; despite its utility, there are known quirks with large arrays and related GitHub issues indicate it’s awaiting more stability. The feature is implemented in utils.InlineArray.
  • Debating Mojo’s GPU Support: Mojo is anticipated to soon have GPU support, starting with Nvidia, leveraging MLIR for versatility across platforms. Meanwhile, discussions clarify that it’s a multi-year effort for existing languages to move from LLVM to MLIR, making Mojo’s inherent MLIR integration special.
  • Interest in Snap Package and I/O Functions: There is a request for an official Snap package on the Snap Store for Ubuntu and discussion on the current state of Mojo’s I/O module being basic, necessitating imports from Python for simple user input functionality like reading from stdin.

Links mentioned:


Modular (Mojo 🔥) ▷ #community-projects (3 messages):

  • Prism CLI Tool Gets Feature Boost: The prism library has been updated with persistent flags and hooks, flag requirements, and flag groups. The README has been overhauled with code samples and animated gifs to demonstrate the new features. Check out the updates on GitHub.

  • Mojo-pytest Now Supports v24.3: The mojo-pytest plugin has been updated to work with Mojo version 24.3, with an open issue aiming to enhance integration for better debug information. Progress can be tracked on this enhancement at Issue #9 on its GitHub repository.

  • NuMojo Outpaces Numpy and Numba: The NuMojo project, previously known as Mojo-Arrays, is undergoing active development and now supports Mojo version 24.3. NuMojo is significantly outperforming NumPy and is also faster than Numba, focusing on expanding the standard library tensor functionality.

Links mentioned:


Modular (Mojo 🔥) ▷ #community-blogs-vids (3 messages):

  • PyCon Lithuania Talk on MAX: A new YouTube video from PyCon Lithuania discusses MAX, but the title and description are currently undefined.
  • Tutorial on Building Apps with Mojo: A new GitHub-based tutorial titled “Let’s mojo build -D your own -D version=1 app” is available, teaching how to create or integrate workflows with the Mojo language. The tutorial can be found here.
  • Syntax Highlighting Tip for Mojo Tutorial: A suggestion was made to use triple quotes with ‘mojo’ instead of ‘python’ in a markdown file for proper syntax highlighting when documenting Mojo code.

Links mentioned:


Modular (Mojo 🔥) ▷ #performance-and-benchmarks (1 messages):

soracc: Good idea


Modular (Mojo 🔥) ▷ #📰︱newsletter (1 messages):

Zapier: Modverse Weekly - Issue 32 https://www.modular.com/newsletters/modverse-weekly-32


Modular (Mojo 🔥) ▷ #nightly (10 messages🔥):

  • Mojo Compiler Language Changes Alert: The 24.3 changelog includes new information on __source_location() and __call_location(), which are detailed at Modular Docs Changlog. It seems they require @always-inline functions for full functionality.
  • Nightly Mojo Compiler Dropped: A new nightly Mojo compiler release is announced, which can be updated using modular update nightly/mojo. See what’s changed and review the changes from the last stable release.
  • Docstrings Length Discussion: There’s a conversation about whether docstrings can exceed 80 columns, with a suggestion to consider relaxing this requirement, especially for the standard library.
  • Boost in Nightly Releases Frequency: Nightly releases of the Mojo compiler are becoming more frequent, with an expectation set for daily updates soon, pending internal infrastructure improvements.

Links mentioned:


OpenAI ▷ #ai-discussions (224 messages🔥🔥):

  • The Ephemeral AI Job Market: Members debated the highest-paying jobs in AI, suggesting that the most sought-after positions are constantly evolving. Some joked that the most lucrative AI careers might be becoming a CEO or dentist.

  • Predicting the Price of Future GPT Versions: Discussions emerged over the possibility of a separate pricing tier for the hypothetical GPT-5, with opinions varying on whether OpenAI would introduce regional pricing or maintain a consolidated price model.

  • Copycat UI Raises Eyebrows: Comments arose about the new HuggingChat UI closely resembling existing AI chat services, with some implying it could be a game-changer for providing consumer-facing products and fostering an open-source AI community.

  • AI’s Existential Debate: A thorough discussion took place on the nature of AI’s growth, human uniqueness, generative abilities, and the blending of hallucination with reality. Concerns were raised about AI overconfidence and its capacity for misleading information.

  • The Transparency of AI Research and Open Source Misconceptions: A series of messages clarified that while OpenAI’s research papers are publicly available, expecting the organization to release fully trained models is unrealistic given their proprietary nature and the computational resources required to run them.


OpenAI ▷ #gpt-4-discussions (23 messages🔥):

  • Nostalgic Throwback to GPT-3 and Codex: Members shared moments of nostalgia, reminiscing about earlier access to GPT-3 and Codex, indicating a continuing interest and appreciation for previous models.
  • Wondering About Voice Chat Rooms: One member inquired about the lack of voice chat rooms within the Discord, and it was clarified that such features are absent due to the challenges of moderation.
  • Confusion Over Chatbot Memory Integration: A user queried about the possibility of integrating the new memory feature with their chatbot in the API, seeking guidance on implementation.
  • Inquiries About GPT-4’s Response Times: Members discussed that GPT-4 appears to be roughly two times slower than its predecessor GPT-3.5, with recent reports of unusual latency and gpt4 turbo being 5-10 times slower than usual.
  • ChatGPT Access Issues and Rate Limits: Users reported issues accessing GPT, reaching out for help, and questioning the message rate limits. Suggestions for checking OpenAI’s service status and experiences of unexpected timeouts were mentioned, indicating a fluctuating rationing system potentially due to high demand.

OpenAI ▷ #prompt-engineering (3 messages):

  • Retrieval Challenges in Large Language Models: A member pointed out that mitigating retrieval issues in large language models (LLMs) isn’t possible in the way one might hope. They referred to a search term “LLM Retrieval Needle In A Hay Stack” for more in-depth understanding and emphasized that foundation model’s retrieval limits can’t be bypassed with algorithms.
  • Python Tool for Word Occurrence: In the context of handling large texts, it was mentioned that there is a Python solution capable of counting unique occurrences of words. This technique could potentially be useful for data analysis and preprocessing tasks.

OpenAI ▷ #api-discussions (3 messages):

  • Limits of LLM Retrieval Addressed: A member mentioned a search term “LLM Retrieval Needle In A Hay Stack” to indicate that it’s not possible to overcome the foundation model’s retrieval limits with any algorithm.
  • Python Script for Word Counting Shared: Another message pointed out the availability of Python solutions for counting unique word occurrences in large texts.

HuggingFace ▷ #announcements (2 messages):

  • Community Highlights Sparkle with Innovations: New user contributions shine with the unveiling of Moondream 2 for batch processing, FluentlyXL v4, the Portuguese translation of HF Audio course’s Chapter 0 + 1, BLIP finetune for extended captions, and what appears to be a list of community highlights in Portuguese.

  • BLOOM Chat Speaks Multilingually: A new multilingual chat supports conversations in 55 languages, while an Inpainting sketch pad unlocks creativity, and a task from HF’s alignment handbook can now be run in the cloud.

  • Hot Off the Press: Cool AI Developments: AI enthusiasts get treated to a guide on protein optimization with AI, a model NorskGPT-Mistral-7B, the basics of implementing a vision language model from scratch, and insights into Google Search with LLMs. Enthusiasts can also explore Token Merging for LLMs, and expand their knowledge on Model Context and Chat Models in this blog post.

  • HF Dives Deep Into Model Interpretability: New insights into the interpretability plus an in-depth analysis of LLMs is now available for AI enthusiasts to learn from.

  • AutoTrain Now Open to All Through Configs: Demonstrating the potential of AutoTrain, users can now train models with YAML config files available in the autotrain-advanced GitHub repo, and are encouraged to contribute by creating a pull request. The ease of use makes it possible for individuals with minimal machine learning knowledge to train state-of-the-art models without code, as announced on Twitter.

Links mentioned:


HuggingFace ▷ #general (163 messages🔥🔥):

  • Voice Synthesis Models Discussed: Members exchanged recommendations for voice synthesis models, such as Xtts v2 and Voice Craft, mentioning their performance and unique features like speech editing. Links to demos were shared for Xtts and Voice Craft, with a member noting Voice Craft’s capabilities in zero-shot text-to-speech.
  • Model Conversion and Fine-Tuning Challenges: Challenges were discussed around converting transformer models to smaller formats, with specific issues mentioned like a model being larger than 2GB and causing errors. Strategies were also discussed for fine-tuning smaller datasets, considering the effectiveness of RAG (Retrieval-Augmented Generation) as an alternative to fine-tuning with limited data.
  • Using LLM Models and Hosting: Questions were raised about deploying large language models (LLMs) in production, with Vllm and TGI suggested as potential frameworks for running LLMs in production environments. The availability and usage of Llama3 were discussed, with recommendations to try services like Groq for free API access.
  • Bot and Parquet Converter Request: Users expressed the need for a parquet converter-bot for dataset conversion and inquired about the status of a “Dev mode,” suggesting the possibility of maintenance or downtime.
  • Prompt Refinement and Evaluation Inquiry: A user inquired about metrics for evaluating the quality of refined prompts, looking for specific metrics tailored to prompt assessment, without follow-up discussions offering specific solutions or metrics.

Links mentioned:


HuggingFace ▷ #today-im-learning (9 messages🔥):

  • Quest for Query Refinement: A member is seeking assistance to rephrase a follow-up question (q2) in the pharma domain by incorporating all the details from an initial query (q1).
  • Ray Deployment Inquiry Remains Open: A user asked for help with deploying HuggingFace models on Ray, indicating a shared interest among community members.
  • Training Loop Customization Debate: A comment was made advocating for writing custom training loops, suggesting that modifying examples from diffusers allows for more flexibility in training AI models.
  • A Shortcut through the Neural Nets: An interesting discussion on Kolmogorov-Arnold Networks (KANs) took place, highlighting their promising attributes, such as requiring smaller computational graphs compared to Multi-Layer Perceptrons (MLPs).
  • Fine-Tuning Explained:
    • A member shared a YouTube video offering a high-level overview of fine-tuning AI models.
    • They also linked to a HuggingFace technical guide on fine-tuning with Transformers and Keras.

Links mentioned:

  • KAN: Kolmogorov-Arnold Networks: Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation fun...
  • What is Fine Tuning? In Two Minutes.: A high-level overview of what fine-tuning a genAI model is in two minutes. TL;DR: Tuning a genAI model is like tuning a guitar. Technical overview from @Hug...
  • Fine-tune a pretrained model: no description found

HuggingFace ▷ #cool-finds (7 messages):

  • New MPI Code Repository Unveiled: A link to a GitHub repository called MPI-Codes by Binary-Beast03 was shared, which is aimed at contributing to the development of MPI Codes. Information about it can be accessed at the MPI-Codes GitHub repository.

  • RAG Boosts LangChain’s Email Savvy: LangChain’s LangGraph Agents have been enhanced with Retrieval-Augmented Generation (RAG) to improve intelligent email drafting, with details shared in a Medium post. However, the content is behind a member-only access wall as noted by a follow-up comment.

  • FinBERT Fine-Tuned for Financial Sentiment: ProsusAI’s FinBERT, a BERT-based NLP model, is specifically trained for sentiment analysis in the financial domain and shared with the HuggingFace link. It is fine-tuned on the Financial PhraseBank and detailed in both an academic paper and a companion blog post.

  • Explaining Retrieval-Augmented Generation (RAG): An informative Databricks page was shared, covering how RAG addresses the issues of LLMs not adapting to custom data and the necessity for AI applications to leverage such data for effective results.

Links mentioned:


HuggingFace ▷ #i-made-this (9 messages🔥):

  • Typo Alert in Model Card: A small typo was pointed out in the model card’s title for Fluently XL V4—it should be “Fluently” instead of “Fluenlty.”
  • Fluently-XL-v4 Showing Off New Colors and Digits: An image generated by Fluently-XL-v4 on a local NVIDIA RTX 3070 mobile boast impressive results, evidenced in an Instagram post, with well-handled colors and correct numbers of fingers, outperforming several other models.
  • Hugging Face Audio Course Receives Brazilian Translation: Chapters 0 and 1 of the Hugging Face audio course have been translated into Portuguese and a PR is open for review here, with a call for help from Brazilian community members for revisions.
  • Introducing LongCap for Image Captioning: A finetuned version of the BLIP model for long captions is shared, which promises to generate detailed image descriptions suitable for prompts in text-to-image generation. A request for assistance in evaluating this model against Google’s DOCCI is open, with a Colab notebook provided for testing.
  • Archiving Community Highlights in Portuguese: A new page created by a community member compiles all posts and links from the Hugging Face Community Highlights since edition #52, with plans to catch up on previous editions and establish a comprehensive database of AI-related content in Portuguese link here.
  • Sythetic Data Generator for LLMs Now on PyPI: A tool for generating and normalizing synthetic data for training large language models has been released and is available on PyPI, potentially aiding fine-tuning efforts for different project use cases.

Links mentioned:


HuggingFace ▷ #reading-group (6 messages):

  • Curated Collection for LLM Improvement: A member shared their research on improving Large Language Models (LLMs) with a curated list on HuggingFace, inviting thoughts and feedback.
  • Spotlight on React Agents: Another participant highlighted the significance of React agents in elevating LLM output quality, noting the abundance of papers in the field and the challenges of selecting a focus.
  • Unindexed Findings in LLM Research: The curator of the LLM improvement collection expressed excitement about sharing papers that had not been indexed, received upvotes, or associated code in their research compilation.
  • Exploring Reasoning and Acting in LLMs: The curator drew attention to a paper titled ‘ReAct’ that proposes a method for combining reasoning traces and task-specific actions in LLMs for enhanced performance and interpretability. The abstract discusses how interleaving both aspects can improve interface with external information sources and handle exceptions (view the paper).
  • Graph ML Meets LLMs: A member shared preliminary notes for a presentation that looks into the intersection of graph machine learning and LLMs, with an observation that the topic is more extensively explored than initially thought. They shared a medium post summarizing the subject.

Links mentioned:


HuggingFace ▷ #computer-vision (7 messages):

  • Channel Quest: A member inquired about the existence of a #cv-study-group channel, which they could not find despite its mention in the Community Computer Vision Course page.
  • Fine-Tuning Strategies Shared: A suggestion was made to fine-tune only the classifier weights of a pre-training model for efficiency, and to consider training a model end-to-end using a shallow CNN to rescale images before connecting to Yolov4.
  • Study Group Status Clarifications: There seems to be some confusion among members regarding the presence of a study group; one member clarified there’s no specific study group, while another mentioned that while there is no reading group, someone might know of past study groups.
  • Agreement on Non-Existence of Study Groups: Members agreed that there is no particular reading or study group currently active in the channel.

Link mentioned: Welcome to the Community Computer Vision Course - Hugging Face Community Computer Vision Course: no description found


HuggingFace ▷ #NLP (5 messages):

  • RARR - A Solution for Model Attribution: A member shared the RARR paper, which presents a system for Retrofit Attribution using Research and Revision. RARR aims to automatically find and add attributions to the outputs of text generation models, and make corrections to unsupported content.

  • Zero-shot Classification Confusion: A user reported an issue with a zero-shot classification model generating disproportionate results, with labels “gun” and “art” leading to almost split probabilities, questioning the model’s behavior against expected results. This could be indicative of a misunderstanding on how the classifier analyzes text unrelated to the labels provided.

Link mentioned: Paper page - RARR: Researching and Revising What Language Models Say, Using Language Models: no description found


HuggingFace ▷ #diffusion-discussions (12 messages🔥):

  • Clarification on Auto-Train Configs: A member clarified that specifying xl: true in auto train configs is optional because the model type can be determined automatically, but it can also be explicitly declared in the configuration.

  • Merging Diffusion Pipelines Technique: One member inquired about the possibility of using two different StableDiffusionPipelines for partial denoising, switching at a midpoint in the process. Another member provided information on an approach called partial diffusion via mixture of experts, linking to an outstanding pull request for SD 1.5 on the diffusers GitHub repository.

  • Seeking Examples for Partial Diffusion: A member requested examples of partial diffusion with StableDiffusionPipelines. They were directed to a GitHub comparison page that showcases the implementation of the method.

  • Availability of Partial Diffusion for Testing: The same member considered testing the partial diffusion method mentioned in the pull request to determine its suitability for their own test suite, noting a preference for faster inference times.

Link mentioned: Comparing huggingface:main…bghira:partial-diffusion-2 · huggingface/diffusers: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX. - Comparing huggingface:main…bghira:partial-diffusion-2 · huggingface/diffusers


LlamaIndex ▷ #blog (5 messages):

  • Build an Optimized RAG Data Stack: A new tutorial has been shared, featuring a comprehensive guide on constructing an efficient data stack for an agentic RAG support bot. It highlights the importance of various data components besides the vector database and is documented by @tchutch94 and @seldo. Check out the full post here.

  • Step-By-Step RAG Pipeline Guide: Plaban Nayak introduces an open-source RAG pipeline guide using Llama 3 from Meta, @qdrant_engine, and ms-marco-MiniLM-L-2-v2. This guide emphasizes improving RAG application precision with a reranker process. Read more about the guide here.

  • Natural Language Filters for Airbnb Listings: Harshad Suryawanshi provides a walkthrough for creating a RAG application to filter @Airbnb listings with natural language, utilizing @MistralAI’s Mixtral 8x7b tools. A detailed explanation and repository can be found here.

  • LlamaIndex 10.34 Release Features Introspective Agents: An announcement for the new LlamaIndex 10.34 release was made, highlighting features such as introspective agents that utilize reflection for iterative responses. The notebook contains implementations but warns of potentially sensitive content. Read about these agents and the warning here.

  • Launch of LlamaIndex 0.10.34 with Huggingface Support: The release of LlamaIndex 0.10.34 introduced introspective agents and mentioned upcoming support for huggingface integration. They promise to discuss all new updates separately in the days to follow. Catch the details here.

Link mentioned: Introspective Agents: Performing Tasks With Reflection - LlamaIndex: no description found


LlamaIndex ▷ #general (140 messages🔥🔥):

  • Seeking a Financial Analysis Application: A member is creating an application to generate financial summaries from pandas dataframes containing company income statements. They seek guidance on using the Pandas Query Engine, given the brief examples in the documentation.
  • Customizing MongoDB with LlamaIndex: A user seeks help on querying directly from MongoDB embeddings with metadata, bypassing document or node submissions to LlamaIndex’s query engine. They shared a tutorial link they previously used and requested alternatives for MongoDB’s collections.aggregate.
  • LLamacpp Parallel Request Deadlock: One member reported a deadlock when running two concurrent queries with the llamacpp gguf model. They inquired about enabling parallel request serving without using a server setup to mitigate the issue.
  • Inquiry about Setting Up Trulens with Llama Index: A user asked for assistance on using Trulens with MongoDB and Llama Index, pointing out that they already have embeddings and metadata uploaded. They shared relevant links from the docs and suggested considering alternative tools like Arize and Langfuse.
  • Memory Load Issues with LlamaIndex: A member experienced memory overload issues when running LlamaIndex, with an 8GB model sometimes exceeding 20GB and then reverting to CPU, causing slowdowns. They identified a specific command execution in their code which appeared to spam memory, and mentioned the necessity of waiting for memory cleanup.

Links mentioned:


LlamaIndex ▷ #ai-discussion (1 messages):

  • Impressive but Costly RAG Performance with OpenAI API: A user shared their positive experience using OpenAI’s assistants API v2 for Retrieval-Augmented Generation (RAG), noting effective answers derived from a testing knowledge base of 500 Wikipedia articles. However, they highlighted the cost concern as a short conversation racked up $1.50 in charges.

Latent Space ▷ #ai-general-chat (25 messages🔥):

  • Seeking Gemini 1.5-compatible tools: A member inquired about tools like Cursor/Aider for Gemini 1.5 full context window but mentioned disappointment with Gemini 1.5 benchmarking, preferring to use Opus or long context that Cursor now supports.
  • Code Interpreter SDK Takes Twitter Stage: Member @mlejva announced the launch of their Code Interpreter SDK on Twitter and solicited community support with a link to their tweet.
  • Prompt Labeling Practices in Question: A user queried the community about best practices for labeling output variables in prompts, particularly with Claude, referencing a Matt Shumer’s tweet.
  • OpenAI Assistants API Quickstart Shared: The OpenAI Assistants API Quickstart was highlighted, featuring integration with Next.js and offering streaming chat interfaces, function calling, and a code interpreter; linked tweet here and GitHub repo.
  • SQLite Gets a New Vector Search Extension: There’s a successor to sqlite-vss called sqlite-vec, being developed for better vector search within SQLite, shared with a creator’s blog post.

Links mentioned:


Latent Space ▷ #llm-paper-club-west (33 messages🔥):

  • Mamba Deep Dive Kicks Off: The llm-paper-club-west channel members geared up for a discussion on Mamba with a Notion link shared for a deep dive into the topic: A Mamba Deep Dive.
  • Debating Mamba’s Selective Recall Capability: A member raised a question about whether selective copying in Mamba is akin to a recall test for previously seen tokens, initiating a discussion on the mechanism’s specificity.
  • Technical Difficulties Spur Platform Switch Proposal: Users faced with delays and technical hiccups during a Mamba discussion agreed on switching to Zoom for future meetings to ensure a smoother experience.
  • Exploring the Sensitivity of Mamba in Fine-tuning: The conversation turned towards how the Mamba architecture fares during fine-tuning and its susceptibility to overfitting in comparison to traditional transformers.
  • State Space Models and Induction Heads: There was a brief exchange about whether state space models, specifically in the context of Mamba, could approximate induction heads found in attention layers. Two arXiv papers were shared for further reading: Arxiv State Space Models and Arxiv Multi Token Paper.

Link mentioned: Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It’s the all-in-one workspace for you and your team


Latent Space ▷ #ai-in-action-club (65 messages🔥🔥):

  • Fascination with Suno’s Audio Generation: A member expressed curiosity about the music generation capabilities of suno, wondering if it creates music tracks from scratch. Another member mentioned Suno’s focus on audio tokenizing as their “secret sauce.”
  • Exploring Different Model Architectures: Discussion around musicgen architecture revealed it being part of a buddy’s adventure into finetuning audio models for multimodal applications. Members also touched upon imagebind as an example of multimodal embedding space.
  • Understanding Harmonic Distortion: In a brief conversation about ‘harmonic distortion’, a member described it as incorrect weightings on harmonic tones, which could result in improper frequency ratios or beats. Reference was made to a blog discussing the snake activation function and its potential for reducing harmonic distortion.
  • Generating Audio with Latent Diffusion & Autoencoders: Inquiry into the process of stable audio 2.0 led to a discussion about how audio files are processed through autoencoders to create tokens, with suggestions that entire audio files are compressed for the model.
  • Commercial Viability of Generated Audio: A member inquired about the licensing and commercial use of outputs from Stable Audio 2.0, indicating an interest in the legalities around generated content. There was also mention of potential applications, like separating and replacing audio channels.

Links mentioned:


Eleuther ▷ #general (49 messages🔥):

  • Intriguing Trends in Multilingual LLMs: An ongoing discussion revolves around how to enhance multilingual capabilities in Large Language Models (LLMs), with references made to research papers such as “Understanding Language Models by Fine-grained Language Identification” and work exploring LLMs’ processing of multilingual inputs. The framework depicted suggests that in initial layers, LLMs convert multilingual inputs into English before generating responses in the original query’s language.

  • Reflecting on the Journey of ML Domains: Users reminisced about abandoned or overshadowed research areas in machine learning, including adversarial robustness, automated architecture, and domain-specific model training. There’s a feeling of nostalgia and a hint of regret for the paths not taken, exacerbated by the fact that job pull by major tech companies has deprioritized these areas.

  • The Changing Landscape of AI Funding and Impact: Users discussed the enormous investments in AI companies and the potential over-saturation leading to diminishing returns on investment. Concerns are raised about how this affects the efficiency of scaling up models and the future direction of AI research.

  • LLMs and System Hierarchy Vulnerabilities: There’s an intricate conversation on how models handle instruction hierarchies and the vulnerabilities that arise when a system prompt is not considered different from a user prompt—a risk particularly important in preventing prompt injections or other attacks on LLMs. A linked paper, “Improving Robustness to Prompt Injections with Instruction Hierarchy”, suggests reinforcing instruction hierarchies could mitigate these vulnerabilities.

  • Job Search for an ML Engineer: A member is reaching out to the community seeking opportunities for a machine learning engineer position outside the US. This person has experience with LLMs, and relevant work includes leading the Polyglot team and contributing to the OSLO project within EleutherAI; they shared personal links to LinkedIn, Google Scholar, GitHub, and an email address for potential contact.

Links mentioned:


Eleuther ▷ #research (54 messages🔥):

  • Exploring the Depths of Dataset Contamination: Amidst discussions about instruction-finetuning and benchmark effectiveness, participants shared concerns over benchmark dataset leakage in large language models (LLMs), emphasizing the difficulty in measuring leaked information and the cycle of detecting and addressing leaks. Two recent papers on benchmark dataset leakage were highlighted: one focused on detecting data leakages and the other discussing fresh benchmark questions as a solution to prevent unfair comparisons.

  • Chatbot Conversations as a Learning Tool: The idea of using chatbot conversations, particularly those with multiple multiturn interactions with the same user, was contemplated with the notion of utilizing sentiment analysis or user retention (churn) to improve an LLM. Participants were curious about how these could lead to a self-improvement loop within the model, with one member pointing out a paper focusing on indirect preference and another reference to the WildChat dataset for chatbot research.

  • Chess Mastery without Heuristics: A study using a transformer model trained on a dataset of 10 million chess games was brought up, demonstrating the model’s high performance in chess without domain-specific enhancements or explicit search algorithms. The DeepMind paper indicates that training models at scale can lead to competitive levels of play without the approaches traditional chess engines use.

  • Serendipitous Time Travels and Future Returns: A humorous exchange took place where a participant jokingly claimed to have built a time machine and returned from the future, leading to playful interactions about their presence in the ‘ot’ (off-topic) channel and the utilization of a time machine.

  • Gwern’s Peripheral Return: In a meta-discussion, participants noted gwern1782’s selective responses to past mentions after a period of absence from the server, with mentions of using Discord’s search feature to filter through the multitude of notifications.

Links mentioned:


Eleuther ▷ #scaling-laws (1 messages):

  • Optimistic Prediction on Math Problem Solving: The NARRATOR mentioned a prediction that was surpassed, with current Math Word Problem Solving performance at over 70% within 2 years turning out to be too pessimistic. This benchmark can be explored in detail at Papers With Code, which is a free resource with data licensed under CC-BY-SA. Contact them via [email protected].

Link mentioned: Papers with Code - MATH Benchmark (Math Word Problem Solving): The current state-of-the-art on MATH is GPT-4-code model (CSV, w/ code, SC, k=16). See a full comparison of 109 papers with code.


Eleuther ▷ #interpretability-general (8 messages🔥):

  • Position Paper Accepted: A recently submitted position paper by authors including Vincent Conitzer, Rachel Freedman, and Stuart Russell among others, has been accepted as a position paper.
  • Mechanistic Interpretability Workshop at ICML 2024: Neel Nanda announces the first academic Mechanistic Interpretability workshop at ICML 2024, with a call for papers. The event features $1750 in best paper prizes, and submissions can include a variety of formats, with a deadline of May 29th.
  • Star-Studded Panel Discussion Revealed: Neel Nanda confirms that Naomi and StellaAthena will be part of a panel at the Mechanistic Interpretability workshop, with updates and additions to the event’s website pending.
  • Comprehensive Primer on Transformer-Based Language Models: javifer96 highlights the release of a primer on Transformer-based language models, encompassing model components and interpretability methods in a unified notation. Interested parties can find the announcement and more information here.
  • Cross-Model Generalization Using English as Pivot Language: Butanium shares that the team has replicated results across llama models, indicating that using English as a pivot language generalizes well across these models. Further details can be found in their latest tweet.

Links mentioned:


Eleuther ▷ #lm-thunderdome (2 messages):

  • Inquiry on MT-Bench Inclusion: A member asked about the status of incorporating MT-Bench or similar benchmarks into lm-evaluation-harness and if there are any upcoming conversational AI quality benchmarks.

  • Prometheus 2 As a Potential Improvement: Another member highlighted Prometheus 2, an open-source evaluator LM, suggesting it as a beneficial addition to lm-evaluation-harness. Prometheus 2 is designed to mirror human and GPT-4 judgements and supports various forms of assessment, as noted in the research abstract on Hugging Face.

Link mentioned: Paper page - Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models: no description found


OpenAccess AI Collective (axolotl) ▷ #general (40 messages🔥):

  • LLama-3 8B’s Context Length Breakthrough: Gradient AI announced the extension of LLama-3 8B’s context length from 8k to over 1040k with help from Crusoe Energy’s compute. The achievement showcases that state-of-the-art large language models (LLMs) can handle long contexts through minimal training using an adjusted RoPE theta.

  • Conceptualizing Ring Attention: A member discussed trying to grasp the concept of ring attention, using visualization to aid understanding, despite some skepticism from others about the technical accuracy of the approach.

  • Collision with ChatML Training: One user reported problems when training with ChatML, facing an AttributeError associated with SeparatorStyle.GEMMA. The troubleshooting included suggestions like removing conflicting arguments and upgrading fastchat, aiming to resolve training issues with ChatML-configured datasets.

  • Injecting Context in Prompts: In a discussion about fine-tuning prompt design, members exchanged insights on how to include ChatML turns within system prompts for model training and inferred that when context is injected into prompts, ChatML tokens are tokenized correctly without escaping.

  • Hermes 2 Accelerated with llama.cpp: A member expressed admiration for the inference speed of Hermes 2 Pro Llama 3 8B on an Android device with 8GB RAM. This was attributed to an upgrade from llama.cpp which reportedly increased inference speed by 30%.


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (19 messages🔥):

  • PR Merged for Performance Improvement: A pull request was merged to fix an issue where the orpo trainer was using only one worker for preprocessing. This enhancement is aimed at speeding up the data preprocessing steps in TRL trainer and possibly others like DPOTrainerArgs. The patch is available at GitHub PR #1583.

  • Parametrization Affects Multiple Configs: Clarifications in the conversation indicate that the dataset_num_proc parameter in question not only affects the TRL trainer but also DPO, SFT, CPO, KTO, and ORPO configurations within the codebase.

  • Minimum Python Version Established for Axolotl: It was confirmed that the minimum Python version required to run Axolotl is 3.10, allowing the use of match..case statements in the codebase.

  • Gradio Configurability Inquiry: A member discussed making hardcoded Gradio options configurable through YAML files. They explored how to pass various configuration options, such as making the Gradio interface private and controlling the IP address and port number.

  • Gradio Tokenization Issues Examined: There was an issue reported with Gradio not using the correct tokens for the llama3 model, which led to a discussion on how the default tokens could be overwriting the already loaded tokenizer.

Link mentioned: FIX: TRL trainer preprocessing step was running in one process by ali-mosavian · Pull Request #1583 · OpenAccess-AI-Collective/axolotl: Description We weren’t passing dataset_num_proc to TRL training config, thus the initial data preprocessing steps in the TRL trainer was running in one process only. Motivation and Context Speeds …


OpenAccess AI Collective (axolotl) ▷ #general-help (7 messages):

  • Epochs and Batch Sizes Inquiry: A member mentioned they usually go with 4 epochs and a batch size of 4 when training their models.
  • Llama3 Model Inference Confusion: A member asked for guidance on how to call inference after training llama3 using the fft script, noting that the regular qlora_model_dir does not seem applicable.
  • SafeTensor to GGUF Conversion Challenge: Discussing conversion from SafeTensors to GGUF, a member expressed difficulties finding a way to convert to various gguf types like Q4_K or Q5_K, after using llama.cpp, which seemed limited in options.
  • Script Solution for Conversion Dilemma: Another member pointed towards a conversion script available in the llama.cpp repository, specifically referencing the GitHub link to convert-gg.sh script.
  • Limited Conversion Options in llama.cpp: Despite the previous suggestion, the same member reiterated the issue, stating the llama.cpp conversion script provides only two gguf conversion options, and they are looking for a broader range of types such as q4k.

Link mentioned: llama.cpp/scripts/convert-gg.sh at master · ggerganov/llama.cpp: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.


OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (15 messages🔥):

  • Configuring for Custom Dataset Roles: A user inquired about configuring a dataset with the structure {"messages": [{"role": "system", "content": "…"}, {"role": "user", "content": "…"}, {"role": "assistance", "content": "…"}]}. They were advised to use UserDefinedDatasetConfig class to align the structure with the system’s expectations.

  • Preprocessing Conversations for ShareGPT: In response to a dataset structure question, it was suggested to preprocess messages by concatenating the “content” with the respective role identifier, ensuring it adheres to the sharegpt model’s expected format.

  • Filling in Dataset Configuration Keys: When asked how to fill in certain keys in a dataset configuration block, it was recommended to set the conversation to Llama2ChatConversation, map the field_human to “user”, the field_model to “assistance”, and appropriately categorize the “system” and “user” as input roles and “assistance” as the output role.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (32 messages🔥):

  • DeepSpeed Stage 3 Quality Concerns Cleared: DeepSpeed Stage 3, also known as ZeRO-3, does not inherently degrade model quality but optimizes memory usage during training. It’s essential to correctly implement and integrate ZeRO-3 within the training pipeline to avoid potential issues due to misconfigurations. DeepSpeed documentation can guide configuration.

  • Flash Attention with DeepSpeed for Fine-Tuning: It is possible to use both Flash Attention and DeepSpeed Stage 3 for fine-tuning models, requiring integration of Flash Attention into the model and DeepSpeed Stage 3 setup within the training script. Proper configuration is crucial to leverage both technologies effectively.

  • Speed Improvements with DeepSpeed Stage 3: DeepSpeed Stage 3 can speed up very large model training, allowing for larger batch sizes and reducing the need for complex parallelism strategies. However, the extent of speedup can vary based on model architecture, hardware setup, and data loading efficiency.

  • Training with LLaMA 3 Instruct on Axolotl: Instructions for training with LLaMA 3 Instruct involve setting up an environment, creating a YAML config file, and initiating training and inference through commands using Accelerate and Axolotl. Adjustments specific to LLaMA 3 Instruct’s implementation may be required. Axolotl GitHub

  • Understanding VRAM Usage with Axolotl Configurations: Utilizing the simple qlora.yaml config in Axolotl examples uses both GPUs equally, but transitioning to FSDP or DeepSpeed Stage 3 might not show significant VRAM reduction due to various factors including model compatibility and overhead in managing sharded models. Configurations may need fine-tuning to optimize memory savings.

Links mentioned:


OpenInterpreter ▷ #general (63 messages🔥🔥):

  • Documentation Confusion Cleared: A member shared a link to Open Interpreter local installation documentation, which specifically includes instructions for Ollama, Jan.ai, and Llamafile with particular emphasis on dolphin-mixtral.
  • Prompt Adjustment for Conciseness: A member advised using the --profile 01 command for Open Interpreter to avoid repetitive recap of steps and plans. They also shared a link to the related system message.
  • Open Source AI Hackathon Announcement: An invite was made to join a team for the Microsoft Open Source AI hackathon in Seattle, with details and registration linked here.
  • Open Interpreter Server Hosting Query: A member inquired whether it’s possible to host a server running Open Interpreter for others to connect to, and another confirmed it’s feasible, pointing to usage of the --api_base along with --model openai/custom --api_key dummykey.
  • Local Model Hosting for Mobile Devices Guidance: Info was sought on setting up a local Open Interpreter model for access by mobile devices, to which links to GitHub documentation were provided, referring to Android device setup and running Open Interpreter locally.

Links mentioned:


OpenInterpreter ▷ #O1 (10 messages🔥):

  • Speaker Selection: A Delicate Process: The choice of speaker for an electronics project is being closely evaluated, with Ben discussing options with vendors and considering integration with the PCB design. This decision is expected to unfold over weeks, with updates to follow based on validation results.

  • Fair Game: Reviewing Released Products: Discussing product reviews, a member expressed that it is completely valid to review a product that has been officially released, implying confidence in the reviewer’s understanding of the product space.

  • Speed Boost for Whisper RKNN: An improved branch for Whisper RKNN on Rockchip RK3588 SBCs has been shared, boasting a 250% performance increase per rbrisita’s GitHub. The contributor plans to introduce LLM RKNN features next.

  • Troubleshooting Interpreter Errors: One user encountering errors with the interpreter command was advised to add --api_key dummykey to their execution. For further assistance, they were directed to specific Discord channels for issue discussion.

  • Progress on TMC Protocol for iOS: A discussion is underway regarding the implementation of the TMC protocol for iOS, which grants access to native features such as the calendar and iMessage. The member is contemplating the benefits of TMC over standard function calling during development.

Links mentioned:

  • error file - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
  • GitHub - rbrisita/01 at rknn: The open-source language model computer. Contribute to rbrisita/01 development by creating an account on GitHub.

OpenInterpreter ▷ #ai-content (2 messages):

  • Open Source AI Vtuber Kit Released: Nikechan presented an AI Vtuber starter kit requiring an OpenAI key and YouTube Key for operation. The project is available on GitHub and was also announced via Twitter.

  • AI Vtuber Runs Offline: Hensonliga shared their AI Vtuber repository that runs entirely offline without the need for an API and noted the content can be uncensored. The announcement included a YouTube demonstration and a link to the GitHub repository.

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

  • Traffic Surge Causes Blips: OpenRouter experienced higher-than-normal errors due to a significant increase in traffic, causing intermittent issues.
  • Scaling Efforts Underway: An update at 7:30am PT indicated that the scaling process to manage the surge was still in progress, reducing but not entirely eliminating the issues.

OpenRouter (Alex Atallah) ▷ #general (72 messages🔥🔥):

  • Exploring Payment Alternatives: Members inquired about the possibility of OpenRouter supporting additional payment methods like Stripe to include WeChat Pay and Alipay, noting that there’s extra paperwork required.
  • Upcoming AI Model Teasers: Excitement and speculation bubbled around new and large-scale language models like LLaMA-3 and potential forthcoming releases from companies like Soliloquy, while acknowledging proprietary model limitations.
  • Concern Over Model Dumbing After Fine-Tuning: A technical discussion unfolded around the consequences of fine-tuning large language models without access to instruct datasets, suggesting that batching old data with new can prevent catastrophic forgetting.
  • Interest in Easier Payment Integration: There was a suggestion to develop an app integrating with Google payment services for easier transactions.
  • Resolving Gemini Pro Issues: User issues with Gemini Pro messages, specifically starting with an “assistant” role, were addressed with updates and workarounds mentioned, including prepending user role messages in prompts.

Links mentioned:


AI Stack Devs (Yoko Li) ▷ #app-showcase (1 messages):

angry.penguin: https://huggingface.co/spaces/YupengZhou/StoryDiffusion


AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (15 messages🔥):

  • Mysterious Messages Puzzle AI Devs: AI Stack Devs are puzzled by empty messages or strings of numbers blocking conversation flow in ai-town, using ollama and llama3 8b. Tokenizer issues have been suggested, but there is no definitive answer yet.

  • Godly Praise for Cocktail Peanut: A member gave a shoutout to cocktail peanut simply stating they’re doing “gods work,” but no context was provided on what work is being referred to.

  • AI Society Without Leadership: Members discussed whether AIs elect a leader within simulations, with the consensus being there’s no mayor or elected official. Curiosity was expressed regarding this aspect from the original simulation paper.

  • Simplifying AI Character Roles: A member mentioned the idea that setting up a mayoral election could be easily implemented in the player bios of AI characters.

  • AI Town Experiences and Tools Shared: Links were shared by @.casado promoting @TheoMediaAI’s exploration of AI simulations, and @cocktailpeanut’s new web app that allows for replaying any AI Town by importing a sqlite file. The latter supports Mac & Linux, with a requirement for ollama.

Links mentioned:

  • Tweet from Theoretically Media (@TheoMediaAI): Exploring Two remarkable AI World Simulations: First, the AI-Westworld from @fablesimulation (PUBLIC BETA is OPEN!), and also taking @realaitown for a spin, but recreating the best movie ever (The THI...
  • Tweet from cocktail peanut (@cocktailpeanut): Introducing AI Town Player Did you know that the entire AI Town is stored in a single sqlite file via @convex_dev? I reverse engineered the schema and built a web app that lets anyone REPLAY any A...

AI Stack Devs (Yoko Li) ▷ #ai-town-dev (26 messages🔥):

  • Node Version Hinders Local Backend Progress: A member encountered an error while running convex-local-backend due to an incorrect node version: Wrong node version v19.9.0 installed at node. It was suggested to switch to node version 18 using nvm use 18.

  • Local Development Halted by Backend Bugs: Another member faced multiple errors when attempting to run convex-local-backend on Ubuntu 18. Issues included problems with the node version, rush buildCacheEnabled, and a type error (Unknown file extension “.ts”).

  • In Search of a Simpler Setup: Frustrated by the complications, the member inquired about a Docker build to simplify the deployment process. An alternative of using ollama locally and convex remotely was mentioned.

  • Sharing Resources for Community Projects: A request was made to share a larger map with another user working on the Pinokio build. Member edgarhnd agreed to share the map.

  • LLama-Farm Project To Connect Local Machines: A project named llama-farm was introduced, designed to connect one or more machines running Ollama to a cloud backend or hosted website, enabling the use of local LLM compute without exposing the machines to public internet requests.

Link mentioned: TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension “.ts” for /app/npm-packages/convex/src/cli/index.ts · Issue #1 · get-convex/convex-backend: I ran the steps in the prerequisites then got this when running just run-local-backend Error: Failed to run convex deploy: TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension “.ts&quot…


AI Stack Devs (Yoko Li) ▷ #ai-raspberry-pi (2 messages):

  • Emotional Penguins and Discord Bots: A member displayed an emoji expressing deep contemplation or skepticism, possibly gearing up for a discussion or pondering a question related to AI on Raspberry Pi.
  • Channel Meets User’s Interest: Another member expressed that the ai-raspberry-pi channel is perfectly suited for their interests, implying they might engage in or contribute to discussions on AI development using Raspberry Pi.

LAION ▷ #general (26 messages🔥):

  • Implementation Challenges with SoundStream: A member experienced difficulties in implementing the SoundStream paper by Google due to unspecified index names and values. Another member pointed out an existing code repository that might help, available on GitHub.

  • Newbie Welcome and Offers Course: A newcomer to AI-generated art, after finishing a Udemy course on Stable Diffusion, offered to share the course for free in hopes of building connections and learning more advanced skills from the community.

  • Investing Strategies in Chat: Various members humorously discussed their investment strategies, ranging from seeking services that 10x their money to preferring ones that halve their funds.

  • Insights on Model Training Limitations: In the StableDiffusion subreddit, discussions mention that using both T5 text encoder and CLIP might not improve prompt adherence as expected, with some expressing surprise and others nodding to the possibility of high CLIP dropout as a potential factor.

  • StableDiffusion Development Updates: Updates from the Stable Diffusion community indicate a shift in focus from larger models, due to hardware constraints, to architectural and training improvements on smaller models. The conversation also touches on the importance of correctly training with CLIP to avoid biases and limitations.

Link mentioned: GitHub - wesbz/SoundStream: This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf: This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf - wesbz/SoundStream


LAION ▷ #research (7 messages):

  • Bot Banishment Brigade: Two members humorously interacted over the removal of a perceived bot from the discussion, with one member cheerfully noting their timely attention to the chat.
  • Scrutiny Over Dataset Choices: A member queried why experiments are not conducted on standard datasets such as MNIST, CIFAR, or ImageNet but rather on synthetic ones. Another member attributed this choice to the goal of demonstrating interpretability.
  • Interpretability vs Real-world Application: Following a discussion on the focus of experiments for interpretability, another member expressed skepticism, pointing out that methods need to solve real-world tasks to be truly compelling.
  • New Tool on the Block: A link to StoryDiffusion was shared by a member with no additional context provided regarding its purpose or relevance.

LangChain AI ▷ #general (26 messages🔥):

  • Integration Struggles with Text Embedding: A user expressed difficulty in integrating a text embedding model with LangChain, mentioning the need to utilize a SageMaker endpoint rather than an API key. The user sought advice for alternative methods or resources for such integration.

  • LangChain Package Version Confusion: A member raised a question about installing the langchain PyPI package, noting that the version of langchain-openai specified is quite old (<=0.1) and wondering if this is intentional for compatibility reasons, given that the current version of langchain-openai is significantly updated.

  • Looking for Chatbot Enthusiasts: A user inquired about finding a community focused on developing conversational chatbots, seeking recommendations from fellow members.

  • Data Retrieval Query for CSV: A member asked how to embed a single column from a CSV file into a LangChain application and later retrieve data from a different column in response, using a use case involving email lookup.

  • Hackathon Heads-up!: An announcement was shared about an upcoming hackathon named BeeLoud, where participants are challenged to build AI products within 54 hours, with a potential prize pool of up to $25,000. The event welcomes diverse skill sets and is set to occur on May 10-12, with participants from across the globe.

  • Request for Interview with LangChain Users: A user requested to discuss the biggest challenges faced by those frequently building with AI agents using LangChain or other frameworks, providing a link to schedule a call for detailed conversations.

  • SQL Agent Functionality Query: A conversation was sparked about whether it’s possible to call MSSQL functions using the SQL agent in LangChain, leading to a detailed explanation of using the SqlToolkit for such executions and relevant links for further guidance.

  • LangChain RAG Implementation Insights: A user preparing for a role involving LangChain’s implementation asked for key points and advice on how to prepare for an interview related to LangChain, particularly in context with implementing RAG through LangChain.

  • Handling Large Databases in LangChain: Members discussed various methods for using LLM to query databases, debating between converting database data to natural language text versus using ChatGPT to convert natural language to SQL queries, and considering the challenges of dealing with large databases within such paradigms.

Links mentioned:

  • Build - Beeloud: Can you build the next billion dollar startup in 3 days? Sam Altman and his buddies are betting you can. You’ve officially been challenged to join this hackathon. I accept… Continue readi...
  • 30 Minute Meeting - Leon Chen: no description found
  • no title found: no description found
  • Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.

LangChain AI ▷ #langserve (1 messages):

  • Clarifying Feedback Submission Confusion: A member was uncertain about how feedback submission works when using the langserve feedback endpoint. It was explained that an “OK” response from Langserve only indicates successful submission but does not confirm recording by langsmith, as requests may be rejected if deemed unauthenticated or invalid by the server.

LangChain AI ▷ #share-your-work (3 messages):

  • Boosting Email Drafting with RAG: Enhancements to LangChain’s LangGraph Agents now include Retrieval-Augmented Generation (RAG) for more intelligent email drafting capabilities. The Medium article details how this integration can significantly improve the efficiency and quality of AI-generated email communication.

  • LangChain Java Port Available: For developers interested in using LangChain with Java, langchain4j offers a Java version of LangChain, expanding the possibilities for integration into various applications.

  • Dragonfly Integrates with LangChain: A new blog post highlights the integration of Dragonfly, an in-memory data store, with LangChain, to manage chat context and improve performance of AI-powered applications. Detailed information and code snippets for this enhancement can be found in the blog post.

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-questions (13 messages🔥):

  • Hints of Implementation Intentions: It appears there’s consideration for implementing something that is not commonly practiced. The specifics are not yet clear, but there’s an expressed interest in beginning the implementation process.
  • Technical Report Wait Game: There’s anticipation for a technical report that hasn’t been published yet, which seems to be causing some confusion. The absence of this report is attributed to timescale constraints related to data.
  • Reward Model Contest Alert: There’s mention of a reward model competition by LMSYS with a significant 100k prize, drawing a parallel to older Kaggle competitions and prompting a call for a similar 200k Interconnects contest.
  • Perspectives on Ensembling: The concept of ensembling reward models is recognized, but it’s viewed as suboptimal, though potentially sufficient to give some competitors an edge.
  • PPO’s Connection to Reinforce: There was a discussion suggesting that Proximal Policy Optimization (PPO) could theoretically be reduced to the REINFORCE algorithm with a particular set of hyperparameters, possibly when the step size limiter is turned off. A link to OpenAI’s Spinning Up documentation was shared for further clarification.

Link mentioned: Proximal Policy Optimization — Spinning Up documentation: no description found


Interconnects (Nathan Lambert) ▷ #ml-drama (4 messages):

  • Drama Unfolding with Potential Model Leak: A member discussed the possibility that a leaked model with oddly specific quant is actually from GDM, referenced by a tweet from @teortaxesTex. Suspicions arose due to strange details like a sudden 4chan link, a throwaway HF account, and Reddit comments.

  • Quant Leak on 4chan Sparks Curiosity: A user summarized the situation as a “random-ass quant from llama3 dropped on 4chan” potentially originating from GDM.

  • Research Paper Fails to Impress with Missing RewardBench Scores: A member shared a link to a paper that missed reporting on RewardBench scores and hinted at underperformance by adding a facepalm emoji reaction.

  • Prometheus 2: A Challenger to GPT-4?: The paper introduced Prometheus 2, an evaluator language model positioned as a better alternative to proprietary LMs like GPT-4. It claims to align closely with human judgement and to handle various types of assessments.

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (5 messages):

  • $100,000 Human Preference Prediction Challenge: LMSYS and Kaggle launch a competition where participants predict user preferences between Language Model (LM) responses. The dataset includes over 55,000 conversations featuring LLMs like GPT-4, Claude 2, Llama 2, and Mistral.

  • A Short Victory Cry: A member simply commented “mogged”.

  • Kaggle’s Appeal to Researchers: A member inquired whether researchers generally have a liking for platforms like Kaggle.

  • Repeated Success Raises Questions: Reacting to the competition announcement, a member expressed disbelief noting, “he can’t keep getting away with this”.

  • Casual Chat About Commitments: The conversation continued with a more casual tone, referring to a ‘John’ who said ‘maybe’ to a member, suggesting a potential context of event or project participation.

Link mentioned: Tweet from lmsys.org (@lmsysorg): Exciting news — we’re thrilled to announce that LMSYS + @kaggle are launching a human preference prediction competition with $100,000 in prizes! Your challenge is to predict which responses user…


Interconnects (Nathan Lambert) ▷ #rl (8 messages🔥):

  • Valuable Value Functions in RLHF: One member pondered why reward functions are released but not the value functions obtained during RLHF training, questioning if somehow value functions are not produced. Another clarified that value functions are indeed obtained when using algorithms like PPO.

  • Reward Models Release Practices Questioned: It was mentioned that claiming people release reward models or functions as a standard practice may be an overstatement in the community.

  • Value of Value Functions Recognized: Despite uncertainties about their release, it is acknowledged that value functions are considered quite valuable, especially in the context of planning.

  • Research Gap on Value Functions?: A member speculated on the absence of research focusing on the value of value functions in classical RL, implying an opportunity for further exploration.

  • Link Between Value and Credit Assignment: The relationship between the value functions in PPO and credit assignment in DPO was noted as a potentially interesting area for future research.


Cohere ▷ #general (21 messages🔥):

  • Search System Design for Large Documents: A member explored ideas for building a search system for large PDFs and considered generating embeddings for semantic search, summarizing documents with LLMs for retrieval, and indexing key information extracted by LLMs.

  • Tokenization Clarification for Llama with Command R+: A member asked about the necessity of adding a “<BOS_TOKEN>” when generating text with the llama-cpp-python library and Command R+ after noticing its automatic addition during tokenization.

  • Cohere API Key Inquiry for RAG: One user inquired whether it’s possible to use a free Cohere API key for RAG, with another member confirming its availability but noting rate limitations.

  • Discussion on C4AI Command R+ Implementation: Members shared links to the C4AI Command R+ model on HuggingFace and a quantized version, alongside technical parameters for implementation, and discussed running it locally with varying degrees of system requirements.

  • Code Interpreter SDK Announcement: A member shared a demo of the launch of the Code Interpreter SDK on Twitter, with another questioning the uniqueness of this release in light of previous similar technologies.

Links mentioned:

  • command-r: Command R is a Large Language Model optimized for conversational interaction and long context tasks.
  • Tweet from Tereza Tizkova (@tereza_tizkova): 🚀 We are launching the @e2b_dev Code Interpreter SDK 🧠 It's a building block for any AI app - SDK for code interpreting! Use it to build 🔸 Advanced data analysts 🔸 Generative UI 🔸 AI softwar...
  • CohereForAI/c4ai-command-r-plus · Hugging Face: no description found
  • command-r-plus: Command R+ is a powerful, scalable large language model purpose-built to excel at real-world enterprise use cases.

Mozilla AI ▷ #llamafile (19 messages🔥):

  • llamafile as a Linux Service: A systemd script to launch llamafile as a service on Rocky Linux 9 was shared, detailing execution commands and environment configurations necessary to run llamafile with specific arguments, such as server port and model path.
  • Feature Request for Server Base URL: A feature request for the ability to specify a base URL for llamafile in server mode was addressed with a GitHub issue link, expressing the need for proxy support through Nginx to serve llamafile under a subdirectory.
  • Interest in Distil Whisper German Model: There’s curiosity about incorporating whisper models like distil-whisper-large-v3-german for speech recognition and potential for a blog post featuring its application, including a hypothetical pipeline of STT -> LLM -> TTS.
  • Embedding Direction Discrepancies: An issue was discussed where embeddings produced by llamafile and by llama.cpp show a low cosine similarity, indicating differing directions, a problem evidenced by a GitHub issue and tested with Python scripts provided.
  • Conversing with Documents/Code: The question of how to enable llamafile to ingest documents and code for conversational interaction was addressed with a suggestion to use curl API calls, referencing examples from the llama.cpp chat script.

Links mentioned:


tinygrad (George Hotz) ▷ #general (4 messages):

  • Tiny Progress Update: One member inquired about progress, to which another confirmed substantial progress made two days ago.
  • Contribution Milestone: A different member shared their enthusiasm for making their first commit to the project and expressed joy when it was successfully committed.

tinygrad (George Hotz) ▷ #learn-tinygrad (13 messages🔥):

  • Clarification on Blobfile’s Importance: The utility of blobfile in examples/llama.py was questioned. It’s clarified that load_tiktoken_bpe depends on blobfile.

  • Forward Pass Compute Graph Troubles: A member had an issue with generating the forward pass compute graph for a simple neural network. They were advised to ensure computation by uncommenting out.item() or using out.realize() and also to resolve a NameError by installing necessary libraries.

  • Networkx Installed but pydot Missing: The aforementioned error persisted despite having networkx installed, and was eventually resolved by installing pydot.

  • Graphviz Installation Resolves dot Command Error: After implementing the solution to install pydot, a new error about a missing dot command was encountered and solved by installing graphviz.

  • Suggestion to Update Documentation: A member suggested updating the documentation to include a hint that installing graphviz can resolve the sh: dot: command not found error.


AI21 Labs (Jamba) ▷ #announcements (1 messages):

  • Jamba-Instruct Takes Center Stage: AI21 Labs announced the launch of Jamba-Instruct, an instruction-tuned version of their hybrid SSM-Transformer Jamba model. They invite feedback and express willingness to accommodate use cases requiring more than the initial 256K context window.

  • Read All About Jamba-Instruct: For an in-depth understanding, AI21 Labs encourages reading the Jamba-Instruct blog post at AI21’s Blog, which details how Jamba-Instruct excels in quality and performance for commercial applications.

Link mentioned: Built for the Enterprise: Introducing AI21’s Jamba-Instruct Model: An instruction-tuned version of our hybrid SSM-Transformer Jamba model, Jamba-Instruct is built for reliable commercial use, with best-in-class quality and performance.


AI21 Labs (Jamba) ▷ #jamba (4 messages):

  • Jamba-Instruct Unveiled: AI21 Labs announced the launch of Jamba-Instruct, shared via a Twitter post.
  • Exploring Larger Context Windows: In response to an inquiry about context windows larger than 256k, an AI21 Labs staff member expressed willingness to explore much higher context windows and invited the member to discuss use cases in a direct message.

Alignment Lab AI ▷ #general-chat (2 messages):

  • Warm Greetings: A member greeted the community with a simple “Hello”.
  • Compute Grants Available: For those seeking fast compute grants, a member shared a link to a Twitter post from @PrimeIntellect: Fast Compute Grants Tweet.

DiscoResearch ▷ #general (2 messages):

  • LLaMA Quantization Quandary: A Discord member highlighted a Reddit thread discussing the impact of quantization on LLaMA 3’s quality compared to LLaMA 2. They linked to an arXiv paper detailing the performance degradation with low-bit quantization, raising questions about post-training quantization methods.
  • Quantization Loses Details: A member expressed that the significant quantization of Meta’s LLaMA which ignores the chinchilla scaling law and uses 15T tokens, could be the reason for major information loss, affecting performance. This suggests a greater risk of degradation with enhanced precision reduction in larger models.

Links mentioned:


Skunkworks AI ▷ #off-topic (1 messages):

  • Fast Compute Grants for Skunkworks Projects: A member mentioned they are eager to fund some exciting Skunkworks projects and provided a twitter link for details. If you’re looking for fast compute grants, this could be an opportunity.

Datasette - LLM (@SimonW) ▷ #llm (1 messages):

  • Digital Housekeeping Woes: A member expressed the need for an LLM that could assist with cleaning up the scattered 7B localmodels taking up space across various directories on their hard drive. The frustration stemmed from numerous apps and libraries contributing to the disarray.