> AI News for 4/24/2024-4/25/2024. We checked 7 subreddits and [**373** Twitters](https://twitter.com/i/lists/1585430245762441216) and **27** Discords (**395** channels, and **5506** messages) for you. Estimated reading time saved (at 200wpm): **631 minutes**.

This one takes a bit of parsing but is a very laudable effort from Snowflake, which til date has been fairly quiet in the modern AI wave. Snowflake Arctic is notable for a few reasons, but probably not the confusing/unrelatable chart they chose to feature above the fold:

image.png

“Enterprise Intelligence” one could warm to, esp if it explains why they have chosen to do better on some domains than others:

image.png

What this chart really shows in not very subtle ways is that Snowflake is basically claiming to have built an LLM that is better in almost every way to Databricks, their main rival in the data warehouse wars. (This has got to smell offensive to Jon Frankle and his merry band of Mosaics?)

Downstream users don’t care that much about training efficiency, but the other thing that should catch your eye is the model architecture - taking the right cue from DeepSeekMOE and DeepSpeedMOE) with more experts = better:

image.png

No mention is made of the “shared expert” trick that DeepSeek used.

Finally there’s mention of a 3 stage curriculum:

image.png

which echoes a similar strategy seen in the recent Phi-3 paper:

image.png

Finally, the model is released as Apache 2.0.

Honestly a great release, with perhaps the only poor decision being that the Snowflake Arctic cookbook is being published on Medium dot com.


Table of Contents

[TOC]


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Image/Video Generation

Other Image/Video AI

Language Models and Chatbots

AI Hardware and Infrastructure

AI Ethics and Societal Impact

Humor/Memes


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

OpenAI and NVIDIA Partnership

  • NVIDIA DGX H200 delivered to OpenAI: @gdb noted NVIDIA hand-delivered the first DGX H200 in the world to OpenAI, dedicated by Jensen Huang “to advance AI, computing, and humanity”. @rohanpaul_ai highlighted the DGX GH200 features like 256 H100 GPUs, 1.3TB GPU memory, and 8PB/s interconnect bandwidth.
  • OpenAI and Moderna partnership: @gdb also mentioned a partnership between OpenAI and Moderna to use AI for accelerating drug discovery and development.

Llama 3 and Phi 3 Models

  • Llama 3 models: @winglian has extended the context length of Llama 3 8B to 96k using PoSE and RoPE theta adjustments. @erhartford released Dolphin-2.9-Llama3-70b, a fine-tuned version of Llama 3 70B created in collaboration with others. @danielhanchen noted Llama-3 70b QLoRA finetuning is 1.83x faster & uses 63% less VRAM than HF+FA21, and Llama-3 8b QLoRA fits in an 8GB card.
  • Phi 3 models: @rasbt shared details on Apple’s OpenELM paper, introducing the Phi 3 model family in 4 sizes (270M to 3B). Key architecture changes include a layer-wise scaling strategy adopted from the DeLighT paper. Experiments showed no noticeable difference between LoRA and DoRA for parameter-efficient finetuning.

Snowflake Arctic Model

  • Snowflake releases open-source LLM: @RamaswmySridhar announced Snowflake Arctic, a 480B Dense-MoE model designed for enterprise AI. It combines a 10B dense transformer with a 128x3.66B MoE MLP. @omarsar0 noted it claims to use 17x less compute than Llama 3 70B while achieving similar enterprise metrics like coding, SQL, and instruction following.

Retrieval Augmented Generation (RAG) and Long Context

  • Retrieval heads in LLMs: @Francis_YAO_ discovered retrieval heads, a special type of attention head responsible for long-context factuality in LLMs. These heads are universal, sparse, causal, and significantly influence chain-of-thought reasoning. Masking them out makes the model “blind” to important previous information.
  • XC-Cache for efficient LLM inference: @_akhaliq shared a paper on XC-Cache, which caches context for efficient decoder-only LLM generation instead of just-in-time processing. It shows promising speedups and memory savings.
  • RAG hallucination testing: @LangChainAI demonstrated how to use LangSmith to evaluate RAG pipelines and test for hallucination by checking outputs against retrieved documents.

AI Development Tools and Applications

  • CopilotKit for integrating AI: @svpino highlighted CopilotKit, an open-source library that makes integrating AI into applications extremely easy, allowing you to bring LangChain agents into your app, build chatbots, and create RAG workflows.
  • Llama Index for LLM UX: @llama_index showed how to build a UX for your LLM chatbot/agent with expandable sources and citations using create-llama.

Industry News

  • Meta’s AI investments: @bindureddy noted Meta’s weak Q2 forecast and plans to spend billions on AI, seeing it as a sound strategy. @nearcyan joked that Meta’s $36B revenue just gets poured into GPUs now.
  • Apple’s AI announcements: @fchollet shared a Keras starter notebook for Apple’s Automated Essay Scoring competition on Kaggle. @_akhaliq covered Apple’s CatLIP paper on CLIP-level visual recognition with faster pre-training on web-scale image-text data.

AI Discord Recap

A summary of Summaries of Summaries

1. Llama 3 and Phi-3 Releases Spark Excitement and Comparisons: The release of Meta’s Llama 3 (8B and 70B variants) and Microsoft’s Phi-3 models generated significant buzz, with discussions comparing their performance, architectures like RoPE, and capabilities like Phi-3’s function_call tokens. Llama 3’s impressive scores on benchmarks like MMLU and Human Eval were highlighted.

2. Advancements in RAG Frameworks and Multimodal Models: Improvements to Retrieval-Augmented Generation (RAG) frameworks using LangChain’s LangGraph were discussed, featuring techniques like Adaptive Routing and Corrective Fallback. The release of Apple’s OpenELM-270M and interest in models like moondream for multimodal tasks were also covered.

3. Open-Source Tooling and Model Deployment: The open-sourcing of Cohere’s Toolkit for building RAG applications was welcomed, while Datasette’s LLM Python API usage for text embedding was explored. Discussions on batching prompts efficiently involved tools like vLLM, TGI, and llm-swarm.

4. Specialized Models and Niche Applications: The medical Internist.ai 7b model’s impressive performance, even surpassing GPT-4 in evaluations, generated excitement. Unique projects like the AI-powered text RPG Brewed Rebellion and the 01 project for embedding AI into devices were also showcased.


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

  • Fine-Tuning Fixes and Future Projections: Unsloth Pro is developing multi-GPU support, expected to launch an open-source version by May. Meanwhile, PR #377 on Unsloth’s GitHub proposes to address vocabulary resizing issues in models and is eagerly awaited by the community.

  • LlaMA Runs Better Unencumbered: Removing extraneous tokens from prompts clarified a path to improving LLaMA models’ finetuning, notably dropping loss from 5 to a more temperate range in the initial epoch. The ideal setup for finetuning the 70B parameter LLaMA-3 model involves at least 48GB GPU VRAM, though richer resources are recommended.

  • Optimizing AI’s Brainpower: Debates on the optimal reranker settled on ms Marco l6 V2 for a 15x speed boost over BGE-m3, while pgvector in PostgreSQL sidesteps the need for external APIs. On the hardware front, new acquisitions are empowering projects such as fine-tuning for large RAG datasets and exploring innovative unsupervised in-context learning.

  • Meta’s Might in the Model Marathon: Meta’s introduction of the LlaMA-3 series has stirred discussions with its 8B and teased 400B models aligning to challenge GPT-4 benchmarks. Open source AI is gaining momentum with the release of LlaMA-3 and Phi-3, targeting similar objectives with distinct strategies, detailed in a shared Substack piece.

  • Technical Tidbits for Training: Tips include utilizing Colab notebooks to navigate fine-tuning glitches, harnessing GPT3.5 or GPT4 for crafting multiple-choice questions, and finetuning continuation on Kaggle. Approaches to sparsify embedding matrices and dynamically adjust context length have been tossed around, with interest in a possible warning system for model-size to GPU-fit mismatches.


LM Studio Discord

Phi and TinyLLamas Take the Spotlight: Members have been experimenting with phi-3 in LM Studio, using models like PrunaAI/Phi-3-mini-128k-instruct-GGUF-Imatrix-smashed to navigate quantization differences, with Q4 outshining Q2 in text generation. Meanwhile, a suite of TinyLlamas models has garnered attention on Hugging Face, presenting opportunities to play with Mini-MOE models of 1B to 2B range, and the community is abuzz with the rollout of Apple’s OpenELM, despite its token limitations.

Navigating GPU Waters: GPU topics were center stage, with discussions on VRAM-intensive phi-3-mini-128k models, strategies for avoiding errors like “(Exit code: 42)” by upgrading to LM Studio v0.2.21, and addressing GPU offload errors. Further, technical advice flowed freely, recommending Nvidia GPUs for AI applications despite some members’ qualms with the brand, and a nod towards 32GB RAM upgrades for robust LLM experimentation.

Tech Tangles in ROCm Realm: The AMD and NVIDIA mixed GPU environment provoked errors with ROCm installs, with temp fixes including removing NVIDIA drivers. However, heyitsyorkie underscored that ROCm within LM Studio is still in tech preview, signaling expected bumps. Wisdom from the community suggested solutions like driver updates, for instance, Adrenalin 24.3.1 for an rx7600, to iron out compatibility and performance concerns.

Mac Mileage Varies for LLMs: Mac users chimed in, suggesting that a minimum of 16GB RAM is ideal for running LLMs smoothly, although the M1 chip on an 8GB RAM setup can handle smaller models if not overloaded with parallel tasks.

Local Server Lore: Strategy-sharing for accessing LM Studio’s local servers highlighted the use of Mashnet for remote operations and the potential role of Cloudflare in facilitating connections, updating the tried-and-true “localhost:port” setup.


Perplexity AI Discord

  • Enigmagi Secures Hefty Investment: Enigmagi celebrated raising $62.7 million in funding, hitting a $1.04 billion valuation with heavy hitters like NVIDIA and Jeff Bezos on board, while also launching a Pro service available for iOS users with existing Pro-tier subscriptions.

  • Perplexity Pro Users Debating Value: Some users are skeptical about the advantages of Enterprise Pro over Regular Pro on Perplexity, particularly over performance differences, although frustration also bubbles regarding the 50 daily usage limit for Opus searches.

  • Voices for Change on Perplexity Platform: Users showed interest in adjustments like temperature settings for better creative outputs, and while some discussed new voice features, others wished for more, like an iOS watch widget. At the same time, issues with Pro Support’s accessibility were brought to light, indicating potential areas for user experience improvement.

  • API Blues and Groq’s Potential: Within the pplx-api discourse, the community learned that image uploading won’t be part of the API, while for coding assistance, llama-3-70b instruct and mixtral-8x22b-instruct are recommended. Meanwhile, GPT-4 is not yet integrated, with current model details found in the documentation.

  • Content Conversations Across Channels: Various searches on Perplexity AI tackled topics from overcoming language barriers to systems thinking. One analysis provided perspective on translation challenges, while links like Once I gladly inferred discussions on temporal shifts in happiness, and Shift No More addressed the inevitability of change.


Nous Research AI Discord

Bold RoPE Discussions: The community debated the capabilities of Rotary Position Embedding (RoPE) in models like Meta’s Llama 3, including its effectiveness in fine-tuning versus pretraining and misconceptions about its ability to generalize in longer contexts. The paper on “Scaling Laws of RoPE-based Extrapolation” (arXiv:2310.05209) sparked conversations on scaling RoPE and the challenges of avoiding catastrophic forgetting with increased RoPE base.

AutoCompressors Enter the Ring: A new preprint on AutoCompressors presented a way for transformers to manage up to 30,720 tokens and improve perplexity (arXiv:2305.14788). Jeremy Howard’s thoughts on Llama 3 and its finetuning strategies echoed through the guild (Answer.AI post), and a Twitter thread unveiled its successful context extension to 96k using advanced methods (Twitter Thread).

LLM Education and Holographic Apple Leans Out: The guild discussed a game aimed at instructing about LLM prompt injections (Discord Invite). In hardware inklings, Apple reportedly reduced its Vision Pro shipments by 50% and is reassessing their headset strategy, sparking speculation about the 2025 lineup (Tweet by @SawyerMerritt).

Snowflake’s Hybrid Model and Model Conversation: Snowflake Arctic 480B’s launch of a unique Dense + Hybrid model led to analytical banter over its architecture choices, with a nod to its attention sinks designed for context scaling. Meanwhile, GPT-3 dynamics under discussion led to skepticism regarding whether it actually runs OpenAI’s Rabbit R1.

Pydantic Models for Credible Citations: Pydantic models garnished with validators were touted as a way to ensure proper citations in LLM contexts; the discussion referenced several GitHub repositories (GitHub - argilla-io/distilabel) and tools like lm-format-enforcer for maintaining credible responses.

Stream Crafting with WorldSim: Guildmates swapped experiences with WorldSim and suggested the potential for Twitch streaming shared world simulations. They also shared a custom character tree (Twitter post) and conversed about the application of category theory involving types ontology and morphisms (Tai-Danae Bradley’s work).


CUDA MODE Discord

PyTorch 2.3: Triton and Tensor Parallelism Take Center Stage: PyTorch 2.3 enhances support for user-defined Triton kernels and improves Tensor Parallelism for training Large Language Models (LLMs) up to 100 billion parameters, all validated by 426 contributors (PyTorch 2.3 Release Notes).

Pre-Throttle GPU Ponders During Power Plays: Engaging discussions occurred around GPU power-throttling architectures like those of A100 and H100, with anticipations around the B100’s design possibly affecting computational efficiency and power dynamics.

CUDA Dwellers Uncover Room for Kernel Refinements: Members shared strategies for optimizing CUDA kernels, including the avoidance of atomicAdd and capitalizing on warp execution advancements post-Volta, which allow threads in a warp to execute diverse instructions.

Accelerated Plenoxels Poses as a CUDA-Sharpened NeRF: Enthusiasm was directed towards Plenoxels for its efficient CUDA implementation of NeRF, as well as expressions of interest in GPU-accelerated SLAM techniques and optimization for kernels targeting attention mechanisms in deep learning models.

PyTorch CUDA Strides, Flash-Attention Quirks, and Memory Management: Source code indicating a memory-efficient handling of tensor multiplications touched upon similarity with COO matrix representation. It also highlighted a potential issue regarding Triton kernel crashes when trying to access expanded tensor indices outside their original range.


Eleuther Discord

Pre-LayerNorm Debate: An engineer highlighted an analysis that pre-layernorm may hinder the deletion of information in a residual stream, possibly leading to norm increases with successive layers.

Tokenizer Version Tussle: Changes between Huggingface tokenizer versions 0.13 and 0.14 are causing inconsistencies, resulting in a token misalignment during model inference, raising concern among members working on NeoX.

Poetry’s Packaging Conundrum: After a failed attempt to utilize Poetry for package management in NeoX development due to its troublesome binary and version management, the member decided it was too complex to implement.

Chinchilla’s Confidence Quandary: A community member questioned the accuracy of the confidence interval in the Chinchilla paper, suspecting an oversampling of small transformers and debating the correct cutoff for stable estimates.

Mega Recommender Revelations: Facebook has published about a 1.5 trillion parameter HSTU-based Generative Recommender system, which members highlighted for its performance improvement by 12.4% and potential implications. Here is the paper.

Penzai’s Puzzling Practices: Users find penzai’s usage non-intuitive, sharing workarounds and practical examples for working with named tensors. Discussion includes using untag+tag methods and the function pz.nx.nmap for tag manipulation.

Evaluating Large Models: A user working on a custom task reported high perplexity and is seeking advice on the CrossEntropyLoss implementation, while another discussion arose over the num_fewshot settings for benchmarks to match the Hugging Face leaderboard.


Stability.ai (Stable Diffusion) Discord

  • RealVis V4.0 Wins Over Juggernaut: Engineers discussed their preference for RealVis V4.0 for faster and more satisfactory image prompt generation over the Juggernaut model, indicating that performance still trumps brand new models.

  • Stable Diffusion 3.0 API Usage Concerns: There was noticeable anticipation for Stable Diffusion 3.0, but some disappointment was voiced upon learning that the new API is not free and only offers limited trial credits.

  • Craiyon, a Tool for the AI Novice: For newcomers requiring assistance with image generation, community veterans recommended Craiyon as a user-friendly alternative to the more complex Stable Diffusion tools that necessitate local installations.

  • AI Model Tuning Challenges Tackled: Conversations spanned from generating specific image prompts to cloud computing resources like vast.ai, handling AI video creation, and fine-tuning issues, with discussions providing insights into training LoRas and adhering to Steam regulations.

  • Exploring Independent AI Ventures: The guild was abuzz with members sharing various AI-based independent projects, like webcomic generation available at artale.io and royalty-free sound designs at adorno.ai.


OpenRouter (Alex Atallah) Discord

Mixtral 8x7b Blank Response Crisis: The Mixtral 8x7b service experienced an issue with blank responses, leading to the temporary removal of a major provider and planning for future auto-detection capabilities.

Model Supremacy Debates Rage On: In discussions, members compared smaller AI models like Phi-3 to larger ones such as Wizard LM and reported that FireFunction from Fireworks (Using function calling models) might be a better alternative due to OpenRouter’s challenges in function calling and adhering to ‘stop’ parameters.

Time-Outs in The Stream: Various users reported an overflow of “OPENROUTER PROCESSING” notifications designed to maintain active connections, alongside issues with completion requests timing out with OpenAI’s GPT-3.5 Turbo on OpenRouter.

The Quest for Localized AI Business Expansion: A member’s search for direct contact information signaled an interest in establishing closer business connections for AI models in China.

Language Barriers in AI Discussions: AI Engineers compared language handling across AI models such as GPT-4, Claude 3 Opus, and L3 70B, noting particularly that GPT-4’s performance in Russian left something to be desired.


HuggingFace Discord

Llama 3 Leapfrogs into Lead: The new Llama 3 language model has been introduced, trained on a whopping 15T tokens and fine-tuned with 10M human annotated samples. It offers 8B and 70B variants, scoring over 80 on the MMLU benchmark and showcasing impressive coding capabilities with a Human Eval score of 62.2 for the 8B model and 81.7 for the 70B model; find out more through Demo and Blogpost.

Phi-3: Mobile Model Marvel: Microsoft’s Phi-3 Instruct model variants gain attention for their compact size (4k and 128k contexts) and their superior performance over other models such as Mistral 7B and Llama 3 8B Instruct on standard benchmarks. Notably designed for mobile use, Phi-3 features ‘function_call’ tokens and demonstrates advanced capabilities; learn more and test them out via Demo and AutoTrain Finetuning.

OpenELM-270M and RAG Refreshment: Apple’s OpenELM-270M model is making a splash on HuggingFace, along with advancements in the Retrieval-Augmented Generation (RAG) framework, which now includes Adaptive Routing and Corrective Fallback features using Langchain’s LangGraph. These and other conversations signify continued innovation in the AI space; details on RAG enhancements are found here, and Apple’s OpenELM-270M is available here.

Batching Discussions Heat Up: The necessity for efficient batching during model inference spurred interest among the community. Aphrodite, tgi, and other libraries are recommended for superior batching speeds, with reports of success using arrays for concurrent prompt processing, suggesting arrays could be used like prompt = ["prompt1", "prompt2"].

Trouble with Virtual Environments: A member’s challenges with setting up Python virtual environments on Windows sparked discussions and advice. The recommended commands for Windows are python3 -m venv venv followed by venv\Scripts\activate, with the suggestion to try WSL for improved performance.


LlamaIndex Discord

Trees of Thought: The development of LLMs with tree search planning capabilities could bring significant advancements to agentic systems, as disclosed in a tweet by LlamaIndex. This marks a leap from sequential state planning, suggesting potential strides in AI decision-making models.

Watching Knowledge Dance: A new dynamic knowledge graph tool developed using the Vercel AI SDK can stream updates and was demonstrated by a post that can be seen on the official Twitter. This visual technology could be a game-changer for real-time data representation.

Hello, Seoul: The introduction of the LlamaIndex Korean Community is expected to foster knowledge sharing and collaborations within the Korean tech scene, as announced in a tweet.

Boosting Chatbot Interactivity: Enhancements to chatbot User Interfaces using create-llama have emerged, allowing for expanded source information components and promising a more intuitive chat experience, with credits to @MarcusSchiesser and mentioned in a tweet.

Embeddings Made Easy: A complete tutorial on constructing a high-quality RAG application combining LlamaParse, JinaAI_ embeddings, and Mixtral 8x7b is now available and can be accessed through LlamaIndex’s Twitter feed. This guide could be key for engineers looking to parse, encode, and store embeddings effectively.

Advanced RAG Rigor: In-depth learning is needed for configuring advanced RAG pipelines, with suggestions like sentence-window retrieval and auto-merging retrieval being considered for tackling complex question structures, as pointed out with an instructional resource.

VectorStoreIndex Conundrum: Confusion about embeddings and LLM model selection for a VectorStoreIndex was clarified; gpt-3.5-turbo and text-embedding-ada-002 are the defaults unless overridden in Settings, as stated in various discussions.

Pydantic Puzzles: Integration of Pydantic with LlamaIndex encountered hurdles with structuring outputs and Pyright’s dissatisfaction with dynamic imports. The discussions haven’t concluded with an alternative to # type:ignore yet.

Request for Enhanced Docs: Requests were made for more transparent documentation on setting up advanced RAG pipelines and configuring LLMs like GPT-4 in LlamaIndex, with a reference made to altering global settings or passing custom models directly to the query engine.


OpenAI Discord

AI Hunts for True Understanding: A debate centered on whether AI can achieve true understanding, with the Turing completeness of autoregressive models like Transformers being a key point. The confluence of logic’s syntax and semantics was considered as potential enabler for meaning-driven operations by the model.

From Syntax to Semantics: Conversations revolved around the evolution of language in the AI landscape, forecasting the emergence of new concepts to improve clarity for future communication. The limitations of language’s lossy nature on accurately expressing ideas were also highlighted.

Apple’s Pivot to Open Source?: Excitement and speculation surrounded Apple’s OpenELM, an efficient, open-source language model introduced by Apple, stirring discussions on the potential impact on the company’s traditionally proprietary approach to AI technology and the broader trend towards openness.

Communication, Meet AI: Members highlighted the importance of effective flow control in AI-mediated communication, exploring technologies like voice-to-text and custom wake words. Discussing the interplay between AI and communication highlighted the need for mechanisms for interruption and recovery in virtual assistant interactions.

RPG Gaming with an AI Twist: The AI-powered text RPG Brewed Rebellion was shared, illustrating the growing trend of integrating AI into interactive gaming experiences, particularly in narrative scenarios like navigating internal politics within a corporation.

Engineering Better AI Behavior: Engineers shared tips on prompt crafting, emphasizing the use of positive examples for better results and pointing out that negative instructions often fail to rein in creative outputs from AI like GPT.

AI Coding Challenges in Gaming and Beyond: Challenges abound when prompting GPT for language-specific coding assistance, as raised by an engineer working on SQF language for Arma 3. Issues such as the model’s pretraining biases and limited context space were discussed, sparking recommendations for alternative models or toolchains.

Dynamic AI Updates and Capabilities: Queries on AI updates and capabilities surfaced, including how to create a GPT expert in Apple Playgrounds and whether new GPT versions could rival the likes of Claude 3. Additionally, the utility of GPT’s built-in browser versus dedicated options like Perplexity AI Pro and You Pro was contrasted, and anticipation for models with larger context windows was noted.


LAION Discord

  • AI Big Leagues - Model Scorecard Insights: The general channel had a lively debate over an array of AI models, with Llama 3 8B likened to GPT-4. Privacy concerns were raised, implying the ‘end of anonymous cloud usage’ due to new U.S. “Know Your Customer” regulations, and there were calls to scrutinize AI image model leaderboards.

  • Privacy at Risk - Cloud Regulations Spark Debate: Proposed U.S. regulations are causing unrest among members about the future of anonymity in cloud services. The credibility of TorrentFreak as a news source was defended following an article it published on cloud service provider regulations.

  • Cutting Edge or Over the Edge - AI Image Models Scrutinized: Discussions questioned the accuracy of AI image model leaderboards, suggesting the possible manipulation of results and adversarial interference.

  • Art Over Exactness? The AI Image Preference Puzzle: Aesthetic appeal versus prompt fidelity was the center of discussions around generative AI outputs, with contrasting preferences revealing the subjective nature of AI-produced imagery’s value.

  • Faster, Leaner, Smarter: Accelerating AI with New Research: Recent discussions in the research channel highlighted MH-MoE, a method improving context understanding in Sparse Mixtures of Experts (SMoE), and a weakly supervised pre-training technique that outpaces traditional contrastive learning by 2.7 times without undermining the quality of vision tasks.


OpenAccess AI Collective (axolotl) Discord

Bold Llama Ascends New Heights: Discussions captivated participants as Llama-3 has the potential to scale up to a colossal 128k size, with the blend of Tuning and augmented training. Interest also percolates around Llama 3’s pretrained learning rate, speculating an infinite LR schedule might be in the works to accompany upcoming model variants.

Snowflake’s New Release Causes Flurry of Excitement: The Snowflake 408B Dense + Hybrid MoE model made waves, flaunting a 4K context window and Apache 2.0 licensing. This generated animated conversations on its intrinsic capabilities and how it could synergize with Deepspeed.

Medical AI Takes A Healthy Leap Forward: The Internist.ai 7b model, meticulously designed by medical professionals, reportedly outshines GPT-3.5, even scoring well on the USMLE examination. It spurs on the conversation about the promise of specialized AI models, captivated by its performance and the audacious idea that it outperforms numerous other 7b models.

Crosshairs on Dataset and Model Training Tangles: Technical discussions dove into the practicalities of Hugging Face datasets, optimizing data usage, and the compatible interplay between optimizers and Fully Sharded Data Parallel (FSDP) setups. On the same thread, members experienced turbulence with fsdp when it comes to dequantization and full fine tunes, indicative of deeper compatibility and system issues.

ChatML’s New Line Quirk Raises Eyebrows: Participants identified a glitch in ChatML and possibly FastChat concerning erratic new line and space insertion. The issue throws a spotlight on the importance of refined token configurations, as it could skew training outcomes for AI models.


tinygrad (George Hotz) Discord

  • Tinygrad Tackles Facial Recognition Privacy: The possibility of porting Fawkes, a privacy tool designed to thwart facial recognition systems, to tinygrad was explored. George Hotz suggested that strategic partnerships are crucial for the success of tinygrad, highlighting the collaboration with comma on hardware for tinybox as an exemplar.

  • Linkup Riser Rebellion and Cool Solutions: There’s a notable struggle with PCIE 5.0 LINKUP risers causing errors, with some engineers suggesting to explore mcio or custom C-Payne PCBs. Additionally, one member reported a venture into water cooling, facing compatibility issues with NVLink adapters.

  • In Pursuit of Tinygrad Documentation: A gap has been flagged regarding normative documentation for tinygrad, contributing to the demand for a clear description of the behaviors of tinygrad operations. This included a conversation on the need for a tensor sorting function, and an intervention with a custom 1D bitonic merge sort function for lengths as powers of two.

  • GPU Colab’s Appetite for Tutorials: George Hotz shared an MNIST tutorial targeting GPU colab users, intended as a resource to help more users harness the potential of tinygrad.

  • Sorting, Looping, and Crashing Kernel Confab: AI engineers grappled with various aspect of tinygrad and CUDA, from the complexities of creating a torch.quantile equivalent to unveiling the architectural nuances of tensor cores, like m16n8k16, and the enigmatic crashes that defy isolation. Discussion of WMMA thread capacity revealed that a thread might hold up to 128 bits per input.


Modular (Mojo 🔥) Discord

Bold Moves in Benchmarking: The engineering community awaits Mojo’s performance benchmarks, comparing its prowess against languages like Rust and Python amidst skepticism from Rust enthusiasts. Lobsters carries a heated debate on Mojo’s claims of being safer and faster, which is central to Mojo’s narrative in tech circles.

Quantum Conundrums and ML Solutions: Quantum computing discussions touched on the nuances of quantum randomness with mentions of the Many-Worlds and Copenhagen interpretations. There’s a buzz about harnessing geometric principles and ML in quantum algorithms to handle qubit complexity and improve calculation efficiency.

Patching Up Mojo Nightly Builds: The Mojo community logs a null string bug in GitHub (#239two) and enjoys a fresh nightly compiler release with improved overloading for function arguments. Simultaneously, SIMD’s adaptation to EqualityComparable reveals both pros and cons, sparking a search for more efficient stdlib types.

Securing Software Supply Chains: Modular’s blogspot highlights the security protocols in place for Mojo’s safe software delivery in light of the XZ supply chain attack. With secure transport and signing systems like SSL/TLS and GPG, Modular puts a firm foot forward in protecting its evolving software ecosystem.

Discord Community Foresees Swag and Syntax Swaps: Mojo’s developer community enjoys a light-hearted suggestion for naming variables and anticipates future official swag; meanwhile, API development sparks discussions on performance and memory management. The MAX engine query redirects to specific channels, ensuring streamlined communication.


Latent Space Discord

A New Angle on Transformers: Engineers discussed enhancing transformer models by incorporating inputs from intermediate attention layers, paralleling the Pyramid network approach in CNN architectures. This tactic could potentially lead to improvements in context-aware processing and information extraction.

Ethical Tussle over ‘TherapistAI’: Controversy arose over leveIsio’s TherapistAI, with debates highlighting concerns about AI posing as a replacement for human therapists. This sparked discussions on responsible representations of AI capabilities and ethical implications.

Search for Semantic Search APIs: Participants reviewed several semantic search APIs; however, options like Omnisearch.ai fell short in web news scanning effectiveness compared to traditional tools like newsapi.org. This points to a gap in the current offerings of semantic search solutions.

France Bets on AI in Governance: Talks revolved around France’s experimental integration of Large Language Models (LLMs) into its public sector, noting the country’s forward-looking stance. Discussions also touched upon broader themes such as interaction of technology with the sociopolitical landscape.

Venturing Through Possible AI Winters: Members debated the sustainability of AI venture funding, spurred by a tweet concerning the ramifications of a bursting AI bubble. The conversations involved speculations on the impact of economic changes on AI research and venture prospects.


LangChain AI Discord

LangChain AI Fires Up Chatbot Quest: Discussions centered around utilizing pgvector stores with LangChain for enhancing chatbot performance, including step-by-step guidance and specific methods like max_marginal_relevance_search_by_vector. Members also fleshed out the mechanics behind SelfQueryRetriever and strategized on building conversational AI graphs with methods like createStuffDocumentsChain. The LangChain GitHub repository is pointed out as a resource along with the official LangChain documentation.

Template Woes for the Newly Hatched LLaMA-3: One member sought advice on prompt templates for LLaMA-3, citing gaps in the official documentation, reflecting the collective effort to catch up with the latest model releases.

Sharing AI Narratives and Tools: The community showcased several projects: the adaptation of RAG frameworks using LangChain’s LangGraph, an article of which is available on Medium; a union-centric, text-based RPG “Brewed Rebellion,” playable here; “Collate”, a service for transforming saved articles into digest newsletters available at collate.one; and BlogIQ, a content creation helper for bloggers found on GitHub.

Training Day: Embeddings Faceoff**: AI practitioners looking to sharpen their knowledge on embedding models could turn to an educational YouTube video shared by a member, aimed at demystifying the best tools in the trade.


Cohere Discord

  • Toolkit Teardown and Praise: Cohere’s Toolkit went open-source, exciting users with its ability to add custom data sources and deploy to multiple cloud platforms, while the GitHub repository was commended for facilitating the rapid deployment of RAG applications.

  • Troubleshooting Takes Center Stage: A member encountered issues while working with Cohere Toolkit on Docker for Mac; meanwhile, concerns about using the Cohere API key on Azure were alleviated with clarification that the key is optional, ensuring privacy.

  • API Anomaly Alert: Disparities between API and playground results when implementing site connector grounding in code were reported, posing a challenge that even subsequent corrections couldn’t fully resolve.

  • Acknowledging Open Source Champions: Gratitude was directed towards Cohere cofounder and key contributors for their dedicated effort launching the open-source toolkit, highlighting its potential benefit to the community.

  • Cohere Critique Critic Criticized: A debate was sparked over an article allegedly critical of Cohere, focusing on the introduction of a jailbreak to Cohere’s LLM that might enable malicious D.A.N-agents, though detractors of the article were unable to cite specifics to bolster their perspective.


OpenInterpreter Discord

  • Top Picks in AI Interpretation: The Wizard 2 8X22b and gpt 4 turbo models have been recognized as high performers in the OpenInterpreter project for their adeptness at interpreting system messages and calling functions. However, reports of erratic behavior in models like llama 3 have raised concerns among users.

  • A Patch for Local Execution: User experiences indicate confusion during local execution of models with OpenInterpreter, with a suggested solution involving the use of the --no-llm_supports_functions flag to resolve specific errors.

  • UI Goes Beyond the Basics: Conversations have emerged around developing user interfaces for AI devices, with engineers exploring options beyond tkinter for compatibility with future microcontroller integrations.

  • Vision Models on the Spotlight: The sharing of GitHub repositories and academic papers has spurred discussions on computer vision models, with a particular focus on moondream for its lightweight architecture and the adaptability of llama3 to various quantization settings for optimized VRAM usage.

  • 01 Project Gains Traction: Members have been engaging with the expansion of the 01 project to external devices, as evidenced by creative implementations shared online, including its integration into a spider as part of a project publicized by Grimes. Installation and execution guidance for 01 has also been addressed, with detailed instructions for Windows 11 and tips for running local models with the command poetry run 01 —local.


Interconnects (Nathan Lambert) Discord

Blind Test Ring Welcomes Phi-3-128K: Phi-3-128K has been ushered into blind testing, with strategic interaction initiations like “who are you” and mechanisms like LMSys preventing the model’s name disclosure to maintain blind test integrity.

Instruction Tuning Remains a Hotbed: Despite the rise of numerous benchmarks for assessing large language models, such as LMentry, M2C, and IFEval, the community still holds strong opinions about the lasting relevance of instruction-following evaluations, highlighted in Sebastian Ruder’s newsletter.

Open-Source Movements Spice Up AI: The open-sourcing of Cohere’s chat interface drew attention and can be found on GitHub, which led to humorous side chats including jokes about Nathan Lambert’s perceived influence in the AI space and musings over industry players’ opaque motives.

AI Pioneers Shun Corporate Jargon: The term “pick your brain” faced disdain within the community, emphasizing the discomfort of industry experts in being approached with corporate cliches during peak times of innovation.

SnailBot Notifies with Caution: The deployment of SnailBot prompted discussions around notification etiquette, while access troubles with the “Reward is Enough” publication sparked troubleshooting conversations, highlighting the necessity of hassle-free access to scientific resources.


Mozilla AI Discord

  • Mlock Malaise Strikes Llamafile Users: Engineers reported “failed to mlock” errors with the phi2 llamafile, lacking explicit solutions or workarounds to address the problem.
  • Eager Engineers Await Phi3 Llamafile Update: The community is directed to use Microsoft’s GGUF files for Phi3 llamafile utilization, with specific guidance available on Microsoft’s Hugging Face repository.
  • B64 Blunder Leaves Images Unrecognized: Encoding woes surfaced as a user’s base64 images in JSON payloads failed to be recognized by the llama model, turning the multimodal : false flag on, and no fix was provided in the discussion.
  • Mixtral Llamafile Docs Get a Facelift: Modifications to Mixtral 8x22B Instruct v0.1 llamafile documentation were implemented, accessible on its Hugging Face repository.
  • False Trojan Alert in Llamafile Downloads: Hugging Face downloads erroneously flagged by Windows Defender as a trojan led to recommendations for using a VM or whitelisting, along with the difficulties in reporting false positives to Microsoft.

DiscoResearch Discord

Batch Your Bots: Discord users investigated how to batch prompts efficiently in Local Mixtral and compared tools like vLLM and the open-sourced TGI. While some preferred using TGI as an API server for its low latency, others highlighted the high throughput and direct Python usage that comes with vLLM in local Python mode, with resources like llm-swarm suggested for scalable endpoint management.

Dive into Deutsch with DiscoLM: Interaction with DiscoLM in German sparked discussions about prompt nuances, such as using “du” versus “Sie”, and how to implement text summarization constraints like word counts. Members also reported challenges with model outputs and expressed interest in sharing quantifications for experimental models, especially in light of the high benchmarks scored by models like Phi-3 on tests like Ger-RAG-eval.

Grappled Greetings: Users debated the formality in prompting language models, acknowledging the variable impact on responses when initiating with formal or informal forms in German.

Summarization Snafus: The struggle is real when trying to cap off model-generated text at a specific word or character limit without abrupt endings. The conversation mirrored the common desire for fine-tuned control over output.

Classify with Confidence: Arousing community enthusiasm was the possibility of implementing a classification mode for live inference in models to match the praised benchmark performance.


Datasette - LLM (@SimonW) Discord

  • Cracking Open the Python API for Datasette: Engineers have been exploring the Python API documentation for Datasette’s LLM, utilizing it for embedding text files and looking for ways to expand its usage.

  • Summarization Automation with Claude: Simon Willison shared his experience using Claude alongside the LLM CLI tool to summarize Hacker News discussions, providing a workflow overview.

  • Optimizing Text Embeddings: Detailed instructions for handling multiple text embeddings efficiently via Datasette LLM’s Python API were shared, with emphasis on the embed_multi() feature as per the embedding API documentation.

  • CLI Features in Python Environments: There’s a current gap in Datasette’s LLM capability featuring direct CLI-to-Python functionality for embedding files; however, the implementation can be traced in the GitHub repository, providing a reference for engineers to conceptually transfer CLI features to Python scripts.


Skunkworks AI Discord

  • Say Hello to burnytech: A brief greeting was made by burnytech with a simple “Hi!” on the general channel.

  • Calling All AI Enthusiasts to Toronto: The Ollamas and friends group have organized an AI developer meetup in Toronto, offering networking opportunities, food, and lightning talks. Interested AI professionals and enthusiasts can register via the Toronto AI Meetup Registration Link to participate in the event hosted at the Cohere space with limited availability.


LLM Perf Enthusiasts AI Discord

  • Tweet Tease Leaves Us Guessing: Jeffery Wang shared a tweet leaving the community curious about the contents, with no further discussion occurring.

  • Inference Inconsistency Raises Questions: Members noticed that inference providers like Octo AI and Anyscale have divergent JSON mode implementations for the same open-source models, with Octo AI’s approach being schema-based and Anyscale mirroring OpenAI’s methodology.

  • Contemplating Contextual Capabilities: The community pondered the effectiveness of a tool’s use of full context, noting its superior performance to GPT without detailed explanation of the advancements.


Alignment Lab AI Discord

  • Congratulations, You’ve Been Anderson-ified!: A guild member humorously congratulated someone, presumably for a noteworthy achievement or milestone, likening them to Laurie Anderson, an avant-garde artist known for her multimedia projects and experimental music. The context or reason behind the comparison was not provided.

AI21 Labs (Jamba) Discord

  • Invasion of the Content Snatchers: A Discord user posted an inappropriate link promoting Onlyfans leaks & Daily Teen content, which violates the community guidelines. The content was tagged to alert all members and included a Discord invite link.

PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (774 messages🔥🔥🔥):

  • Troubleshooting Finetuning Problems: Users discussed issues while finetuning with special tokens and formats using LLaMA models. One particular issue was addressed by removing unnecessary tokens (<|start_header_id|>, <|end_header_id|>, <|eot_id|>) from prompts, which improved loss significantly from 5 down to a reasonable range during the first epoch of training.

  • Discussion on AI Model Sizes and Efficiency: Users conversed about using different sizes of LLaMA-3 models, particularly the 70B model, and weighed in on the appropriate batch sizes and GPUs necessary for finetuning such large models efficiently. The conversation also touched upon the balance between using QLoRa for GPU efficiency versus LoRA for better accuracy.

  • GPU Requirement Queries: There was an inquiry about the suitability of different GPUs for finetuning LLaMA-3 models. It was clarified that at least 48GB was required, although renting higher VRAM was recommended for continuous pretraining of different models.

  • Hunter x Hunter Anime Appreciation: Users expressed their opinions on the anime series “Hunter x Hunter,” debating the quality of the Chimera Ant arc and sharing their favorite moments and arcs.

  • Meta Employee Humor: A joke was made about a user potentially being a Meta employee due to their familiarity with long training times for machine learning models. This spawned a friendly quip about the salaries at Meta and a user humorously insisting they were not employed there.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (13 messages🔥):

  • Reranker Choices for Speed and Efficiency: A member highlighted ms Marco l6 V2 as their reranker of choice, finding it 15x faster than BGE-m3 with very similar results for reranking 200 embeddings.

  • PostgreSQL and pgvector for Reranking: Another snippet explained the use of PostgreSQL combined with pgvector extension, implying no need for an external API for reranking tasks.

  • Hardware Acquisitions Power Up Training: A member expressed enthusiasm about obtaining hardware suitable for fine-tuning models, which has enhanced their capabilities in RAG and prompt engineering.

  • Fine-tuned Llama for Large RAG Datasets: It was mentioned that a fine-tuned llama from Unsloth is being used to generate a substantial 180k row RAG ReAct agent training set.

  • Unsupervised In-Context Learning Discussion: A link to a YouTube video was shared, titled “No more Fine-Tuning: Unsupervised ICL+”, discussing an advanced in-context learning paradigm for Large Language Models (Watch the video).

Link mentioned: No more Fine-Tuning: Unsupervised ICL+: A new Paradigm of AI, Unsupervised In-Context Learning (ICL) of Large Language Models (LLM). Advanced In-Context Learning for new LLMs w/ 1 Mio token contex…


Unsloth AI (Daniel Han) ▷ #help (186 messages🔥🔥):

  • Unsloth Pro Mult-GPU Support Is Brewing: Unsloth Pro is currently in the works for distributing multiple GPU support, as confirmed by theyruinedelise. An open-source version with multi-GPU capabilities is expected around May, while the existing Unsloth Pro inquiries are still pending replies.

  • Tuning Advice with Experimental Models: Starsupernova advised using updated Colab notebooks for fixing generation issues after fine-tuning, as seen in the case where model outputs repeated the last token. There’s mention of “cursed model merging,” the need for model retraining after updates, and the potential use of GPT3.5 or GPT4 for generating high-quality multiple-choice questions (MCQs).

  • Dataset Challenges and Solutions: Discussions around dataset handling included issues with key errors during dataset mapping and typing errors with curly brackets; solutions involved loading datasets from Google Drive into Colab and making datasets private on Hugging Face with CLI login.

  • Colab Training Considerations on Kaggle and Local Machines: Users inquiried about resuming training from checkpoints on Kaggle due to the 12-hour limit, and starsupernova confirmed that fine-tuning can continue from the last step. There are hints from members about appropriate steps for fine-tuning, such as utilizing the save_pretrained_merged and save_pretrained_gguf functions in one script.

  • Inference and Triton Dependency Clarifications: Theyruinedelise clarified that Triton is a requirement for running Unsloth and mentioned that Unsloth might provide inference and deployment capabilities soon. There was a question about a Triton runtime error specific to SFT training, highlighting potential variability in environment setup.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (3 messages):

  • Meta Unveils LlaMA-3 and Teases 400B Model: Meta has released a new set of models called LlaMA-3, featuring an 8B parameter model that surpasses the previous 7B in the LlaMA series. Alongside the release, Meta has also teased an upcoming 400B model poised to match GPT-4 on benchmarks; access remains gated but available upon request.
  • Growth in Open Source AI: There’s excitement around the recent open source releases of LlaMA-3 and Phi-3, with an acknowledgment that both target similar goals through different approaches. The full details can be found on a shared Substack article.
  • Promotion in the Community: A message encourages sharing the LlaMA-3 update in another channel (<#1179035537529643040>), suggesting that the community would find this information valuable.

Link mentioned: AI Unplugged 8: Llama3, Phi-3, Training LLMs at Home ft DoRA.: Insights over Information


Unsloth AI (Daniel Han) ▷ #suggestions (75 messages🔥🔥):

  • PR Fix for Model Vocabulary Issue: A pull request (PR #377) has been discussed that addresses the issue of loading models with resized vocabulary. The PR aims to fix tensor shape mismatches and can be found at Unsloth Github PR #377. If merged, subsequent release of dependent training code is expected.
  • Anticipation for PR Merge: There’s a request for the merge of the aforementioned PR, with the contributor expressing eagerness. The Unsloth team has confirmed adding the PR after some minor discussion about .gitignore files impacting the GitHub page’s appearance.
  • Suggestions for Model Training Optimization: Ideas were shared about sparsifying the embedding matrix by removing unused token IDs to allow for training with larger batches, and possibly offloading embeddings to the CPU. Implementation may involve modifying the tokenizer or using sparse embedding layers.
  • Model Size Consideration with Quantization: A suggestion was made to implement a warning or auto switch to a quantised version of a model if it does not fit on the GPU, which sparked interest.
  • Dynamic Context Length Adjustment: Discussions involved the possibility of iteratively increasing available context length during model evaluation without needing reinitialization. Suggestions included using laser pruning and freezing techniques, and the mention of updating config variables for the model and tokenizer.

Link mentioned: Fix: loading models with resized vocabulary by oKatanaaa · Pull Request #377 · unslothai/unsloth: This PR is intended to address the issue of loading models with resized vocabulary in Unsloth. At the moment loading models with resized vocab fails because of tensor shapes mismatch. The fix is pl…


LM Studio ▷ #💬-general (298 messages🔥🔥):

  • Puzzled Over Potential Phi-3 Preset: A member asked about a preset for phi-3 in LM Studio, and another provided a workaround by taking the Phi 2 preset adding specific stop strings. They mentioned using PrunaAI/Phi-3-mini-128k-instruct-GGUF-Imatrix-smashed and Phi-3-mini-128k-instruct.Q8_0.gguf to achieve satisfactory results.

  • Quantized Model Quality Queries: Discussions included the varying performance of different quantization levels (Q2, Q3, Q4) for phi-3 mini models. A member reported that Q4 functioned correctly whereas Q2 failed to generate coherent text, indicating the potential impact of quantization on model quality.

  • Finding the Fit for GPUs: Users exchanged information about running LM Studio with various GPU configurations, allowing for LLM usage up to 7b + 13b models on cards like the Nvidia GTX 3060. A member also confirmed phi-3-mini-128k GGUF’s high memory requirements on VRAM.

  • Alleviating Error Exit Code 42: Users who faced the error “(Exit code: 42)” were advised to upgrade to LM Studio v0.2.21 to rectify the issue. Additional advice highlighted that the error could be linked to older GPUs not having enough VRAM.

  • Accessing Local Servers and Networks: Conversations revolved around utilizing a local server setup within LM Studio, like using NordVPN’s Mashnet to remotely access LM Studio servers from other locations by changing “localhost:port” to “serverip:port”. Users discussed ways to enable such configurations, with some suggesting the usage of Cloudflare as a proxy.

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (73 messages🔥🔥):

  • LLama-3 Herd Galore: Hugging Face now hosts a diverse collection of “TinyLlamas” on their repo, featuring Mini-MOE models ranging from 1B to 2B in different configurations. The Q8 version of these models is recommended, and users are advised to review the original model page for templates, usage, and help guidance.
  • Cozy Praise for CMDR+: Discussions revealed high satisfaction with CMDR+, with users describing it as resembling GPT-4 performance levels on high-spec Macbook Pros, potentially surpassing the likes of LLama 3 70B Q8.
  • Loading Errors and Solutions for Phi-3 128k: Users reported errors while trying to load Phi-3 128k models. The issue seems to be an unsupported architecture in the current version of llama.cpp, but information on GitHub pull reqests and issues suggest updates are on the way to address this.
  • OpenELM Intrigue and Skepticism: Apple’s new OpenELM models are a topic of curiosity, though skepticism remains due to their 2048 token limit and potential performance on different hardware setups. Users appear eager for support in llama.cpp to try out these models with LM Studio.
  • LongRoPE Piques Curiosity: Discussion about LongRoPE, a method for drastically extending the context window in language models to up to 2048k tokens, has generated interest. The significance of this development has prompted users to share the paper and express astonishment at the extended context capabilities it suggests.

Links mentioned:


LM Studio ▷ #🧠-feedback (9 messages🔥):

  • GPU Offload Issues Reported: A member noted that having GPU offload enabled by default causes errors for users without GPUs or those with low VRAM GPUs. They recommended turning it off by default and providing a First Time User Experience (FTUE) section with detailed setup instructions.

  • Troubles with GPU Acceleration Needed: Despite the GPU offload issue, another member expressed a need for GPU acceleration. They confirmed that turning off GPU offload allows the application to be used.

  • Solving GPU-Related Errors: In response to questions about errors, it was suggested to turn off GPU offload as a possible solution, linking to additional resources with the identifier <#1111440136287297637>.

  • Regression in Version 2.20 for Some Users: One user reported that after upgrading to version 2.20, they could no longer use the application, marking version 2.19 as the last operational one, even with a similar PC configuration and operating system (Linux Debian).

  • High VRAM Not Helping with Loading Model: A user with 16GB of VRAM expressed confusion over the inability to load models on the GPU, noting a 100% GPU usage rate but still facing issues since the upgrade to version 2.20.


LM Studio ▷ #🎛-hardware-discussion (112 messages🔥🔥):

  • Choosing the Right CPU and GPU for AI Tasks: A member was advised to select the best CPU they can afford, and for AI tasks, Nvidia GPUs were recommended for ease of use and compatibility should the member want to run applications like Stable Diffusion. The same member discussed their dislike for Nvidia, prompted by issues such as “melted 4090s” and driver problems.
  • RAM Upgrades for LLM Performance: Members agreed that upgrading to 32GB of RAM would be beneficial for local LLM experiments and implementations. One member shared their own successful LLM activity on a machine equipped with an AMD Ryzen 7840HS CPU and RTX 4060 GPU.
  • Power Efficiency Versus Performance in AI and Gaming Rigs: Discussions about power efficiency in builds revolved around member setups like a 5800X3D and 5700XT with 32GB of RAM, advocating for settings like Eco Mode and power limiting Nvidia GPUs to manage heat.
  • Troubleshooting Model Loading and GPU Offload Errors: Users experiencing errors such as “Failed to load model” due to insufficient VRAM were advised to turn GPU offload off or to use smaller buffer settings. Another member resolved their issue with LM Studio’s GPU usage by setting the GPU_DEVICE_ORDINAL environment variable.
  • Mac Performance for Running LLMs Locally: Members discussed the performance of Macs running LLMs, with the consensus that ideally, Macs need 16GB or more RAM for efficient operation, recognizing that the M1 chipset in an 8GB RAM configuration manages small models but without other concurrent apps.

LM Studio ▷ #langchain (1 messages):

vic49.: Yeah, dm me if you want to know how.


LM Studio ▷ #amd-rocm-tech-preview (56 messages🔥🔥):

  • Dual GPU Setup Confusion: Users with dual AMD and NVIDIA setups experienced errors when installing ROCm versions of LM Studio. A workaround involved removing NVIDIA drivers and uninstalling the device, though physical card removal was sometimes necessary.

  • Tech Preview Teething Troubles: Some users expressed frustration with installation issues, but heyitsyorkie reminded the community that LM Studio ROCm is a tech preview, and bugs are to be expected.

  • ROCm Compatibility and Usage: Users discussed which GPUs are compatible with ROCm within LM Studio. heyitsyorkie provided clarification, noting only GPUs with a checkmark under the HIPSDK are supported, with nettoneko indicating support is based on the architecture.

  • Installation Success and Error Messages: Certain users reported successful installations after driver tweaks, while others encountered persistent error messages when trying to load models. kneecutter mentioned that a configuration with RX 5700 XT appeared to run LLM models but was later identified to be on CPU, not ROCm.

  • Community Engagement and Advice: Amidst reported glitches, community members actively shared advice, with propheticus_05547 mentioning that AMD Adrenaline Edition might be needed for ROCm support. andreim suggested updating drivers for specific GPU compatibility, like Adrenalin 24.3.1 for an rx7600.


Perplexity AI ▷ #announcements (2 messages):

  • Enigmagi’s Impressive Funding Round: Enigmagi announced a successful fundraising of $62.7 million at a $1.04 billion valuation, with an investor lineup including Daniel Gross, NVIDIA, Jeff Bezos, and many others. Plans are underway to collaborate with mobile carriers like SK and Softbank, along with an imminent enterprise pro launch, to accelerate growth and distribution.

  • Pro Service Launches on iOS: The Pro service is now available to iOS users, allowing them to ask any question and receive an answer promptly. This new feature officially starts today for users with a Pro-tier subscription.


Perplexity AI ▷ #general (467 messages🔥🔥🔥):

  • Enterprise Pro vs. Regular Pro: Users questioned the benefits of Enterprise Pro over Regular Pro, with discussions focusing on whether there was any difference in performance or search quality (“I highly doubt it. But you can pay double the money for privacy!”). Concerns about Opus usage limitations remained, as users debated its 50-use per day restriction.

  • Unpacking Perplexity’s Opus Usage Cap: The community expressed frustration over the 50-use daily limit for Opus searches on Perplexity Pro. Several members speculated about the reasons for the restriction, mentioning abuse of trial periods and the resource-intensive nature of Opus.

  • Anticipation for Model Adjustments: There’s a desire for Perplexity to introduce the ability to adjust the temperature setting for Opus and Sonnet models, as it’s deemed important for creative writing use.

  • Voice Features and Tech Wishes: A couple of users discussed new voice features, including an updated UI and the addition of new voices on Perplexity Pro. Others expressed a desire for a Perplexity app for Watch OS and a voice feature iOS widget.

  • Concerns Over Customer Support: Users reported issues with the Pro Support button on Perplexity’s settings page, with one user noting it didn’t work for them despite various attempts on different accounts. There were also comments about a lack of response from the support team when contacted via email.

Links mentioned:


Perplexity AI ▷ #sharing (8 messages🔥):

  • Exploring the Language Barrier: A shared link leads to Perplexity AI’s analysis on overcoming language translation challenges.
  • Joy in the Past Tense: An intriguing moment of reflection is found at Once I gladly, examining how happiness can shift over time.
  • The Constant of Change: The topic of Shift No More brings insights into how the inevitability of change affects our worldview.
  • Tuning into ‘Mechanical Age’: A curious exploration into a song titled ‘Mechanical Age’ suggests a blend of music with the notion of technological progress.
  • Dive Into Systems Thinking: Systems thinking analysis is discussed as a comprehensive approach to understanding complex interactions within various systems.
  • Seeking Succinct Summaries: A search query points to a desire for concise summaries, possibly for efficiency in learning or decision-making, discussed on Perplexity AI.
  • The Search for Answers in Caretaking: One link is directed towards Perplexity AI’s information on using the Langlier Saturation Index for swimming pool care, despite its complexity and outdoor pool limitations.

Perplexity AI ▷ #pplx-api (14 messages🔥):

  • Image Upload Feature Not on the Roadmap: A user inquired about the possibility of uploading images via the Perplexity API, to which the response was a definitive no, and it is not planned for future roadmaps either.
  • Seeking the Best AI Coder: In the absence of ChatGPT4 on Perplexity API, a user recommended using llama-3-70b instruct or mixtral-8x22b-instruct as the best coding models available, highlighting their different context lengths.
  • Perplexity API Lacks Real-Time Data: A user integrating the API into a speech assistant reported that the API provided correct event dates but outdated event outcomes. They also inquired about document insertion for comparisons and eagerly awaited more functionalities.
  • GPT-4 Not Supported by Perplexity API: Users inquiring about GPT-4 support on Perplexity API were directed to the documentation where model details, including parameter count and context length, were listed, with the note that GPT-4 is not available.
  • Clarification on Hyperparameters for llama-3-70b-instruct: A user was seeking advice on the optimal hyperparameters for making API calls to llama-3-70b-instruct, providing a detailed Python snippet used for such calls; another user suggested trying out Groq for its free and faster inference but did not confirm if the hyperparameters inquired about were appropriate.

Link mentioned: Supported Models: no description found


Nous Research AI ▷ #ctx-length-research (11 messages🔥):

  • Clarifying RoPE and Fine-Tuning vs. Pretraining: A dialogue clarified that the paper discussing Rotary Position Embedding (RoPE) was about fine-tuning, not pretraining, which might contribute to misconceptions about the generalization capabilities of models like llama 3.

  • Misconceptions About RoPE Generalization: A participant pointed out that there is no proof RoPE can extrapolate in longer contexts by itself, indicating potential confusion around its capabilities.

  • llama 3 RoPE Base is Consistent: Another key point is that llama 3 was trained with a RoPE base of 500k right from the start, and there was no change in the base during its training.

  • The Purpose of High RoPE Base: It was proposed that lama 3’s high RoPE base might be aimed at decreasing the decay factor, which could benefit models that handle longer contexts.

  • RoPE Scaling and Model Forgetting: The conversation included a hypothetical scenario: even if a model is retrained with a higher RoPE base after an extensive initial training, it might not generalize due to the forgetting of previous learning, emphasizing that currently, it’s only proven that pretraining tokens largely outnumber extrapolation tokens.

Link mentioned: Scaling Laws of RoPE-based Extrapolation: The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest. The mainstream approach to addressing extrapolation with …


Nous Research AI ▷ #off-topic (17 messages🔥):

  • Apple’s Headset Strategy Shake-Up: Apple is reportedly cutting Vision Pro shipments by 50% and reassessing their headset strategy, potentially indicating no new Vision Pro model for 2025. This information was shared via a tweet by @SawyerMerritt and an article on 9to5mac.com.

  • LLM Prompt Injection Game: A game has been created to teach about LLM prompt injections, featuring basic and advanced levels where players try to extract a secret key GPT-3 or GPT-4 is instructed to withhold. Interested participants can join the discord server through this invite link.

  • Discord Invite Challenges: There was an issue with a discord invite link being auto-deleted. The member intended to share an invite to a game that teaches about LLM prompt injections.

  • Moderation Assist: After an invite link was auto-deleted, a mod offered to pause the auto-delete feature to allow reposting of the original message inviting members to a discord server focused on LLM prompt injections.

Links mentioned:

  • Tritt dem LLM-HUB-Discord-Server bei!: Sieh dir die LLM-HUB-Community auf Discord an – häng mit 54 anderen Mitgliedern ab und freu dich über kostenlose Sprach- und Textchats.
  • Tweet from Sawyer Merritt (@SawyerMerritt): NEWS: Apple cuts Vision Pro shipments by 50%, now ‘reviewing and adjusting’ headset strategy. "There may be no new Vision Pro model in 2025" https://9to5mac.com/2024/04/23/kuo-vision-pro-ship...

Nous Research AI ▷ #interesting-links (16 messages🔥):

  • Introducing AutoCompressors: A new preprint discusses AutoCompressors, a concept for transformer-based models that compresses long contexts into compact summary vectors to be used as soft prompts, enabling them to handle sequences up to 30,720 tokens with improved perplexity. Here’s the full preprint.

  • Jeremy Howard Comments on Llama 3: Jeremy Howard details the significant shift in model finetuning strategies and discusses the latest Llama 3 models by Meta in an article linked by a community member. The article and further thoughts can be found here: Answer.AI post.

  • Llama 3’s Contextual Breakthrough: A tweet mentions Llama 3’s achievement of a 96k context with the help of continued pre-training and adjustments to RoPE theta, also noting its availability on Hugging Face as LoRA. Discover how they enhanced context processing in this Twitter thread.

Links mentioned:


Nous Research AI ▷ #announcements (1 messages):

  • Announcements Channel Upgrade: The Announcements channel can now be followed and integrated into other Discord servers for seamless updates and information sharing.

Nous Research AI ▷ #general (181 messages🔥🔥):

  • Einstein v6.1 Llama3 Unveiled: Einstein v6.1 Llama3 8B, a fine-tuned version of Meta-Llama-3-8B, has been released with more conversational data and training sponsored by sablo.ai. The model is fine-tuned using 8xRTX3090 + 1xRTXA6000 with funding by sablo.ai.

  • Phi 3 Analysis and Speculation: The Phi 3 model is discussed for its architectural quirks, suggesting possible SWA usage and other features like upcasted RoPE and a fused MPL & QKV. Curiosity is piqued on why the MMLU evals significantly differ between the Phi team and Llama-3 team.

  • Dense + Hybrid in Snowflake Arctic 480B: Discussion centers around Snowflake’s announcement of a massive 480B parameter model with a unique architecture that has attention sinks to potentially scale context length. Queries arise about the rationale behind its design and the choice to use a residual MoE without attention, with references to maintaining the strength of token embeddings and computational efficiency.

  • LLaMA Pro Post-Pretraining Methodology: Interest is shown in LLaMA Pro’s unique post-pretraining method to improve model’s knowledge without catastrophic forgetting. The mention of techniques such as QDoRA+FSDP and comparisons with a 141B Mistral model spurs examination of the transformer architecture and scaling considerations.

  • GPT-4 and Rabbit R1 Exchanges Spark Skepticism: Confusion spreads after claims that GPT-3 powers OpenAI’s Rabbit R1, with some suggesting it’s a miscommunication or a hallucination by the model. The potential misrepresentation leads to discussions about model truthfulness and the reliability of information provided by AI systems.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (53 messages🔥):

  • Training Models with Instructions: A discussion on whether including instructions in model training loss is redundant was initiated. It was noted that the parameter train_on_inputs from Axolotl influences this and that bypassing instructions during fine-tuning might be sensible for most tasks.

  • Synthetic Data Strategy Inquiry: One member sought advice on generating diverse synthetic data for training models. Recommendations included exploring GitHub - argilla-io/distilabel, and examining projects such as WizardLM, Airoboros, Phi, Alpaca, and Orca.

  • The Paradox of Overfitting: Debate occurred over the utility of validation sets during training, with some arguing that the validation loss doesn’t necessarily correlate with real-world performance and that checkpoint evaluations might be more efficient. The conclusion seemed to favor minimal epochs and evaluating last epochs for performance.

  • Long Context Management in LLMs: An exchange about the abilities of LLMs, specifically Llama3, in managing long contexts took place. Participants mentioned that merely extending context without proper understanding isn’t sufficient, and techniques like rope scaling were brought up as methods currently being used.

  • Quantized Model Finetuning Quandary: An inquiry about improving performance on a quantized version of a model (Q4_K_M) led to suggestions around data quantity and LORA vs QLORA approaches. A speculation about creating a more powerful clown car MoE was proposed, combining pretraining on high-quality data with post-epoch refinement to match outputs of higher-level models.

Links mentioned:


Nous Research AI ▷ #project-obsidian (3 messages):

  • Language Settings in Tenor Links: A Tenor gif was shared (Why not both?) and it displayed a note about language translation based on browser settings, offering an option to change the language if needed.
  • Anticipation for Vision Model Updates: An expression of anticipation for updates on the vision models being developed by qnguyen was mentioned.
  • Deployment Woes at Hackathon: There was a discussion regarding an attempt to deploy nanollava, which was met with frustration due to the hackathon providing Windows VMs, expressed with a disgusted emoji.

Link mentioned: Why Not Both Por Que No Los Dos GIF - Why Not Both Por Que No Los Dos Yey - Discover & Share GIFs: Click to view the GIF


Nous Research AI ▷ #bittensor-finetune-subnet (1 messages):

paradox_13: What are the miner rates?


Nous Research AI ▷ #rag-dataset (75 messages🔥🔥):

  • Exploring Grounding with Attention Mechanisms: Discussing how the model grounds its responses, one suggestion involves using <scratch_pad> tags as a form of working memory. This idea seems to align with a goal of improving how the model references specific information within the provided text.

  • Challenges with Code Citations and Hallucinations: A model’s code documentation training led to unexpected behavior with random associations to code-related content, including hallucinated bug locations. It was pointed out that maintaining code syntax integrity is crucial and that special tokens might disrupt the model’s understanding of code grammar.

  • Promoting Valid Citations with Pydantic Models: An extensive discussion on using Pydantic models and validators to ensure proper citations, with suggestions such as ensuring JSON serialization and preserving valid code chunks. The necessity of keeping system prompts concise and minimal for fine-tuning was also highlighted.

  • Potential Attention Masking Solutions: There was a proposition for a novel attention-masking technique, allocating special tokens to each line of input text, allowing the model to point to these for citations. However, concerns were raised about the potential risks of such fine-grained attention control and its impact on the model’s ability to generalize.

  • Dataset Synthesis and External Tool Integrations: Mention of using distilabel’s workbench for dataset synthesis was brought up, showcasing methods for function calling and JSON/pydantic data generation. The conversation suggests an ongoing search for effective tools in creating robust datasets for model training. distilabel-workbench on function calling dataset and json_schema_generating_dataset were shared as resources.

Links mentioned:


Nous Research AI ▷ #world-sim (102 messages🔥🔥):

  • Category Theory and LLMs: Members in the chat discussed the application of category theory in relation to language models, sharing resources like Tai-Danae Bradley’s work and the concept of using the Yoneda lemma to understand semantic concepts. A detailed theoretical conversation ensued regarding types ontology, morphisms, and transformations.

  • The World Simulation Saga: Users actively shared experiences and ideas about using world simulations, like WorldSim, for various purposes, including narrative expansion and forming superhero universes. A link to Janus posts was mentioned to relate to visual elements in world simulation.

  • Twitch Streaming of WorldSim: There was excitement over the idea of livestreaming shared world simulation experiences on Twitch. One member had even set up a Twitch channel for streaming in French, but plans were disrupted due to the temporary shutdown of WorldSim due to abuse by 4chan users.

  • Evocative AI-Directed Character Trees: A member shared a character family tree for “ClaudeCharacters” they developed, emphasizing the narrative potential and emergent interactions between the characters in a Twitter post.

  • Websim Issues and Tips: A brief interaction revolved around troubleshooting Websim with suggestions of using it for web page simulations. One member shared that copying and pasting the first prompt from a Claude model could be beneficial.

Links mentioned:


CUDA MODE ▷ #general (11 messages🔥):

  • Tackling Suboptimal Compiler Optimization: A member tried to improve performance by disabling compiler optimizations with flags (nvcc -O0 -Xcicc -O0 -Xptxas -O0), but did not achieve the desired results, getting a 19.45 on their metric.

  • Manual Kernel Optimization Insights: On analyzing a matrix multiplication kernel, a member mentioned seeing benefits from caching when manually calculating the kernel performance based on arithmetic intensity, flops, and memory transfers.

  • AWS Instance GPU Selection Limitations: There’s a discussion about AWS instances where a member brought up reading from a Modular blog that users can’t choose the specific type of GPU, only the class, when setting up an AWS instance.

  • Cultivating CUDA Expertise: One member asked for advice on what to do after finishing PMPP and most of its exercises to further learn about CUDA or get a job in the field. Another member suggested demonstrating skills publicly by optimizing a niche operation.

  • Collaborative CUDA Kernel Development on Discord: Members are encouraged to share and improve upon existing CUDA kernels in the CUDA MODE Discord channel, even hinting at an upcoming collaboration for triton kernels. Another suggested creating a repository as Discord channels are not suitable for long-term storage of such information.


CUDA MODE ▷ #triton (1 messages):

  • PyTorch 2.3 Embraces Triton: PyTorch has released version 2.3, which now includes support for user-defined Triton kernels within torch.compile. This update allows the migration of Triton kernels from eager execution without performance loss or breaking the computational graph, as highlighted in the official release blog post.

  • Tensor Parallelism Fine-Tuned for LLMs: The new update enhances the experience of training Large Language Models (LLMs) by offering improved Tensor Parallelism. This feature has been validated on models with up to 100 billion parameters.

  • Speed Boost with Semi-Structured Sparsity: PyTorch 2.3 introduces semi-structured sparsity as a Tensor subclass, which has shown speed improvements of up to 1.6 times over dense matrix multiplication in some cases.

  • Community Effort Behind the 2.3 Update: A total of 426 contributors participated in this release, with 3393 commits made since PyTorch 2.2. The community is thanked for their contributions and encouraged to report any issues with the new release.

Link mentioned: PyTorch 2.3 Release Blog: We are excited to announce the release of PyTorch® 2.3 (release note)! PyTorch 2.3 offers support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kerne…


CUDA MODE ▷ #cuda (32 messages🔥):

  • CUDA File Extensions and Their Usefulness: A user asked about the necessity and benefits of .cuh files in CUDA development; however, there was no follow-up providing definitive benefits.

  • Optimization Exploration in COLMAP MVS: A CUDA newbie inquired about speeding up COLMAP MVS and was advised to use profiling tools to understand GPU utilization better. An initial adjustment of THREADS_PER_BLOCK from 32 to 96 resulted in improved performance.

  • CUDA Profiling Tools Enlighten the Performance Path: Amidst optimization efforts, another user recommended NVIDIA Nsight Compute for in-depth analysis of CUDA applications, which led to the discover that only 14 blocks were being launched, indicating inefficient GPU usage.

  • Interrogating Kernel Activities for Bottlenecks: After profiling with Nsight Compute, one user needed clarification on how to delve deeper into kernel performance issues but was guided to select a ‘full’ metric profile for more comprehensive information.

  • Modern CUDA Warp Execution Nuance Shared: A user highlighted an update on CUDA execution since Volta, indicating that threads in a warp no longer need to run the exact same instruction, referencing an article with detailed implementation of matrix multiplication in CUDA.

Links mentioned:


CUDA MODE ▷ #torch (9 messages🔥):

  • Tensor Expand Explained: Strides of a tensor are modified when using Tensor.expand, explained by a member who also noted that Triton kernels crash if indices outside the range of the original tensor are accessed, suggesting a potential mishandling of the tensor’s striding within the Triton kernel.

  • Striding Leads to Triton Troubles: A discussion highlighted that accessing indices outside a tensor’s range can cause crashes in Triton kernels, possibly due to the kernel not properly handling the tensor strides after an expansion operation.

  • Incompatibility Alert for flash-attn and CUDA: Version 2.5.7 of flash-attn is reported to be incompatible with the CUDA libraries installed with PyTorch 2.3.0, causing issues like undefined symbol errors and excessively long build times.

  • PyTorch CUDA Operations and Memory Overview: A member shared a link to the PyTorch CUDA documentation, detailing how CUDA tensors are managed in PyTorch and emphasizing that the allocated tensor’s device memory location is respected during operations, regardless of the currently selected GPU.

Link mentioned: CUDA semantics — PyTorch 2.3 documentation: no description found


CUDA MODE ▷ #algorithms (4 messages):

  • Racing Towards Efficient NeRF: Plenoxels, a CUDA-accelerated version of NeRF, was highlighted for its speed and comparable accuracy, with the source code available on GitHub.
  • Wishlist for GPU-powered SLAM: The community expressed interest in seeing a CUDA implementation of Gaussian Splatting SLAM, a technique not yet available in CUDA format.
  • Mobile ALOHA Gets a Speed Boost: The inference algorithms for Mobile ALOHA, including ACT and Diffusion Policy, are available on GitHub, promising advancements in mobile networks efficiency.
  • On the Hunt for Optimized Kernels: A community member expressed the need for a kernel capable of handling attention and general deep learning operations on binary matrices or ternary representations.

CUDA MODE ▷ #beginner (6 messages):

  • Tensor Core Improvements Noted: A member highlighted that the Tensor cores in later GPU generations have significantly improved, with a rough estimation suggesting twice the speed from the 3000 series to the 4000 series.
  • Balancing Cost and Performance: The 4070 Ti Super was recommended as a cost-effective option, offering a balance between performance and price, being approximately 50% slower but also 50% cheaper than the top-tier 4090, while still being the latest generation.
  • Setup Complexity for 4070 Ti Super: There’s a mention of the setup complexity and effort required to extract full performance from the 4070 Ti Super, implying that it may not be as straightforward as other options.
  • Comparing Dual 4070s to a Single 4090: After a correction indicating 2x4070 GPUs were meant instead of a 2070, a recommendation was made to opt for a single 4090 instead, based on similar price/performance ratios and to avoid dual GPU setup complications.
  • Learning Opportunity with Multi-GPU Setup: One perspective offered suggests that despite potential issues, opting for a dual GPU setup could provide valuable experience in multi-GPU programming.

CUDA MODE ▷ #pmpp-book (5 messages):

  • Clarification on Exercise Location: Mr.osophy specified the location of the query, indicating it is right above exercise 3.d.
  • Definition of Burst Size Clarified: In a discussion about burst size related to memory coalescing, mr.osophy explained that when threads access contiguous memory locations, the system combines multiple loads into one, which is achieved via bursts at the hardware level.
  • Insights from Book Authors’ Slides: Additional insight was provided with a link to slides from the book’s authors, indicating that bursts contain around 128 bytes, contrasting to an assumed uncoalesced size of 8 bytes.

CUDA MODE ▷ #youtube-recordings (5 messages):

  • Clarification on Improving Arithmetic Intensity: The conversation clarified that quantization increases arithmetic intensity by reducing byte size, while sparsity avoids unnecessary computation but may show a lower arithmetic intensity due to a consistent number of writes.
  • Efficiency in Memory Bandwidth and Workload: It was noted that sending less data to the GPU enhances memory bandwidth efficiency. Conversely, sending the same data amount but doing less work decreases arithmetic intensity, yet the workload efficiency still improves.
  • Lecture Citation for Understanding Sparsity and Quantization: A member referenced a specific moment in Lecture 8 at 36:21 to clarify a point regarding sparsity improving the arithmetic intensity during GPU operations.
  • Sharing of Presentation Material: A participant mentioned that the PowerPoint presentation could be shared, indicating a willingness to provide resources or information discussed.

CUDA MODE ▷ #torchao (1 messages):

  • CUDA Memory Optimization Advancements: A member shared a simplified version of the main bucketMul function, demonstrating how it handles multiplications with model weights and dispatch parameters to manage memory loads efficiently. It suggests an approach resembling COO but in buckets, also considering activation memory optimization.

Link mentioned: Effort Engine: no description found


CUDA MODE ▷ #off-topic (1 messages):

iron_bound: https://github.com/adam-maj/tiny-gpu


CUDA MODE ▷ #llmdotc (353 messages🔥🔥):

  • PyTorch Version Confusion: There was clarification that PyTorch 2.2.2 was installed for testing, not 2.1, with a reference to the PyTorch package on PyPI.
  • Optimization Challenges with Float Precision: A member expressed difficulties in optimizing gelu computation with mixed float precision types, noting significant speedup after converting b16 to floats for operation and then back to b16 again.
  • CUDA Versions and Atomics Debate: Members discussed minimizing complexity in CUDA kernels, for instance by eliminating the usage of atomicAdd to simplify support for multiple data types. The goal is to find an implementation for layernorm_backward_kernel that avoids atomics without significantly increasing runtime.
  • GPT-2 Training and Multi-GPU Scaling on A100: An ongoing PR (#248) was discussed regarding multi-GPU training scaling with NCCL and the differing performance when power throttled.
  • Discussion on Next-Gen GPUs and Power Efficiency: A detailed and technical discussion took place on how GPUs like the A100, H100, and the expected B100, are power-throttled based on their architectures, power, and thermal dynamics. The conversation delved into how the input data’s bit patterns can influence power consumption and computational efficiency. There was speculation about the coming B100’s architecture changes and its implications for power throttling.

Links mentioned:


CUDA MODE ▷ #massively-parallel-crew (4 messages):

  • LinkedIn Event for Weekend Sessions: A member proposed the idea of creating a LinkedIn event for upcoming weekend sessions. The suggestion was met with approval.

  • Recording Plans for Tomorrow’s Session: A query was raised regarding who will be handling the recording for the next day’s session. No further details were provided.


Eleuther ▷ #general (28 messages🔥):

  • Exploring Counterfactual Research: One member mentioned that counterfactual reasoning is a hot research topic in AI, with many high-quality papers published in recent years.
  • Normalized Performance Evaluation for LLMs Proposed: A member suggested normalizing LLMs’ performance on benchmarks by their perplexity or log-likelihood, which could counter the impact of data contamination where the model may have encountered the evaluation data during training.
  • Parallel Attention and FFN in Models: In response to a question about why some papers depict attention and feed-forward networks (FFN) as parallel operations, a member clarified that some models, like PaLM, indeed use parallel attention and FFN.
  • Discussion on Sliding Window Attention: Members discussed the sliding window attention mechanism, which limits how far back a transformer model can attend by using an attention mask. They also explored challenges in applying this technique to models handling extremely large context lengths.
  • Providing Hashes for the Pile Dataset: A member requested SHA256 hashes for the Pile dataset, receiving a prompt reply with a list of hashes for various parts of the dataset, which are accessible on the Eleuther AI website.

Link mentioned: Hashes — EleutherAI: no description found


Eleuther ▷ #research (324 messages🔥🔥):

  • Facebook Unveils Hefty Recommender System: Facebook’s recent publication reveals an HSTU-based Generative Recommender system with 1.5 trillion parameters, boasting a 12.4% performance improvement in online A/B tests on a major internet platform. The paper is highlighted for its potential implications rather than its content, found here.

  • Attention Variant Potentially Boosts Performance: Facebook’s new model introduces a modification that uses SiLU (phi) and linear functions f1/f2 in place of softmax, along with a relative attention bias (rab)—altering attention mechanisms and replacing the feedforward network with gating. This design is particularly optimized for their vast recommendation system.

  • Netflix-like Services Favor Batch Recommendations: The common practice among large-scale services, like Netflix, is to compute recommendations in daily batches rather than in real-time, aiding utilization and operational efficiency. Twitter and possibly Facebook may follow a similar pattern to enhance user experience.

  • Concerns Over GPT-like Models Repeating Copyright Material: Amidst the discussion on copyright handling with generative AI, a paper proposing a cooperative game theory-based framework for compensation emerges. Some participants argue over methods like RLHF to deter verbatim repetition, while others touch on the potential negative impact of a data licensing regime here.

  • Investigation into Tokenizer Techniques and Impact: Conversations surface around techniques like BPE-dropout and incorporating bytes into token embeddings to improve spelling and other text-level tasks. Participants are intrigued about whether current large language models utilize such methods during training and their possible downstream effects.

Links mentioned:


Eleuther ▷ #scaling-laws (3 messages):

  • Centered vs. Thin-Tailed Regression Discussion: A member criticized a regression analysis as being too thin-tailed and suggested that it should be merely centered. They emphasized that mathematically, the error of the true regression can only be assured to be centered.
  • Debate on Chinchilla Sampling Methodology: The accuracy of the Chinchilla paper’s confidence interval was questioned, exploring whether authors oversampled small parameter transformers and how the cutoff point for stable estimate should be determined. The member is seeking clarification on whether the confidence interval in the paper is indeed mistakenly narrow.

Eleuther ▷ #interpretability-general (6 messages):

  • LayerNorm and Information Deletion: A user found an analysis interesting which suggests that pre-layernorm makes it difficult to delete information from the residual stream. This may contribute to the norm increase with each additional layer.
  • Penzai’s Learning Curve: A member experimenting with penzai expressed that it’s intriguing yet presents a learning curve, mainly due to issues like not being able to call jnp.sum() on a named tensor.
  • Penzai Naming Workaround: In talk of penzai’s idiosyncrasies, a user suggested using untag+tag as a means to work with named tensors, indicating helper functions can aid in navigating the toolkit.
  • Practical Example for Penzai: Demonstrating penzai’s functionality, a member provided an example of using pz.nx.nmap for tag manipulation within tensors.
  • Mysterious Tweet Shared: A user shared a mysterious tweet, but did not provide context or details about its relevance or content.

Eleuther ▷ #lm-thunderdome (19 messages🔥):

  • Mistral Model’s Surprising Performance: A member expressed surprise at lower scores from Mistral 7b in certain tests, speculating that the inability to use incorrect answers as information might be a limitation.

  • Custom Task Troubles: A member is working on a custom task with the goal of evaluating instruction-finetuned models using CrossEntropyLoss. They reported extremely high perplexity values and overflow issues, and are seeking advice on correctly implementing the evaluation, including whether to include the instruction template within doc_to_text.

  • Matching Hyperparameters for Benchmarks: A question was raised about the num_fewshot setting for gsm8k to align with the Hugging Face leaderboard, with another member suggesting using 5 as the number.

  • VLLM Upgrade Question: An inquiry was made about what might be preventing vllm from being upgraded to the latest version, given recent model architecture additions. A member clarified that upgrades are possible unless one is using Tensor Parallel, which was later corrected to mean Data Parallel.

  • LM Evaluation Harness Filter Registration: A newcomer to the community offered to submit a PR to add a register_filter function to the FILTER_REGISTRY for lm_eval, which was welcomed by a community member.

  • Brier Score Evaluation Issue: A member encountered an error while evaluating the Brier score for tasks like ARC, attributing the problem to a mismatch in the expected number of choices. A community member suggested modifying the brier_score_fn to handle individual instance scores, pledging to make changes upstream soon.

Link mentioned: lm-evaluation-harness/lm_eval/api/metrics.py at 3196e907fa195b684470a913c7235ed7f08a4383 · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness


Eleuther ▷ #gpt-neox-dev (27 messages🔥):

  • Tokenization Troubles in Progress: A member noted an issue with token remapping and splitting due to changes in Huggingface’s tokenizers between versions 0.13 and 0.14. The inconsistency between pretrained data and current tokenizer behavior may be causing tokenization misalignment during model inference.
  • Poetry Package Management Pitfalls: Discussions about trying to switch NeoX’s package management to Poetry highlighted problems such as poor binary management and undesirable version bumps, leading one member to decide against implementing this change due to its complexities.
  • Pre-tokenized Pythia Data Potential Pitfall: Conversations went on to appreciate the pre-tokenization of Pythia training data, but concerns were raised about whether the pre-tokenization is consistent with the eventual inputs during model usage, especially if token versions were mixed in the training data.
  • Tokenization Woes and Merger Mechanics: Further dialogue delved into the intricacies and frustrations with tokenizer behavior, particularly with added tokens involving spaces, and theoretical approaches to handling unmergable tokens using the “merges” file.
  • Fast vs Slow Tokenizer Inconsistencies: It was pointed out that preprocessing steps cause tokenizer mismatches, with the merging process itself being stable. A member expressed eagerness to document the tokenizer issue in more detail and called for a better abstraction to handle tokenizer updates.

Stability.ai (Stable Diffusion) ▷ #general-chat (354 messages🔥🔥):

  • Juggernaut Model Troubles and Solutions: A user expressed frustration with the Juggernaut model’s difficulty in producing satisfactory image prompts, leading them to prefer the performance of RealVis V4.0, which generated the desired prompt much quicker.
  • Anticipation for Stable Diffusion 3.0: While discussing the highly anticipated release of Stable Diffusion 3.0, users were redirected to use the API, which is already active for it. However, a user felt disappointed upon realizing the API service isn’t freely accessible but provides limited trial credits.
  • Seeking Assistance with Image Generation: Newcomers to the community sought guidance on generating images using Stability AI, leading veteran users to suggest external tools like Craiyon for easy online generation, as the Stable Diffusion models require local software installation.
  • Discussion on Advanced Model Usage: Members discussed multiple AI-related topics, including strategies for generating specific image prompts, using cloud computes like vast.ai, handling AI video creation, and the challenges in model fine-tuning. Specific guidance on training LoRas and generating content in compliance with Steam regulations was also shared.
  • Explorations of New AI Tools and Projects: Users announced and discussed various independent AI projects, such as generating webcomics, creating royalty-free sound designs, and a bot offering free AI image generation. Some projects included a beta product for webcomic creation at artale.io and a professional sound design generator at adorno.ai.

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

  • Mixtral 8x7b Provider Issue Detected: A major provider for Mixtral 8x7b was found to be sending down blank responses. They have been removed temporarily and there are plans to implement a method for auto-detecting such issues in the future.

OpenRouter (Alex Atallah) ▷ #general (323 messages🔥🔥):

  • Concerns Over Model Contact Information in China: A member inquired about direct contact information for a business team in China, seeking to establish a local connection.
  • Discussions on Model Performance: The performance and use of various models such as Phi-3 and Wizard LM were debated, with some members favorably comparing smaller models to larger alternatives.
  • OpenRouter Streaming Anomalies: Users reported an infinite stream of “OPENROUTER PROCESSING” messages, which was clarified as a standard method to keep connections alive, although one member faced a timeout issue during completion requests with OpenAI’s GPT-3.5 Turbo.
  • OpenRouter’s Functionality Gets Mixed Reviews: Members discussed the pros and cons of OpenRouter’s features, including trouble with function calls and the service not respecting ‘stop’ parameters in some models, despite others recommending platforms like Fireworks’ FireFunction.
  • Debates Over Multi-Lingual Model Competency: Users engaged in a comparison of models like GPT-4, Claude 3 Opus, and L3 70B, particularly focusing on their performance in non-English prompts, with one member noting that GPT-4 responses in Russian sounded unnatural.

Links mentioned:


HuggingFace ▷ #announcements (1 messages):

<ul>
  <li><strong>Llama 3 leaps into action</strong>: Boasting a training on 15T tokens and fine-tuning on 10M human annotated samples, <strong>Llama 3</strong> comes in 8B and 70B versions as both Instruct and Base. The 70B variant has notably become the best open LLM on the MMLU benchmark with a score over 80, and its coding abilities shine with scores of 62.2 (8B) and 81.7 (70B) on Human Eval, now available on Hugging Chat with <a href="https://huggingface.co/chat/models/meta-llama/Meta-Llama-3-70B-Instruct">Demo</a> and <a href="https://huggingface.co/blog/llama3">Blogpost</a>.</li>
  <li><strong>Phi-3's MIT Makeover</strong>: The recently rolled-out <strong>Phi-3</strong> Instruct variants, designed with contexts of 4k and 128k and trained on 3.3T tokens, demonstrate superior performance over Mistral 7B or Llama 3 8B Instruct on standard benchmarks. This model also features specialized "function_call" tokens and is optimized for mobile platforms, including Android and iPhones, with resources available via <a href="https://huggingface.co/chat/models/microsoft/Phi-3-mini-4k-instruct">Demo</a> and <a href="https://x.com/abhi1thakur/status/1782807785807159488">AutoTrain Finetuning</a>.</li>
  <li><strong>Open Source Bonanza</strong>: HuggingFace unveils <strong>FineWeb</strong>, a massive 15 trillion token web data set for research, alongside the latest updates to Gradio and Sentence Transformers for developers. Notably, <strong>The Cauldron</strong>, a large collection of vision-language datasets, emerges to assist in instruction fine-tuning, detailed at <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb">FineWeb</a> and <a href="https://huggingface.co/posts/tomaarsen/476985886331959">Sentence Transformers v2.7.0</a>.</li>
  <li><strong>HuggingChat Breaks into iOS</strong>: The HuggingChat app lands on Apple devices, bringing the power of conversational AI to iPhones, as announced in the latest post available <a href="https://huggingface.co/posts/fdaudens/628834201033253">here</a>.</li>
  <li><strong>Content to Quench Your AI Thirst</strong>: Explore the versatility of transformer agents with the blog post "Jack of All Trades, Master of Some", and get the low-down on deploying open models on Google Cloud in the upcoming HuggingCast, while the Open Chain of Thought Leaderboard offers a new competitive stage for researchers, as introduced at <a href="https://huggingface.co/blog/leaderboard-cot">Leaderboard CoT</a>.</li>
</ul>

HuggingFace ▷ #general (276 messages🔥🔥):

  • Baffled by Batching: Members discussed batching for model inference, clarifying that prompts can be input as an array (e.g., prompt = ["prompt1", "prompt2"]) to perform parallel generation. It was mentioned that libraries like vllm, Aphrodite, tgi are optimized for faster batching compared to normal transformers.

  • Mistral’s Training Peculiarities Raised Concerns: There were queries about fine-tuning Mistral 7b, with discussions around hyperparameters for full SFTing mixtral 8x22b on OpenHermes 2.5, such as batch size, learning rates, and total steps.

  • Snowflakes in the AI Summer: Snowflake released two Hybrid Dense+MoE models: snowflake-arctic-instruct and snowflake-arctic-base, with a unique design featuring 480B parameters and only 17B active parameters at a time. A demo was also mentioned, available on HuggingFace Spaces.

  • Search for the Holy Venv Command: One member faced difficulty setting up a Python virtual environment on Windows. They were advised to use python3 -m venv venv followed by venv\Scripts\activate on Windows, and encouraged to explore WSL (Windows Subsystem for Linux) for a better experience.

  • Streaming Experiences and Echoes of Virtual Environments: Discussions included sharing a custom streaming pipeline integration idea for the transformers library, and assistance with issues related to initiating Python virtual environments in VS Code and Git Bash on Windows platforms.

Links mentioned:


HuggingFace ▷ #today-im-learning (3 messages):

  • ZeroShotClassification Limitations Revealed: A member discovered that hf.zeroShotClassification has a limitation, supporting only up to 10 labels at a time, which was a cause for dismay.

  • Finetuning Foibles: During an attempt to finetune Mistral 7B, one member noticed an unusual behavior where multiple files were being uploaded, which differed from their previous experience.


HuggingFace ▷ #cool-finds (9 messages🔥):

  • RAG Frameworks Get a Refresh: An article discusses the improvements to Retrieval-Augmented Generation (RAG) frameworks, featuring Adaptive Routing, Corrective Fallback, and Self-Correction using Langchain’s LangGraph. Read up on how these frameworks are being unified here.

  • A New Architecture for Text-based Games: A novel architecture called deep reinforcement relevance network (DRRN) is introduced to aid in reinforcement learning within natural language spaces, showing promising results in text-based games. The original paper with details can be found on arXiv.

  • Live Learning in French on Twitch: For those who speak French, there was a live event on Twitch by user Micode which may still be relevant and interesting. You can visit the stream here.

  • OpenELM-270M Release by Apple on HuggingFace: Apple has released its OpenELM-270M text generation models, which are now available in the HuggingFace collections. Check out the models here.

  • 6G and AI Join Forces for the Future: An arXiv paper discusses the convergence of 6G and AI, predicting significant transformations in wireless communication systems to support ubiquitous AI services. The full abstract and paper can be accessed here.

Links mentioned:


HuggingFace ▷ #i-made-this (15 messages🔥):

  • Duplicate Space Alert: The space bark-with-custom-voice has been mentioned as a duplicate of suno/bark. The post included visual content but no additional information was provided.

  • Space Inactivity Notice: LipSick Space requires an MP4 at 25fps and a WAV audio file, but it’s currently sleeping due to inactivity.

  • Product Launch on ProductHunt: A member announced the launch of Wizad on ProductHunt, encouraging users to check out and support the launch with an upvote. The message included navigation links to ProductHunt’s various sections but did not provide a direct link to the Wizad product page.

  • New Micro-Musicgen Model Released: A new micro-musicgen model for creating jungle drum sounds swiftly has been shared, micro-musicgen-jungle, alongside another Gradio app, micro-slot-machine, which allows to pull chords from jungle drum outputs. A Tweet detailed the creative process and invited users to challenge themselves with sound design.

  • Transforming “Read Later” Into Newsletters: An app called Collate was introduced that transforms read-later content into a bite-sized daily newsletter. The app invites users to try the newly built, personalized newsletter feature using their own articles or PDFs.

Links mentioned:


HuggingFace ▷ #computer-vision (3 messages):

  • Solid Pods as a Solution: A member suggested that Solid pods may be the answer to an unspecified topic of discussion.
  • Gratitude Expressed for Assistance: A thank you was extended by one member to another, indicating some form of helpful interaction had taken place.
  • In Search of pix2pix Testing Methods: A member is seeking a method for testing instruct pix2pix edit prompts outside of the instruct pix2pix space, pointing out the lack of a gradio_client API and the need for any text input image edit control net as a solution for their demo’s final step.

HuggingFace ▷ #NLP (7 messages):

  • Parallel Prompt Pondering: A member inquired about generating responses from a Large Language Model (LLM) in parallel, wondering if simultaneous requests are possible instead of sequential ones.

  • Curated Conformal Prediction: A user shared a GitHub link to awesome-conformal-prediction, a curated list of resources on Conformal Prediction, suggesting it as a useful asset for direct linkable code implementations.

  • Demystifying Chat Template Training: A question was raised about the SFFTrainer’s internals, specifically regarding the initial input to the LLM during training and constraints on token generation. The member sought detailed resources on the training procedure for better understanding.

  • Seeking Open Source STT Frontend: An individual was searching for any open source web frontend solutions for Speech-to-Text (STT) technologies and asked the community for suggestions.

  • Trustworthy Language Model Unveiled: The announcement of v1.0 of the Trustworthy Language Model (TLM) included links to a hands-on playground (TLM Playground) and supporting resources including a blog and a tutorial. This model aims to address reliability issues with a new confidence score for LLM responses.

Links mentioned:


HuggingFace ▷ #diffusion-discussions (4 messages):

  • Seeking Guidance on Utilizing TTS Model: A member inquired about how best to use a fine-tuned Text to Speech (.bin) model with diffusers, pondering if creating a custom model is necessary or if another method exists.
  • Parameters Tuning with fooocus: For precise parameter tuning, a member suggested trying out fooocus, especially for lcm ip adapter tasks.
  • Troubleshooting Prompt+Model Issues: A discussion pointed to the combination of prompt and model as the probable cause of some issues being faced.
  • LCM and IP-Adapter Collaboration Cheers: Highlighting the effective collaboration between ip-adapter and lcm-lore, a member showed appreciation for these tools, while also expressing interest in hyper-sd advancements.

LlamaIndex ▷ #blog (7 messages):

  • Language Agent Tree Search Innovation: The shift towards LLMs capable of comprehensive tree search planning, rather than sequential state planning, is set to enhance agentic systems significantly. Details of this breakthrough and its implications were shared on LlamaIndex’s Twitter.
  • Real-time Knowledge Graph Visualization: @clusteredbytes showcased a dynamic knowledge graph diagramming tool that streams updates to the front-end, leveraging the @Vercel AI SDK. Discover this engaging visual tech in the shared Twitter post.
  • LlamaIndex KR Community Launch: The LlamaIndex Korean Community (LlamaIndex KR) has launched, aiming to explore and share the capabilities and scalability of LlamaIndex. Korean language materials, use cases, and collaborative project opportunities are highlighted in the announcement tweet.
  • Enhanced UX for LLM Chatbots: Introducing an improved user experience for chatbots/agents with expandable UI elements for source information, now possible with create-llama. The code and concept were attributed to the great work by @MarcusSchiesser, as mentioned in LlamaIndex’s tweet.
  • Tutorial for RAG Applications with Qdrant: A tutorial demonstrates building a top-tier RAG application using LlamaParse, @JinaAI_ embeddings, and @MistralAI’s Mixtral 8x7b. The guide provides insights into parsing, encoding, and storing embeddings detailed on LlamaIndex’s Twitter page.

LlamaIndex ▷ #general (188 messages🔥🔥):

  • Understanding RAG Functionality: RAG seems to perform optimally with straightforward queries, but it encounters difficulties with reverse-structured questions, prompting suggestions to explore more advanced RAG pipelines such as sentence-window retrieval or auto-merging retrieval. An educational video may provide deeper insights into the construction of these sophisticated RAG pipelines.

  • Configuring Multiple Index Changepoint: A user was confused about selecting embedding and LLM models for a VectorStoreIndex created from documents, where a response clarified that by default, gpt-3.5-turbo for LLMs and text-embedding-ada-002 for embeddings are used unless specified in the global Settings or directly in the query engine.

  • Implementing Pydantic in LlamaIndex: When integrating Pydantic into LlamaIndex, users have expressed difficulty in getting Pydantic to structure outputs correctly. Discussions reveal complexities and error messages related to configuring LlamaIndex pipelines, with mentions of an OpenAI API chat.completions.create method being used.

  • Issues with Pydantic Imports and Type Checking: There’s an issue with Pyright’s type checking being unhappy due to LlamaIndex’s dynamic try/except importing strategy for Pydantic, which potentially necessitates the use of # type:ignore comments. A query about finding a better solution was raised without a definitive solution provided.

  • QueryEngine Configuration Details Sought: A user inquired about the need for clearer documentation or instructions for setting up advanced RAG pipelines. Another requested information on where to specify the use of GPT-4 instead of the default LLM, with a solution provided to change the global settings or pass it directly to the query engine.

Links mentioned:


OpenAI ▷ #ai-discussions (128 messages🔥🔥):

  • Discussing AI’s Understanding Capabilities: A deep conversation unfolded regarding whether a model can “truly understand.” It was noted that logic’s unique confluence of syntax and semantics could enable a model to perform operations over meaning, potentially leading to true comprehension. Further, the Turing completeness of autoregressive models, like Transformers, was highlighted for having enough computational power to execute any program.

  • AI in Language and Society: Addressing the relationship between language and AI, it was argued that language evolution with respect to AI will likely create new concepts for clearer communication in the future. Moreover, language’s lossy nature was discussed, considering the limitations it imposes on expressing and translating complete ideas.

  • On the Horizon: Open Source AI Models: Excitement bubbled about Apple’s OpenELM—an efficient, open-source language model family—and its implications for the broader trend of open-source development. Speculations were sparked on whether this signified a shift in Apple’s proprietary stance towards AI and whether other companies might follow.

  • AI-Assisted Communication Enters Discussions: The integration of AI and communication technology was a topic of interest, featuring technologies like voice-to-text software and custom wake words for home voice assistants. Importance was placed on the need for effective communication flow control in AI interactions, such as mechanisms for interruption and recovery during conversations with virtual assistants.

  • Exploration of AI-Enhanced Text RPG: A member shared their creation, Brewed Rebellion—an AI-powered text RPG on Playlab.AI, where players navigate the workplace politics involved in unionizing without getting caught by higher management.

Links mentioned:


OpenAI ▷ #gpt-4-discussions (18 messages🔥):

  • Creating a GPT Expert for Apple Playgrounds: A member is interested in developing a GPT proficient in the Apple Playgrounds app, questioning how to feed data to the model, including Swift 5.9 documentation and whether to include Playgrounds PDFs in the knowledge section, despite their unavailability for download from Apple’s Books app.

  • Taming Custom GPTs to Follow Rules: One user expressed difficulty with Custom GPTs not adhering to instructions, wondering how to ensure compliance. Another suggested sharing the GPT’s instructions to help diagnose the issue.

  • A Query on GPT Updates and Competition: A member asked about the next update for GPT that could outmatch Claude 3, mentioning its superior response and understanding, and humorously inquired about the anticipated release of GPT-5.

  • Debate Over Web Browsing Capabilities: Users discussed the efficacy of GPT’s browsers compared to Perplexity AI Pro and You Pro, noting that GPT sometimes uses fewer sources but the quality seems equivalent or better, and questioning the necessity of a dedicated “Web browser” version.

  • Analysis of Large Text Documents with LLMs: The conversation shifted to analyzing very large documents with language models. An experienced user in text analysis mentioned using Claude 3 Haiku and Gemini 1.5 Pro and positively regarded the information about OpenAI’s 128K context window option available via API, anticipating models with progressively larger context windows.


OpenAI ▷ #prompt-engineering (14 messages🔥):

  • Commanding GPT to Chill Out: Users discussed how to make GPT’s language more casual and avoid cringe-worthy output. Suggestions included focusing on positive instructions (“do this”) rather than negative (“don’t do this”), and utilizing provided positive examples to guide the model’s language style.

  • Prompts over Negative Instructions: One member emphasized that negative instructions are ineffective and should be removed, advising to provide positive examples in prompts to see desired variations in the output.

  • Resource Sharing among Coders: A link to a shared resource on OpenAI Chat was provided, potentially helpful for others facing similar prompt-engineering challenges.

  • Building a Coded Companion for Gaming: A participant sought help in creating a GPT for coding in SQF language for the game Arma 3, expressing difficulty in crafting prompts to refer to specific uploaded documents.

  • The Iceberg of GPT Coding Assistance: In response to a request for help with building a GPT for coding, a member described the challenges when using GPT for language-specific programming tasks, including limited context space and AI’s tendency to hallucinate code, suggesting that a different model or toolchain might be required.


OpenAI ▷ #api-discussions (14 messages🔥):

  • Tackling Overly-Creative GPT Language: A user was concerned with GPT’s use of ‘cringe’ and overly-creative language despite giving it straightforward instructions. It was suggested to use positive examples rather than negative instructions, as GPT tends to ignore negative instructions during inference.

  • The Challenge with Negative Prompts: Echoing advice on prompt construction, another member affirmed that stating “do this” is more effective than “don’t do this” when instructing GPT, as GPT may not differentiate the intended negative instruction.

  • A Community Member’s Code-Oriented Query: One posted about a struggle with creating prompts for GPT to assist in coding with the SQF language for the game Arma 3 and sought guidance on using uploaded documents effectively.

  • The Complexities of Custom Code Training: An expert explained the difficulties of training GPT with custom code, due to its extensive pretraining on other languages and the challenges with managing the toolchain and context. They recommended considering a different model or toolchain for such projects.

  • Simplifying Bot Language with Claude: In a light-hearted response, a community member suggested using Claude as a possible solution for simplifying the language style of GPT.


LAION ▷ #general (126 messages🔥🔥):

  • Discussion on AI Model Capabilities and Comparisons: Participants discussed the performance of models like Llama 3, DeepFloyd (DF), and Snowflake’s Arctic, with comparisons to other models in terms of size, recall abilities, and upscaled image quality. One mentioned that Llama 3 8B is comparable to GPT-4, while another pointed out their script’s potential to impact a model’s training through mass rating submissions.

  • Threats to Anonymity and Privacy in Cloud Services: There were mentions of proposed U.S. regulations for cloud service users, including the “Know Your Customer” rules that may put an end to anonymous cloud usage. Users expressed concerns about the wider implications for privacy and compared it to encryption battles from the past.

  • Skepticism Around AI Image Model Leaderboards: A user expressed suspicion about AI image model leaderboards and the veracity of claimed performance, suggesting possible configuration issues or other influencing factors. They noted adversarial activities could be manipulating results and discussed ways to potentially bypass systems that collect user rating data.

  • Debates Over the Importance of Aesthetic versus Prompt Alignment: The conversation touched on what users prefer in generative AI’s outputs—whether they value image aesthetics or prompt alignment more. Some preferred images with the exact elements from their prompts, even if less aesthetically pleasing, while others favored visually pleasing results.

  • Discussing the Legitimacy of TorrentFreak as a News Source: There was a brief dialogue regarding TorrentFreak’s credibility as a news source after they published an article on proposed regulations involving cloud service providers. A link was shared to validate the source’s credibility, clarifying that although it covers topics like BitTorrent and file sharing, it has a record of balanced political reporting.

Links mentioned:


LAION ▷ #research (5 messages):

  • Exploring Sparse Mixtures of Experts: A new paper introduces Multi-Head Mixture-of-Experts (MH-MoE) that addresses low expert activation in SMoE by splitting tokens into sub-tokens for parallel processing by diverse experts. This method aims to enhance context understanding and reduce overfitting without significantly increasing training and inference costs.

  • Accelerating Pre-training of Vision Models: Research on weakly supervised pre-training offers a $2.7×$ speed increase over traditional contrastive learning by treating image-text data pre-training as a classification task. It sidesteps the computational intensity of pairwise similarity computations yet preserves high-quality representation for diverse vision tasks.

  • Humor in Efficiency: A member comments on the novel pre-training method as “kind of funny” likely because of its surprising efficiency and simplicity compared to more complex models like CLIP, while producing competitive results.

  • Back to Basics with BCE and Multilabel Classification: The discussed pre-training approach seems to boil down to identifying concepts via alt-text and then using a multilabel classifier, a strategy that contrasts with more complex systems but achieves rapid and effective training.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (85 messages🔥🔥):

  • New Line Formatting Issues in ChatML: Members noticed an issue with ChatML and potentially FastChat formatting adding new lines and spaces incorrectly, particularly around end-of-stream tokens, which might affect training. The conversation involves token configurations and whether this issue is seen with models like Llama 3 as well.

  • Continuous Pretraining and Learning Rates Discussions: Queries about Llama 3’s pretrained learning rate (LR) surfaced, with speculation on whether an infinite LR schedule is used given the planned model variants. On the other hand, a member accidentally set a higher LR and noticed the effects only after training.

  • Model Releases and Technical Discussions: Several model-related announcements and discussions took place, such as posting a member’s model in the chat for feedback, issues with small models for generalist tasks, and discussing the potential of 32k Llama with customized RoPE theta.

  • Snowflake’s 408B Dense + Hybrid MoE Release: The release of Snowflake’s 408B Dense + Hybrid MoE model was highlighted, boasting 4K context windows and Apache 2.0 licensing, sparking discussions on its capabilities and integrations with Deepspeed.

  • Market Reactions and Meta’s Q1 Earnings Report: A discussion about markets not reacting favorably to Llama 3 coincided with Meta’s Q1 earnings report, noting that CEO Mark Zuckerberg’s AI comments coincided with a stock price drop. This led to a humorous aside about whether Meta’s increased expenditure might be on GPUs for even larger AI models.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (8 messages🔥):

  • Llama-3 Scaling Up: A member mentioned that llama-3 could potentially reach 128k size by the afternoon.
  • Tuning vs. Training for Llama-3: Clarification was provided that in addition to Tuning with Rope, a bit of training is also being done using PoSE.
  • Qdora and FSDP Conundrum: Discussion revealed that qdora can operate without fsdp on a single GPU by enabling peft_use_dora: true. However, fsdp itself is necessary for multi-GPU setups.
  • FSDP and Dequantization Issues: It was pointed out that peft’s dora implementation conflicts with fsdp because of the way it handles dequantization.
  • FSDP Issues with Full Fine Tunes: Members are facing difficulties getting fsdp to work with full fine tunes, noting that it was problematic and changes in the underlying system might be a factor.

OpenAccess AI Collective (axolotl) ▷ #general-help (4 messages):

  • Encouragement for Experimentation: A member showed appreciation for advice received, indicating they will attempt to apply it themselves.
  • Good Luck Wishes Exchanged: Another member expressed hope for a successful outcome in response to the planned trial.
  • Fine-Tuning the Phi3 Model: A member discussed the challenges faced while fine-tuning the phi3 model, noting issues with high RAM usage and slow processing times.

OpenAccess AI Collective (axolotl) ▷ #datasets (1 messages):

aillian7: Is there a format for ORPO that i can use for a conversational use case?


OpenAccess AI Collective (axolotl) ▷ #community-showcase (9 messages🔥):

  • Internist.ai Paves the Way for Medical AI: The newly released Internist.ai 7b medical language model, crafted by medical professionals, outperforms GPT-3.5 and has achieved a passing score on the USMLE examination. It demonstrates that a physician-in-the-loop approach with carefully curated datasets can excel over large data volumes, and there are plans for larger models with more data. Check out the model here!

  • Comparative Performance Unlocking Clinical Applications: Upon manual evaluation by ten medical doctors and a blinded comparison with GPT-4, Internist.ai displayed superior performance across 100 medical open-ended questions.

  • Laughter is the Best Medicine?: Two members expressed their surprise with simple reactions, “shees” and “damn”, followed by another member lauding the model’s capability with an appreciative “damn it’s demolishing all other 7b models 😄”.

  • Llama’s Competitive Edge at 8b: Despite the accolades for Internist.ai, it was noted that llama 8b yields approximately synonymous results, albeit being a larger 8b model, affording it a potentially competitive edge in the field.

  • Training Trials and Tribulations: An update on attempts to train llama3 suggests current difficulties, with plans to proceed once appropriate merges are in place and challenges are overcome.

Link mentioned: internistai/base-7b-v0.2 · Hugging Face: no description found


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (10 messages🔥):

  • Selective Dataset Usage in Hugging Face: Members inquired about how to use only a portion of a dataset from Hugging Face. An extensive explanation was provided on the use of the split parameter in load_dataset, showcasing slicing syntax and methods to load percentages of datasets, create custom splits, and perform random dataset splits using DatasetDict and train_test_split.

  • Optimizer Compatibility with FSDP: A user asked which optimizers are compatible with Fully Sharded Data Parallel (FSDP). It was clarified that while most PyTorch optimizers work with FSDP, the use of optimizer wrappers like FairScale’s Sharded Optimizer can enhance efficiency.

  • FSDP and Optimizer Constraints: A user reported an error stating that FSDP Offload is not compatible with the optimizer paged_adamw_8bit. This points to specific compatibility issues between FSDP features and certain optimized optimizers.


tinygrad (George Hotz) ▷ #general (61 messages🔥🔥):

  • Exploring Tinygrad for Privacy-Enhancing Tools: A member inquired about the feasibility of rewriting a privacy-preserving tool against facial recognition systems, like Fawkes, in tinygrad.
  • Riser Troubles and Water Cooling Woes: Users discussed issues with using PCIE 5.0 LINKUP risers for their setups, experiencing many errors, with some suggesting avoiding risers altogether and mentioning alternatives like mcio and custom cpayne pcbs. Solutions included a recommendation to consult C-Payne PCB Design for hardware needs, while one user considered a shift to water cooling but faced constraints with NVLink adapters.
  • Seeking Documentation on Tinygrad Operations: A request for normative documentation on tinygrad operations was raised due to the lack of a descriptive explanation of their expected behaviors.
  • Strategic Partnerships for Tinygrad’s Success: George Hotz highlighted that forming partnerships is crucial for tinygrad’s triumph, noting comma’s involvement in making hardware for tinybox as a prime example.
  • MNIST Tutorial Available for Tinygrad: George Hotz shared a link to an MNIST tutorial intended for use in GPU colab to encourage users to try out tinygrad, which can be found on the Tinygrad documentation site.

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (31 messages🔥):

  • Exploring WMMA Thread Capacity: A member sought to understand the limitations of thread memory in WMMA, alluding that each thread might hold a maximum of 128 bits per input, and considering two inputs plus an accumulator, a thread could manage 128 bits multiplied by three.
  • Tensor Core Dimensions and Loops: Clarifications were provided regarding the usage of tensor cores in CUDA for processing matrix multiplication, highlighting that different dimensions like m16n8k16 are available and loops over the K, M, and N dimensions are leveraged.
  • Difficulty in Isolating Kernel Crashes: Members are attempting to isolate a kernel crash by reproducing conditions, but encountering different outcomes when running customized scripts as compared to predefined examples like BEAM=2 python extra/gemm/simple_matmul.py.
  • Implications of Sorting Tensors: A discussion emerged around the non-existence of a tensor sorting function in tinygrad, prompting a collaborator to share a custom 1D bitonic merge sort function they wrote, supporting lengths that are powers of two.
  • Manual Gradient Assignments and Quantile Implementations: Queries about manually assigning gradients to tensors and implementing a torch.quantile equivalent in tinygrad were raised, revealing an intention to devise sampling algorithms for diffusion models.

Modular (Mojo 🔥) ▷ #💬︱twitter (2 messages):

  • Sneak Peek at Modular’s Twitter: Modular teased followers with an upcoming feature, directing them to their Twitter post with a short link for a sneak peek.
  • More Modular Twitter Buzz: Another tweet by Modular stirred the community, hinting at further developments to be tracked on their official Twitter feed.

Modular (Mojo 🔥) ▷ #✍︱blog (1 messages):

  • Shielding Modular from Supply Chain Strikes: Modular confronts unique security challenges due to frequently shipping new software, as modern delivery mechanisms are increasingly susceptible to attacks. The urgency for robust defenses is underscored by the XZ supply chain attack, driving Modular to prioritize secure software delivery since the launch of Mojo in Fall 2023.

  • Today’s Software Ecosystem Vulnerabilities: With the rise of open-source usage, where codebases comprise an average of over 500 open-source components, the need for robust software delivery systems is critical. Modular’s ecosystem employs SSL/TLS, GPG signing systems, and verified… (message truncated)

Link mentioned: Modular: Preventing supply chain attacks at Modular: We are building a next-generation AI developer platform for the world. Check out our latest post: Preventing supply chain attacks at Modular


Modular (Mojo 🔥) ▷ #ai (4 messages):

  • Quantum Mechanics: Not Just Randomness?: A participant stated that many people misunderstand quantum physics and computing, perceiving them as based on randomness, when in fact there is strong disagreement among physicists on this point. The debate extends into the realms of computer science and ML/AI, where randomness (though not true randomness) plays a foundational role.

  • The Dividing Lines in Quantum Interpretations: The same individual expressed a preference for the Many-Worlds interpretation and the Simulation Hypothesis, highlighting a divide in thought among those studying quantum physics, between those who support the Copenhagen interpretation and others.

  • Despite Quantum Complexities, the Focus May Differ: Another user suggested that the type of processing units, whether quantum or classical, might not be pivotal when considering the overall importance of computational architecture.

  • Geometrical Insights Could Tame Quantum Randomness: A member discussed the amplituhedron, a geometrical structure that may simplify quantum calculations, and proposed that using geometric principles could enhance our understanding and efficiency in quantum algorithms and circuit designs.

  • ML as a Key to Complex Quantum Visualizations: They further speculated that machine learning could possibly be employed to interpret complex, hyper-dimensional visualizations involved in quantum computing, especially as we deal with increasing qubit numbers and phenomena like entanglement.


Modular (Mojo 🔥) ▷ #🔥mojo (21 messages🔥):

  • New Alias Suggestion Sparks Laughter: A playful suggestion was made to rename IntLiteral to LitInt in the Mojo language, with an accompanying joke about mojo format adding alias definitions to the top of all .mojo files.
  • Mojolicious Swag Desires: Members expressed a desire for official Mojo swag, musing over what could be available in the future, from Mojo emoji gear to fantasy MojoCON staff uniforms.
  • API Development Causes Performance Hiccups: A member described issues with an API for HTML causing lag in the Language Server Protocol (LSP) and stalling the mojo build, signaling they are pushing the limits of the current toolset.
  • Concerns About Memory Management and Safety: Discussing custom type creation in Mojo, a user was assured that the operating system would clean up memory post-process, meaning there’s no need for manual memory management in Mojo.
  • Engine Requirements Inquiry Redirected: A query about whether the MAX engine requires AVX support was redirected to a more appropriate channel, suggesting channel-specific queries further inform and declutter discussions.

Modular (Mojo 🔥) ▷ #community-blogs-vids (15 messages🔥):

  • PyConDE Presentation Reflects on Mojo’s First Year: A talk exploring Mojo’s impact and potential as a Python superset at PyConDE in Berlin was highlighted, questioning its ability to make its mark among alternatives like Rust, PyPy, and Julia. The talk titled “Mojo: Python’s faster cousin?” discussed whether Mojo could live up to its promise or remain a programming language footnote, and a recording of the presentation will be available shortly (watch here).

  • Rust Community Skepticism Over Mojo: The community discussed the Rust community’s skepticism towards Mojo, noting that Rustaceans tend to dismiss Mojo’s claims, which some perceive as not backed by substantial evidence. One shared a blog post debating Mojo’s safety and speed, especially in comparison to Rust’s approach to mutable aliased references.

  • Expectations for Mojo on Benchmarks: Members conversed about anticipating Mojo’s performance on future benchmarking tests, with some noting that while benchmarks on GitHub should be viewed as perspectives rather than definitive verdicts, others eagerly await Mojo’s comparison results with other programming languages.

  • Mojo’s Controversial Marketing Strategy: A link to a discussion critiquing Mojo’s marketing approach was shared, suggesting it focused more on targeting programming influencers rather than producing in-depth technical content, which could indicate a strategic but controversial path to popularization.

  • Sharing Mojo Insights with Students: A member shared an event by Google Developer Student Club at Budapest University of Technology and Economics about Python and Mojo, highlighting Mojo’s features and integration with Python. The notes from the event, aimed at helping students understand Mojo better, are available, and the member seeks tips for promoting Mojo to students (event details).

Links mentioned:


Modular (Mojo 🔥) ▷ #performance-and-benchmarks (5 messages):

  • Debating Dual RNG Systems: A discussion was raised about having two versions of random number generators: one optimized for performance and another cryptographically secure that executes in constant time.
  • Exploring Mojo’s Support for RVO: There’s been testing to see if Mojo supports return value optimization (RVO) as C++ does. Unexpected behavior led to opening this issue upon a suggestion from a fellow member.

Link mentioned: Issues · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.


Modular (Mojo 🔥) ▷ #📰︱newsletter (1 messages):

Zapier: Modverse Weekly - Issue 31 https://www.modular.com/newsletters/modverse-weekly-31


Modular (Mojo 🔥) ▷ #🏎engine (4 messages):

  • Explaining Acronyms in Chat: A user inquired about the acronym “QPS,” which another clarified as Queries Per Second.
  • Optimism for Progress: A member expressed their optimism about future improvements, implying that the system’s performance is expected to continue on an upward trend.
  • Performance Parity Curiosity: Noting a peculiar observation, a user mentioned that despite differences in architecture, PyTorch and TensorFlow reported the same performance gain, which they found amusing.

Modular (Mojo 🔥) ▷ #nightly (32 messages🔥):

  • Null String Bug Alert: An issue with print(str(String())) leading to corrupted future prints of String()-values was mentioned, followed by a prompt to file a bug report. The issue was recorded in GitHub as #239two.

  • String Termination Troubles: Discussion on null terminated strings highlighted their role as a frequent bug source in the stdlib, with at least 5 bugs attributed to them. The necessity of these strings for C interoperability but their potential for revision with less dependency on the C++ runtime was recognized by members.

  • Nightly Mojo Compiler Update Rolled Out: A new version of the nightly Mojo compiler has been released, and users are encouraged to update with modular update nightly/mojo. The diff since the last nightly can be viewed here and the changelog here.

  • Function Overload Expansion: There was a chuckle over the new addition of overload methods with a reference to a pull request discussion noting an external call needing 15 different arguments. The preference over the current method was for the usage of tuples or heterogeneous variadic lists for better function implementations.

  • SIMD and EqualityComparable Hack: A member discovered a hack allowing SIMD to conform to EqualityComparable using function overload precedence. The conversation touched on its drawbacks, notably the necessity to implement it in stdlib types and the search for a possibly better approach.

Links mentioned:


Latent Space ▷ #ai-general-chat (77 messages🔥🔥):

<ul>
  <li><strong>Transformer Architecture Tweaks Under Discussion:</strong> Members were discussing an approach to improve transformer models by taking inputs from intermediate attention layers in addition to the last attention layer, likening the method to the Pyramid network in CNN architectures.</li>
  <li><strong>TherapistAI Sparks Controversy:</strong> A member highlighted the controversy surrounding levelsio's <a href="https://twitter.com/meijer_s/status/1783032528955183532">TherapistAI</a> on Twitter, criticizing its potentially misleading suggestion that it could replace a real therapist.</li>
  <li><strong>Semantic Search Solution Inquiry:</strong> A discussion about finding a good semantic search API like <a href="https://newsapi.org">newsapi.org</a> led to recommendations including<a href="https://omnisearch.ai/"> Omnisearch.ai</a>, though it wasn't a fit for scanning the web for news.</li>
  <li><strong>France Steps Towards LLMs in the Public Sector:</strong> There was a conversation regarding France's experimental incorporation of LLMs into public administration, with insights and opinions shared about France's innovation and political climate, linking to a <a href="https://twitter.com/emile_marzolf/status/1783072739630121432">tweet about the topic</a>.</li>
  <li><strong>AI Winter Predictions Stir Discussion:</strong> Users deliberated over the state and future of AI venture funding prompted by a <a href="https://x.com/schrockn/status/1783174294865887521?s=46&t=90xQ8sGy63D2OtiaoGJuww">tweet on AI bubble effects</a>, reflecting on the implications of a potential bubble burst for AI innovation.</li>
</ul>

Links mentioned:


LangChain AI ▷ #general (47 messages🔥):

  • Exploring pgvector with LangChain: A member asked for guidance on using a pgvector store as context for a chatbot and received detailed steps and resources for initialization and document retrieval. The utilized methods include max_marginal_relevance_search_by_vector from PGVector store, which finds relevant documents, and OpenAIEmbeddings for generating embeddings. Further technical details can be found in the LangChain GitHub repository.

  • Dissecting SelfQueryRetriever mechanics: The SelfQueryRetriever from LangChain’s langchain_community.retrievers library was discussed, outlining its use in fetching documents from a specific table in a PGVector store. The approach requires setting up several environment variables and retrieving documents via the invoke method.

  • Graph Building for Conversational AI: Advice was shared on creating a graph to decide whether to fetch documents for context or respond without any context using createStuffDocumentsChain from LangChain. The graph construction involves setting up a chat model and invoking it with the appropriate context.

  • Chroma and Vercel AI SDK endorsement: A member recommended checking out Vercel AI SDK and Chroma but did not provide specific reasons or links.

  • Beginning with LangChain: A new user expressed interest in building a chatbot with LangChain, seeking tips and confirming their intent to use LangChain features such as Redis-backed chat memory and function calling. They received a link to LangChain’s chatbot documentation as a resource.

  • Distinguishing Stores and Chat Memory: Differentiating between Redis stores for chat message history and a generic key-value RedisStore, a user clarified that chat message history is for persisting chat messages by session, while RedisStore is more generalized, meant to store any byte value by string key.

Links mentioned:


LangChain AI ▷ #langchain-templates (1 messages):

  • Seeking Template Structure for LLaMA-3: A member inquired about the existence of headers in the LLaMA-3 prompt template for providing context to questions, referencing the official documentation. Concerns were raised about the completeness of the documentation due to the model’s recent release.

LangChain AI ▷ #share-your-work (6 messages):

  • Expanding RAG with LangChain: An article details the integration of adaptive routing, corrective fallback, and self-correction techniques into Retrieval-Augmented Generation (RAG) frameworks using Langchain’s LangGraph. Explorations can be further read on Medium.

  • In Search of Pull Request Partners: A member inquires about where to request a review for a partner pull request, considering if the share-your-work channel is appropriate for such a discussion.

  • The Brewed Rebellion: A new text-based RPG named “Brewed Rebellion” is shared, encouraging players to navigate workplace politics to form a union as a barista at StarBeans. Check out the intrigue at play.lab.ai.

  • Introducing Collate: A platform named Collate has been introduced, transforming saved articles into a daily newsletter digestible in bite-sized form. Feedback is welcome, and you can try it out at collate.one.

  • Clone of Writesonic and Copy.ai Launched: BlogIQ, a new app powered by OpenAI and Langchain that aims to simplify the content creation process for bloggers, is now available on GitHub.

Links mentioned:


LangChain AI ▷ #tutorials (1 messages):

  • Embedding Models Showdown: A member offered insight into the best embedding models available by sharing a video. The content aims to clarify doubts regarding model choices and can be viewed here.

Cohere ▷ #general (42 messages🔥):

  • Cohere Toolkit Goes Open Source: The exciting news that Cohere’s Toolkit is being open-sourced spread, allowing users to add custom data sources and deploy to the cloud. It works with Cohere models on various cloud platforms.

  • Github Repo for Cohere Toolkit Shared: Users shared the GitHub repository of Cohere Toolkit, which garners praise for accelerating the deployment of RAG applications.

  • Toolkit Troubleshooting and Inquiries: A user reported difficulties when working with files in Toolkit on Docker for Mac. Another user queried about the privacy of deploying apps on Azure using the Cohere API key, and it was clarified by @co.elaine that the API key is optional for full privacy.

  • API and Playground Results Mismatch: One user struggled with mismatched results when using the API compared to the playground, specifically when incorporating site connector grounding in their code. Through back and forth help, they made some corrections but still observed slightly different responses between the playground and their script.

  • Support and Acknowledgments for Cohere Team: Users expressed their gratitude and appreciation, particularly towards cofounder @mrdragonfox and @1vnzh, for their efforts and the release of the open-source toolkit, emphasizing how this could benefit the opensource community.

Links mentioned:


Cohere ▷ #project-sharing (6 messages):

  • Misidentification of a Hit Piece: A discussion unfolded addressing the claim that an article was a ‘hit piece’ against Cohere, with the participant admitting to not remembering the specifics of where they read what they cited.
  • The Crux of the Jailbreak Issue: Conversants grappled with the core argument of an article, summarizing its message as adding a jailbreak to Cohere’s LLM could result in creating D.A.N-agents capable of malintent.
  • Lost in Memory, Not in Reason: The critic of the article was challenged for not providing specifics from the article to back up their claim of it being unfairly critical of Cohere.
  • Debating the Responsibility for Backing Claims: It was pointed out that if one is going to label an article as maliciously motivated, they should be prepared to substantiate such attacks with specific points from the work in question.

OpenInterpreter ▷ #general (32 messages🔥):

  • Exploring OpenInterpreter Performance: Members report varying levels of success with different models on OpenInterpreter, with Wizard 2 8X22b and gpt 4 turbo identifying as top performers for following system message instructions and function calling. The issue of models, like llama 3, showing inconsistency has been noted.

  • Local Model Execution Confusion: There seems to be confusion when executing code locally using different models as the interaction with OpenInterpreter in a terminal doesn’t always result in the actual running of the code, despite the model writing it out. The use of an additional flag --no-llm_supports_functions was cited as a solution to correct some errors.

  • Need for OpenInterpreter Update: Discussion of updates necessary for fixing local models with OpenInterpreter, advising the use of the -ak dummykey flag for improvement. If issues persist, members have been redirected to a specific Discord channel to seek help.

  • UI Development for AI Devices: Inquiry into alternatives for building a user interface for an “AI device” started a conversation around options beyond tkinter for UI development, considering future use with microcontrollers.

  • Vision Model Discussions and Demonstrations: Links to GitHub repositories and papers discussing computer vision models have been shared, with the focus on lightweight models like moondream and also mention of running models like llama3 on different quantization settings to manage VRAM usage.

Links mentioned:


OpenInterpreter ▷ #O1 (14 messages🔥):

  • 01 Project Expansion: Members discussed running the 01 project on external devices, inspired by implementations such as Jordan Singer’s Twitter post.
  • Killian’s Spider Scenes: Another example of 01’s flexibility highlighted was embedding the AI into Grimes’ spider, shared by a member with Killian’s tweet.
  • Inquiries on 01 Product Development: There was a question about the current state of the 01 light product design and how to replicate the functionality demonstrated in Killian’s video using the M5 echo device.
  • Windows 11 Installation Guide for 01: A detailed set of instructions was provided for installing the 01 project on Windows 11, including a link to install miniconda and the GitHub repository for the software.
  • Local Model Execution on 01: The chat contained guidance on running local language models such as Ollama on 01, using the command poetry run 01 —local and selecting the desired model.

Links mentioned:


OpenInterpreter ▷ #ai-content (1 messages):

8i8__papillon__8i8d1tyr: https://mlflow.org/


Interconnects (Nathan Lambert) ▷ #news (3 messages):

  • Phi-3-128K Steps into the Blind Test Ring: A member revealed that Phi-3-128K has entered the testing phase, being included in blind tests.
  • Identity Inquiry as Conversation Start: It was mentioned that conversations now begin with the question “who are you,” as a standard interaction.
  • Preventing Model Name Disclosure: Members learned that LMSys is designed to exclude responses that might reveal the model’s name, maintaining the integrity of blind tests.

Interconnects (Nathan Lambert) ▷ #ml-questions (17 messages🔥):

  • Instruction-Tuning Relevancy Quest: A member questioned the relevance and enduring popularity of instruction-following evaluations mentioned in Sebastian Ruder’s article, looking for ways to assess their impact.
  • Novel Evaluation Benchmarks for LLMs: Discourse touched on new benchmarks like LMentry to test large language models in a simple and interpretable manner; M2C, which probes model behavior for different linguistic features; and IFEval, which offers verifiable instruction-following assessments.
  • Simplicity in Complexity: Amidst advancements, one user embraces the simplicity of the MMLU-ChatBotArena for GPU efficacy evaluation, humorously self-identifying as a “simpleton” in this area.
  • HELM on the Horizon: A user noted recent updates from the HELM team on introspection features for model performance analysis, questioning the current status and impact of HELM in the AI community.
  • Repligate on RL Sophistication: A thread from https://x.com/repligate was shared, discussing the potential and challenges of reinforcement learning techniques, like RLHF, for AI models, and praising the interesting outcomes and anomalies in training models like Claude.

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (12 messages🔥):

  • Reflections on Missing Tweets: A member realized the absence of tweets about a blog post was due to having muted the word “AGI”.
  • Cohere Releases Chat Interface Source: Cohere’s decision to open-source their chat interface sparked positive remarks and a suggestion that a member’s advice might have influenced this move. The source code is available on GitHub.
  • Influence in the AI Industry: A few members joked about Nathan Lambert’s growing influence on the AI industry and the playful notion that he might need to be given “laser eyes.”
  • Company Motives Scrutinized: One discussed the odd reasoning behind a company’s actions without specifying the details, suggesting that the reasoning doesn’t actually matter.
  • Disdain for the Phrase “Pick Your Brain”: There was an expression of discomfort with the term “pick your brain”, especially when individuals are busy and need to deny such requests.

Link mentioned: Tweet from Nick Frosst (@nickfrosst): we open sourced our chat interface. https://github.com/cohere-ai/cohere-toolkit/?tab=readme-ov-file


Interconnects (Nathan Lambert) ▷ #posts (10 messages🔥):

  • SnailBot Delivers: SnailBot is seemingly functioning and community feedback is requested on whether the notification tags are bothersome.
  • Link Troubles for “Reward Is Enough”: A member reported issues accessing the “Reward is Enough” article, encountering a content access problem which may be confused with a paywall.
  • No Account, No Access: Another member confirmed that the link is inaccessible without an account on the hosting website.
  • Personal Problem Diagnosis: A member humorously suggested that the issue with accessing the content could be an individual problem.
  • Access Issue Resolved: The user ultimately resolved their access issue with the provided link.

Mozilla AI ▷ #llamafile (25 messages🔥):

  • Llamafile Users Face Mlock Errors: A member reported experiencing a “failed to mlock” error while trying to run the phi2 llamafile on both cmd and powershell, without providing further details or solutions.
  • Anticipation for Phi3 Llamafile Release: Queries about the release of a Phi3 llamafile have been addressed with directions to use Microsoft’s GGUF files, as Microsoft has detailed steps on how to use Phi3 with llamafile.
  • Image Payload Issues with Llama Model: A user encountered issues when sending base64 encoded images in the JSON payload, as the llama model did not recognize the images and marked multimodal : false. The user shared their encoding approach but did not specify if a resolution was found.
  • Mistral Llamafile Requires Update: In response to feedback, modifications have been made to the documentation of the Mixtral 8x22B Instruct v0.1 llamafile on its Hugging Face repository.
  • Windows Defender Misidentifies Llamafile as Trojan: A file from Hugging Face was mistakenly flagged as a trojan by Windows Defender, with a member advising solutions like using a VM or whitelisting the folder in Defender settings, also noting the constraints with reporting false positives to Microsoft.

Links mentioned:


DiscoResearch ▷ #general (7 messages):

  • Batching Prompts in Local Mixtral: A member is looking to send a “batch” of prompts through a local mixtral using 2 A100s but faces challenges finding a real example for them to follow. They mention using vLLM previously and consider trying the recently open-sourced TGI, wondering if there are direct Python usages or if TGI is intended only as an API server.

  • LLM-Swarm for API Endpoint Management: Another member points to huggingface/llm-swarm for managing scalable open LLM inference endpoints in Slurm clusters, although they note that it might be overkill for just two GPUs.

  • TGI Primarily an API Server vs vLLM Local Python Mode: Discussing TGI’s purpose, a member suggests that it’s intended mainly as an API server and recommends using asynchronous requests with continuous batching. They emphasize that while vLLM’s local Python mode is convenient for experimentation and development, decoupling components allows for easier LLM swaps on different infrastructures or through APIs.

  • Batch Completion Operations Without an API Server: A contribution is made about using litellm.batch_completion to run batch requests against an API server, but a member specifies they prefer to operate directly in Python without an API server, intending to stick with vLLM and the vllm.LLM class.

  • TGI Focused on Low Latency, vLLM on High Throughput: The distinction between TGI and vLLM is highlighted in a message, with TGI being described as API-first and focused on low latency, whereas vLLM is noted for being an easy-to-use library oriented towards cost-effective, high throughput deployment.

Links mentioned:


DiscoResearch ▷ #discolm_german (10 messages🔥):

  • Prompt Nuances in German: A user inquired about the impact of using the informal “du” versus the formal “Sie” in prompting the DiscoLM series models in German, seeking insights based on others’ experiences.

  • Challenges with Text Summarization Constraints: A user expressed difficulties in getting DiscoLM models to adhere to a 150-word count or a 1000-character limit for text summarization tasks, noting that the max_new_tokens parameter resulted in mid-sentence cutoffs.

  • Queries on Sharing Quantifications: A member contemplated providing quantifications for experimental versions of the Llama3_DiscoLM_German_8b_v0.1 model, questioning if it’s better to contribute now or wait for a more advanced release. Another user encouraged sharing, noting that while upcoming improved versions are in the works, they will take some time to be production-ready.

  • Phi-3’s Impressive Benchmarks: A user shared excitement after discovering that Phi-3 scored nearly perfectly on the Ger-RAG-eval, and asked the community for guidance on how to apply a “classification mode” for live inference using the model, similar to benchmark evaluations.

  • Technical Difficulties with DiscoLM-70b: A member reported encountering a “Template not found” error and nonsensical outputs when attempting to interact with DiscoLM-70b using the huggingface/text-generation-inference endpoint, despite the successful run process.


Datasette - LLM (@SimonW) ▷ #llm (7 messages):

  • Python API mysteries solved: A member inquired about Python API documentation for Datasette’s LLM, to use it programmatically for tasks like embedding a directory of text files. Another user provided a direct link to the detailed explanation on how to utilize the Python API.

  • Claude meets Hacker News with LLM CLI: Simon Willison discussed using Claude in conjunction with the LLM CLI tool for summarizing long Hacker News threads. He elaborated on his process and provided a link to his workflow using the LLM CLI with an LLM plugin.

  • Embedding API Usage Explored: For embedding text files through the Python API, the conversation pointed to LLM’s embedding API documentation, which includes code snippets and usage examples for handling embeddings efficiently, including embed_multi() for multiple strings.

  • Python Equivalent for CLI Embedding Feature: The creator, Simon Willison, clarified that there is no direct Python equivalent for the “embed every file in this directory” CLI feature, but he shared the relevant section in the GitHub repo where one can see how the CLI implements it using the Python API.

Links mentioned:


Skunkworks AI ▷ #general (1 messages):

burnytech: Hi!


Skunkworks AI ▷ #off-topic (2 messages):

  • AI Developer Meetup in Toronto: The Ollamas and friends are hosting a local and open-source AI developer meetup at the Cohere space in Toronto, with food and a chance for networking and lightning talks. Interested parties are encouraged to register early due to limited space at the event described in this Toronto AI Meetup Registration Link.

Link mentioned: Toronto Local & Open-Source AI Developer Meetup · Luma: Local & open-source AI developer meetup is coming to Toronto! Join the Ollamas and friends at the Cohere space! Special thank you to abetlen (Andrei), the…


LLM Perf Enthusiasts AI ▷ #general (1 messages):

jeffreyw128: https://twitter.com/wangzjeff/status/1783215017586012566


LLM Perf Enthusiasts AI ▷ #opensource (1 messages):

  • Divergent Inference Provider Implementations: A member queried about the reason inference providers like Octo AI and Anyscale have differing JSON mode implementations for the same open-source models. They noted that Octo AI’s approach resembles a function call requiring a schema, while Anyscale aligns more closely with OpenAI’s implementation.

LLM Perf Enthusiasts AI ▷ #openai (1 messages):

  • Context Utilization in Question: A member mentioned it’s unclear to them how effectively the tool uses the full context available. However, they stated that it performs better than GPT, insinuating some level of improved effectiveness.

Alignment Lab AI ▷ #general-chat (1 messages):

neilbert.: Congrats! You are now Laurie Anderson!


AI21 Labs (Jamba) ▷ #general-chat (1 messages):

  • Inappropriate Link Shared: A member posted a link promoting Onlyfans leaks & Daily Teen content that is clearly inappropriate for this Discord community, accompanied by emojis suggesting adult content. The message contains a Discord invite link and tags to grab the attention of all members.

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.