So you have time to either:

And congrats to Logan on joining Google.

Table of Contents

[TOC]

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence. Comment crawling still not implemented but coming soon.

Open Source Models and Libraries

RAGFlow open sourced: RAGFlow, a deep document understanding based RAG engine leveraging pip-library-etl-1.3b, is now open source. Key features include 16.3k context length, automated library parsing, example tuning, static and dynamic function analysis, and natural language instruction support. (link, link)
Jamba v0.1 released: Lightblue released Jamba v0.1, a 52B parameter Apache-licensed model with Mixture-of-Experts (MoE) architecture. However, some users found its outputs repetitive and underwhelming compared to expectations. (link, link)
Command-R from Cohere: Command-R, a model from Cohere available on the ollama library, is reportedly pretty good but not widely discussed. (link)

Model Performance and Capabilities

GPT-3.5-Turbo architecture details: An analysis of GPT-3.5-Turbo's logits estimated it has an embedding size of around 4096 and around 7 billion parameters, aligning with sizes of recent open models like OpenChat-3.5-0106 and Mixtral-8x7B. (link)
AI chatbots beat humans in debates: In a study, AI chatbots were more persuasive than humans in debates on contentious topics. People were more likely to change their minds when challenged by GPT-4 compared to human debaters. (link)
Mistral-7B makes amusing mistakes: Mistral-7B made amusing mistakes when answering a simple math riddle about counting fruits, failing to understand that shoes are not fruits and giving inconsistent answers. (link)

Hardware and Performance

Discounted HGX H100 machine: Someone snagged a new HGX H100 640GB machine with 8 H100s on eBay for only $58k, a steep discount from the $270k retail price. (link)
Epyc vs Threadripper for LLM inference: A performance comparison of the Epyc 9374F and Threadripper 1950X CPUs on LLM inference tasks was posted. (link)
GPU recommendations for local LLMs: Advice was sought on the best GPU options (GTX 1070, RTX 3050 8GB, RTX 2060S) for running 7-13B parameter LLMs locally on Windows, with key considerations being VRAM and inference speed. (link)
Optimizing local LLM with 4090 GPU: A user with a 4090 GPU and 64GB RAM asked for recommendations on the best local LLM for uncensored roleplay/chatbot use, finding Midnight-miqu-70b-v1.0.q5_k_s too slow at 0.37 it/s. (link)

Stable Diffusion and Image Generation

SD reduces gaming time: Stable Diffusion reduced gaming time for some users who now prefer exploring their creativity with generative AI. (link)
Bias in evaluating AI art: In a study, people preferred AI-generated art when they thought it was made by humans, but struggled to tell the difference, suggesting bias in evaluating AI art. (link)
Regional prompting experiments: Regional prompting experiments with A8R8, Forge, and a forked Forge Couple extension allow more granular control over image generation, with the new interface supporting dynamic attention regions, mask painting, and prompt weighting to minimize leakage. (link)
Choosing best epoch for LoRA: Questions were raised about choosing the best epoch when training LoRA models, as results at 30% training sometimes looked better than 100%. Finding the right settings to recreate the LoRA's captured aesthetic also requires trial and error. (link)
Recreating MidJourney styles in SD: A user is learning to recreate MidJourney styles in Stable Diffusion for more control and consistency, seeking advice on reverse engineering a retro Siamese cat image. (link)

Miscellaneous

Google AI chief on AI hype: Google's AI chief said the billions going into AI means "a bunch of hype and maybe some grifting", but we may be at the beginning of a new scientific renaissance. He believes AGI has a 50% chance of arriving in the next decade. (link)
Frustration with OpenAI's dominance: OpenAI's pervasiveness in workplaces is frustrating some who object to its closed model, profiting from web-scraped data, and abandonment of open source principles. However, there is pressure to use it to pay the bills. (link)
One-click image tagging app: A free Windows app was created to rename and add relevant metadata to images/GIFs using GPT vision in one click. (link)
DagsHub Storage Buckets launched: DagsHub launched DagsHub Storage Buckets, an S3-compatible Google Drive alternative integrated with Google Colab, aiming to provide scalable storage for ML workflows. (link)

Memes and Humor

Stale Diffusion paper: A humorous "Stale Diffusion" paper proposed hyper-realistic 5D movie generation using old-school methods. The authors lamented its rejection by even unserious venues. (link)
OpenAI removing Sam Altman meme: An image macro joked about OpenAI removing Sam Altman's ownership of its startup fund. (link)
"I can take it" meme: A meme depicted someone confidently claiming they can "take it" in response to an unknown challenge. (link)

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

AI Models and Architectures

DBRX: @DbrxMosaicAI noted DBRX is the top open-source model on the latest WildBench Leaderboard on HuggingFace, trained on 12T tokens of high-quality data with efficiency improvements like fine-grained MoE, GPT-4 tokenizer, and architecture modifications.
Jamba: @AI21Labs released the Jamba whitepaper detailing the novel hybrid SSM-Transformer architecture interleaving Mamba, Transformer and MoE. @amanrsanger noted Jamba's KV cache is 8x smaller than a pure Transformer, but requires more memory for long context generation.
Gecko: @kelvin_guu shared Gecko is the strongest model under 768-dim on the Massive Text Embedding Benchmark (MTEB), available on Google Cloud for RAG, retrieval, vector databases, etc.
DenseFormer: @fchollet highlighted DenseFormer which takes a weighted average of all prior blocks' output at each transformer block, improving performance with a sparsification strategy to avoid IO bottlenecks.

Retrieval Augmented Generation (RAG)

RAG with LlamaParse and Local Models: @llama_index shared a tutorial on building advanced PDF RAG with LlamaParse and local models like @GroqInc, FastEmbed by @qdrant_engine, and flag-embedding-reranker for efficient RAG setups.
RAG with Chroma: @DeepLearningAI noted RAG is effective if retrieved information is relevant, and the course "Advanced Retrieval for AI with @trychroma" teaches techniques to improve retrieval relevancy.
RAFT: @llama_index is hosting a webinar on retrieval-augmented fine-tuning (RAFT) with @tianjun_zhang and @shishirpatil_, lead co-authors of RAFT, to discuss fine-tuning and RAG.
RAGFlow: @rohanpaul_ai shared RAGFlow, a RAG engine based on deep document understanding that is now open-sourced and works with on-premise LLMs.

Tooling and Infrastructure

Keras 3 + JAX: @fchollet benchmarked Keras 3 + JAX against popular HuggingFace models, showing 1.5-3x speedups, noting Keras leverages the compiler for performance so users can write simple, readable code.
Instructor: @jxnlco released instructor 1.0.0 with proper autocomplete, helper methods for partials, iterables, and original responses, while maintaining the simple instructor.from_openai and instructor.patch APIs.
LangChain Financial Assistant: @virattt added Buffett-inspired tools to the LangChain Financial Assistant like owner earnings, ROE, ROIC calculations, with plans to add price-based tools next.
FastEmbed: @qdrant_engine announced FastEmbed now allows generating efficient, interpretable sparse vector embeddings using the SPLADE++ model.

Research and Techniques

Spatial Consistency in Text-to-Image Models: @RisingSayak shared a paper investigating spatial consistency in T2I diffusion models, improving it with systematic re-captioning methods and the SPRIGHT dataset.
Sequoia: @AlphaSignalAI shared a paper on Sequoia, a hardware-aware speculative decoding algorithm that can improve LLM inference speed by 10x by optimizing the storage of speculated tokens based on available hardware.
Stealing from LLMs: @AlphaSignalAI noted a paper showing you can exploit logprobs from an LLM API to extract information about the model like hidden dimensions or token embeddings.
Temporal Alignment: @AlphaSignalAI shared a paper exploring aligning an LLM's knowledge to a certain point in time to create temporal grounding.
Pretraining Data: @cwolferesearch highlighted a survey paper aggregating best practices for creating high-quality LLM pretraining datasets as more researchers share details on pretraining data construction.

Memes and Humor

@francoisfleuret joked that research is an "eternal stream of glorious sudoku or crossword grids designed by the universe itself" and solutions accumulate on the "heap of human knowledge".
@teortaxesTex poked fun at Americans' ability to come up with "bizarre non-standard units of measurement by comparing $stuff to something in their immediate visual field".
@DeepLearningAI shared a meme about AI solving problems, with the image showing a poorly trained GAN.
@suno_ai_ joked about SuNoise™, a new AI-generated frequency "previously thought to be outside the range of human hearing" that only 2.5% of people can hear.
@AravSrinivas quoted a fake Steve Jobs quote for April Fool's Day: "Don't believe all you read on the Internet".

AI Discord Recap

A summary of Summaries of Summaries

Claude 3 Haiku Impresses as Budget-Friendly Opus Alternative: The smaller and cheaper Claude 3 Haiku model is generating buzz for its effective reasoning and trick question handling, posing as a cost-efficient alternative to Opus in Perplexity AI. Discussions also focused on Perplexity's potential plans to introduce ads and the preference for the Writing focus mode over All focus for cleaner LLM interactions. (Perplexity AI Discord)
Gecko and Aurora-M Push Boundaries in Text Embedding and Multilingual LLMs: The new Gecko model demonstrates robust performance on the Massive Text Embedding Benchmark (MTEB) and may accelerate diffusion model training, as detailed in its Hugging Face paper and arXiv abstract. Meanwhile, the Aurora-M model, with 15.5B parameters, is geared towards multilingual tasks and has processed over 2 trillion training tokens, as highlighted on Twitter and arXiv. (LAION Discord)
Efficient Fine-Tuning Techniques Spark Debate: Conversations in the Unsloth AI community revolved around strategies for dataset splitting, the efficacy of sparse fine-tuning (SFT) versus quantization methods like QLora, and the steep costs associated with model pre-training. Members also highlighted the need for robust detection systems to combat AI misuse and protect Discord servers from malicious bots and scams. (Unsloth AI Discord)
Stable Diffusion Community Anticipates SD3 and Tackles Model Challenges: The Stable Diffusion community is buzzing with anticipation for the 4-6 week release timeline of Stable Diffusion 3 (SD3), while also addressing challenges with rendering facial and hand details using tools like Adetailer and various embeddings. Discussions touched on the rapid pace of AI development, ethical considerations around using professional artwork for training, and the potential memory demands of future SD versions. (Stability.ai Discord)
Mojo 24.2 Introduces Python-Friendly Features as Tensor Talk Heats Up: The Mojo Programming Language community is abuzz with the release of Mojo 24.2, which brings a host of Python-friendly features and enhancements. Discussions delved into Mojo's handling of parallelism, value types, and tensor performance optimizations. The announcement of the MAX Engine and C/C++ interop in Mojo also generated excitement for its potential to streamline Reinforcement Learning (RL) Python training. (Modular Discord)
Tinygrad Grapples with AMD GPU Instability and Cultural Resistance: The tinygrad community expressed frustration with severe system instability when using AMD GPUs, highlighting issues like memory leaks and non-recoverable errors. Skepticism was directed towards AMD's commitment to resolving underlying problems, with calls for open-source documentation and modern software practices. Discussions also touched on workaround strategies and the need for a fundamental cultural shift in AMD's approach to software and firmware. (tinygrad Discord)
LLM Serving Platforms Compete as Triton Alternatives Emerge: Discussions in the LM Studio and MAX Serving communities focused on the capabilities of different LLM serving platforms, with MAX Serving being explored as a potential alternative to Triton. Users sought guidance on migrating existing setups and inquired about support for features like GPU-hosted models. The LM Studio community also grappled with error messages and compatibility issues across various models and hardware configurations. (LM Studio Discord, Modular Discord)
Retrieval-Augmented Fine-Tuning (RAFT) Takes Center Stage: LlamaIndex hosted a webinar featuring Retrieval-Augmented Fine-Tuning (RAFT) with lead co-authors Tianjun Zhang and Shishir Patil, delving into how RAFT combines the benefits of retrieval-augmented generation (RAG) and fine-tuning to improve language models' performance in domain-specific settings. The webinar aimed to provide insights and resources for those interested in implementing RAFT in their own projects. (LlamaIndex Discord)
Axolotl Advances with Lisa Merge and DeepSpeed Challenges: The Axolotl AI Collective celebrated the approval of the latest PR for lisa and the addition of a YAML example for testing. However, developers encountered out-of-memory errors when attempting to train models with DeepSpeed or FairScale Single-Process Single-GPU (FSDP). The collective also made strides in dataset unification efforts and expressed interest in exploring runpod serverless for very large language models (VLLM). (OpenAccess AI Collective Discord)
FastLLM and RankLLM Push Boundaries in Retrieval and Reranking: Qdrant introduced FastLLM, a language model boasting a 1 billion token context window, aimed at enhancing AI-driven content generation, as detailed in their announcement post. Meanwhile, RankLLM by @rpradeep42 et al., an open-source collection of LLMs fine-tuned for reranking, was recommended for those building advanced RAG systems, with emphasis on the importance of choosing the right reranker. (HuggingFace Discord, LlamaIndex Discord)

PART 1: High level Discord summaries

Perplexity AI Discord

Claude 3 Haiku Enters the Fray: The smaller and cheaper Claude 3 Haiku is generating buzz for its effective reasoning and trick question handling, posing as a cost-efficient alternative to Opus.
Perplexity Users Ponder Advertising Prospects: Discussion thrives around a potential shift in Perplexity AI's strategy with the introduction of ads, stoking debates about the authenticity of recent announcements, possibly tied to an April Fools' gag.
Selecting the Superior Search: Participants advocate for the Writing focus over the All focus in Perplexity for a streamlined and less problematic Large Language Model (LLM) interaction.
Prompt Defence Protocols in Question: Security concerns heighten over prompt attacks on Perplexity AI's models. Discourse turns to the necessity for robust safeguards against malicious injections and data poisoning.
Price Tag Shock for Gemini 1.5 Pro API: An active dialogue contests the steep pricing of Gemini 1.5 Pro API, leading to conversations about more budget-conscious tiered pricing structures based on token consumption.
Embracing the Cascade: Keen members exchange insights into Stable Cascades, referencing Perplexity AI for in-depth understanding.
A Peek into Perplexity's Neo: Queries launched to unravel what sets Neo apart, with an eye on distinct attributes.
Managing Perplexity Subscriptions Skillfully: A hiccup arises as API credits get ensnared in "Pending" limbo, and the lack of a team signup option for Perplexity's API draws attention.
Comparing Token Economies: Resources are shared to contrast the token expense between Perplexity and ChatGPT, fostering informed decisions for users.

LAION Discord

Gecko Climbs to New Heights in Text Embedding: The new Gecko model demonstrates robust performance on the Massive Text Embedding Benchmark (MTEB) and may accelerate diffusion model training, as detailed in its Hugging Face paper and arXiv abstract. Interest in Gecko's practical application is reflected in queries about the availability of its weights.

Aurora-M Lights Up Multilingual LLM Space: The Aurora-M model, with 15.5B parameters, is geared towards multilingual tasks while adhering to guidelines set by the White House EO and is celebrated for processing over 2 trillion training tokens, as highlighted on Twitter and arXiv.

Hugging Face's Diffusers Under the Spotlight: Contributions to Hugging Face's Diffusers stirred debates around efficiency, with a focus on a PR regarding autocast for CUDA in Diffusers and incomplete unification in pipelines, as seen in discussion #551 and PR #7530.

PyTorch Gears Up with 2.6 Stirring Curiosity: Discussions around updates in PyTorch versioning sparked interest, especially regarding the silent addition of bfloat16 support in PyTorch 2.3, and anticipation for new features in the upcoming PyTorch 2.6. Noteworthy contributions include a critique of autocast performance with details in a GitHub thread.

LangChain Event Hooks in AI Engineers With Harrison Chase: Harrison Chase, CEO of LangChain, prepares to talk at an online event about leveraging LangSmith in moving from prototype to production on April 17 at 6:30 PM, with registration available here. His company focuses on using LLMs for context-aware reasoning applications.

Unsloth AI (Daniel Han) Discord

Model Might on a Budget: Guild members actively debated cost vs quality in AI modeling, with discussions ranging from $50K to several million dollars needed for pre-training varying in dataset size. A strong emphasis was placed on finding a balance between resource efficiency and maintaining high-quality outputs.

Scam Shield Tightens: Concerned with an increase in malicious bots and scams, the engineer community underscored the need for robust detection systems to thwart AI misuse and protect Discord servers.

Precision in Saving Space: Tips were shared on conserving space when saving finetuned models on platforms like Google Colab, with one user suggesting a method that saves 8GB of space but warned of a slight loss in accuracy.

Training Tactics Tussle: The optimal approach for division of datasets and the application of sparse fine-tuning (SFT) versus quantization methods was a hot topic, with insights into the trade-offs between performance and cost-effectiveness being highly sought after.

Integration Enthusiasm for DeepSeek: A user-proposed integration of the DeepSeek model into Unslotsh 4bit, showcasing the community's push for model diversity and efficiency improvements, with an accompanying Hugging Face repository and a Google Colab notebook set for implementation.

Stability.ai (Stable Diffusion) Discord

Cyberrealistic vs. EpicRealism XL: Debate is ongoing about the performance of two Stable Diffusion models: while Cyberrealistic demands precise prompts, EpicRealism XL outshines with broader prompt tolerance for realistic imagery.

SD3 Is Coming: The community is buzzing with the 4-6 weeks anticipated release schedule for Stable Diffusion 3 (SD3), with some doubt about the timing but evident excitement for improved features, notably a fixed text function.

Fixing Faces and Hands: The Stable Diffusion aficionados are tackling challenges with rendering facial and hand details, recommending tools such as Adetailer and various embeddings to enhance image quality without sacrificing processing speed.

CHKPT Model Confusion: In the sea of CHKPT models, users seek guidance for best use cases, pointing towards models like ponyxl, dreamshaperxl, juggernautxl, and zavychroma as part of a suggested checkpoint "starter pack" for Stable Diffusion.

Ethics and Performance in Model Development: Discussions touch on the rapid pace of AI development, ethical questions around using professional artwork for AI training, and speculated memory demands for future Stable Diffusion versions, all peppered with light-hearted community banter.

Nous Research AI Discord

DBRX Revealed: A new open-source language model titled DBRX is making waves, claiming top performance on established benchmarks. Watch the introduction of DBRX.

Whisper Models Under the Microscope: WhisperX might replace BetterTransformer given concerns over the latter's high error rates. Community mulling over Transforming the Web and Apple's latest paper on reference resolution.

Speed Meets Precision in LLM Operations: LlamaFile boasts 1.3x - 5x improved speed over llama.cpp on CPU for specific tasks, potentially altering future local operations. A configuration file for Hercules fine-tuning resulted in decreased accuracy, stirring debates over settings like lora_r and lora_alpha.

Hugging Face Misstep Halts Upload: ModelInfo loading issues caused by safetensors.sharded metadata from Hugging Face are preventing uploads to the chain, driving discussions for fixes.

Brainstorming for WorldSim: WorldSim enthusiasts propose a "LLM Coliseum" with competitive benchmarks, file uploads facilitating pre-written scripts, and speculation on future developments like competitive leaderboards and AI battles.

Traffic Signal Dataset Signals Opportunity: A traffic signal image dataset surfaced, promising to aid vision models despite Hugging Face's viewer compatibility issues.

tinygrad (George Hotz) Discord

Trouble in GPU Paradise: AMD GPUs are causing major headaches for tinygrad users, with system crashes and memory leak errors like "amdgpu: failed to allocate BO for amdkfd". Users share workarounds involving PCIe power cycling but remain unimpressed by AMD's perceived lack of commitment to addressing these bugs.

A Virtual Side-Eye to AMD's Program: An invitation to AMD's Vanguard program drew skepticism from George Hotz and others, sparking a debate over the effectiveness of such initiatives and the need for open-source solutions and better software practices at AMD.

Learning Curve for Linear uOps: A detailed write-up explaining linear uops was shared in the #learn-tinygrad channel, aiming to demystify the intermediate representation in tinygrad, complemented by a tutorial on the new command queue following a significant merge.

Tinygrad Pull Requests Under Scrutiny: Pull Request #4034 addressed confusion around unit test code and backend checks. A focus on maintaining proper test environments for various backends like CLANG and OpenCL is emphasized.

Jigsaw Puzzle of Jitted Functions: A knowledge gap regarding why jitted functions don't show up in command queue logs led to discussions about the execution of jitted versus scheduled operations within tinygrad’s infrastructure.

LM Studio Discord

LM Studio Tangles with Model Troubles: Engage with caution, LM Studio is throwing unknown exceptions particularly with estopian maid 13B q4 models on RTX 3060 GPUs, and users report crashes during prolonged inferencing. There's a growing need for Text-to-Speech and Speech-to-Text functionality, but currently, one must tether tools like whisper.cpp for voice capabilities.

In Quest for Localized Privacy: While the quest for privacy in local LLMs continues, one suggestion is to pair LM Studio with AnythingLLM for a confidential setup, though LM Studio itself does not have built-in document support. Meanwhile, Autogen is producing a mere 2 tokens at a time, leaving users to wonder about optimal configurations.

GPU Discussions Heat Up: SLI isn't necessary for multi-GPU setups; however, VRAM rather than combined VRAM is what's at play - an important spec for running models. A dual Tesla P40 setup is touting 3-4 tokens/sec for 70B models, while those on a budget admire P40s' VRAM, weighing it against the prowess of the 4090 GPU.

Top Models for Anonymous Needs: For the discrete engineer, the Nous-Hermes 2 Mistral DPO and Nous-Hermes-2-SOLAR-10.7B models come recommended, particularly for those needing to handle NSFW content. Tech hiccups with model downloads and execution have left some discontented, suspecting missing proxy support as the culprit.

Desiring Previous Generation Functionality: The convenience of splitting text on each new generation is missed, as current LM Studio updates overwrite existing output, prompting requests for a revert to the previous modality.

Eleuther Discord

Google Packs the Web in RAM: Engineers noted Google's robust search performance may be due to embedding the web in RAM using a distributed version of FAISS and refined indexing strategies like inverted indexes. The discussion delved into Google's infrastructure choices, hinting at methods for handling complex and precise search queries.

Sleuthing Google's Programming Paradigms: Participants dissected Google's use of programming strategies that include otherwise shunned constructs like global variables and goto, illustrating a pragmatic approach to problem-solving and efficiency in their systems.

Sparse Autoencoders Reveal Their Secrets: A new visualization library for Sparse Autoencoder (SAE) has been released, shedding light on their feature structures. Mixed reactions to categorizing SAE features in AI models reflect both the detailed complexities and the abstract challenges in AI interpretability.

New Horizons in Music AI: A paper examining GANs and transformers in music composition was discussed, hinting at potential future directions in music AI, including text-to-music conversion metrics. Meanwhile, gaps in lm-eval-harness benchmarks for Anthropic Claude models suggest a growing interest in comprehensive model evaluation frameworks.

Batch Size Trade-offs in GPT-NeoX: Tuning GPT-NeoX for uneven batch sizes may introduce computational bottlenecks due to load imbalances, as larger batches hold up processing speed.

Bonus Bullet for AI Sportsmanship: Suggestions were made for EleutherAI community engagement in the Kaggle AI Mathematical Olympiad competition. Compute grants could support these inclinations towards "AI in science" initiatives.

OpenAI Discord

ChatGPT Goes Unrestricted: OpenAI has introduced a new way to use ChatGPT instantly, enabling access without sign-up requirements, aiming to broaden AI accessibility and confirming that user interactions help enhance model performance, with optional data contribution.
Prompt Pondering and Managerial Models: Engineers discussed the efficacy of schema versus open-ended prompts in converting PDFs to JSON, raising concerns about potential Terms of Service breaches, and seeking advice on prompts to automate managerial tasks, including the division of directives and performance planning.
AI Creative Limits and Originality Inquisition: A comparison of different AI responses to song recognition challenges revealed the boundaries of AI's creativity, and a study pointed to AI demonstrating emergent behaviors, potentially offering original outputs not evident in their training sets.
Anticipating GPT-5 and Navigating GPT-4: Dialogues within the community reflected on the reflective capabilities of Large Language Models (LLMs), joked about April Fools' tech links, discussed GPT-4's advancements over Opus and server stability issues, and shared the use of DALL-E 3's image editing feature, with a nod towards the potential of the anticipated GPT-5.
AI Serving Diversity in Functions: Engineers are exploring various AI tools, like Claude 3 Sonnet and Midjourney, for image description, and discussing compatibility challenges with AI apps on devices such as the Samsung Galaxy Note 9, with solutions involving checking system versions or utilizing mobile web browsers as alternatives.

LlamaIndex Discord

Get Schooled on RAFT: A Retrieval-Augmented Fine-Tuning (RAFT) webinar with Tianjun Zhang and Shishir Patil is scheduled for Thursday at 9am PT, promising insights into RAFT's advantages over traditional fine-tuning in language models. Prep materials include RAFT blog posts and the full RAFT paper, with registration available here.
LlamaIndex's Call for Webinar Participation: LlamaIndex is hosting a webinar on RAFT, comparing it to taking an "open-book exam," and has shared a schematic diagram for constructing RAG frameworks using different tools which can be found here in a step-by-step guide.
Troubleshooting with LlamaIndex: There were reports of outdated LlamaIndex documentation, difficulties with the OpenAI.organization setting, and deprecated models like text-davinci-003. Also discussed was the use of WeatherReader for weather-related queries within RAG, and manual methods for handling images in PDFs using LlamaParse.
Question Over-Simplification in Agent-Based Systems: In the realm of creating a multi-document RAG system, one user highlighted an issue where the top_agent over-simplified the input question resulting in inadequate search outcomes. They shared details about incorrect narrowing of queries like "expiration date of chocolate," reducing it merely to "expiration date."
Tutorial Worth Watching: A user recommended a YouTube tutorial on building a RAG application using LlamaIndex, highlighting integration with Pinecone and Gemini Pro for content scraping, embedding conversion, and querying, which can be accessed here.

LangChain AI Discord

JSON Juggling Woes: Engineers are discussing challenges with parsing JSON in LangChain, where each line currently creates a separate Document instead of one Document with comprehensive metadata. The issue is detailed in the JSON loader documentation, but a solution has not been posted.
Token Tally Rising with Tool Use: There's a noted 50% increase in token usage when LangChain agents employ tools, attributed to the tools' process of data retrieval and tokenization. While the system prompt is executed once for inference assumptions, not all tools necessitate this.
LangGraph Labyrinth: Insights into utilizing a base model as the state in LangGraph were shared alongside a Github notebook example. Moreover, StructuredTool fields in LangChain can be validated using Pydantic's BaseModel and Field classes, as referenced in Github issues.
Fine-tuning Foil or Friend?: Dialogues around achieving structured output from a chain suggest employing two agents to balance specialized knowledge and general intelligence post fine-tuning. However, no clear consensus or strategy has been provided to address this challenge.
PDFs and PersonaFinders Proliferate: The discourse includes attempts to map content across PDFs using vector embeddings for matching paragraphs semantically, while a new release called PersonaFinder GPT promises conversational AI abilities based on identified personal attributes and invites testing on PersonaFinder Pro.

HuggingFace Discord

LinkedIn Badges: Stat or Fad?: A LinkedIn user flaunted having over 30 Top Voice badges, raising questions on the value of such accolades; LinkedIn's badges in question.
AI Hallucinates, Developers Take Notes: Software packages imagined by AI are being created and mistakenly utilized by major companies like Alibaba, showcasing a potential malware vector; more in The Register's coverage.
Billion-Token Models on the Horizon: Qdrant introduces FastLLM, capable of a 1 billion token context window, aimed at enhancing AI-driven content generation; dive into the details in their announcement post.
Depth in Diffuser Channels: Discussions focused on the intricacies of LoRA with diffusers, touching upon model queries without clear resolutions, and tackling the challenge of fine-tuning language models on PDF files without conclusive advice being provided.
Gradio 4.25 Debuts, Brings Enhanced UX: Gradio 4.25.0 rolls out features like auto-deletion of gr.State variables, cache_examples="lazy", a fix for streaming audio outputs, and a more intuitive gr.ChatInterface to streamline user interactions.

Modular (Mojo 🔥) Discord

Mojo Gets Mighty with MAX Engine: The imminent introduction of the MAX Engine and C/C++ interop in Mojo aims to streamline RL Python training, potentially allowing Python environments to be speedily re-implemented in Mojo, as detailed in the Mojo Roadmap. Meanwhile, Mojo 24.2 has excited developers with its focus on Python-friendly features, whose depth is explored in the MAX 24.2 announcement and the blog post on Mojo open-source.

Tune in to Modular's Frequencies: Modular's busy Twitter activity seems part of an outreach or announcement series, and details on their ideas can be tracked on Modular's Twitter for those interested in their updates or campaigns.

Tensors, Tests, and Top-level Code Talk: Open dialogue about the quirks and features of Mojo continued with insights like the need for improved Tensor performance, which was tackled by reducing copy initialization inefficiencies. Engineers also raised issues around top-level code and SIMD implementations, highlighting challenges like Swift-style concurrency and intrinsic function translations, with some guidance available in the Swift Concurrency Manifesto.

Unwrapping the CLI with Prism: The Prism CLI library's overhaul brings new capabilities like shorthand flags and nested command structures, harmonizing with Mojo's 24.2 update. Enhancements include command-specific argument validators, with the development journey and usability of References being a point of focus, as seen on thatstoasty's Prism on GitHub.

Deploy with MAX While Anticipating GPU Support: Questions about using MAX as a Triton backend alternative point to MAX Serving's utility, though currently lacking GPU support; documentation can guide trials via local Docker, found in the MAX Serving docs. Ongoing support and clarifications for prospective MAX adopters are discussed, emphasizing that ONNX models could fit smoothly into the MAX framework.

Nightly Mojo Moves and Documentation: Dedicated Mojo users were alerted about the nightly build updates and directed to use modular update commands, with changes listed in the nightly build changelog. Additionally, valuable guidelines for local Mojo stdlib development and best testing practices are documented, suggesting testing module use over FileCheck and pointing to stdlib development guide.

OpenInterpreter Discord

Miniconda Shrinks the Stack: Miniconda is validated as an effective, smaller substitute to Anaconda for those needing lighter installs without sacrificing functionality.
Call to Collaborate on OhanaPal: The OhanaPal app is an innovative tool leveraging OpenAI GPT APIs to aid neurodivergent individuals, with the developers seeking contributors for further brainstorming and prototyping. Interested parties can engage through their website.
3D Printing Size Adjustments for Gadgets: When 3D printing the O1 Light, scale the model up by 119.67% to properly accommodate an M5 Atom Echo, and a GitHub pull request #214 enhances the M5Atom with auto-reconnection features.
Windows Package Management Enhanced: A tip for Windows users: consider winget and scoop as viable tools for software package management, alongside the traditional Microsoft offerings.
Open Source AI Fosters Independence: The fabric repository on GitHub provides an open-source AI augmentation framework to solve specific problems using crowdsourced AI prompts, and Microsoft’s UFO (GitHub - microsoft/UFO) explores UI-Focused Agents for Windows interaction.

OpenRouter (Alex Atallah) Discord

Chatbot Prefix Quirk in OpenRouter: The undi95/remm-slerp-l2-13b:extended model is unexpectedly prefixing responses with {bot_name}: in OpenRouter during roleplay chats; however, recent prompt templating changes were ruled out as the cause. The usage of the name field in this scenario is under investigation.

SSL Connection Mystery: A connection attempt to OpenRouter was thwarted by an SSL error described as EOF occurred in violation of protocol, yet the community did not reach a consensus on a solution.

New Book Alert: Architecting with AI in Mind: Obie Fernandez has launched an early release of his book, Patterns of AI-Driven Application Architecture, spotlighting OpenRouter applications. The book is accessible here.

Nitro Model Discussion Heats Up: Despite concerns over the availability of nitro models, it's been affirmed that nitro models are still accessible and forthcoming. Confusion around the performance of different AI models suggests a prominent interest in optimizing speed and efficiency.

Model Troubleshooting & Logit Bias: Users encountered issues with models like NOUS-HERMES-2-MIXTRAL-8X7B-DPO and debated alternatives such as Nous Capybara 34B for specific tasks, noting its 30k context window for improved performance. Clarifications were made regarding OpenRouter logit bias application, which is currently limited to OpenAI's models only.

Mozilla AI Discord

NumPy's Unexpected Thread Behavior: A member was surprised to find that NumPy wasn't fully utilizing threads, which was confirmed by benchmarking code showing better performance with a custom matmul function. This highlighted NumPy's suboptimal multi-threading capabilities.
Prompting for llamafile Documentation: The impending release of llamafile 0.7 sparked conversations on prompt templating within openchat 3.5, revealing a need for better documentation to clear confusion among users. The community eagerly awaits clearer guidance on integration specifics.
TinyBLAS Offers a CUDA-Free Alternative: The discussion addressed TinyBLAS as an alternative for GPU acceleration, though it was noted that its performance is contingent upon the specific graphics card used. This option enables GPU support without needing the installation of CUDA or ROCm SDKs, which could significantly ease setup for some users.
Windows ARM64 Compatibility Hurdles with llamafile: Users inquiring about Windows ARM64 support for llamafile discovered that while the ARM64X binary format is supported, there are emulation issues with AVX/AVX2, a detail crucial to developers working within the Windows ARM64 ecosystem.
Local Deployment Troubles: Participants encountered an "exec format error" during local deployment of llamafile, sparking a troubleshooting discussion that included suggestions to switch from zsh to bash and details on the correct execution of Mixtral models dependent on hardware configurations.

OpenAccess AI Collective (axolotl) Discord

15 Billion Reasons to Consider AMD: MDEL's successful training of a 15B model on AMD GPUs suggests that AMD may be a viable option in the hardware landscape for large-scale AI models.

The Mystery of the Training Freeze: Post-epoch training hangs were reported without the apparent use of val_set_size or eval_table, with hints suggesting the cause could be due to insufficient storage or yet-unidentified bugs in certain models or configurations.

Axolotl Development Continues Amid Pranks: The Axolotl Dev team approved a PR merge for lisa, added a YAML example for testing, and jovially proposed an April Fool's partnership with OpenAI. However, there are issues with missing documentation and out of memory errors potentially related to DeepSpeed or FSDP training attempts.

Unified Data Dilemma: There's a significant effort to combine 15 datasets into a unified format, with members tackling hurdles from data volume to misaligned translations.

Rigorous Runpod Reviews Requested: Interest has been shown in the use of runpod serverless offerings for very large language models, seeking insights from community experiences.

Latent Space Discord

FastLLM Blasts into the AI Scene: Qdrant announced FastLLM (FLLM), a language model boasting a 1 billion token context window for Retrieval Augmented Generation, though skeptics suggest the timing of its announcement on April 1 may signal a jest.

Visualization for Understanding GPTs: A visual introduction to Transformers and GPTs by popular YouTube channel 3Blue1Brown has garnered attention among AI professionals looking for a clearer conceptual understanding of these architectures.

Engineers Build Open Source LLM Answer Engine: An open source "llm-answer-engine" project unveiled on GitHub has intrigued the community with its use of Next.js, Groq, Mixtral, Langchain, and OpenAI to create a Perplexity-Inspired Answer Engine.

Structured Outputs from LLMs Become Simpler: The engineering crowd noted the release of instructor 1.0.0, a tool aimed at ensuring Large Language Models (LLMs) produce structured outputs that conform to user-defined Pydantic models, assisting in seamless integration into broader systems.

Google Powers Up AI Division: In a pivot to bolster its AI offerings, Google has tapped Logan Kilpatrick to lead AI Studio and advance the Gemini API, signaling the tech giant's intensified commitment to becoming the hub for AI developers.

CUDA MODE Discord

Engaging with GPTs Made Easy: Members highlighted an educational video designed to explain the fundamentals of transformers and GPTs in an easy-to-understand format, intended for those new to the machine learning field.
Geek Dreams Realized: An ambitious project to create a homebrew GPU capable of running Quake was shared, demonstrating a successful FPGA design by an individual with a background in the gaming industry; further details can be found on their blog.
CPU Equals GPU? Not Quite: A post from Justine Tunney's blog was circulated discussing CPU matrix multiplication optimization tactics, noting the differences from GPU methods such as warptiling.
Triton Takes the Stage with Profiling: The use of Nsight Compute in profiling Triton code was a major topic, with insights on optimizing performance and specific commands like ncu --target-processes all --set detailed --import-source yes -o output_file python your_script.py shared for better developmental workflow. Key performance improvements were emphasized, referencing resources from Accelerating Triton.
Benchmarking Battles: Concerns were raised by the PyTorch team regarding recent benchmark comparisons with JAX and TensorFlow, prompting an official response, while a tweet by Jeff Dean presenting JAX as the fastest GPU performer for several tests stirred community discussion; the related benchmarking table is available here.

Interconnects (Nathan Lambert) Discord

RLAIF Could Boost Opus: It's speculated that applying Reinforcement Learning with Augmented Intermediate Features (RLAIF) could further enhance Opus by refining its decision-making accuracy.
Google's Bold AI Aspiration: A new AI product leader at Google announced their commitment to making Google the paramount destination for AI developers, supported by the AI Studio and Gemini API.
Advancements and Discussion in DPO: A recent preprint explores verbosity issues in Direct Preference Optimization (DPO) at large scales. The discourse also mentioned the rebuttal of a study on verbosity exploitation in Reinforcement Learning from Human Feedback (RLHF), available on arXiv.
A Veil Over AI Post-GPT-4: Post-GPT-4, AI communities notice a trend toward increased secrecy from companies sharing less about model intricacies, deviating from prior norms of transparency.

AI21 Labs (Jamba) Discord

Jamba's Speed Insight: Engineers scrutinized how Jamba's end-to-end throughput efficiency improves with more tokens during the decoding process. Some members questioned the increase, given decoding is sequential, but the consensus highlighted that throughput gains exist even as the context size increases, impacting decoding speed.

Decoding Efficiency Puzzler: A pivotal discussion unfolded around a graph showing Jamba's decoding step becoming more efficient with a larger number of tokens. Confusion was addressed, and it was elucidated that the higher throughput per token affects decoding phase efficiency, countering initial misconceptions.

PART 2: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (888 messages🔥🔥🔥):

Claude 3 Haiku Holds Its Own Against Opus: Users discussed the effectiveness of Claude 3 Haiku within Perplexity AI, analyzing how well it handles reasoning and trick questions compared to Opus, and its cost-effectiveness as a smaller, cheaper model.
Concerns Over Introducing Ads to Perplexity: There's speculation and concern among users regarding Perplexity AI's potential plans to introduce ads, especially in relation to the Pro subscription. The credibility of the news, possibly being an April Fools' joke, is being debated, with references to articles from AdWeek and Gadgets360 discussing advertising strategies.
Prevalence of Writing Mode Focus: Discussions centered around whether Writing focus mode within Perplexity is superior, with users suggesting it provides a better user experience and less problematic results than the All focus mode which encompasses web search. There is a clear preference for writing mode for its cleaner LLM interactions.
Prompt Attacks Security Concerns: A user inquired about how Perplexity AI secures its models, like Sonar, against prompt attacks and other security vulnerabilities. The conversation shifted towards the broader issue of protecting LLMs against policy violations due to poisoned data or prompt injections.
Gemini 1.5 Pro API Pricing Commentary: Users discussed the preview pricing of Gemini 1.5 Pro, which is noted to be expensive at $7 per million tokens for its 1 million token context ability. Conversations point to hopes for future price adjustments and the potential for tiered pricing based on context window usage.

Links mentioned:

Perplexity AI ▷ #sharing (19 messages🔥):

Detecting the Details of Stable Cascades: A member shared a link to Perplexity AI concerning the details and insights into Stable Cascades.
Exploring Neo's Uniqueness: Curiosity about what makes Neo stand out led to a search query on Perplexity AI.
Clarity on How to Proceed with Perplexity: A member's query redirected to Perplexity AI, while a reminder was given about ensuring threads are shareable with a Discord link for reference.
April Fools’ or Tech Innovation?: The line between humor and advancement was explored with a Perplexity AI search about April fool technology.
GPTDevil Collection Intrigues: One member pointed to the Perplexity AI collections about GPTDevil, signaling interest in the topic.

Perplexity AI ▷ #pplx-api (16 messages🔥):

Issue with Adding API Credits: A member reported a problem with the addition of API credits, stating that the process gets stuck on "Pending" and the invoice displays as "void."
Request for Retrieving Sources in API: A member inquired about the possibility of getting sources for prompts through the Perplexity API, similar to the feature available in the browser prompt.
Token Costs Comparison Resources: A member provided links for others who are looking to compare token costs for Perplexity models and ChatGPT, with Perplexity's pricing and OpenAI’s pricing available for review.
No Team Signup for Perplexity API: A member asked about the process for signing up a team for the Perplexity API and was informed that team signups are currently not an option.
Confusion Over Rate Limits: A member raised concerns regarding inconsistent rate limiting behavior when using the sonar-medium-online model, experiencing 429 errors despite adhering to the stated 20 requests per minute limit.

Links mentioned:

LAION ▷ #general (525 messages🔥🔥🔥):

Hugging Face Diffusers PR Discussion: Community members discussed a PR about disabling autocast for CUDA devices on the Hugging Face Diffusers GitHub. The conversation pivots to a critique of Hugging Face for not having unified code across different pipelines and trainers, pointing to efficiency and absurdity.
Persistent Issue with Merging Different SD Pipelines: Community members highlighted an ongoing issue captured in GitHub discussion #551 about merging different Stable Diffusion pipelines, noting the complication persists due to being decided to keep separate pipelines.
Criticisms of Hugging Face's Engineering Priorities: A discussion emerged criticizing Hugging Face's engineering work on Diffusers, relating to both a lack of enough engineers and too many 'AI code thought leaders,' as well as conflicting approaches like the adoption of microservices frameworks by the engineers.
PyTorch Version Specific Discussions: Community members had extensive technical discussions on pytorch versions, mentioning the silent addition of bfloat16 support in PyTorch 2.3 and the complexities of nightly builds. There were comments on autocast performance issues and possible fixes, details added on a GitHub thread, and the anticipation for the PyTorch 2.6 release.
AI Generated Images and Sampling Settings: The quality of images generated by various diffusion model versions and configurations were critiqued, with a particular focus on images of hammers. Differences in samplers and their configurations led to an exchange on the efficacy and correct use of these parameters.

Links mentioned:

LAION ▷ #research (9 messages🔥):

Introducing Gecko for Efficient Text Embedding: Gecko, a compact text embedding model, showcases strong retrieval performance by distilling knowledge from large language models into a retriever. It outperforms existing models on the Massive Text Embedding Benchmark (MTEB), with details available in the Hugging Face paper and the arXiv abstract.
Potential Application of Gecko in Diffusion Models: The conversation suggests exploring the use of Gecko to potentially accelerate diffusion model training, replacing the usage of T5. The discussion is speculative about the impact on model performance, especially in terms of embeddings.
Gecko Weights Inquiry: A member inquired if the weights for the aforementioned Gecko are available, indicating interest in its practical application.
Assessing Large Vision-Language Models: The MMStar Benchmark examines the efficacy of evaluating Large Vision-Language Models, pinpointing issues such as unnecessary visual content for problem-solving where text suffices.
Announcement of Aurora-M, a Multilingual LLM: The new preprint for Aurora-M, a 15.5B parameter, red-teamed, open-source, and continually pre-trained multilingual large language model, is introduced. It has processed over 2T training tokens and meets the guidelines of the White House EO, with more details found on Twitter and arXiv.
Improving Spatial Consistency in t2i Translations: Incorporating better spatial descriptions in captions during fine-tuning enhances the spatial consistency of images generated by text-to-image models. The study's results are detailed in an arXiv preprint.

Links mentioned:

LAION ▷ #learning-ml (1 messages):

Join the LangChain Event with Harrison Chase: Harrison Chase, the CEO and co-founder of LangChain, will be speaking at an upcoming event on the challenges companies face when moving from prototype to production using LangSmith. The online meetup is scheduled for April 17 at 6:30 PM and interested individuals can register here.
From Prototypes to Production, Harrison Chase Explains: The talk will cover the transition from easy-start GenAI apps to full production, highlighting the new challenges this brings. Chase's company, LangChain, focuses on simplifying the use of LLMs for developing context-aware reasoning applications.

Link mentioned: Meetup #3 LangChain and LLM: Using LangSmith to go from prototype to production, mer. 17 avr. 2024, 18:30 | Meetup: Nous avons le plaisir d'accueillir Harrison Chase, le Co-Founder et CEO de LangChain, pour notre troisième Meetup LangChain and LLM France ! Ne loupez pas cette occasion u

Unsloth AI (Daniel Han) ▷ #general (212 messages🔥🔥):

Discussions on Efficient Model Training: Members discussed strategies for forming conversational AI models, pondering whether to keep datasets split based on response times or use LLMs for higher quality samples. The consensus tilted towards starting simple before proceeding to more complex splitting.
Quantization and Fine-Tuning Practices Debated: Conversations revolved around the efficacy and performance loss involved in quantization methods like QLora versus Sparse Fine Tuning (SFT), where fine-tuning on blocks (LoRA/QLoRA) may not match the individual weight focus of SFT. Users shared insights into the balance between resource efficiency and model quality.
Cost Conversations: The community exchanged thoughts on the steep costs associated with model pre-training, with speculation on prices ranging from $50K for small-scale efforts to millions for larger datasets and more resources, highlighting a focus on cost-efficiency and the potential need for more affordable training mechanisms.
Prevention Against Scams and Malicious Bots Urged: Members noted an uptick in bots and scams targeting Discord servers, pointing out the risks of AI technologies being leveraged by scammers and the need for increased vigilance and better detection systems for fraudulent content.
Fine-Tuning Model Prompt Clarifications and Tutorials: Specific advice was given on fine-tuning prompts and model formats, with conversations about the need for clearer tutorials, video content on fine-tuning processes, and the utility of collaborative notebooks and GitHub resources for the community.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #help (311 messages🔥🔥):

Unsloth Update Breaks Inference: Users reported encountering size mismatch errors during inference with Unsloth AI after an update. A fix was applied, reverting changes, which resolved the issue for users.
Model Saving Challenges on Colab: A user struggling with saving a finetuned 13B model on limited storage available on Google Colab was advised to try Kaggle for its free access to 2x Tesla T4s. Another user suggested using the model.save_pretrained_gguf("model", tokenizer, quantization_method = "q5_k_m", first_conversion = "q8_0") method to save 8GB of space but noted a potential 0.1% loss in accuracy.
Jamba Model Support Speculation: Conversation about the complexity of adding Jamba model support to Unsloth, acknowledging the difficulty due to it being a Mamba and MoE model.
Finetuning Evaluation Clarification: A detailed response was given regarding the Unsloth SFT Trainer's evaluation process, explaining how to get evaluation metrics by explicitly passing the evaluation dataset and strategy.
Load Dataset Slowness and Potential IPv6 Issue: Users discussed significant delays when using load_dataset in local Jupyter notebooks, suspecting it might be related to IPv6 settings on Ubuntu systems. The same command was reported to work normally on Windows with WSL and IPv4.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #suggestions (4 messages):

DeepSeek Model Suggestion for Unslotsh 4bit: A member proposed adding the smallest DeepSeek model to Unslotsh 4bit, referring to it as a good base model, and provided the official Hugging Face repository for DeepSeek. Another member agreed with the suggestion and confirmed they would implement it.
Colab Notebook for Consideration: A shared Google Colab notebook was flagged for review, indicating that the material within should be implemented.

Links mentioned:

Stability.ai (Stable Diffusion) ▷ #general-chat (377 messages🔥🔥):

Cyberrealistic vs. EpicRealism XL: Participants discussed the Cyberrealistic and EpicRealism XL models in relation to realistic image generation. They found that while Cyberrealistic requires detailed prompts, EpicRealism XL produces better outcomes with more relaxed prompts.
SD3 Anticipation: There's community anticipation for SD3 release with a timeframe mentioned of 4-6 weeks from a previous announcement. Doubts are expressed regarding the release timing, with some users sharing their eagerness for the new version's capabilities and improvements, especially the fixed text function.
Face and appendage model challenges: Users described issues with face and hand rendering using Stable Diffusion, with various fixes like Adetailer and embeddings suggested. Efforts are made to find quick and reliable solutions that can minimize additional processing time during batch image generations.
CHKPT Model Guidance: Queries about guides to CHKPT models were shared due to the vast quantity available, seeking information on which models are best for specific purposes. Suggestions for specific models like ponyxl, dreamshaperxl, juggernautxl, and zavychroma were given as part of a stable diffusion checkpoint "starter pack."
Model Performance Discussions: Conversations spanned various topics including the speed of AI development, ethical considerations surrounding AI training with professional artwork, and the potential memory requirements for upcoming Stable Diffusion versions. There was also banter and jokes, highlighting the lighter side of community engagement.

Links mentioned:

Nous Research AI ▷ #off-topic (4 messages):

April Fools' Enthusiasm: A member expressed a desire to create a gguf that repeats "April fools" and to share it as a mock GPT4 named q0_k_xl.
DBRX New AI Unveiled: A video was shared titled "DBRX: A New State-of-the-Art Open LLM," which introduces DBRX as an open, general-purpose language model claiming to set new records on benchmarks. Watch the video here.

Link mentioned: DBRX: A New State-of-the-Art Open LLM: Introducing DBRX, an open, general-purpose LLM created by Databricks. Across a range of standard benchmarks, DBRX sets a new state-of-the-art for established...

Nous Research AI ▷ #interesting-links (18 messages🔥):

Surprising Error Rates for HF Transformers: Members shared surprise over the high error rates with HF transformers and mentioned using BetterTransformer with Distil whisper v3 large. The intention to consider WhisperX for future projects was expressed.
Anticipating the End of Web Silos: A link was shared discussing impending transformations of the web, including a shift in search engines from information retrieval to a more predictive and personalized approach, the need for revenue strategy overhauls due to interconnected digital ecosystems, and potential obsolescence of dedicated Web User Interfaces (UIs). Read about the impending digital transformations in Transforming the Web.
New Apple Paper on Reference Resolution: Discussion about a new Apple paper suggesting an 80M parameter model outperforms GPT-3.5 and a 250M parameter model beats GPT-4 in most benchmarks for reference resolution. The conversation noted the critical importance of reference resolution in the effectiveness of AI agents. Catch up on the Apple paper here.
Reference Resolution's Role in AI Accuracy: Continuing on the subject of reference resolution, it was highlighted that this could be a significant factor in AI agents committing errors during task execution. Further dialogue involved an interpretation that 80M parameter models performed surprisingly well on unseen tasks, potentially due to a high margin of error across models or similarities in accuracy.
Latest Reference from Twitter: A Twitter post highlighting newly released information was suggested for comparison in the ongoing discussions. It could be added as a further point of reference for model comparisons. Check out the new information on this Twitter post.

Links mentioned:

Nous Research AI ▷ #general (104 messages🔥🔥):

LlamaFile's Speed Leap: Justine Tunney announced that LlamaFile is now 1.3x - 5x faster than llama.cpp on CPU for many prompt/image evaluation use cases, which was detailed in their matrix multiplication blog post here.
Exploring PII Masking: A member shared a trending dataset on Hugging Face for PII (personally identifiable information) masking, which can be accessed here, while discussing the challenges of handling personal-sensitive data.
Debate on the Direction of Hermes Project: There's an ongoing discussion about the Hermes project's future direction, with some members highlighting the importance of the model handling PII well, and a reference to similar capabilities in smaller models like open-llama-3b.
Concern Over Instaban on Anthropic: A member expressed confusion over being instantly banned from the Claude AI site after logging in and out too quickly, speculating it may have appeared suspicious to the system.
Discussion on Language Model Programs: An extensive debate ensued on the potential of using solvers with DSPy to create highly compressed foundational models optimized for "Logical Coherence," including a mention of Stanford researchers interested in "The next step in LLM efficiency," all sparked by a video from Omar Khattab on DSPy which can be watched here.

Links mentioned:

Nous Research AI ▷ #ask-about-llms (37 messages🔥):

LLM Implementation Inquiry: A newcomer inquired about where to start learning to implement open-source LLMs like llama 2 into a website as a fine-tuned chatbot. No specific resources or solutions were provided in response.
Quantized LLM Training Quandary: In discussing the training of a 110m parameter model using Hugging Face's Llama architecture, a member expressed concerns over GPU memory access inefficiencies, noting the GPU spends ~97% of the time accessing memory. A sophisticated BitLinear implementation was shared, and the possibility that this issue could be caused by model size or next-token prediction demands was suggested.
Fine-Tuning Frustration: A user shared a configuration file for fine-tuning NousResearch/Hermes-2-Pro-Mistral-7B on a custom domain benchmark but found accuracy decreased post-fine-tuning. Items in the config, like lora_r, lora_alpha, and sequence_len were outlined, but no diagnosis was given for the observed accuracy drop.
Agent Inclusivity in Sample Time Reward Models: A conversation was initiated about the use of reward models during sample time, noting that better outcomes could sometimes be achieved through best-of-n sampling where a reward-tuned model doesn't always generate the optimal answer according to its own standards.
Supervised Fine-Tuning Scripts and Tokenizer Configurations Shared: Links to resources for fine-tuning the LLMs OLMo-Bitnet-1B and NousResearch/Hermes-2-Pro-Mistral-7B were provided, including discussions on handling special tokens and tokenizer configurations. A specific technique ensured tokenizer configs contained necessary tokens for primary functionality, and PRs for these configurations were merged.

Links mentioned:

Nous Research AI ▷ #project-obsidian (4 messages):

Traffic Signal Images for Vision Models: A dataset containing traffic signal images was shared, considered instrumental for structured output and tool-use with vision models. Note that the Hugging Face dataset viewer does not support this dataset due to the execution of arbitrary python code.
Dataset Development Interest: Members expressed interest in building a dataset based on the provided traffic signal images link. However, no specific progress or details about the dataset construction were shared.
Acknowledging Dataset Utility: A brief acknowledgment was made regarding the potential usefulness of the proposed traffic signal images dataset.

Link mentioned: Sayali9141/traffic_signal_images · Datasets at Hugging Face: no description found

Nous Research AI ▷ #bittensor-finetune-subnet (2 messages):

Upload to Chain Fails Due to Hugging Face Metadata: A member has reported an issue with uploading to the chain, where Hugging Face unexpectedly adds a safetensors.sharded = true/false key to the model metadata. This key is not accepted by the Hugging Face Python library's hf_api.py method, causing a failure to load ModelInfo necessary for pushing and validating models.
Seeking Solutions in Discord: The member experiencing the upload issue inquired if others are facing the same problem and asked for possible workarounds. Another member provided a link to a Discord message, but no additional context is available in this summary.

Nous Research AI ▷ #rag-dataset (5 messages):

Exploring the Scratchpad's Utility: An example was shared showing the use of <scratchpad> to gather evidence for RAG (retrieval-augmented generation) from the Claude prompt-engineering guide.
Debating Scratchpad Versus Full Context: A member questioned if using a scratchpad is better than including the full context in the prompt, to which another member responded negatively, suggesting that having the right context is still the preferred method.
The Role of Scratchpads in Workflows: Another member illustrated how scratchpads can be helpful by mentioning their workflow, which includes notes similar to a scratchpad intended for user interaction.

Nous Research AI ▷ #world-sim (110 messages🔥🔥):

Envisioning File Uploads & Local Cache: Members discussed the value of file uploading as a feature for WorldSim, suggesting it could enhance efficiency by running pre-written scripts. Another suggestion was maintaining a local cache to simulate file system navigation and the concept of a "Generative Virtual Machine (GVM)" for dumping files to maintain consistency.
WorldSim Easter Egg Uncovered: Users discovered an April Fools' Day easter egg within WorldSim. The easter egg is triggered when discussing morality and adds a playful twist to interactions.
Competitive WorldSim Challenges: A member proposed a concept akin to "LLM Coliseum," where a similar competitive benchmark involving WorldSim tasks could test LLMs against each other in a competitive setting, potentially even with an LLM acting as a judge for the competitions.
WorldSim Future Features and Roadmap Speculation: There was discussion on the potential for future WorldSim features, such as a competitive leaderboard, text to video integration, and input/output capabilities. Users express a desire for a publicly available roadmap or updates for transparency on upcoming developments.
WorldSim as a Platform for AI Battles: Ideas were shared about AI battles within WorldSim, where a "constellation of BBS's" could run games from various eras, with an emergent theme of unifying opposites and philosophical dimensions, including treasure hunts and alchemical lore.

Links mentioned:

tinygrad (George Hotz) ▷ #general (244 messages🔥🔥):

GPU Stability Nightmares: Users shared experiences of severe system instability when using AMD GPUs, highlighting issues such as memory leaks and non-recoverable errors after running benchmarks and stressing the GPUs. Errors like "amdgpu: failed to allocate BO for amdkfd" and "amdgpu: Failed to create process device data" were reported, indicating hardware/firmware level issues.
AMD's Invitation Viewed Skeptically: One user received an invitation to AMD's Vanguard program after bringing attention to GPU reset issues on AMD's subreddit. However, George Hotz and others express doubt about AMD's commitment to resolving underlying problems, emphasizing actions over words and stressing the importance of open-source documentation and sources for meaningful progress.
Perception of AMD within Tinygrad Community: There is palpable frustration with AMD's approach to software and drivers within the tinygrad community. Hotz predicts future regrets for large investments in AMD's MI300X due to poor testing and cultural resistance to modern software practices.
Approaches to Dealing with GPU Resets: Discussions surround workaround strategies like PCIe power cycling and redundancy to tackle the inability to reset AMD cards after crashes. Various anecdotes and potential solutions like "PCIe hotswap" or "GPUs in RAID 1" are humorously proposed, but ultimately signify the gravity of the issue.
Reflections on Software and Firmware Practices: There's an ongoing dialogue about the need for a fundamental cultural shift in AMD regarding their software and firmware practices. George Hotz speculates that with the right management and testing protocols, such as CI and fuzzing, it may be possible to replace the current complex firmware with something simpler and more robust.

Links mentioned:

tinygrad (George Hotz) ▷ #learn-tinygrad (31 messages🔥):

Linear uOps Exposed: A member shared a write-up on linear uops to assist others in understanding the intermediate representation used in tinygrad. They note that while the content is based on personal study notes, feedback and suggestions are welcomed.
Command Queue Clarification: There was a discussion around the new command queue implementation in tinygrad. A tutorial has been shared that explains changes following a recent merge, indicating that the command queue is a replacement for the "run_schedule" function.
Test Code Puzzles: A Pull Request to tinygrad raised questions about commented-out unittest code and backend checks. It was clarified and fixed in PR #4034, ensuring tests could be run with different backends like CLANG and OpenCL on Intel GPUs.
ShapeTracker Specs Scrutinized: A specification for a high-level shapetracker was briefly discussed, touching upon the topic of mergeability and expressing strides for shapes in mathematical notation (Z^n), which can be negative.
Findings on Jitted Functions: Members were trying to understand why jitted functions didn't appear in the command queue logs, discussing operational aspects of tinygrad's command queue and how it affects the execution of scheduled items.

Links mentioned:

LM Studio ▷ #💬-general (89 messages🔥🔥):

LM Studio Error Troubles: Members are experiencing errors with LM Studio, such as unknown exceptions during inferencing and crashes after a few messages with Quantized models, specifically citing issues with estopian maid 13B q4 on an RTX 3060 GPU.
Seeking Voice Understanding Models in LM Studio: A member inquired about models that understand voice directly, to which a response clarified that LM Studio requires a separate tool for speech-to-text since there's no TTS (Text-to-Speech) or STT (Speech-to-Text) functionality built in, mentioning whisper.cpp as an example.
Optimizing Context Length and Model Selection for Development: Discussions about how to manage context length in LM Studio and which models are best for software development surfaced, echoing that best practices and model choice can vary based on the user's hardware.
LM Studio Settings and Performance Queries: Users are seeking tips on how to increase performance, with advice given on how to set the model to use more GPU via the "GPU Offload" option in settings, and confirming that LM Studio can load locally stored GGUF files.
Updates, Downgrades, and Usage Help: Individuals are navigating issues with model loading, seeking ways to downgrade to a previous stable version of LM Studio, and looking for specific and potentially nonexistent features such as support for PKL models or running embedding models—both of which aren't supported at this time.

Links mentioned:

LM Studio ▷ #🤖-models-discussion-chat (44 messages🔥):

LM Studio Lacks File Input Features: Members are looking for a way to provide files to models in LM Studio, similar to OpenAI's API, but LM Studio currently lacks this feature. It has been noted that this capability is on their ToDo list.
No Local Qwen MoE support: Support for Qwen MoE in llama.cpp is a work in progress according to a shared GitHub pull request.
Privacy-Conscious Local LLM Solution Proposed: It's suggested that combining LM Studio with AnythingLLM can provide a private Local LLM + RAG chatbot experience, even though LM Studio itself lacks built-in document support.
7B Models Discussed for NSFW Content: Members are discussing the best performing uncensored 7b models, like Nous-Hermes 2 Mistral DPO and Nous-Hermes-2-SOLAR-10.7B, with a focus on how to handle NSFW content.
Tech Issues with Model Downloads and Execution: Some users reported difficulty downloading and running models locally, encountering silent failures potentially due to the lack of proxy support or trying to load large models without a GPU.

Links mentioned:

LM Studio ▷ #🧠-feedback (1 messages):

Regeneration Overwrites Previous Output: A member expressed concerns about continuing generation overwriting the existing assistant output instead of creating a new entry as it did before. They hope for a rollback to former functionality that split the text on every new generation.

LM Studio ▷ #🎛-hardware-discussion (80 messages🔥🔥):

GPU Support Confusion Cleared: Members clarified that SLI is not required for running multiple GPUs. One user's setup indicates that the system detects the combined VRAM, but actual VRAM utilization is limited to that of the primary GPU.
NVIDIA Tesla P40 Performance Insights: A shared Reddit post details that a dual Tesla P40 configuration can run a 70B parameter model with approximate performance of 3-4 tokens/second.
Resourceful Dual GPU Setup: One contributor described a budget-wise build using Triple P40 GPUs, revealing that 70B models run about 4 tokens/second at 8192 context length. The DIY build recommendations came from a Mikubox Triple-P40 guide.
VRAM Over Compute Power Debate: Discussions surfaced comparing the value of VRAM over compute power, with one user considering replacing a 4090 GPU with P40s for their higher VRAM, despite the 4090's superior performance.
Balancing Budget and Performance: Members joke about the trade-offs between affordable hardware and desired performance, comparing it to choosing low-end car upgrades that get the job done but compromise on comfort and quality.

Links mentioned:

LM Studio ▷ #autogen (1 messages):

Autogen Troubles with Token Limits: A member encountered an issue where Autogen is only generating about 2 tokens at a time and incorrectly assuming the agent is done. They're seeking advice on whether special configuration is necessary to make LM Studio work effectively with Autogen.

Eleuther ▷ #general (54 messages🔥):

Model Distillation and Claude3 Haiku Performance: Users discussed the distillation of larger models like Claude3 Opus into smaller, more efficient ones like Claude3 Haiku. Some are impressed by Haiku's performance, considering it might suffice for many use cases originally thought to require GPT-4.
Residual Block Discussion Sparks Technical Debate: A technical conversation emerged around why residual blocks in neural architectures often use two linear layers. Users explained that two layers with non-linearity increase expressiveness and allow for flexible parameterization.
AI Mathematical Olympiad Engagement: Mention of the Kaggle AI Mathematical Olympiad competition sparked a suggestion that the EleutherAI community could form groups to compete. Compute grants for "AI in science" could potentially support such initiatives.
Resource Sharing and Project Joining: New members introduced themselves, sharing their research interests in fields like alignment, privacy, fine-tuning of large language models, and autoformalisation. They are looking to contribute to projects and learn from the community.
Building a Credible Benchmarking Dataset: One user inquired about the necessity of a peer-reviewed paper when creating a benchmarking dataset for a new language, seeking advice on establishing credibility for their dataset.

Link mentioned: AI Mathematical Olympiad - Progress Prize 1 | Kaggle: no description found

Eleuther ▷ #research (111 messages🔥🔥):

Deciphering Google's Search Infrastructure: Discussion revolved around Google's ability to embed the entire web in RAM for rapid indexing and retrieval. Participants talked about the potential infrastructure, with comments stating that Google may use a distributed version of FAISS and operates primarily with data stored in RAM to ensure fast response times, essential to their operations.
Musing Over Google's Approach to Programming: In a further conversation about Google's technical strategies, it was mentioned that Google isn't afraid to utilize "bad" programming constructs like global variables or goto if they serve a purpose. There's also reference to utilizing thread local storage to streamline context handling in remote procedure calls.
Discussing the Limits of Text Indexing: Questions arose around how Google handles obscure text search queries that necessitate exact matches, leading to an explanation of Google's use of inverted indexes. Different indexing strategies, such as full-text search and inverted indexes, were considered for handling wide-ranging and exact match queries efficiently.
Suspense Over New Research Papers: There was anticipation for new papers being shared, with specific interest in the robustness of safety filters for LLMs. A link to recent research was provided, with a nod to the essential nature of continued exploration in the field, involving safeguarding against reverse-engineering or misusing language models.
Nonchalant Revelation of Sleek OSS Agent: A link was shared highlighting an open-source software agent from Princeton NLP named SWE-Agent, which claims to perform on par with proprietary agents like Devin in software engineering tasks. This piqued interest as an example of cutting-edge open-source contributions to NLP.

Links mentioned:

Eleuther ▷ #interpretability-general (4 messages):

Visualizing Sparse Autoencoder Features: A new visualization library for Sparse Autoencoder (SAE) features has been released and is praised for its usefulness in illuminating SAE feature structures in the context of AI alignment.
SAE Features Under the Spotlight: A forum post is shared which examines whether Sparse Autoencoder features are revealing properties of the model or just reflecting the data distribution, acknowledging the complexity of this question and its relevance to AI alignment.
Abstract Animal Architecture?: One member expressed bemusement at conceptualizing a house as something between 'concrete giraffe' and 'abstract giraffe', showing the complexity and sometimes whimsical nature of categorizing features in AI models.
Emojis Speak Louder Than Words: A member responded to the perplexity over categorizing features with a shrug emoji, 🤷‍♂️, indicating either confusion or acceptance of the inherent ambiguities in AI interpretability.

Links mentioned:

Eleuther ▷ #lm-thunderdome (18 messages🔥):

Music Composition AI Review: A link to an arXiv paper discusses current research on using GANs and transformers for music composition, focusing on style replication and transfer, with suggestions that future work could include metrics for text-to-music on a leaderboard, hinting at future directions for music AI evaluation.
Claude Benchmarked with lm-eval-harness?: A user inquired if there's a repository of lm-eval-harness results for Anthropic Claude models, indicating a gap and potential area for sharing findings in API support and task adaptation.
ValueError Troubleshooting in lm-eval-harness: A user reported persistent ValueError issues despite using DEBUG verbosity and was advised to share the YAML configuration file contents for further troubleshooting assistance.
Multilingual Capabilities Highlight Language Family Nuances: A member machine-translated the arc_challenge to various languages and found that even models not trained in a specific language can perform well if it's in a related language family, suggesting the use of more generative evaluations to test language capabilities.
LM Evaluation Task List Compiled: After discussion and debugging efforts, a user successfully compiled a list of lm-evaluation-harness tasks that require generate until functionality by searching GitHub, facilitating the discovery of suitable evaluation benchmarks.

Links mentioned:

Eleuther ▷ #multimodal-general (3 messages):

Seeking Multimodal Vision Research Groups: A member inquired about research groups or Discords focused on computer vision tasks like bias detection and emotion recognition for their Ph.D. proposal. They requested information on key papers and a broad view on bias in computer vision or multimodal applications.
LAION Suggested for Computer Vision: A participant responded by suggesting LAION as a community, but noted it might not have the specific subfocus of bias detection and emotion recognition in computer vision.
Discover Prisma's Multimodal Group: Another member mentioned the Prisma multimodal group as fairly active, though it does not specialize in the same subfocus sought after. They included a link to a tweet for reference.

Eleuther ▷ #gpt-neox-dev (2 messages):

Uneven Batch Sizes May Lead to Bottlenecks: A user pointed out that while it is possible to hack GPT-NeoX to use uneven batch sizes, doing so could cause load imbalance. The largest batches would bottleneck the process, as the system waits for the GPUs handling them.

OpenAI ▷ #annnouncements (1 messages):

ChatGPT without the Wait: OpenAI introduces the option to use ChatGPT instantly, without needing to sign-up, aiming to make AI accessible to a broader audience. The tool is already used weekly by over 100 million people in 185 countries.
No Account, No Problem: The company is rolling out this feature gradually, with a commitment to its mission of making AI tools like ChatGPT widely available.
Your Input Helps Improve AI: Users' interactions with ChatGPT may be used to improve model performance, but there is an option to opt out of this in the settings—even without creating an account. More details on data usage can be found in the Help Center.

Link mentioned: Start using ChatGPT instantly: We’re making it easier for people to experience the benefits of AI without needing to sign up

OpenAI ▷ #ai-discussions (95 messages🔥🔥):

Nuances in Fine-Tuning: Members discussed how using "\n" or line breaks could affect fine-tuning language models, noting LLMs learn what they're given, and that human-like formatting could be beneficial.
AI Song Recognition Falters: An AI comparison shows ChatGPT, Gemini, and Claude failed to correctly list songs referred to in Laurent Voulzy's song "Rockollection", prompting a user to highlight the differing responses and creative limits of the AIs when offered the correct list.
Exploring AI-Generated Content's Originality: There was a conversation around whether language models are just parroting content or if they can create original content. A study suggests AI can combine skills in ways not found in training data, highlighting that AI may exhibit emergent behaviors beyond simple reproduction.
AI Image Description Seekers: Users discussed their search for AI tools for describing images, mentioning Claude 3 Sonnet and Midjourney's /describe command, while also observing limitations in the effectiveness of such tools and their availability.
AI App Compatibility Issues: A user highlighted trouble accessing an AI app on a Samsung Galaxy Note 9, discussing potential compatibility or support issues, with suggestions to verify Android and Play Store versions as well as to use the mobile browser as a workaround.

Link mentioned: Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models: With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly co...

OpenAI ▷ #gpt-4-discussions (38 messages🔥):

Exploring LLM's Reflective Capabilities: Members discussed the possibility of prompting LLM to reflect internally, with one sharing insights that while LLM operates as a text predictor, structuring prompts effectively can avoid leaps in logic, with a reference to OpenAI's official guide on prompt engineering found here.
April Fools' Technical Jokes?: One member flagged a link presuming it to be an April Fools' joke, while another confirmed the functionality of the discussed feature as operational, despite not being able to provide a screenshot due to permission restrictions.
GPT Model Comparisons and Anticipation for GPT-5: The conversation included a commentary on GPT-4's perceived superiority over Opus, despite its larger context window, and expressed anticipation for GPT-5, suggesting that once it includes better reasoning, code interpretation, and internet access, it could become a strong contender.
Server Stability Issues Garner User Attention: Several members experienced issues with server stability, affecting their ability to log in and use services with one reporting a persistent "Error in input stream" for an extended period and soliciting known solutions.
Diverse AI Utilization and Development: Users shared their developments and experiences with different AI services, including a custom-built GPT for finding persona details and a discussion about the availability of the new image editing feature in DALL-E 3, linking to the official instructions here.

OpenAI ▷ #prompt-engineering (7 messages):

JSON Conundrums with PDFs: A member inquired about the best approach to convert a pdf into a JSON object using GPT and pondered whether a schema or an open-ended prompt would be more effective. Another user suggested always sending a schema, although they noted the process can be quite random in effectiveness.
TOS Warning Over PDF Conversion: It was pointed out that converting a PDF to JSON potentially violates the terms of service.
Call for Research Participants: Anna, a Computer Science graduate conducting research at the American University of Armenia, invited ML engineers, content creators, prompt engineers, and other language model users for a 20-minute interview to discuss challenges associated with large language models.
Seeking Manager Replacement Prompts: A member requested suggestions for effective prompts to replace managerial tasks, focusing on division of directives and performance planning for middle to C-suite management positions.

OpenAI ▷ #api-discussions (7 messages):

Choosing the Best JSON Approach: Members are discussing the optimal way to extract JSON from a PDF using GPT. While one member has tried specifying the JSON schema, another is experimenting with a more open-ended approach to let GPT capture as much data as possible.
Random Results in Schema Enforcement: In the process of converting documents to JSON, schema provision to GPT was addressed, with experiences indicating varying levels of success and an element of unpredictability in the results.
Understanding LLM Use-Cases: Anna, a recent graduate in her research phase, is seeking to discuss with ML engineers and other professionals on their experiences and challenges in using large language models, asking interested parties to direct message or respond for a potential meetup.
Exploring Manager Replacement Prompts: A member is seeking advice on good manager replacement prompts related to middle and C suite management tasks, such as dividing up directives and performance plans, hinting at potential advancements in automating managerial functions.

LlamaIndex ▷ #announcements (1 messages):

Dive into RAFT with LlamaIndex Webinar: Join LlamaIndex's special webinar on Retrieval-Augmented Fine-Tuning (RAFT) featuring lead co-authors Tianjun Zhang and Shishir Patil for an in-depth session. Register for the event scheduled for this Thursday at 9am PT here.
Understanding RAFT via Upcoming Webinar: The webinar will explore how RAFT combines the benefits of retrieval-augmented generation (RAG) and fine-tuning to improve language models' performance in domain-specific settings. Take part to learn from the experts behind this technique on Thursday at 9am PT.
Complementary Resources for RAFT Enthusiasts: For additional context on the RAFT methodology, check out the dedicated RAFT blog posts and access the full RAFT paper to prepare for the webinar.
Generate Your Own RAFT Dataset: Thanks to @ravithejads, you can now create a dataset for RAFT using the RAFTDatasetPack provided by LlamaIndex. Access the pack here and find the corresponding notebook on GitHub.

Links mentioned:

LlamaIndex ▷ #blog (4 messages):

RAFT Webinar Alert: LlamaIndex announced a new webinar focused on retrieval-augmented fine-tuning (RAFT), comparing it to taking an "open-book exam" and touting its benefits over traditional fine-tuning for language models.
Advanced PDF RAG Tutorial by Mesudarshan: A tutorial by @mesudarshan demonstrates building advanced PDF Retrieval-Augmented Generation (RAG) using LlamaParse and local models was featured, highlighting extraction as a crucial step. Models from GroqInc and FastEmbed by @qdrant_engine are utilized in the process, as detailed in this tweet.
Step-by-Step RAG Building Guide: @lambdaEranga's detailed schematic diagram for constructing RAG with local models including tools from @llama_index, @ollama, and @huggingface was shared, showing the utility of wrapping it all in a Flask server for development purposes. Here's the guide on Twitter.
RankLLM for Advanced RAG Reranking: RankLLM by @rpradeep42 et al., an open-source collection of LLMs fine-tuned for reranking, was recommended for those building advanced RAG systems. The significance of choosing the right reranker is emphasized, with the RankZephyr model getting a specific mention in the tweet.

LlamaIndex ▷ #general (118 messages🔥🔥):

Setting up LlamaIndex with OpenAI LLM: A user experienced issues when specifying the organization id for OpenAI LLM. The correct way is to either set openai.organization in the code or pass the organization parameter when initializing OpenAI with Settings.llm = OpenAI(organization="orgID",...).
Tackling Outdated LlamaIndex Documentation: Users reported frustrations with outdated tutorials and broken links to GitHub files. They are seeking guidance for structured data and natural language SQL queries, and a transition from deprecated classes like NLSQLTableQueryEngine to newer ones like SQLTableRetriever.
Customizing Agents and Error Troubleshooting: Discussions revolved around creating custom agents using OpenAIAgent.from_tools, handling errors such as 404 Not Found with server endpoints like ollama, and implementing features in RAG such as weather queries with WeatherReader.
Dealing with Deprecated OpenAI Models: A user encountered an error due to the text-davinci-003 model being deprecated. It was suggested to replace this model with gpt-3.5-turbo-instruct in their GPTVectorStoreIndex setup.
Image Handling in PDF with LlamaParse: Inquiries about handling images in PDF files were directed towards LlamaParse, a proprietary parsing tool capable of reading images, and alternative manual methods involving unstructured extraction and vector store summarization.

Links mentioned:

LlamaIndex ▷ #ai-discussion (4 messages):

Top Agent Simplification Issue: A member reported a problem while building a multi-document rag system using Agents where the top_agent oversimplifies the question. For instance, a query about the expiration date of chocolate gets reduced to just "expiration date," leading to unsatisfactory search results.
Specific Query Simplification Examples: The same member further illustrated the issue with another example, where the user asked for the expiration date of a fire extinguisher, but the agent only queried the retrieval engine with the term "expiration date."
IPEX-LLM and LlamaIndex Could Revolutionize Chat and Text Generation: A link to a Medium article titled "Unlocking the Future of Text Generation and Chat with IPEX-LLM and LlamaIndex" was shared, discussing the potential impacts these tools could have on the future of text generation and chat applications. Read the article here.
Tutorial Alert: Creating a RAG App with LlamaIndex: A member shared a YouTube video tutorial that provides a step-by-step guide for building a simple RAG application using LlamaIndex, Pinecone, and Gemini Pro. Key processes such as scraping content, converting to vector embeddings, storing on Pinecone index, and using LlamaIndex to query Gemini Pro are covered. Watch the tutorial here.

Link mentioned: How to build a RAG app using Gemini Pro, LlamaIndex (v0.10+), and Pinecone: Let's talk about building a simple RAG app using LlamaIndex (v0.10+) Pinecone, and Google's Gemini Pro model. A step-by-step tutorial if you're just getting ...

LangChain AI ▷ #general (109 messages🔥🔥):

Handling Complex JSON with LangChain: A user encountered difficulty when each JSON line created a Document instead of one Document with metadata for the full JSON. They inquired about a solution, but no follow-up was given, the original issue is described in the JSON loader documentation.
Increased Token Usage with Agents Using Tools: A user noticed a 50% increase in token usage with agents using tools. It was clarified that tools retrieve and tokenize data, hence more tokens are used, and the system prompt is run once for inference assumptions but not every tool requires it.
Discussions on LangGraph and Structured Tool Validation: Users discussed the potential to use a base model as the state in LangGraph and provided a Github notebook example. Also, instructions on using Pydantic's BaseModel and Field classes to validate a field in a StructuredTool in LangChain were shared from Github issues and LangChain documentation.
Issues with Structured Output and Fine-tuning: Users discussed problems related to obtaining structured output from a chain and the preservation of base knowledge after fine-tuning a model. A user suggested having two agents, a fine-tuned one and a regular GPT model, to maintain both specialized and general knowledge. No definitive solution to the structured output issue was posted within the conversation.
Mapping Content Between PDFs Using LangChain: A user was attempting to map related content between PDFs using a RAG with RetrievalQA chain and was advised to try using vector embeddings to match paragraphs based on semantic content. They also asked about handling images in LangHub, encountering a deserialization error, but again, no solution was provided within the conversation.

Links mentioned:

LangChain AI ▷ #share-your-work (5 messages):

Langgraph Advocated for Conversational Bots: A member praised langgraph for making it easy to implement cycles, highlighting its importance in creating advanced conversational taskbots. This feature sets it apart from other LLM app frameworks and will be further documented through community contributions like blog posts.
Custom Food Ordering with OpenGPTs: A user showcased the extensibility of OpenGPTs by integrating a custom food ordering API, demonstrating the platform's adaptability for custom AI applications. Feedback is sought on their YouTube demo titled "Hack OpenGPT to Automate Anything".
PersonaFinder GPT Released: A member has developed PersonaFinder GPT, a conversational AI that can provide information on individuals based on their name, country, and profession. The tool is available for testing at PersonaFinder Pro.
Call for Proficient Prompters to Test New Tool: There's a request for proficient prompters to test and provide feedback on a new tool designed for automated code transformations to maintain code standards and quality for production deployment. The tool is accessible here.
Kleinanzeigen Ad Shared: An ad on Kleinanzeigen for a picture was shared, though it seems unrelated to AI or projects on the LangChain AI Discord. The Mona Bild can be viewed here.

Links mentioned:

HuggingFace ▷ #general (79 messages🔥🔥):

LinkedIn's Virtual Badges Cause a Stir: LinkedIn user touts achieving over 30 Top Voice badges and shares a post link, with others questioning the actual benefits of collecting such badges.
AI's Fictitious Package Hazard: In a shocking find, The Register reports that a software package previously hallucinated by generative AI has been made real, leading several big businesses, including Alibaba, to erroneously incorporate it, potentially opening the door to malware threats if not for the benign nature of the test package.
Challenges in PPO Algorithm Performance: A member shares a curve indicating an issue with a Proximal Policy Optimization (PPO) algorithm, with others suggesting rollback to a previous checkpoint when the agent's performance declines.
Stable Diffusion's Evolution in AI Image Generation: Members discuss stable diffusion advancements with new variants like stable cascade and OOT diffusion, and the introduction of new tools such as ControlNet for more input control, while others highlight persistent performance issues related to VRAM requirements.
Integrating Language Models and Tackling Misconceptions: A user seeks help for integrating language models with their code base, discussing SDK utilization and expressing concerns about switching providers and prompt modification without recoding, while others approach chatbot development, seeking advice on avoiding AI disclosure from the responses.

Links mentioned:

HuggingFace ▷ #today-im-learning (1 messages):

docphaedrus: https://youtu.be/7na-VCB8gxw?si=azqUL6dGSMCYbgdg

HuggingFace ▷ #cool-finds (5 messages):

FastLLM Breaks the Billion-Token Barrier: FastLLM (FLLM), Qdrant's lightweight Language Model for Retrieval Augmented Generation (RAG), enters Early Access with an impressive context window of 1 billion tokens. It is specifically designed to integrate with Qdrant, heralding a revolution in AI-driven content generation and retrieval capabilities. Read more about it on their announcement post.
Reinforcement Learning with Entropy in Mind: An academic paper on Soft Actor-Critic, a method for Off-Policy Maximum Entropy Deep Reinforcement Learning, is shared, providing insights into stochastic actor approaches for reinforcement learning. The full text can be found on arXiv.
Finding the Right Open Source Status Page: A blog post on Medium introduces the 6 best open-source status page alternatives for 2024, offering insights for developers and teams looking to efficiently monitor and communicate their systems' status. The full article is available on Medium.
IPEX-LLM and LlamaIndex Lead the Way: A new Medium article discusses IPEX-LLM and LlamaIndex as potential game-changers in the realm of text generation and chat capabilities. The detailed piece on these advanced tools is accessible here.

Link mentioned: Introducing FastLLM: Qdrant’s Revolutionary LLM - Qdrant: Lightweight and open-source. Custom made for RAG and completely integrated with Qdrant.

HuggingFace ▷ #i-made-this (12 messages🔥):

<ul>
  <li><strong>Stream of Bot Conscience:</strong> Introducing <strong>LLMinator</strong>, a context-aware streaming Chatbot that enables running LLMs locally with Langchain and Gradio, compatible with both CPU and CUDA from HuggingFace. Check it out on <a href="https://github.com/Aesthisia/LLMinator">GitHub</a>.</li>
  <li><strong>Data Management Made Easier:</strong> DagsHub launches a new integration for Colab with DagsHub Storage Buckets, promising a better data management experience akin to a scalable Google Drive for ML. Example notebook is available on <a href="https://colab.research.google.com/#fileId=https%3a%2f%2fdagshub.com%2fDagsHub%2fDagsHubxColab%2fraw%2fmain%2fDagsHub_x_Colab-DagsHub_Storage.ipynb">Google Colab</a>.</li>
  <li><strong>Python's New Rival, Mojo:</strong> Speculations arise about the Mojo Programming Language surpassing Python in performance, as discussed in a YouTube video titled "Mojo Programming Language killed Python." Watch the full explanation <a href="https://youtu.be/vDyonow9iLo">here</a>.</li>
  <li><strong>Robotics Showcase:</strong> A member has built an advanced line follower and wall follower robot with a colour sensor, demonstrated in a YouTube video by SUST_BlackAnt. Find the full presentation <a href="https://www.youtube.com/watch?v=9YmcekQUJPs">here</a>.</li>
  <li><strong>Launch SaaS with OneMix:</strong> The new SaaS boilerplate OneMix claims to accelerate project launches by providing essentials like landing page, payment, and authentication setup. More details are available at <a href="https://saask.ing">saask.ing</a> and a demo on <a href="https://www.youtube.com/watch?v=NUfAtIY85GU&t=8s&ab_channel=AdityaKumarSaroj">YouTube</a>.</li>
</ul>

Links mentioned:

HuggingFace ▷ #reading-group (1 messages):

grimsqueaker: yay! thanks!

HuggingFace ▷ #computer-vision (7 messages):

Batch Size Equivalence Query: A member questioned if using batch size 32 with accumulation of 2 is comparable to batch size 64 for training different sizes of architectures, such as ConvNeXt.
Research Outreach in Quantum Neural Networks: A member shared they are conducting research on the performance of quantum neural network models on traditional image datasets and faced a hiccup.
Feature Extraction & Quantum SVM Inquiry: The member elaborated on their research, mentioning they extracted features using a transformer model and seeking advice for using these in a Quantum SVM (QSVM) for multi-class classification.
Seeking Quantum Kernel & Hyperparameter Guidance: Recommendations for choosing an appropriate quantum kernel and hyperparameters for QSVM were sought, specifically within Qiskit 1.0.2.
Open Collaboration Invitation: An interest in the QSVM research was expressed by another member, leading to an open invitation for direct messaging and potential collaboration.

HuggingFace ▷ #diffusion-discussions (8 messages🔥):

Triggering LoRA with Diffusers: A member inquired about how to trigger LoRA with diffusers for which the response provided guidance on using PEFT for inference including loading and managing adapters, especially LoRA, with the DiffusionPipeline.
Model Usage Confirmation: The same member followed up with another query about knowing if a model is being used, which did not receive a direct response within the provided messages.
Seeking Assistance with PDFs: A community member requested help for fine-tuning an open-source language model on PDF files, expressing challenges in this endeavor, yet no specific advice was offered in the chat record provided.
Checking In on Mistral: A check-in was made regarding updates to Mistral, but no new information or responses followed the inquiry.
Realtime Video Jitters in Technology Discussion: A community member shared observations of jitter and drift in a realtime video, questioning whether this could be due to rounding errors or a bug in the process, and hoping for insights into controlling this issue for better output.

Links mentioned:

HuggingFace ▷ #gradio-announcements (1 messages):

Gradio 4.25.0 Drops with Smoother Performance: Gradio's latest update introduces automatic deletion of gr.State variables to enhance performance for high-traffic demos and includes an unload event for actions when a browser tab is closed.
Lazy Example Caching Now Available: Gradio 4.25.0 adds a lazy example caching feature with cache_examples="lazy", especially benefiting ZeroGPU users by caching examples upon their first request rather than at server startup.
Streaming Audio Bug Squashed: An update has fixed a bug related to streaming audio outputs in the new Gradio version.
More Intuitive gr.ChatInterface: The gr.ChatInterface received upgrades, notably allowing the pasting of images directly from the clipboard.

Modular (Mojo 🔥) ▷ #general (8 messages🔥):

RL Integration Challenges in Mojo: A member enquired about the challenges in running Reinforcement Learning (RL) Python training in Mojo, specifically the use of a PyTorch environment within Mojo. They were informed about the upcoming MAX Engine and C/C++ interop in Mojo, as detailed in the Mojo Roadmap, which will allow for re-implementing PyTorch interfaces and speed up RL environment development and execution.
Mojo Doc Cheerleader: In response to a discussion about documentation, a member praised the new documentation for Mojo, saying it is quite comprehensive.
Mathematical Symbols in Mojo Variable Names: There was a question about Mojo's support for mathematical names similar to Julia. It was clarified that Mojo currently supports only ASCII characters for variable names and follows Python's conventions for variable names.
String Handling Curiosity in Mojo: A member was curious about why the division operator "/" isn't "Stringable" in Mojo, questioning if all string entities should inherently possess the Stringable trait.
Emoji Variable Naming Workaround: A different member pointed out that in Mojo, symbols (including emojis) can be used as variable names by enclosing them in backticks, providing an example where an emoji is used as a variable.

Link mentioned: Mojo🔥 roadmap & sharp edges | Modular Docs: A summary of our Mojo plans, including upcoming features and things we need to fix.

Modular (Mojo 🔥) ▷ #💬︱twitter (10 messages🔥):

Tweeting Spree from Modular: Modular shared a series of tweets, possibly as part of a campaign or event. Specific content of tweets was not provided.
Check Out Modular's Twitter: For updates and information, follow the series of posted tweets by Modular at their official Twitter using the provided links.

Modular (Mojo 🔥) ▷ #✍︱blog (1 messages):

MAXimum Mojo Momentum Unveiled: The latest release, MAX 24.2, has been made available for download, bringing a trove of new features specifically aimed at Python developers adopting Mojo. For more depth on these features, there are dedicated posts in the MAX 24.2 announcement and the Mojo open-source blog.

Link mentioned: Modular: What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more: We are building a next-generation AI developer platform for the world. Check out our latest post: What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more

Modular (Mojo 🔥) ▷ #🔥mojo (47 messages🔥):

Parallelism Challenge in Mojo: Mojo's handling of non-embarrassingly parallel problems is still under development with some considering it similar to Swift's concurrency model as outlined in the Swift Concurrency Manifesto.
Value Types vs. Identity: A discussion clarified that Mojo's value types, unlike Python, do not inherently have identity, meaning that the is operator might have different semantics, focusing on value equality rather than object identity.
Tensor Performance Inquisition: A comparison of tensor operations using Mojo's Tensor struct versus direct use of DTypePointer showed significant performance differences, which was attributed to inefficient copy initialization and was rectified by improving the implementation see gist for details.
Top-Level Code and Escaping Operator Mystery: Questions were raised about the implementation of top-level code in Mojo and the seeming absence of documentation on the "escaping" operator, with a member unable to find substantial information in the official docs.
SIMD Naïve Search Exploration: A member expressed the desire to implement SIMD Naïve Search as outlined in an academic paper, but faced uncertainty on how to translate SSE2 intrinsic functions into Mojo constructs.

Links mentioned:

Modular (Mojo 🔥) ▷ #community-projects (2 messages):

Refactoring Triumph for Prism CLI library: The Prism CLI library modeled after Cobra underwent significant refactoring with the 24.2 update, resulting in a slew of new features such as shorthand flag support and enhanced command structure, which now manages parent and child relationships within struct fields. The update also ensures commands can use customized positional argument validation functions; the library however comes with several built-in validators. Check out the details and examples on GitHub.
Easing the Reference Wrangle: The creator of Prism has signaled a strong interest in the evolution of References, citing them as a main challenge during development. Better usability around References is eagerly anticipated in future updates.

Link mentioned: GitHub - thatstoasty/prism: Mojo CLI Library modeled after Cobra.: Mojo CLI Library modeled after Cobra. Contribute to thatstoasty/prism development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #performance-and-benchmarks (4 messages):

Matrix Multiplication Mystery: A member encountered an error running matmul.mojo related to test_matrix_equal[matmul_vectorized](C, A, B); adjusting the tolerance fixed the issue, suggesting a problem with result consistency between implementations.
Rounding Suspected: By changing DType.float32 to DType.float64 at the top of the matmul.mojo file, the member was able to eliminate the error for some matrix elements but not all, indicating the error might be related to rounding.

Modular (Mojo 🔥) ▷ #⚡serving (7 messages):

Exploring MAX Beyond Triton: A member inquired about the potential benefits of MAX beyond just serving as a Triton backend. MAX Serving is described as a wrapper around MAX Engine which can be tried out using a local Docker container, with details available in the MAX Serving documentation.
Migration Clarification Sought: The same member asked about migrating from a current setup using Triton inference server with two models (a tokenizer and an ONNX/TensorRT model) to MAX, questioning whether the migration would be as simple as updating the backend in the config.
Assistance Offered for Migration to MAX: A rep offered help to the member contemplating the migration to MAX, expressing eagerness to understand the use case better and to support their pipeline's performance upgrade.
Details Matter in Migration: The rep asked for specifics regarding the member's setup, inquiring about how the tokenizer model was implemented and how the two models are connected, particularly if Ensemble Models or Business Logic Scripting features were being used.
Seamless ONNX, GPU Support Pending: While confirming that ONNX models would seamlessly work with a simple backend change in the config, the rep noted that MAX doesn't currently support GPU-hosted models, stating that it's being actively developed.

Link mentioned: Get started with MAX Serving | Modular Docs: A walkthrough showing how to try MAX Serving on your local system.

Modular (Mojo 🔥) ▷ #nightly (11 messages🔥):

New Nightly Mojo Build Released: The Mojo nightly build has been updated, and you can upgrade with modular update nightly/mojo. A changelog detailing the differences between the stable and new nightly builds can be found on GitHub.
Inspecting Differences Between Mojo Releases: To compare the differences between the Mojo releases, a diff link has been provided: Comparing releases on GitHub.
Local Development of Mojo Stdlib: If you are looking to test local modifications to the stdlib in Mojo, there's documentation available on how to develop and test it here, along with the use of the MODULAR_MOJO_NIGHTLY_IMPORT_PATH environment variable for configuration.
Testing Best Practices for Mojo: New tests in Mojo should prefer using the methods from the testing module over FileCheck for better practices.
Collaboration Channels for Contributors: There's a suggestion to use the current channel for general discussions among Mojo contributors, while specific issues should be moved to GitHub repositories.

Links mentioned:

OpenInterpreter ▷ #general (17 messages🔥):

Miniconda as a Lightweight Alternative: A member inquired if Miniconda could be an effective substitute for Anaconda due to its smaller size, which was confirmed to be a viable option.
OhanaPal Seeks Collaborators: OhanaPal, an app designed to assist neurodivergent individuals with everyday tasks and learning, is looking for community help in brainstorming and prototyping, mentioning they currently use OpenAI GPT APIs. Access to the prototype and more information can be found on their website.
Community Call for Involvement: Participants were reminded of the upcoming April House Party and invited to contribute ideas on how Open Interpreter can improve the human condition for everyone in a new designated channel.
Anticipation for Open Interpreter Mobile App: Discussion around the potential for an Open Interpreter iPhone app has been spurred by a Twitter post by Jordan Singer. Members expressed enthusiasm, and the project was encouraged for open-source development within the community.
Beginnings of a React Native App: A community member disclosed they have started developing a react native app for Open Interpreter, though it is not fully completed.

Links mentioned:

OpenInterpreter ▷ #O1 (45 messages🔥):

Scaling Up for a Perfect Fit: When 3D printing an O1 Light, it is recommended to scale the model up to 119.67% to fit the M5 Atom Echo properly into its slot. A user clarified that the sizing issue arises because the Atom must be removed from its case before installing.
Seamless Reconnection on M5Atom: A new pull request on GitHub (#214) addresses an update for the M5Atom, allowing it to automatically reconnect to the last successful WiFi and Server URL upon reboot.
Calling on Creativity for Mobile Control: The team at OpenInterpreter hinted at the potential for mobile phone control in the future and shared a GitHub repo (open-interpreter-termux) to get OI running on Android through Termux.
Revamped Windows Installation Docs: Updated instructions for installing OpenInterpreter on Windows were shared, detailing necessary tools like Git, virtualenv/MiniConda, Chocolatey, and Microsoft C++ Build Tools. For Linux users, discussions about various errors during the installation were noted, with a call to share those errors for collaborative troubleshooting.
Alternative Windows Package Managers: In addition to the traditional Windows package management tools, users were advised to consider winget (Microsoft's official package manager) and scoop as alternative options for Windows package management needs.

Links mentioned:

OpenInterpreter ▷ #ai-content (7 messages):

Exploring Open Interpreter: A member shared their YouTube video titled "Open Interpreter Advanced Experimentation - Part 2," which may contain new experiments with the Open Interpreter. The video is available at YouTube.
Fabric, the AI Augmentation Framework: A GitHub repository named fabric was introduced; it's an open-source framework for augmenting humans with AI. It utilizes a crowdsourced set of AI prompts for solving specific problems, accessible at GitHub - danielmiessler/fabric.
Microsoft's UFO for Windows OS Interaction: A member found Microsoft's UFO, a GitHub project described as a UI-Focused Agent for Windows OS Interaction. Questions arose if this is Microsoft's testing ground for implementing Open Interpreter (OI) on Windows, repository available at GitHub - microsoft/UFO.
Visual Intro to Transformers on YouTube: A video titled "But what is a GPT? Visual intro to Transformers | Deep learning, chapter 5" was shared, providing an introduction to transformers, the technology behind LLMs (Large Language Models). The video can be watched on YouTube.
Community Excitement for GPT Educational Content: Members expressed excitement about the educational content regarding transformers and GPTs. They shared their anticipation and approval with comments like "bookmarked!" and "Awesome 🚀".

Links mentioned:

OpenRouter (Alex Atallah) ▷ #general (66 messages🔥🔥):

Bot Name Prefix in Chatbot Responses: A user encountered responses starting with {bot_name}: from the undi95/remm-slerp-l2-13b:extended model when using OpenRouter for roleplay chat using the messages key and queried whether it was due to a prompt error or required text replacement. It was clarified that recent updates to prompt templating shouldn't have caused this and the issue was discussed further, exploring whether the name field was being used.
Error Connecting to OpenRouter: A user reported an SSL error (EOF occurred in violation of protocol) when trying to connect to OpenRouter, but no solution was directly offered in the chat.
Announcement of "Patterns of Application Development Using AI": Obie Fernandez announced the early release of his book, Patterns of AI-Driven Application Architecture, highlighting the use of OpenRouter.
Enquiry About Model Performance and Availability: Users discussed the performance of various models and the availability of nitro and non-nitro models, with one seeking the fastest options available after the unavailability of nitro models. It was confirmed that nitro models are still available, and more are on the way.
General Troubleshooting and Model Suggestions: Users shared experiences with model failures such as NOUS-HERMES-2-MIXTRAL-8X7B-DPO, and gave advice on alternative models for specific tasks like roleplay, with suggestions including Nous Capybara 34B equipped with 30k context window. Concerns about OpenRouter logit bias not working on certain models were addressed with an explanation that it's supported only on OpenAI's models.

Links mentioned:

Mozilla AI ▷ #llamafile (41 messages🔥):

Benchmarks and Revelation of Thread Utilization: A member expressed surprise at the significant improvements over NumPy, assuming it was already heavily optimized, and requested to see benchmarking code. The code shared utilizes both NumPy and a custom matmul function to demonstrate performance differences, revealing that NumPy does not use threads.
Eager Anticipation for New AI Updates: Discussion revolves around the release of llamafile 0.7 and attempts to use it with openchat 3.5. Members sought clarification on prompt templating and the use of variables within the UI, highlighting confusion due to a lack of documentation.
TinyBLAS vs Proprietary Libraries: In discussing llamafile's performance on CPU vs. GPU, it was stated that the --tinyblas flag can be used for GPU support without installing CUDA or ROCm SDKs, but performance may vary based on the graphics card.
Compatibility Queries for Windows ARM64: Discussion on Windows ARM64 compatibility with llamafile raised questions about support and binary formats, revealing that Windows on ARM supports PE format with ARM64X binaries, but has issues with AVX/AVX2 emulation.
Exercise and Troubleshooting in Local Deployment: Users encountered an "exec format error" when trying to run llamafile locally, with suggestions to use bash instead of zsh, and clarifications provided for running Mixtral models on specific hardware configurations.

Links mentioned:

OpenAccess AI Collective (axolotl) ▷ #general (6 messages):

MDEL Marches Ahead with AMD: MDEL has successfully trained a 15B model using AMD GPUs, marking a potentially interesting development in hardware utilization for large-scale models.
Mistral Opens Its Doors: The Mistral team invited community members to an office hour session for questions, signaling an open channel for dialogue and support.
Skepticism Over New Release: A member jokingly inquired if the v0.2 release is an April Fools' prank, reflecting community surprise or skepticism towards the update.
Dataset Unification Challenges: A contributor is working on unifying approximately 15 different datasets into TSV and pickle-formatted index files, facing challenges such as misaligned translations and the sheer data volume. They're considering the creation of a single, gigantic JSON of language pairs without weighting.
Seeking Runpod Experience: A user inquired about experiences with runpod serverless for very large language models (VLLM), suggesting interest in community knowledge on this service.

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (18 messages🔥):

Pull Request Good to Merge: The latest PR for lisa has been approved for merging, indicating that the changes made are considered solid and beneficial.
YAML Example Added and Run Initiated: An example YAML file has been added to the codebase, and a subsequent testing run has been initiated to monitor the new configuration's performance.
Axolotl's Prank Proposal: There's a tongue-in-cheek proposal for an April Fool's announcement claiming Axolotl has partnered with OpenAI, humorously suggesting that Axolotl can now fine-tune GPT-4 and future models.
DeepSpeed Out Of Memory (OOM) Issues: Developers are encountering out of memory errors when trying to train models with DeepSpeed or FairScale Single-Process Single-GPU (FSDP), particularly with Lisa changes.
Documentation Update Inquiries: A member observed updates to the Axolotl documentation but pointed out that the Table of Contents appears to be missing. Another member acknowledged the issue and plans to address it soon.

Links mentioned:

OpenAccess AI Collective (axolotl) ▷ #general-help (16 messages🔥):

Training Hangs Mysteriously: A member encountered a training issue where the process was stuck after the first epoch. They verified that the issue was not related to evaluation as val_set_size: 0 had been set, suggesting evaluation was not being performed.
Hunt for the Culprit in Training Freeze: Through discussion, it was suggested that the problem might be linked to lack of storage or potentially buggy features like eval_table, which is known for generating predictions and uploading to wandb during evaluation.
Eval Table: Buggy or Not?: One member clarified that they do not have eval_table enabled since they do not perform evaluation during training. A hint was given that this feature could be buggy.
Config Consistency Not Guaranteing Performance: The same member mentioned facing the training issue with some models but not with others, despite having used the same configuration, just changing the model name, adding to the mystery of the root cause.
Inference Prompt Practices Discussed: In a separate thread, it was advised to use the same prompt format during model inference as used in training, specifically in the context of instructions within user input and impact on few-shot prompting after fine-tuning.

Latent Space ▷ #ai-general-chat (29 messages🔥):

FastLLM Launches with Big Claims: Qdrant announced their new language model FastLLM (FLLM) designed for Retrieval Augmented Generation with a staggering context window of 1 billion tokens. The AI community highlighted this as potentially effective trolling, referring to its announcement on April Fools' Day.
New Instructional Gem on Transformers: A video by 3Blue1Brown titled "But what is a GPT? Visual intro to Transformers | Deep learning, chapter 5" received attention for offering a visual introduction to transformers and GPTs.
LLM Answer Engine Github Project Unveiled: An open source project titled "llm-answer-engine" on GitHub garnered interest for building a Perplexity-Inspired Answer Engine using a robust stack including Next.js, Groq, Mixtral, Langchain, and OpenAI.
Instructor Abstraction for Structured LLM Outputs: The release of instructor 1.0.0 was noted, which is a tool that ensures structured outputs from LLMs align with user-defined Pydantic models, simplifying the interaction and integration with other system modules.
Google Revs Up on AI with New Leadership: Logan Kilpatrick announced his move to Google to lead product for AI Studio and support the Gemini API, indicating a significant focus on making Google a prime location for developers in AI.

Links mentioned:

CUDA MODE ▷ #general (4 messages):

"But what is a GPT?" Video Shared: A YouTube video providing a visual introduction to transformers and GPTs was highlighted in the channel. It's aimed at those new to machine learning, offering an accessible stepping stone into the topic.
Homemade GPU Achieves New Milestone: A member discussed their personal project on developing a custom full-stack GPU capable of playing Quake after years of experience in the games industry. This FPGA design represents the culmination of a long-term engagement with the hardware aspect of rendering.
Justine's CPU Matmul Optimization Insights: A link was shared to Justine Tunney's blog which describes CPU matrix multiplication optimization and its similarities with GPU optimization, albeit without certain GPU-specific techniques like warptiling or explicit caching.

Links mentioned:

CUDA MODE ▷ #triton (6 messages):

Triton Code Profiling with Nsight Compute: A member shared they have seen profiling of Triton code using Nsight Compute, which allows viewing PTX and Python code simultaneously, as seen in the "Vectorized Load" section of Accelerating Triton. They inquired how to set up this insightful feature for their work.
Benchmarking Triton Kernels: The blog post discussed provides a thorough template for accelerating Triton kernels, mentioning a significant improvement in performance from 275us to 47us for a typical Llama style inference input.
Nsight Compute Trace Download Versus Remote Launch: One individual asked about the best practice for using Nsight Compute, questioning whether to generate a trace file and download it or to use a remote launch, hinting that setting up the remote launch could be cumbersome.
Command for Profiling Triton with Nsight: A helpful command provided for generating a detailed profile using Nsight Compute was shared: ncu --target-processes all --set detailed --import-source yes -o output_file python your_script.py. This allows for profiling and subsequent analysis of the Triton code.
Successful Profiling Setup Confirmation: A member confirmed the successful setup of Triton code profiling and expressed gratitude, indicating the utility and value of the profiling information obtained.

Link mentioned: Accelerating Triton Dequantization Kernels for GPTQ: TL;DR

CUDA MODE ▷ #cuda (3 messages):

Installation Troubles with Nsight DL Design on Ubuntu: A member detailed their attempt to install Nsight DL Design by running the .run file on Ubuntu, confirming the execution rights with chmod +x and using sudo, but was unable to find the DL Design application post-installation. They sought advice on how to open the DL Design app after installation.

CUDA MODE ▷ #torch (2 messages):

PyTorch Team Preparing a Response: PyTorch representative indicated concerns about issues with the benchmarks comparing JAX, TensorFlow, and PyTorch, stating that the Keras team did not consult them to address these issues. A response is currently being formulated.
Benchmark Showdown: A tweet from Jeff Dean was shared, highlighting key benchmarks from a given link showing that JAX is the fastest on GPU for the majority of tests, with TensorFlow also performing strongly, while PyTorch lagged behind on speed. The tweet references a recent benchmarking table, viewable here.

Links mentioned:

CUDA MODE ▷ #off-topic (1 messages):

c_cholesky: Thank u 😊

Interconnects (Nathan Lambert) ▷ #news (1 messages):

Potential for Opus Enhancement: A discussion hints at the possibility that Opus might gain additional benefits from further RLAIF (Reinforcement Learning with Augmented Intermediate Features) application, following an assumption of accurate Opus judgement.

Interconnects (Nathan Lambert) ▷ #random (2 messages):

New Captain for Google's AI Ship: An individual announced joining Google to lead product for AI Studio and support the Gemini API. They expressed determination to make Google the top destination for developers in AI with the sentiment: "I’m not going to settle for anything less."
Unexpected Power Move: The move was surprising to members of the channel, prompting reactions like "did not expect this move AT ALL."

Link mentioned: Tweet from Logan Kilpatrick (@OfficialLoganK): Excited to share I’ve joined @Google to lead product for AI Studio and support the Gemini API. Lots of hard work ahead, but we are going to make Google the best home for developers building with AI. ...

Interconnects (Nathan Lambert) ▷ #rl (5 messages):

New Preprint on DPO and Verbosity: A link to a new preprint was shared, highlighting the interplay between Direct Preference Optimization (DPO) and verbosity, with initial feedback pointing out issues faced when training on a LARGE scale.
Reinforcement Learning from Human Feedback Study: An arXiv preprint discusses the exploitation of biases in human preferences, specifically verbosity, in Reinforcement Learning from Human Feedback (RLHF) and explores this under-researched area in the context of Direct Preference Optimization (DPO).
Endorsement of Rafael's Work: A member mentioned that they "should prolly read" the preprint and acknowledged Rafael’s expertise in the field, noting that they have engaging conversations.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #sp2024-history-of-open-alignment (1 messages):

Shift to Secrecy Post-GPT-4: A discussion highlighted the movement towards secrecy among companies after the release of the GPT-4 technical report, which notably omitted model details, marking a change from open data sharing to guarded science.

AI21 Labs (Jamba) ▷ #jamba (8 messages🔥):

Jamba's Speed Puzzle: A member questioned how Jamba becomes faster with an increased number of tokens, specifically during the decoding step which is sequential. The graph referenced shows end-to-end throughput efficiency per token, leading to the confusion.
Decoding Speed Misconceptions: The discussion then moved to clarify that the depicted speed-up in the plot is also present during the decoding phase, not just encoding. Despite decoding being a sequential process, the throughput increases even as context size grows.