Evals are the "eat your vegetables" of AI engineering - everyone knows they should just do more of it:

Hamel Husain has yet another banger in his blog series: Your AI Product Needs Evals:

Like software engineering, success with AI hinges on how fast you can iterate. You must have processes and tools for:

Evaluating quality (ex: tests).

Debugging issues (ex: logging & inspecting data).

Changing the behavior or the system (prompt eng, fine-tuning, writing code)

Many people focus exclusively on #3 above, which prevents them from improving their LLM products beyond a demo. Doing all three activities well creates a virtuous cycle differentiating great from mediocre AI products (see the diagram below for a visualization of this cycle).

We are guilty of this at AINews - our loop is slow and hence the product improvement pace has also been much slower than we would want to see. Hamel proposes a mental model to center on evals:

Excerpts we liked:

You must remove all friction from the process of looking at data.
Many vendors want to sell you tools that claim to eliminate the need for a human to look at the data. but You should track the correlation between model-based and human evaluation to decide how much you can rely on automatic evaluation.
Eval Systems Unlock Superpowers For Free. In addition to iterating fast, having good evals unlock the ability to finetune and synthesize data.

The post has a lot of practical advice on how to make these "sensible things" easy, like using spreadsheets for hand labeling or hooking up LangSmith (which doesn't require LangChain).

Obligatory AI Safety PSA: OpenAI today released some samples of their rumored Voice Engine taking a 15s voice samples and successfully translating to different domains and languages. It's a nice demo and is great marketing for HeyGen, but more importantly they are trying to warn us that very very good voice cloning from small samples is here. Take Noam's word for it (who is at OpenAI but not on the voice team):

Alec Radford does not miss. We also enjoyed Dwarkesh's pod with Sholto and Trenton.

Table of Contents

[TOC]

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence. Comment crawling still not implemented but coming soon.

New Models and Architectures:

Jamba: The first production-grade Mamba-based model using a hybrid Transformer-SSM architecture, outperforming models of similar size. (Introducing Jamba - Hybrid Transformer Mamba with MoE)
Bamboo: A new 7B LLM with 85% activation sparsity based on Mistral's weights, achieving up to 4.38x speedup using hybrid CPU+GPU. (Introducing Bamboo: A New 7B Mistral-level Open LLM with High Sparsity)
Qwen1.5-MoE: Matches 7B model performance with 1/3 activated parameters. (Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters)
Grok 1.5: Beats GPT-4 (2023) in HumanEval code generation and has 128k context length. (Grok 1.5 now beats GPT-4 (2023) in HumanEval (code generation capabilities), but it's behind Claude 3 Opus, X.ai announces grok 1.5 with 128k context length)

Quantization and Optimization:

1-bit Llama2-7B: Heavily quantized models outperforming smaller full-precision models, with a 2-bit model outperforming fp16 on specific tasks. (1-bit Llama2-7B Model)
QLLM: A general 2-8 bit quantization toolbox with GPTQ/AWQ/HQQ, easily converting to ONNX. (Share a LLM quantization REPO , (GPTQ/AWQ/HQQ ONNX ONNX-RUNTIME))
Adaptive RAG: A retrieval technique dynamically adapting the number of documents based on LLM feedback, reducing token cost by 4x. (Tuning RAG retriever to reduce LLM token cost (4x in benchmarks), Adaptive RAG: A retrieval technique to reduce LLM token cost for top-k Vector Index retrieval [R])

Stable Diffusion Enhancements:

Hybrid Upscaler Workflow: Combining SUPIR for highest quality with 4x/16x quick upscaler for speed. (Hybrid Upscaler Workflow)
IPAdapter V2: Update to adapt old workflows to the new version. (IPAdapter V2 update old workflows)
Krita AI Diffusion Plugin: Fun to use for image generation. (Krita AI Diffusion Plugin is so much fun (link in thread))

Humor and Memes:

AI Lion Meme: Humorous image of a lion. (AI Lion Meme)
"No, captain! What you don't get is that these plants are Alive!!": Humorous text-to-video generation. ("No, captain! What you don't get is that these plants are Alive!! & they are overwhelming us... what's that? Yes! Marijuana Leaves! Hello Captain? Hello!!" (TEXT TO VIDEO- SDCN for A1111, no upscale))
"Filming animals at the zoo": Humorous image of a person filming animals. ("Filming animals at the zoo")

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

AI Models and Architectures

New open-source models released: @AI21Labs introduced Jamba, an open SSM-Transformer model based on Mamba architecture, achieving 3X throughput and fitting 140K context on a single GPU. @databricks released DBRX, setting new SOTA on benchmarks like MMLU, HumanEval, and GSM 8k. @AlibabaQwen released Qwen1.5-MoE-A2.7B, a MoE model with 2.7B activated parameters that can achieve 7B model performance.
Quantization advancements: @maximelabonne shared a Colab notebook comparing FP16 vs. 1-bit LLama 2-7B models quantized using HQQ + LoRA, with SFT greatly improving the quantized models, enabling fitting larger models into smaller memory footprints.
Mixture of Experts (MoE) architectures: @osanseviero provided an overview of different types of MoE models, including pre-trained MoEs, upcycled MoEs, and FrankenMoEs, which are gaining popularity for their efficiency and performance benefits.

AI Alignment and Factuality

Evaluating long-form factuality: @quocleix introduced a new dataset, evaluation method, and aggregation metric for assessing the factuality of long-form LLM responses using LLM agents as automated evaluators through Search-Augmented Factuality Evaluator (SAFE).
Stepwise Direct Preference Optimization (sDPO): @_akhaliq shared a paper proposing sDPO, an extension of Direct Preference Optimization (DPO) for alignment tuning that divides available preference datasets and utilizes them in a stepwise manner, outperforming other popular LLMs with more parameters.

AI Applications and Demos

AI-powered GTM platform: @perplexity_ai partnered with @copy_ai to create an AI-powered Go-To-Market platform offering real-time market insights, with Copy AI users receiving 6 months of Perplexity Pro for free.
Journaling app with long-term memory: @LangChainAI introduced LangFriend, a journaling app leveraging memory capabilities for a personalized experience, available to try with a developer-facing memory API in the works.
AI avatars and video generation: @BrivaelLp showcased an experiment with Argil AI, creating entirely AI-generated content featuring an AI Barack Obama teaching quantum mechanics, demonstrating AI's potential to revolutionize user-generated and social content creation.

AI Community and Events

Knighthood for services to AI: @demishassabis, CEO and co-founder of @GoogleDeepMind, was awarded a Knighthood by His Majesty for services to Artificial Intelligence over the past 15 years.
AI Film Festival: @c_valenzuelab announced the second annual AI Film Festival in Los Angeles on May 1 to showcase the best in AI cinema.

PART 0: Summary of Summaries of Summaries

1) New AI Model Releases and Architectures:

AI21 Labs unveils Jamba, a hybrid SSM-Transformer model with 256K context window, 12B active parameters, and 3X throughput for long contexts. Open weights under Apache 2.0 license.
Qwen1.5-MoE-A2.7B, a Transformer-based MoE model matching 7B performance with only 2.7B activated parameters.
LISA algorithm enables 7B parameter fine-tuning on 24GB GPU, outperforming LoRA and full training for instruction tasks.
SambaNova unveils Samba-1, a Composition of Experts (CoE) model claiming reduced compute needs and higher performance, though transparency concerns exist.

2) Open Source Collaboration and Community Projects:

Modular open-sources Mojo standard library under Apache 2 license, with nightly builds and a community livestream covering updates.
LlamaIndex blog shares optimizations like Int8 and Binary Embeddings for RAG pipelines by Cohere to reduce memory and costs.
Community showcases include Tinyllama and Mischat models leveraging Unsloth notebooks.
OpenGPTs project gets a dedicated <#1222928565117517985> channel for collaboration on the open-source platform.

3) Model Evaluation, Benchmarking and Datasets:

Discussions on reliability of existing AI benchmarks like HuggingFace Chatbot Arena which includes human evaluation.
Sharing of long context datasets like MLDR on HuggingFace for training models requiring extensive sequences.
Plans for translation model battle between DiscoLM, Occiglot, Mixtral, GPT-4, DeepL, Azure Translate on Capybara dataset.
Shure-dev mission to curate high-quality LLM research papers as a resource for the rapidly evolving field.

4) Local LLM Deployment and Hardware Optimization:

Guidance on running LLMs locally with LM Studio, Open Interpreter, and hardware like RTX 3090 GPUs.
Recommendations for power supply sizing (1200-1500W) and efficient GPU power limits (350-375W for RTX 4090).
Discussions on VRAM requirements for training large models like LLaMa 7B with QLoRA and LoRA.
Windows Subsystem for Linux (WSL2) suggested for running CUDA/Torch on Windows gaming machines.

PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord

Supercharging SD3 with Turbo Techniques: Enthusiasm buzzes around SD3 Turbo to provide a high-detail model similar to SDXL Lightning, with discussions hinting at promising performance on 12GB VRAM systems.
Balancing Power & Accessibility in AI Art: Debate centers on potential SD3 feature downscaling for broader accessibility; suggestions include providing control nets and fine-tuning tools upon launch to empower users.
Training LoRA with SD3 Under the Microscope: The likelihood of training LoRA models using SD3 with 24GB of VRAM is mulled over, with questions about the transition of these tools to SD3 after its release remaining unanswered.
Smaller GPUs Tackling Big Training Ambitions: Success stories of training smaller LoRA models on 16GB VRAM GPUs emerge, highlighting optimisations such as Aggressive train rates and network dropouts.
Arc2Face Steals the Spotlight: Arc2Face entices members with its face manipulation prowess amid jests about alien technology and dataset censorship controversies.

Perplexity AI Discord

DBRX Takes the Limelight: DBRX, a newfound heavyweight in language models by Databricks, steals the show at Perplexity Labs, outshining GPT-3.5 and neck-to-neck with Gemini 1.0 Pro on tasks demystifying math and coding challenges. A green light is given to test-run DBRX for free at Perplexity Labs, throwing the gauntlet down for AI connoisseurs.

Mark Your Calendars for Copy AI and Perplexity Alliance: The fusion of Copy AI's platform with Perplexity's state-of-the-art APIs points to an uprising in go-to-market strategies, lighting the path for real-time market acumen. For users leveraging Copy AI, a half-year access to Perplexity Pro cuts the ribbon, and it's all spelled out in their co-authored blog post.

In Search of Perfection with Perplexity: Users are scratching their heads over the hit-or-miss performance of academic focus mode in Perplexity's search capabilities, puzzled by intermittent outages. Improvement in the Pro Search spaces and conflicting tales of file sources dominated discussions, with a spotlight on possibly employing RAG or GPT-4-32k technology for diverse file processing.

Tuning into Enhanced Scratchpad Tactics: The community exchanges notes on drawing out the best from Perplexity; one user gives a hands-on demo using <scratchpad> XML tags, and space enthusiasts fling questions at the AI about Starship and astronautics. Users also threw in finance-flavored queries, probing into Amazon's monetary moves and the FTX conundrum.

API Adventures and Misadventures: Queries abound regarding the Perplexity AI API's unpredictable behavior, where search results are sometimes lost in the web/API rift while hunting for answers, veering off from the steadiness promised on the web interface. For those thirsting for beta feature participation, including coveted URL citations, a beeline can be made to Apply for beta features, keeping API fanatics at the edge of their seats.

Unsloth AI (Daniel Han) Discord

Jamba Juices Up AI Debate: The discussion was sparked by the announcement of Jamba by AI21 Labs, a model that integrates Mamba SSM with Transformers. Engineers debated Jamba's capacity, tokenization, and lack of transparency in token counts.
Quantifying Model Behaviors: Conversations arose around evaluating open-ended generation in models. To assist, various resources, including the paper Are Emergent Abilities of Large Language Models a Mirage? available here, were shared to provide clarity on perplexity and pairwise ranking metrics.
Fine-Tuning Face-Off: There was robust sharing of fine-tuning strategies on models like Mistral, involving SFTTrainer difficulties and optimal epochs and batch sizes discussion. Technical difficulties in multi-GPU support and dependencies like bitsandbytes were flagged for resolution.
Karpathy's Kernel of Wisdom: Andrej Karpathy's YouTube deep-dive likened LLMs to operating systems, stirring talks on finetuning practices and the importance of using original training data. He voiced criticism of current RLHF techniques, contrasting them with AlphaGo, and suggested a fundamental shift in model training approaches.
Training Trials and Successes Shared: Insights were exchanged regarding tokenizer errors, finetuning pitfalls, checkpointing methods, and best practices to avoid overfitting. Practical guidance was given on pushing local checkpoints to Hugging Face and the usage of hub_token and save_strategy.
Innovative Model Instances Introduced: The community showcased new models, such as Tinyllama and Mischat, illustrating how Unsloth's notebooks and a Huggingface dataset can lead to novel LLMs. A member promoted 'AI Unplugged 5,' a blogpost covering diverse AI topics found at AI Unplugged.

LM Studio Discord

AI's Pricey Playground: AI hobbyists are voicing concerns over the high cost of entry, with investments closing in on $10k for necessary hardware like cutting-edge GPUs or MacBooks to ensure optimal performance. Discussions reveal a "hard awakening" to the complex, jargon-heavy, and resource-intensive nature of the field.
GPU Talk – Power, Performance, and Particulars: Engagements in hardware chat center on optimal power supply sizing with recommendations such as 1200-1500W for high-performance builds and efficient power limit settings. There's also interest in the practical performance benefits of legacy hardware like Nvidia K80, alongside advanced cooling hacks using old iMac fans.
Model Performance and Compatibility Challenges: Conversations pinpoint VRAM choke points and compatibility, noting the inadequacy of 32GB VRAM for upcoming models and discussing model performance, with one mention of Zephyr's superiority in creative writing tasks over Hermes2 Pro. Users are also troubleshooting issues with GPUs proper recognition and utilization with LM Studio's ROCm beta version and grappling with the selection of coder models for constrained VRAM.
LM Studio Talk – Updates, Issues, and Integrations: The latest LM Studio version 0.2.18 rollout, which introduces new features and bug fixes, is being dissected, with users highlighting discrepancies in VRAM display and version typings and requesting features like GPU and NPU monitoring tools. A member shared their successful integration of LM Studio into Nix, available as a PR. Plus, a repository for a chatbot OllamaPaperBot designed to work with PDFs was shared on GitHub.
Rethinking Abstraction with AI: There is exploration into using AI to distill abstract concepts, with a reference to the paper on Abstract Concept Inflection. A request was made for guidance on choosing the right Agent program to plug into LM Studio, aiming for a seamless integration.

OpenAI Discord

Voice Cloning Sparks Heated Debate: Discussions emerged around OpenAI's Voice Engine, with some excited by its potential to generate natural-sounding speech from a 15-second audio sample, while others raised ethical concerns about the tech's misuse.

Confused Consumers and Missing Models: Confusion reigns among users regarding different versions of GPT-4 implemented in various applications, with contradictory reports about model stability and cutoff dates. Meanwhile, anticipation for GPT-5 is rife, yet no concrete information is available.

Encounters with Errant Equations: Users across multiple channels grappled with transferring LaTeX equations into Microsoft Word, proposing MathML as a potential solution. The intricacies of proper prompt structuring for specific AI tasks, like translations maintaining HTML tags, also took center stage.

Meta-Prompting Under the Microscope: AI enthusiasts debated the merits of metaprompting over direct instructions, with experiences suggesting inconsistent results. Precise prompts were underscored as pivotal for optimized AI performance.

Roleplay Resistance in GPT: A peculiar behavior was noted with the gpt-4-0125-preview model regarding roleplay prompts, with the AI refusing to role-play when an example format was given, yet complying when the example was omitted. Users shared workarounds and tactics to guide the AI's responses.

Eleuther Discord

New Fine-Tuning Frontiers: LISA, a new fine-tuning technique, has outshined LoRA and full-parameter training in instruction following tasks, capable of tuning 7B parameter models on a 24GB GPU. LISA's details and applications can be explored through the published paper and its code.

Chip Chat heats up: AI21 Labs revealed Jamba, a model fusing the Mamba architecture and Transformers, with a claim of 12B active parameters from a 52B total. Meanwhile, SambaNova introduced Samba-1, a Composition of Experts (CoE) model alleging reduced compute needs and higher performance, though transparency concerns persist. Details about Jamba can be found on their official release page, and scrutiny over Samba-1's performance is encouraged via SambaNova's blog.

Sensitive Data Safety Solutions Discussed: Techniques for safeguarding sensitive data in training, including SILO and differential privacy methods, formed a topic of serious discussion. Researchers interested in these topics can examine the SILO paper and [differential privacy papers](https://arxiv.org/abs/1607.00133, https://arxiv.org/abs/2110.05679) for more insights.

Discrepancy Detective Work in Model Weights: Discordants untangled differences in model weight parameters between Transformer Lens (tl) and Hugging Face (hf). The debugging process involved leveraging from_pretrained_no_processing to avoid preset weight modifications by Transformer Lens, as elucidated in this GitHub issue.

MMLU Optimization Achieved: Efficiency in MMLU tasks has been boosted, enabling extraction of multiple logprobs within a single forward call. A user reported memory allocation issues when attempting to load the DBRX base model on incorrect GPU configurations, corrected upon realizing the node configuration error. Further, a pull request aimed at improving context-based task handling in the lm-evaluation-harness awaits review and feedback after the CoLM deadline.

Nous Research AI Discord

A Peek at AI21's Transformer Hybrid: AI21 Labs has launched Jamba, a transformative SSM-Transformer model with a 256K context window and performance that challenges existing models, openly accessible under the Apache 2.0 license.

LLMs Gearing Up with MoE: The engineering community is charged up about microqwen, speculated to be a more compact version of Qwen, and the debut of Qwen1.5-MoE-A2.7B, a transformer-based MoE model that promises high performance with fewer active parameters.

LLM Training Woes and Wins: Engineers are troubleshooting issues with the Deepseek-coder-33B's full-parameter fine-tuning, exploring structured approaches for a large book dataset, and peeking at Hermes 2 Pro's multi-turn agentic loops. Meanwhile, they're diving into the significance of 'hyperstition' in expanding AI capacities and clarifying heuristic versus inference engines in LLMs.

RAG Pipelines and Data Structuring Strategies: To boost performance and efficiency in retrieval tasks, AI engineers are exploring structured XML with metadata and discussing RAG models. A mention of a ragas GitHub repository indicates ongoing enhancements to RAG systems.

Worldsim, LaTeX, and AI's Cognitive Boundaries: Tips and resources, like the gist for LaTeX papers, are being exchanged on the Worldsim project. Engineers are considering the potential of AI to delve into alternate history scenarios, while carefully differentiating between large language model use-cases.

With these elements converged, engineers are evidently navigating the challenges and embracing the evolving landscape of AI with a focus on efficiency, structure, and the constant sharing of knowledge and resources.

Modular (Mojo 🔥) Discord

Mojo Gets Juiced with Open Source and Performance Tweaks: Modular has cracked open the Mojo standard library to the open-source community under the Apache 2 license, showcasing this in the MAX 24.2 release. Enhancements include implementations for generalized complex types, workshop sessions on NVIDIA GPU support, and a focus on stabilizing support for MLIR with the syntax set to evolve.

Hype Train Gathers Steam for Modular's Upcoming Reveal: Modular is stoking excitement through a series of cryptic tweets, signaling a new announcement with emojis and a ticking clock. Community members are keeping a keen eye on the official Twitter handle for details on the enigmatic event.

MAX Engine's Leaps and Bounds: With the MAX Engine 24.2 update, Modular introduces support for TorchScript models with dynamic input shapes and other upgrades, as detailed in their changelog. A vivid discussion unfolded around performance benchmarks using the BERT model and GLUE dataset, showcasing the advancements over static shapes.

Ecosystem Flourishing with Community Contributions and Learning: Community projects are syncing up with the latest Mojo version 24.2, with an expressed interest in creating deeper contributions through understanding MLIR dialects. Modular acknowledges this enthusiasm and plans to divulge more on internal dialects over time, adapting a progressive disclosure approach towards the complex MLIR syntax.

Teasers and Livestreams Galore: Modular is shedding light on their recent developments with a livestream on YouTube covering the open sourcing of Mojo's stdlib and MAX Engine support, whereas tantalizing teasers in the form of tweets here sustain high anticipation for impending announcements.

HuggingFace Discord

Quantum Leaps in Hugging Face Contributions: New advancements have been made in AI research and applications: HyperGraph Representation Learning provides novel insights into data structures, Perturbed-Attention Guidance (PAG) boosts diffusion model performance, and the Vision Transformer model is adapted for medical imaging applications. The HyperGraph paper is discussed on Hugging Face, while PAG's project details are on its project page and the Vision Transformer details on Hugging Face space.

Colab and Coding Mettle: Engineers have been sharing tools and tips ranging from the use of Colab Pro to run large language models to the HF professional coder assistant for improving coding. Another shared their experience with AutoTrain, posting a link to their model.

Model Generation Woes and Image Classifier Queries: Some are facing challenges with models generating infinite text, prompting suggestions to use repetition penalty and StopCriterion. Others are seeking advice on fine-tuning a zero-shot image classifier, sharing issues and soliciting expertise in channels like #NLP and #computer-vision.

Community Learning Announcements: The reading-group channel's next meeting has a confirmed date, strengthening community collaboration. Interested parties can find the Discord invite link to participate in the group discussion.

Real-Time Diffusion Innovations: Marigold's depth estimation pipeline for diffusion models now includes a LCM function, and an improvement allows real-time image transitions at 30fps for 800x800 resolution. Questions on the labmlai diffusion repository indicate ongoing interest in optimizing these models.

LAION Discord

Voices for AI Training Debated: Some participants expressed concern that professional voice actors may shy away from contributing to AI projects like 11labs due to negative sentiments towards AI, suggesting that amateur voices might suffice for training purposes where emotional depth isn't crucial.
Benchmarks Under Scrutiny: There is criticism regarding the reliability of AI benchmarks, with the HuggingFace Chatbot Arena being recommended for its human evaluation element. This conversational note calls into question the effectiveness of prevalent benchmarking methods in the AI field.
Jamba Ignites Hype: AI21 Labs has introduced Jamba, a cutting-edge model fusing SSM-Transformer architecture, which has shown promise in excelling at benchmarks. The discussion revolved around this innovation and its implications on model performance (announcement link).
Confusion and Clarification on Diffusion Models and Transformers: A member pointed toward a YouTube video that they believe misinterprets how diffusion transformers work, particularly concerning the use of attention mechanisms (YouTube video). Calls were made for better explanations of transformers within diffusion models, highlighting the need for simpler "science speak" breakdowns.

LlamaIndex Discord

Intelligent RAG Optimizations Emerge: Cohere is enhancing Retrieval-Augmented Generation (RAG) pipelines by introducing Int8 and Binary Embeddings, offering a significant reduction in memory usage and operational costs.
PDF Parsing Puzzles Addressed by Tech: While parsing PDFs poses more complexity than Word documents, the community spotlighted LlamaParse and Unstructured as useful tools to tackle challenges, particularly with tables and images.
GenAI Gets a Data Boost from Fivetran: Fivetran's integration for GenAI apps simplifies data management and streamlines engineering workflows, as detailed in their blog post.
Beyond Fine-Tuning: Breakthroughs in Model Alignment: RLHF, DPO, and KTO alignment techniques are proving to be advanced methodologies for improved language generation in Mistral and Zephyr 7B models, per insights from a blog post on the topic.
LLM Repository Ripe for Research: The mission of Shure-dev is to compile a robust repository of LLM-related papers, providing a significant resource for researchers to access a breadth of high-quality information in a rapidly evolving field.

OpenInterpreter Discord

International Shipping Hacks for O1 Light: Engineers explored workarounds for international delivery of the O1 light, including buying through US contacts. It was noted that O1 devices built by users are functional globally.

Local LLMs Cut API Expenses: There's active engagement around using Open Interpreter in offline mode to eliminate API costs. Contributions for running it with local models such as LM Studio were detailed, including running commands like interpreter --model local --api_base http://localhost:1234/v1 --api_key dummykey, and can be referenced in the official documentation.

Calls for Collaboration on Semantic Search: A call to action was issued for improving local semantic search within the OpenInterpreter/aifs GitHub repository. This highlights a community-driven approach to enhancing the project.

Integrating O1 Light with Arduino's Extended Family: Technical discussions looked at merging O1 Light with Arduino hardware for greater utility. While ESP32 is standard, there's eagerness to experiment with alternatives like Elegoo boards.

O1 Dev Environment Installation Windows Woes: Members reported and discussed issues with installing the 01 OS on Windows systems. A GitHub pull request aims to provide solutions and streamline the setup process for Windows-based developers.

CUDA MODE Discord

Compilers Confront CUDA: While the debate rages on the merits of using compiler technology like PyTorch/Triton versus manual CUDA code creation, members also sought guidance on CUDA courses, including recommendations for the CUDA mode on GitHub and Udacity's Intro to Parallel Programming available on YouTube. A community-led CUDA course by Cohere titled Beginners in Research-Driven Studies (BIRDS) was announced, starting April 5th, advertised on Twitter.

Windows Walks with WSL: Several members provided ease-of-use solutions for running CUDA on Windows, emphasizing Windows Subsystem for Linux (WSL), particularly WSL2, supported by a helpful Microsoft guide.

Circling the Ring-Attention Revolution: In the #[ring-attention] channel, a misalignment of fine-tuning experiments with ring-attention goals halted progress, but insights on resolving modeling_llama.py loss issues spearheaded advancements. The successful training of tinyllama models with extended context lengths up to 100k on substantial A40 VRAM was a hot topic, alongside a Reddit discussion on the hefty VRAM needs for Llama 7B models with QLoRA and LoRA.

Triton Tangle Untangled: The #[triton-puzzles] channel was abuzz with a sync issue in triton-viz linked to a specific pull request, and an official fix was provided, though some still faced installation woes. The use of Triton on Windows was also clarified, pointing to alternative environments like Google Colab for running Triton-based computations.

Zhihu Zeal Over Triton: A member successfully pierced the language barrier on the Chinese platform Zhihu to unearth a trove of Triton materials, stimulating a wish for a glossary of technical terms to aid in navigating non-English content.

OpenRouter (Alex Atallah) Discord

Model Mania: OpenRouter Introduces App Rankings: OpenRouter launched an App Rankings feature for models, exemplified by Claude 3 Opus, to showcase utilization and popularity based on public app usage and tokens processed.

Databricks and Gemini Pro Stir Excitement, But Bugs Buzz: Engineers shared enthusiasm for Databricks' DBRX and Gemini Pro 1.5, although issues like error 429 suggested rate-limit challenges, and downtime coupled with error 502 and error 524 signaled areas for reliability improvements in model availability.

Claude's Capabilities and API Discussed: The community clarified that Claude in OpenRouter doesn't support prefill features and explored error fixing for Claude 2.1. A side conversation praised ClaudeAI via OpenRouter for better handling roleplay and sensitive content with fewer false positives, noting standardized access and cost parity with official ClaudeAI API.

APIs and Clients Get a Tune-Up: OpenRouter has simplified their API to /api/v1/completions and shunned Groq for Nitro models due to rate limitations, alongside improvements in OpenAI's API client support.

Easing Crypto Payments for OpenRouter Users: OpenRouter is slashing gas costs of cryptocurrency transactions by harnessing Base chain, an Ethereum L2 solution, aiming for more economical user experiences.

AI21 Labs (Jamba) Discord

Jamba Joins the AI Fray: AI21 Labs has launched Jamba, an advanced language model integrating Mamba with Transformer elements, marked by a 256K context window and single GPU hosting for 140K contexts. It's available under the Apache 2.0 license on Hugging Face and received coverage on TechCrunch.
Groundbreaking Throughput Announced: Jamba achieves a 3X throughput improvement in long-context tasks such as Q&A and summarization, setting a new performance benchmark in the generative AI space.
AI21 Labs Platform Ready for Jamba: The SaaS platform of AI21 Labs will soon feature Jamba, complementing its current availability through Hugging Face, and staff have indicated a forthcoming deep-dive white paper detailing Jamba's training specifics.
Jamba, the Polyglot: Trained on multiple languages including German, Spanish, and Portuguese, Jamba's multilingual prowess is confirmed, although its efficacy in Korean language tasks seems to be in question with no Hugging Face Space demonstration planned.
Pricing and Performance Tweaks: AI21 Labs has removed the $29 monthly minimum charge for model use, and discussions regarding Jamba's high efficiency reveal that its mix of transformer and Mixture-of-Experts (MoE) layers enables sub-quadratic scaling. Community debates touched on quantization for memory reduction and clarified that Jamba's MoE consists of Mamba blocks, not transformers.

tinygrad (George Hotz) Discord

Debugging Delight: An engineer shared their meme-inducing madness while dealing with a Conda metal compiler bug, indicating high levels of frustration with debugging the issue.
Tinygrad's Memory Management: The topic of implementing virtual memory in the tinygrad software stack was brought up with interest in adapting it for cache/MMU projects.
Comparing CPU Giants: Users debated the responsiveness of AMD's versus Intel's customer service, highlighting Intel's better GPU price-performance and pointing out software issues with AMD.
Optimization Opportunities for Intel Arc: Suggestions arose regarding potential performance enhancements for the Intel Arc by optimizing transformers/dot product attention within the IPEX library.
FPGA-Driven Tinygrad: An inquiry about extending tinygrad to leverage FPGA was made, engaging users in potential hardware acceleration benefits.

Latent Space Discord

OpenAI's Prepayment Explainer: OpenAI's adoption of a prepayment model is to combat fraud and offer clearer pathways to increased rate limits, with LoganK's tweet providing insider perspective. Despite its advantages, comparisons are drawn to issues faced by consumers with Starbucks' gift card model, where unspent balances contribute significantly to revenue.
AI21's Jamba Roars: AI21's new "Jamba" model merges Structured State Space and Transformer architecture, boasting an MoE structure and currently evaluated on Hugging Face. However, the community noted the lack of straightforward mechanisms for testing the newly announced model.
xAI's Grok-1.5 Making Strides in AI Reasoning: The newly launched Grok-1.5 model by xAI is recognized for outperforming its predecessor, especially in coding and mathematical tasks, with the community anticipating its release on the xAI platform. Benchmarks and performance details were shared via the official blog post.
Ternary LLMs Stir BitNet Debate: The community engaged in a technical debate over the correct terminology for LLMs using three-valued logic, discussing the impact of BitNet and emphasizing that these are natively trained models rather than quantized versions.
Stability AI's Shakeup: Emad Mostaque's exit from Stability AI sparked conversations about leadership changes in the generative AI space, fueled by insights from his interview on Peter Diamandis's YouTube channel and a detailed backstory found on archive.is. These discussions reflect ongoing shifts and potential volatility in the industry.

OpenAccess AI Collective (axolotl) Discord

Jamba Sets the Bar High: AI21 Labs has introduced a new model, Jamba, featuring a 256k token context window, optimized for performance with 12 billion parameters active. Their accelerated progress is underlined by the quick training time, with knowledge cutoff on March 5, 2024, and details can be found in their blog post.

Pushing the Boundaries of Optimization: Discussions have highlighted the effectiveness of bf16 precision in torchTune leading to substantial memory savings over fp32, with these optimizations being applied to SGD and soon to the Adam optimizer. Skepticism remains over whether Axolotl provides the same level of training control as torchTune, particularly in the context of memory optimization.

The Cost of Cutting-Edge: Conversations around the GB200-based server prices revealed a steep cost of US$2-$3 million each, prompting a consideration of alternative hardware solutions by the community due to the high expenses.

Size Matters for Datasets: The hunt for long-context datasets prompted sharing of resources including one from Hugging Face's collections and the MLDR dataset on Hugging Face, which cater to models requiring extensive sequence training.

Fine-Tuning Finesse and Repetition Debate: The community has been engaging in detailed discussions about model training, with a focus on strategy sharing like ▀ and ▄ usage in prompts and debates over dataset repetition's utility, referencing a paper on data ordering to support repetition. New fine-tuning approaches for larger models like Galore are also being experimented with, despite some memory challenges.

LangChain AI Discord

OpenGPTs: DIY Food Ordering System: A resourceful engineer integrated a custom food ordering API with OpenGPTs, capturing the adaptability and potential of LangChain's open-source platform, showcased in a demonstration video. They encouraged peer reviews to refine the innovation.

A Smarter SQL AI Chatbot: Members explored methods to enable an SQL AI Chatbot to remember previous interactions, enhancing the bot’s context-retaining abilities for more effective and coherent dialogues.

Gearing Up for Product Recommendations: Engineers discussed the development of a bot that would suggest products using natural language queries, considering the use of vector databases for semantic search or employing an SQL agent to parse user intents like "planning to own a pet."

Upgrade Your Code Reviews With AI: A new AI pipeline builder designed to automate code review tasks including validation and security checks was introduced, coupled with a demo and a product link, poised to streamline the code review process.

GalaxyAI Throws Down the Gauntlet: GalaxyAI is providing free access to elite AI models such as GPT-4 and Gemini-PRO, presented as an easy-to-adopt option for projects via their OpenAI-compatible API service.

Nurturing Engineer Dialogues: The creation of the <#1222928565117517985> channel fosters concentrated discussion on OpenGPTs and its growth, as evidenced by its GitHub repository.

Interconnects (Nathan Lambert) Discord

Jamba Jumps Ahead: AI21 Studios unveils Jamba, a new model merging Mamba's Structured State Space with Transformers, delivering a whopping 256K context window. The model is not only touted for its performance but is also accessible, with open weights under Apache 2.0 license.
Jamba's Specs Spark Interest: Jamba's hybrid design has prompted discussions over its 12B parameters focused on inference and its total 52B parameter size, leveraging a MoE framework with 16 experts and only 2 active per token, as found on Hugging Face.
Dissecting Jamba's DNA: Conversations surfaced around Jamba's true architectural definition, debating whether it should be dubbed a "striped hyena" due to its hybrid nature and specific incorporation of attention mechanisms.
Qwen Redefines Efficiency: The hybrid model Qwen1.5-MoE-A2.7B is recognized for matching the prowess of 7B models, with a much leaner 2.7B parameters, a feat of efficiency highlighted with resources including its GitHub repo and Hugging Face space.
Microsoft Gains Genius, Databricks on Megablocks: Liliang Ren hops to Microsoft GenAI as a Senior Researcher to construct scalable neural architectures evident from his announcement, while Megablocks gets a new berth at Databricks, enhancing its long-term developmental prospects mentioned on Twitter.

DiscoResearch Discord

AI21 Labs Cooks Up Jamba: AI21 Labs has launched Jamba, a model blending Structured State Space models with Transformer architecture, promising high performance. Check out Jamba through its Hugging Face deployment, and read about its groundbreaking approach on the AI21 website.

Translation Titans Tussle: Members are gearing up for a translation battle among DiscoLM, Occiglot, Mixtral, GPT-4, DeepL, and Azure Translate, using the first 100 lines from a dataset like Capybara to compare performance.

Course to Conquer LLMs: A GitHub repository offering a course for Large Language Models with roadmaps and Colab notebooks was shared, aiming to educate on LLMs.

Token Insertion Tangle Untangled: A debugging success was shared regarding unexpected token insertions believed to be caused by either quantization or the engine; providing a added_tokens.json resolved the anomaly.

Training Data Transparency Tremors: The community has asked for more information on the training data used for a certain model, with specific interest in the definition and range of "English data" as stated in the model card or affiliated blog post.

Skunkworks AI Discord

Call for Python Prodigies: A member inquired about the upcoming onboarding schedule for Python developers keen on joining the project, indicating that sessions tailored for experienced Python talent were previously in the pipeline.
YouTube Educational Content Shared: Pradeep1148 shared a YouTube video link, but no context was provided regarding the content or relevance to the ongoing discussions.

PART 2: Detailed by-Channel summaries and links

Stability.ai (Stable Diffusion) ▷ #general-chat (936 messages🔥🔥🔥):

Turbo Boost for SD3: The idea of finetuning SD3 lingers, with a hypothesis that the SD3 Turbo might serve as a powerful, detailed model akin to SDXL Lightning. Members speculate on the performance benefits of running the upcoming SD3 on hardware with 12GB VRAM.
AI Art and Politics: The discussion covers the balance between model power and public access, with concerns over potential downsizing of SD3 features for public use. There's hope for providing fine-tuning tools and control nets at launch, which could keep the user base productive.
LoRA's Future with SD3: Conversations revolve around training LoRA (Local Rank Aware) models on SD3, with the likelihood that 24GB of VRAM should be sufficient. There are uncertainties about how quickly and effectively these tools will be adapted to SD3 post-launch.
Training on Hardware: Users discuss the feasibility of training smaller sized LoRA models on GPUs that have 16GB VRAM, with some managing to achieve this on resolutions as high as 896 with success. Techniques and optimisations for training, such as Aggressive train rates, network dropouts, and ranking are shared.
Leveraging New Models for Growth: Members showcase and recommend using Arc2Face for face manipulation, admiring its demo and capabilities. Discussions are sprinkled with irony as users mock the potential of technology being perceived as alien intervention and trivialize the controversy over dataset censorship.

Links mentioned:

Perplexity AI ▷ #announcements (2 messages):

DBRX Debuts on Perplexity Labs: The state-of-the-art language model DBRX from Databricks is now available on Perplexity Labs, outperforming GPT-3.5 and rivaling Gemini 1.0 Pro especially in math and coding tasks. Users are invited to try DBRX for free at Perplexity Labs.
Copy AI and Perplexity Forge a Powerful Partnership: Copy AI integrates Perplexity's APIs to create an AI-powered go-to-market (GTM) platform, providing real-time market insights for improved decision-making. As an added perk, Copy AI users get 6 months of Perplexity Pro for free - details on the collaboration can be found in their blog post.

Link mentioned: Copy.ai + Perplexity: Purpose-Built Partners for GTM Teams | Copy.ai: Learn more about how Perplexity and Copy.ai's recent partnership will fuel your GTM efforts!

Perplexity AI ▷ #general (728 messages🔥🔥🔥):

Perplexity Search & Focus Modes: Discussions reveal confusion among users whether academic focus mode is functioning properly on Perplexity, noting potential outages with semantic search. Users recommend turning off focus mode or trying focus all as potential workarounds.
Exploring Pro Search and "Pro" Features: Conversation around Pro Search suggests some users find the follow-up questions redundant, preferring immediate answers. Pro users receive $5 credit for the API and access to multiple AI models, including the ability to attach sources like PDFs persistently in threads for referencing.
Understanding Perplexity's File Context and Sources: Through testing, users found the context for files on Perplexity differs from direct uploads vs. text input, with speculation that Retrieval Augmented Generation (RAG) is utilized—Claude may rely on other models like GPT-4-32k for file processing and referencing.
Generative Capabilities & API Utilization: Users discuss the costs and benefits of using the Perplexity API vs. subscriptions to services like GPT Plus and Claude Pro. Models like Opus and other subscription-based models are mentioned for specific tasks, such as generating content ideas, with users seeking guidance on when to use which service or feature.
Clarifying Claude Opus and GPT-4.5: User inquire about the accuracy and authenticity of Claude Opus responses, noting it has incorrectly identified itself as an OpenAI development in outputs. Participants suggest caution when interpreting such AI self-descriptions and to refrain from querying AI about their origins.

Links mentioned:

Perplexity AI ▷ #sharing (22 messages🔥):

Exploration of Perplexity's Capabilities: Various members are exploring and sharing Perplexity AI search results on topics such as Truth Social, Anthropic tag models, Amazon's investment strategies, Exchange administration intricacies, and global warming research implications.
Delving into Detailed Analyses: One member has been experimenting with <scratchpad> XML tags in Perplexity AI to demonstrate differences in search results, specifically illustrating the effectiveness in structuring detailed outputs.
Space Enthusiasm on Display: Members shared their curiosity about space technologies and phenomena through Perplexity searches, inquiring into details about SpaceX's Starship chunks and how space impacts the human body.
Tech and Finance Merge in User Queries: The topics of investment by Amazon and the story of FTX's co-founder, Sam Bankman-Fried, were brought into focus, reflecting users' interests in the intersection of technology and finance.
Scratchpad Strategy Sharpening: By employing improved scratchpad strategies, a member claims to have significantly enhanced the content quality of final outputs, highlighting the potential advancements in extracting more explorative and useful content from AI-generated text.

Perplexity AI ▷ #pplx-api (8 messages🔥):

API Results Inconsistency: A member reports that they often receive no results when using the Perplexity AI API, contrasting with the "plenty of results" on the web interface when searching for information, such as "Olivia Schough spouse."
Seeking API Parameter Guidance: The same member inquires if there are more parameters to guide the API to improve results.
Comparison of API and Web Interface Strengths: The member expresses an opinion that the web app seems "way better" than the API in terms of performance.
Sources Lacking in API Responses: Another user highlights a discrepancy: the API's returned response lacks the variety of sources that are present when using Perplexity AI on the web.
Beta Features and Application Process: In response, a user redirects members to a previous message about URL citations still being in beta and provides a link to apply for it: Apply for beta features.

Link mentioned: pplx-api form: Turn data collection into an experience with Typeform. Create beautiful online forms, surveys, quizzes, and so much more. Try it for FREE.

Unsloth AI (Daniel Han) ▷ #general (351 messages🔥🔥):

Jamba's Potential and Limitations Discussed: Members discussed the newly announced Jamba model by AI21 Labs, noting its claim of blending Mamba SSM technology with traditional Transformers. Some were impressed with its performance and context window, while others were skeptical about its tokenization and absence of detailed token counts.
Concerns Over Various Models: Members shared mixed experiences with AI tools like Copilot and claude, noting a drastic drop in Copilot quality, while some found claude low-key helpful. Concerns were raised regarding Model Merging tactics, with a mention of potentially discussing Argilla in the future.
Evaluation of Generative Models: Discussion centered on how to evaluate open-ended generation quality, with links shared to resources that delve into perplexity, pairwise ranking, and metrics linked to apparent emergent abilities, such as this emergence as a myth paper.
Model Training Challenges: Users shared experiences and sought advice on training AI models, particularly issues when fine-tuning Mistral and using SFTTrainer. They exchanged configurations and debated the optimal number of training epochs and batch sizes.
multi-GPU Support and Other Technical Difficulties Addressed: Concerns were raised about the lack of multi-GPU support with Unsloth AI, along with issues related to bitsandbytes dependency and error messages during fine-tuning. Advice and troubleshooting tips were shared among members.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #random (34 messages🔥):

Andrej Karpathy's AI Analogy: A YouTube video featuring Andrej Karpathy discusses LLMs in analogy to operating systems, citing important aspects of finetuning and the misconception of labeling Mistral and Gemma as open source.
Finetuning's Fine Balance: It's highlighted that for effective finetuning, a blend of the model’s original training data and new data is essential; otherwise, a model's proficiency might regress.
Missed Opportunity at AI Panel: Dialogues revealed disappointment as a non-technical question about Elon Musk's management style took precedence over deeper AI topics during a public panel featuring AI expert Andrej Karpathy.
Reinforcement Learning Critique: Andrej Karpathy criticizes current RLHF techniques, comparing them unfavorably to AlphaGo's training, suggesting that we need more fundamental methods for training models, akin to textbook exercises.
Snapdragon's New Chip in the Spotlight: Discussion of the new Snapdragon X Elite chip's benchmarks and capabilities surfaced, with comparisons to existing technology and hopes for future advancements in processor performance shared alongside a benchmark review video.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #help (201 messages🔥🔥):

Tokenizer Troubles: User encountered a list index out of range error using the get_chat_template function from unsloth with the Starling 7B tokenizer. This suggests potential tokenization issues with Starling's implementation.
Finetuning Frustrations: One member discussed difficulties with the finetuning process, describing various parameters tried without desired results. Suggestions from others included fixing the rank and experimenting with different alpha rates and learning rates.
Checkpoint Challenges: Concerns about unsloth's need for internet access during training were raised, especially when internet instability occurs, potentially interfering with checkpoint saving. Unsloth doesn't inherently need internet connection to train, and issues may relate to Hugging Face model hosting or WandB reporting.
Pushing Checkpoints to Huggingface: A member asked how to push local checkpoints to Hugging Face, with the response pointing towards using Hugging Face's hub_token for cloud checkpoints and save_strategy for local storage.
Overfitting Overhead: A user sought advice for a model that overfits, with recommendations provided to reduce the learning rate, lower the LoRA rank and alpha, shorten the number of epochs, and include an evaluation loop to stop training if loss increases.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #showcase (6 messages):

Tinyllama Leverages Unsloth Notebook: The Lora Adapter from Unsloth's notebook got converted to a ggml adapter to be used in the Ollama model, resulting in the Tinyllama model with training emerged from the Unsloth Notebook and a dataset from Huggingface.
Mischat Model Unveiled: Another model named Mischat, which leverages gguf, is now updated on Ollama. It involved fine-tuning with the Mistral-ChatML Unsloth notebook and can be found at Ollama link.
Showcasing Template Reflection in Models: A user showcased how templates defined in notebooks can reflect in Ollama modelfiles, using the recent models as examples of this process.
Blogpost for AI Enthusiasts: A member introduced a new blog post titled 'AI Unplugged 5' covering a variety of AI topics ranging from Apple MM1 to DBRX to Yi 9B which offers insight into recent AI developments and available at AI Unplugged.
Positive Reception for AI Summary Content: The blog summarizing weekly AI activities received positive feedback from another user who appreciated the effort to provide insights on AI advancements.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #suggestions (1 messages):

starsupernova: ooo very cool!

LM Studio ▷ #💬-general (237 messages🔥🔥):

AI Hobbies Aren't Cheap: Members discuss the costs associated with AI as a hobby, with expenditures reaching around $10k due to the need for high-end GPUs or MacBooks for good performance.
Challenges for Newcomers in AI: Newcomers to the world of AI face a steep learning curve, with discussions highlighting the "hard awakening" to the computationally demanding and jargon-laden field.
Local LLM Guidance and Troubleshooting: Several inquiries and tips were exchanged on running local LLMs, from issues with LM Studio and GPU compatibility to recommendations on model selection based on VRAM and system specifications.
LM Studio Enhancements and Feature Requests: Users express excitement for new features like the Branching system, and highlight the usefulness of having folders for branched chats; some are requesting a 'Delete All' feature for smoother user experience.
Community Support and Shared Resources: Helpful conversations and shared resources like YouTube guides and GitHub links provide support among users, who collaboratively troubleshoot technical challenges and explore the capabilities of different models.

Links mentioned:

LM Studio ▷ #🤖-models-discussion-chat (39 messages🔥):

VRAM Requirements for Future Models: An issue has been raised about current GPU configurations lacking the VRAM for upcoming models, specifically pointing out that 32GB VRAM might fall short for Q2 or Q3 versions of models. A GitHub issue mentions that llama.cpp doesn't yet support newer, VRAM-intensive model formats like DBRX.
LM Studio Pairing with Open Interpreter: A member asked if anyone has gotten Open Interpreter to work with LM Studio, and another member linked both the official documentation and a YouTube tutorial to assist with setup.
Leveraging Existing Server Setups for Large LLMs: Some members discussed using older servers with Nvidia P40 GPUs for running large language models, noting that while they are not ideal for tasks like stable diffusion, they can handle large models with decent speed.
Assessing Creative Writing Capabilities: A few members compared different language models, noting that Zephyr may outperform Hermes2 Pro in creative writing tasks, highlighting the nuances in how various language models handle specific functions.
Choosing the Right Coder LLM for Limited VRAM: A query about the best coder language model to use on a 24GB RTX 3090 led to a mention that DeepSeek Coder might be the top choice, and that instruct models might function better when used as agents.

Links mentioned:

LM Studio ▷ #announcements (2 messages):

LM Studio 0.2.18 Update Rolls Out: LM Studio has released version 0.2.18, which introduces stability improvements, a new 'Empty Preset' for Base/Completion models, default presets for various LMs, a 'monospace' chat style, along with a number of bug fixes. Download the new version from LM Studio or use the 'Check for Updates' feature.
Comprehensive Bug Fixes Detailed: The update addresses issues such as duplicated chat images, ambiguous API error messages without models, GPU offload bugs, and incorrect model names in the UI. Mac users get a Metal-related load bug fixed, and Windows users see a correction for app opening issues post-installation.
Documentation Site Launched: A dedicated documentation site for LM Studio is now live at lmstudio.ai/docs, with plans to expand the content available in the near future.
Config Presets Accessible on GitHub: Users who can't find the new configs in their setup can access them at openchat.preset.json and lm_studio_blank_preset.preset.json on GitHub, although the latest downloads and updates should already include these presets.

Links mentioned:

LM Studio ▷ #🧠-feedback (13 messages🔥):

Praise for AI Ease-of-Use: A member expressed their appreciation for an AI tool, calling it the most user-friendly AI project they have ever used.
VRAM Display Inaccuracies Reported: A member noted that the VRAM displayed is incorrect, showing only 68.79GB instead of the expected 70644 GB.
Seeking Stable Models Amidst Volume: Another member disclosed that they are sifting through almost 2TB of models, encountering issues with many models producing garbled or repetitive text.
Prompt Formatting Woes: Members discussed the importance of using the correct prompt format for models to function properly and mentioned checking the model's Hugging Face page for clues.
VRAM Readings Fluctuate Across Versions: It was observed that the displayed VRAM changed across different versions of the software, with one member speculating that version 2.17 possibly showed the most accurate value.

LM Studio ▷ #🎛-hardware-discussion (111 messages🔥🔥):

Power Supply Sizing Recommendations: Discussions suggest a 1200-1500W power supply for an Intel 14900KS and an RTX 3090 build, whereas 1000W might suffice for AMD setups like a 7950x/3d, considering no GPU power limits. Gold/platinum rated PSUs were endorsed for better efficiency.
GPU Power Limit Tips: It was shared that locking RTX 4090 GPUs to a 350-375W range only results in a minimal performance loss, emphasizing that such a limit is more practical than full power overclocking's minuscule gains.
Performance Enthusiasm for Legacy Hardware: Some members discuss leveraging older GPU hardware like the K80, while noting limitations like high heat output. It was suggested that old iMac fans can be modded to help with cooling on such antique tech.
Exploring the Utility of NVLink: Members exchanged ideas on whether NVLink can improve performance for model inference, with some arguing for a noticeable speed increase in token generation with the link, while others questioned its utility for inference tasks.
LM Studio Compatibility Queries: As LM Studio updates arrive, users typically inquire about support for various GPUs, highlighting limitations and workarounds, like using OpenCL for GPU offloading or ZLUDA for older AMD cards. Moreover, issues are sometimes encountered and solved, such as the UI problem fixed by running the app in fullscreen.

Links mentioned:

LM Studio ▷ #🧪-beta-releases-chat (80 messages🔥🔥):

User Proposes GPU/NPU Monitoring: A suggestion was made to add a monitoring feature for GPU and NPU in the usage sections along with utilization statistics for better system parameter optimization.
GPU Offloading Troubleshooting: Users discussed issues with ROCm beta on Windows not loading models into GPU memory. The conversation led to identifying a possible issue with partial offloading when not all layers are offloaded to the GPU. Users shared logs and error messages to diagnose the problem.
LM Studio Windows Version Confusion: Some users reported the LM Studio version appears as 0.2.17 in logs, even when 0.2.18 is in use. There's discussion about an issue with adjusting the GPU offload layers slider in the UI.
ROCm Beta Stability Questions: One user encountered crashes while using the 0.2.18 ROCm Beta; it works initially but fails upon asking questions. There is a call for users to test with a debug build and send verbose logs for further investigation.
LM Studio on Nix via GitHub PR: A user successfully integrated LM Studio into Nix, shared the Pull Request link (https://github.com/NixOS/nixpkgs/pull/290399), and indicated plans to merge the update soon.

Links mentioned:

LM Studio ▷ #langchain (2 messages):

Check Out OllamaPaperBot's Code: A member shared a GitHub repository for OllamaPaperBot, a chatbot designed to interact with PDF documents using open-source LLM models, inviting others to review their code.
Inquiring about LMStudio's JSON Output: A member inquired if anyone has tried using the new JSON output format from LMStudio in conjunction with the shared OllamaPaperBot repository.

Link mentioned: OllamaPaperBot/simplechat.py at main · eltechno/OllamaPaperBot: chatbot designed to interact with PDF documents based on OpenSource LLM Models - eltechno/OllamaPaperBot

LM Studio ▷ #amd-rocm-tech-preview (54 messages🔥):

Understanding GPU Utilization with LM Studio: Users reported issues with LM Studio where in version 0.2.17, GPUs were properly utilized, but after updating to 0.2.18, there was a noticeable drop in GPU usage under 10%. Instructions were provided for exporting app logs using a debug version to investigate the problem.
Geared Up for Debugging: In response to low GPU usage, a user attempted troubleshooting with a verbose debug build and noted that GPU usage improved when LM Studio was used as a chatbot rather than a server.
Ejecting Models causes errors: A user discovered a potential bug where after ejecting a model in LM Studio, they were unable to load any new models without restarting the application. Other users were unable to reproduce this failure.
Driver Issues May Affect Performance: A conversation about the proper application of ROCm instead of AMD OpenCL suggested that driver updates or installing the correct build could resolve some issues. A user confirmed improvement after updating AMD drivers and installing the correct build of LM Studio.
Select GPUs Unrecognized by New Update: Users observed that while LM Studio version 2.16 recognized secondary GPUs, version 2.18 no longer did. A user clarified that models started working properly after ensuring they were using the ROCm build, not the regular LM Studio build.

Links mentioned:

LM Studio ▷ #crew-ai (2 messages):

Deepening Abstract Concepts with AI: A member shared their interest in a paper (Abstract Concept Inflection) which discusses using AI to expand abstract concepts into detailed ones, similar to breaking down coding tasks. They are currently experimenting with using different models as a critic or generator for a proof of concept with autogen.
Seeking Agent Program Recommendations for LM Studio: A member inquired about which Agent program to use and if there are any that easily plug into LM Studio. They are looking for guidance on integrating with LM Studio.

OpenAI ▷ #annnouncements (1 messages):

Voice Engine Unveiled: OpenAI has shared insights on a new model called Voice Engine, which requires only text and a 15-second audio sample to generate natural-sounding speech, aiming to produce emotive and realistic voices.
Voice Engine Powers Text-to-Speech API: Initially developed in late 2022, Voice Engine is currently used for preset voices in OpenAI's text-to-speech API and also enhances ChatGPT Voice and Read Aloud features.
AI Ethics in Focus: OpenAI emphasizes a careful approach to releasing Voice Engine more broadly, acknowledging potential misuse and stressing the importance of responsible synthetic voice deployment as outlined in their AI Charter.

Link mentioned: Navigating the Challenges and Opportunities of Synthetic Voices: We’re sharing lessons from a small scale preview of Voice Engine, a model for creating custom voices.

OpenAI ▷ #ai-discussions (75 messages🔥🔥):

Gemini Advanced Leaves Users Wanting: Members express disappointment with Gemini Advanced, noting longer wait times for responses compared to other products and limited availability in Europe.
OpenAI Products Version Confusion: Discord users discuss the versions of GPT-4 used in different OpenAI applications, with instances of ChatGPT reporting varying cutoff dates, creating confusion about model stability.
VoiceCraft Sparks Enthusiasm and Worry: A link to VoiceCraft, a state-of-the-art tool for speech editing and TTS with impressive demos, attracts attention, but equally raises concerns about potential misuse for scams.
Legal and Ethical Discussions on Voice Cloning: Users debate the legality of voice cloning, with some emphasizing the difficulty of preventing misuse preemptively and others pointing out the distinction between technology availability and illicit activities.
OpenAI's Cautious Approach to Innovation: A conversation about OpenAI’s cautious release strategy for emerging AI technologies elicits user frustration over perceived slow progress, yet also garners understanding for the methodical approach to manage risk.

Links mentioned:

OpenAI ▷ #gpt-4-discussions (14 messages🔥):

Bot Development Enthusiasm: A member expressed their intention to create a Telegram or Discord bot.
Inquisitive Minds for Future GPT: A query was posed about whether GPT-5 will outperform Gemini 1.5 Pro. Answers varied, jokingly predicting it to be better than GPT-4 but worse than a hypothetical GPT-6.
Anticipation for GPT-5: Users discussed the possible release and access to GPT-5, although no specific details were provided about its availability.
Assistance Offered for GPT-4 Code Execution: A user needed help with running code in GPT-4 as GPT-3.5 wasn't providing solutions, and another member volunteered to assist via direct message.
Channel Usage Reminder: A reminder was issued for users to keep discussions about ChatGPT and related applications in a dedicated channel, emphasizing that gpt-4-discussions is meant for GPI AI models' discussions.

OpenAI ▷ #prompt-engineering (153 messages🔥🔥):

Prompt Precision Makes Perfect: Members discussed the importance of phrasing prompts correctly to achieve desired outcomes. Examples such as Respond **only** with {xyz}. Limit yourself to {asdf}. were highlighted to ensure concise responses from models.
Roleplay Puzzler with GPT-4.0125: One user shared an oddity with GPT-4.0125 model's responses regarding roleplay prompts, where providing an example of how to respond can cause the model to refuse roleplaying. The user also found workarounds, like pre-crafting a first message and removing "comply with the following directives", and discussed troubleshooting steps involving subtle leading phrases and excluding mention of what not to do.
Equations in MS Word Troubles: A user sought advice on transferring LaTeX equations from ChatGPT to Microsoft Word. Despite following a YouTube tutorial, issues persisted due to differences in their MS 365 version, leading to suggestions about using MathML as an intermediary.
Meta-Prompting Merits Discussion: The efficacy of metaprompting versus traditional prompting was debated, with members examining whether metaprompting consistently results in higher quality outputs, despite evidence suggesting models can unpredictably provide both correct and incorrect responses. Another user discouraged linking to external papers due to platform rules but suggested search terms for finding related studies.
Presentation Prompting 101: A dialogue about creating prompts for presentations was initiated by a user seeking a fill-in-the-blanks approach. The conversation highlighted the need to provide detailed information to AI for it to effectively assist with tasks like generating presentations.

OpenAI ▷ #api-discussions (153 messages🔥🔥):

Prompt Precision Leads to Better Performance: A discussion highlighted the significance of prompt wording, suggesting specific phrasing such as "Respond only with {xyz}. Limit yourself to {asdf}" to improve AI performance; avoiding vague terms is crucial.
Presenting Proper Translation Prompts: One user sought advice on improving their prompt for translations, aiming to keep HTML tags and certain elements in English. Another member suggested focusing on what to do in the prompt, for better output, such as "Provide a skilled translation from English to {$lang_target}" while maintaining specific formatting and content requirements.
Strategic Prompting for Role-Play Responses: Users exchanged strategies to make GPT-4 adhere to role-play instructions, addressing oddities encountered with gpt-4-0125-preview that refused to role-play when given an example format but complied when the example was omitted.
Formatting Frustration with Equations in Word: Members tried to help a user who faced difficulties copying LaTeX equations into Microsoft Word. Suggestions like using MathML as an intermediate format and verifying Microsoft Word version compatibilities were given.
Is Metaprompting the Way to Go?: There was a debate on whether metaprompting consistently achieves better results, with various users sharing skepticism and personal experiences that challenge the effectiveness of metaprompting vs. traditional direct instructions.

Eleuther ▷ #general (272 messages🔥🔥):

AI Predictions Trickle Into Chat: The conversation revolved around the potential of AI to surpass human-level intelligence, with concerns about needing radically different architectures and hardware to achieve such advancements. The cost and eco-economics of scaling up current systems, and the death of Moore's law, were hotly debated.
Leaning Into Latin and PR Speak: A member shared a disdain for the word "testament" as a telltale sign of AI-generated content, while others highlighted how the corporate PR-like demeanor of ChatGPT makes it easily identifiable.
Skepticism Straddles Startup Speculation: A YouTube-sourced claim about a startup developing more efficient AI chips led to discussions about the difficulties in shifting hardware paradigms and the intricacies of semiconductor manufacturing.
Grammar and Language Peculiarities Plucked: Chat participants chuckled over the mistaken singular forms of plural Latin-derived words, sharing personal anecdotes and clarifying correct usage in various languages, including English and Portuguese.
Fears of Fast AI Change Fostered: Some expressed the opinion that society might not cope with a rapid acceleration of AI capabilities, reflecting on how society has dealt with the impacts of the digital revolution and the need for mindful progress in AI.

Links mentioned:

Eleuther ▷ #research (46 messages🔥):

LISA: A New Fine-Tuning Technique: LISA, a new method for fine-tuning large language models, outperforms LoRA and full-parameter training in instruction following tasks and can tune 7B parameter models on a 24GB GPU. The simple yet powerful algorithm activates embedding and linear head layers while randomly sampling intermediate layers for unfreezing. Paper and Code are available for review.
Mamba Joins Forces with Transformer: AI21 announces Jamba, a model combining the Mamba architecture with Transformers, leading to a hybrid that has 12B active parameters out of a total of 52B. It's claimed to offer the benefits of both worlds, including a pseudo working memory and potentially more data-efficient operation. AI21 Jamba Release
Protecting Sensitive Training Data: Various approaches for training with sensitive data are discussed, including SILO which decouples sensitive examples (paper) and differential privacy methods (paper 1, paper 2) to prevent data extraction from models.
SambaNova Debuts Samba-1: SambaNova released Samba-1, a Composition of Experts (CoE) model which integrates over 1 trillion parameters from 50+ expert models pilfered from the open source community, claiming superior performance on enterprise tasks with reduced compute needs. Skepticism abounds regarding their actual in-use performance due to lack of transparent details. SambaNova Blog
Understanding CoE at SambaNova: Composition of Experts (CoE) at SambaNova could involve a heterogeneous mixture of expert modules, such as multiple models with different sizes combined into one system. Discussion indicates this may resemble an ensemble of fine-tuned models originating from other creators. Benchmarking Samba-1 Blog

Links mentioned:

Eleuther ▷ #interpretability-general (40 messages🔥):

Weight Wonders or Woes?: Discrepancies were identified when comparing weight parameters between Transformer Lens (tl) and Hugging Face (hf) versions of models like GPT-2 and Pythia 70m. The examination of these discrepancies involved comparing shapes and computing the maximum absolute differences between the models' embedding parameters.
A Feature, Not a Bug: The variation in model weights was initially thought to be a bug but was confirmed to be a featured processing step by Transformer Lens. The solution advised was to use from_pretrained_no_processing when initializing models to avoid such pre-processing. References were made to a GitHub issue and further comments on weight processing.
Logit Labyrinths: Despite same-shaped logits showing significant absolute differences between nn_model and tl_nn_model, a comparison after applying softmax indicated negligible differences—the implication being that while raw logit values differ, the relative order and softmax outputs are consistent.
A Plotting Pathway to Clarification: The process of identifying and understanding the weight disparities involved plotting various matrices, including the embedding matrices and their absolute differences, and analyzing their characteristics.
Sparse Autoencoders Under Scrutiny: A separate mention highlighted a research post discussing the impact of reconstruction errors in Sparse Autoencoders (SAEs), suggesting that these errors could significantly alter model predictions more than random errors of equal magnitude. The post is accessible via a LessWrong link.

Links mentioned:

Eleuther ▷ #lm-thunderdome (8 messages🔥):

Optimizing MMLU Performance: Efficiency improvements for MMLU have been implemented, allowing one forward call to extract multiple logprobs simultaneously, which can be disabled by setting logits_cache to False.
Troubleshooting Large Model Support: A user encountered memory allocation issues when loading the DBRX base model using lm-eval harness. They discovered their user error of operating on a node with only 4 GPUs instead of the 8 with 64GB VRAM they anticipated having.
Enhanced Context-Based Task Handling: A pull request was submitted proposing a new approach for handling context-based tasks in lm-evaluation-harness which better supports tasks relying on prior request answers. The update includes methods to refine requests before model ingestion and manage an external log for crucial contextual information.
Feedback Requested on Pull Request: User hailey_schoelkopf promises to review and provide feedback on the mentioned pull request after the CoLM deadline, apologizing for the delay.

Link mentioned: Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

Nous Research AI ▷ #off-topic (6 messages):

Quick Exit: A user simply posted that they had been banned and kicked without providing further context or details.
Simple Gratitude: A short message expressing thanks was posted, with no additional information included.
Cohere's Efficiency in Embeddings: A YouTube video titled "Cohere int8 & binary Embeddings" was shared, which discusses how to scale vector databases for large datasets. The video is relevant for those interested in AI, LLMs, ML, deep learning, and neural networks.
Styling with StyleGAN2: A user queried about training a complex directory structure of various fashion datasets in StyleGAN2-ADA and whether script modification is necessary for this process.
Aerospace Student Eying AI: An aerospace student expressed a desire to delve into ML/AI and contribute to open source projects, seeking advice on whether starting with fast.ai courses is a suitable approach given their math and coding background.

Link mentioned: Cohere int8 & binary Embeddings: Cohere int8 & binary Embeddings - Scale Your Vector Database to Large Datasets#ai #llm #ml #deeplearning #neuralnetworks #largelanguagemodels #artificialinte...

Nous Research AI ▷ #interesting-links (6 messages):

MicroQwen on the Horizon: A tweet hints at an upcoming release for what is speculated to be a smaller model variant of Qwen named microqwen.
AI21 Labs Jams with Jamba: AI21 Labs announces Jamba, a novel Mamba-based model with Transformer elements, boasting a 256K context window and unmatched throughput and efficiency. Jamba comes with open weights under the Apache 2.0 license for community innovation.
Qwen Embraces Mixture of Experts: Introducing Qwen1.5-MoE-A2.7B, a transformer-based MoE model with 14.3B parameters yet only 2.7B activated at runtime. This model claims to achieve comparable performance to much larger models with significantly less training resources and faster inference speeds.
LISA Amplifies LoRA's Fine-Tuning Impact: A new study identifies a layerwise skewness in weight norms when using LoRA and proposes a simple yet more efficient training strategy called Layerwise Importance Sampled AdamW (LISA), which shows improvements over LoRA and full parameter training with low memory costs. Details available on arXiv.
BLLaMa Project Forks on GitHub: A GitHub repository named bllama has surfaced, promoting a 1.58-bit LLaMa model which seems to be an efficient variant of the original LLaMa model, aiming for contributions from the community.

Links mentioned:

Nous Research AI ▷ #general (145 messages🔥🔥):

Speculation on Future NOUS Tokens: A member inquired about the potential for NOUS to introduce a cryptographic token, sparking a light-hearted response dismissing the notion.
NOUS Subnet Connection in Place of Tokens: In response to a query on NOUS tokens, it was clarified that while there's no crypto token, NOUS has a subnet on bittensor/Tao, which has its own coin.
Sneak-Peek at Qwen1.5-MoE-A2.7B: A member teased the imminent release of a compact MoE model dubbed Qwen1.5-MoE-A2.7B, prompting anticipation and discussions about its upcycling of experts from an existing 1.8B parameter model.
Jamba's Entry Brings SSM-Transformer Hybrid to Forefront: AI21 announced Jamba, a hybrid SSM-Transformer LLM boasting a 256K context window and potential for high throughput and efficiency, stirring excitement among community members about its capabilities.
Discussions of Support for New Model Architectures in Open Source Libraries: Conversations revolved around the anticipation for popular open source libraries like vllm and transformers to embrace new architectures like RWKV, SSMs, and the freshly announced Jamba, highlighting the industry's quick evolution and the desire for timely support of innovative models.

Links mentioned:

Nous Research AI ▷ #ask-about-llms (26 messages🔥):

Fine-Tuning Frustrations: A member shared a full-parameter fine-tuning case of Deepseek-coder-33B where the training loss decreases at the start of each epoch while the evaluation loss increases. It was speculated to be a case of textbook overfitting, with suggestions to shuffle the dataset to address possible data inconsistencies.
Hardware Hunt for Local LLMs: Queries were made about where to find information on hardware requirements for running large language models locally. An alternate preference for ERP systems was mentioned, but no specific resources were provided.
Pretraining Prep for Prolific Prose: A challenge was raised on how to prepare a 1,500 book dataset for pretraining when some books have up to 200k tokens. It was recommended to use clustering and hierarchical approaches to handle repetition and maintain thematic consistency in the data.
Retrieving the Right RAG Strategy: Discussion on whether to use one big Retrieval Augmented Generation (RAG) model or several domain-focused RAGs for different documents took place. Suggestions included using structured approaches with domains and metadata filtering, with the ultimate decision noted to be use-case specific.
Hermes Model Missing Tokens Mystery: A concern was raised about the Hermes-2-Pro-Mistral-7B model configuration on HuggingFace where a mismatch in the defined vocabulary size and extra defined tokens might cause errors during generation. The issue was addressed as an artifact from model padding processes, with a suggestion to create extra unused tokens to resolve it.

Links mentioned:

Nous Research AI ▷ #project-obsidian (1 messages):

night_w0lf: Did it work?

Nous Research AI ▷ #rag-dataset (57 messages🔥🔥):

Agentic Data Sets for Hermes 2 Pro: Hermes 2 Pro is trained on agentic datasets with all popular frameworks, and work is in progress to develop multi-turn agentic loops.
CoT Revisions and the Value of Metadata: The Chain of Thought (CoT) process is revised to enhance answer retrieval, with structured formats like XML found to be effective in enhancing Claude's performance as evidenced by Anthropic's use of XML tags.
Structured Inputs for RAG: It's proposed to utilize structured formats such as XML for input delineation in modeling and to create a dedicated category for pydantic related implementations, indicating a trend towards more structured and metadata-rich inputs for AI models.
Incorporating Temporal Awareness in Models: There's a curiosity about increasing a model's temporal awareness, highlighting the importance of attributing proper context and metadata to data being processed by the model.
RAG Evaluation Framework Introduction: The chatbot referred to a GitHub repository for an evaluation framework called ragas, indicating ongoing efforts to improve Retrieval Augmented Generation (RAG) pipelines.

Links mentioned:

Nous Research AI ▷ #world-sim (88 messages🔥🔥):

Prompt Assistance and Sharing Resources: Members are sharing tips, tutorials, and code snippets for various projects like generating LaTeX papers for arXiv using Worldsim. The gist for LaTeX papers was specifically mentioned as helpful.
Hyperstition Concept Discussions: The term hyperstition has been analyzed in-depth, with explanations about how it affects the cognitive domains of LLMs like Claude. The term is associated with expanding the generative range of AI models and is linked to Mischievous Instability (MI).
Philosophy Open Mic Interest: There's an expressed interest in hosting philosophy discussions or open mic nights to explore language as a complex adaptive system amongst other topics. This could potentially be coupled with new steamcasting technology for wider distribution.
Exploring Historical 'What-ifs': Members discussed using Worldsim to experiment with alternate history scenarios, like Julius Caesar surviving or JFK avoiding assassination, and expressed curiosity about the potential outcomes on world history.
Clarifying the Usage of Large Language Models: A member explained the difference between heuristic engines and inference engines, highlighting how terms are defined based on relationships and associations in the latter, leading to what's known as "Cognitive Domains."

Links mentioned:

Modular (Mojo 🔥) ▷ #general (127 messages🔥🔥):

Understanding Mojo's Performance Boost for RL: Members discussed the potential benefits of using Mojo for speeding up reinforcement learning environments. A member pointed out, despite Mojo's speed, that it will take time and effort to adapt existing Python-based tools and environments to work effectively with Mojo.
Dynamic Shape Capabilities in MAX Engine: A blog post unveiled improvements in MAX Engine's 24.2 release, focusing on dynamic shape support in machine learning models and comparing dynamic shapes' latency to static ones.
Mojo Running Locally: Members discussed running Mojo locally on different platforms. Mojo has native support for Ubuntu and M-series Macs, can run through WSL on Windows, and Docker on Intel Mac, with native support for other platforms in the works.
Mojo and Interoperability with Python: Clarification was provided on how Mojo works with Python, highlighting that Python interop runs through the CPython interpreter and uses reference counting for memory management. Code not interfacing with Python avoids garbage collection and instead uses compiler-driven deallocation similar to C++ RAII or Rust.
Package Management and Cross-Compiling in Mojo: There is currently no official package manager for Mojo, although there's ongoing discussion about the project manifest format, which will precede a package manager. Cross-compiling capabilities for Windows aren't confirmed, with community members looking forward to potential Windows support as a sign of Mojo's maturity for further engagement.

Links mentioned:

Modular (Mojo 🔥) ▷ #💬︱twitter (6 messages):

Teasing New Frontiers: Modular teased an upcoming announcement in a tweet, hinting at something new on the horizon. The tweet can be explored here.
Dropping Hints: Another Modular tweet dropped a hint with a single emoji, sparking curiosity among followers. Check the tweet here.
Countdown to Reveal: A tweet by Modular included a clock emoji, suggesting a countdown to an upcoming reveal or event. The tweet is accessible here.
Sneak Peek Shared: Modular released a sneak peek of what's to come in a recent tweet, which can be seen here.
Event Announcement: Modular announced a specific event, potentially related to the previous teasers, in their latest tweet. Full details can be found here.
Event Reminder: A follow-up tweet from Modular served as a reminder about the recently announced event. Visit the tweet for more information here.

Modular (Mojo 🔥) ▷ #📺︱youtube (1 messages):

Modular Community Livestream Release: Modular announced a new video on YouTube titled "Modular Community Livestream - New in MAX 24.2". The video covers the latest updates in MAX 24.2, including the open sourcing of Mojo standard library and MAX Engine support. Watch the video here.

Link mentioned: Modular Community Livestream - New in MAX 24.2: MAX 24.2 is now available! Join us on our upcoming livestream as we discuss everything new in MAX - open sourcing Mojo standard library, MAX Engine support f...

Modular (Mojo 🔥) ▷ #✍︱blog (3 messages):

Mojo🔥 Standard Library Now Open Source: Modular announced the general availability of MAX 24.2, featuring the open sourcing of the Mojo standard library. Developers are invited to contribute to its development, with the library now available on GitHub, and nightly builds released for the latest language features.
Mojo🔥Open Source Contributions Welcomed: Modular has released core modules of the Mojo standard library under the Apache 2 license, as part of their belief in open-source development. The ongoing improvements since May 2023 can be tracked through the changelog, with the company encouraging collaboration from developers worldwide.
Leveraging MAX Engine 24.2 for Dynamic Shapes: The MAX Engine 24.2 release includes support for dynamic shapes in machine learning, crucial for handling real-world data variability. The impact of dynamic shapes versus static shapes is demonstrated through a latency comparison using the BERT model on the GLUE dataset.

Links mentioned:

Modular (Mojo 🔥) ▷ #announcements (1 messages):

Mojo Standard Library Goes Open Source: Modular has officially open-sourced the core modules of the Mojo standard library, releasing them under the Apache 2 license. This move is aligned with their belief that open-source collaboration will lead to a better product, further outlined in their blog post.
Nightly Builds for the Standard Library: Alongside the open-sourcing, Modular has introduced nightly builds for developers to stay updated with the latest improvements. Open-sourcing is seen as a step towards closer collaboration with the global Mojo developer community.
MAX Platform v24.2 Update: The latest MAX platform update, version 24.2, includes support for TorchScript models with dynamic input shapes and other enhancements. Detailed changes are listed in the MAX changelog.
Latest Mojo Language Tools and Features: The Mojo language update to v24.2 has the standard library going open source and brings advancements such as implicit conformance of structs to traits. The running list of significant changes can be found in the Mojo changelog.
Deep Dive into MAX's Dynamic Shapes Support: A dedicated blog post explains the new dynamic shapes support in MAX Engine’s 24.2 release, its use cases, and the performance impact, particularly showcasing improvements on the BERT model on the GLUE dataset.

Links mentioned:

Modular (Mojo 🔥) ▷ #🔥mojo (91 messages🔥🔥):

Generalized Complex Types Introduced: Moplex, offering fully featured generalized complex types, has been released on GitHub by helehex. Detailed information and code repository is available at helehex/moplex on GitHub.
NVIDIA GPU Support Gears Up for Summer Release: Modular is prioritizing NVIDIA GPU support with a target release around summer, which includes split compilation similar to CUDA with Python-like syntax that transitions from CPU to GPU. A MAX session about Mojo's GPU support shares insights on the development.
Modular AI Engine and Mojo Changelog for 24.2.0: An early peek into the Mojo 24.2 changelog shows significant updates including open-sourcing of the Mojo standard library. Changelog details can be found here.
Featuring Simplified References Across Collections: Structs and nominal types can now implicitly conform to traits in the fresh updates, hinting at more intuitive conformance patterns. A member spotlighted the feature and its potential, directing others to Modular's contributing guidelines at modularml/mojo on GitHub.
Matrix Multiplication Example Revision Needed: A user identified a bug in the matrix multiplication example, proposing a correction for cases where the columns of C are not a multiple of nelts to avoid a crash. The user suggests altering the range loop to rectify the problem.

Links mentioned:

Modular (Mojo 🔥) ▷ #community-projects (13 messages🔥):

Mojo Standard Library on Fast Track: Updates have been made to several Mojo packages such as mojo-prefix-sum, mojo-flx, mojo-fast-base64, and mojo-csv to version 24.2, while mojo-hash and compact-dict are partially updated with some ongoing issues.
Call for Understanding MLIR Dialects: A community member expressed a desire to learn about Mojo's underlying MLIR dialects, suggesting that such knowledge would enable more direct contributions to Mojo's standard library.
Patience for MLIR Syntax Documentation: In response to queries about MLIR details, it was pointed out that the syntax for using MLIR in Mojo is set to undergo significant changes and is not yet documented, except for a notebook provided in the link: Mojo with MLIR notebook.
Mojo Language Development in Progress: The Modular development team acknowledges the curiosity in the community regarding internal dialects and assures that more information will be made available over time, emphasizing that the current focus is on stabilizing MLIR syntax.
Reference Feature is Evolving: The Reference feature in the Mojo language was highlighted as being in an early and rapidly changing state, with expectations set for it to continue evolving and improving.

Links mentioned:

Modular (Mojo 🔥) ▷ #community-blogs-vids (1 messages):

Mojo Standard Library Goes Open Source: The Mojo standard library (stdlib) has been open-sourced, allowing the community to contribute to its codebase. A new blog post provides a guide on how to build the stdlib locally on macOS (and possibly Linux), detailing steps to make and test changes in the repository.

Link mentioned: Use locally built standard library in Mojo: Mojo standard library (stdlib) was open-sourced yesterday. It is exciting that the community can now contribute directly to the codebase. After spending some time with the stdlib repository, I want to...

Modular (Mojo 🔥) ▷ #🏎engine (4 messages):

Clarification on TensorSpec Usage: A member questioned the use of TensorSpec in the "Run Inference with Mojo" example, suggesting that documentation might need an update if it's not part of Mojo. The response clarified that TensorSpec is indeed part of the Mojo API, Python API, and C API, with minor differences set to be smoothed out soon.
PyTorch Example Requested: A member sought an example utilizing add_input_spec and TensorSpec objects for PyTorch in Mojo. It was acknowledged that such an example is currently missing and will be provided in the future.

HuggingFace ▷ #announcements (1 messages):

HyperGraph Paper by SauravMaheshkar: A new paper on HyperGraph Representation Learning has been published and discussed, providing insights into advanced data representations. The paper is accessible via Hugging Face.
Professional Tips for Coders: Hugging Face introduces the HF professional coder assistant using Hugging Chat, inspired by an open-source GPT and open for exploration at Hugging Face's Chat.
Visualizing LLM Leadership: Open LLM Leaderboard Viz has received an update featuring new functionalities such as data type filters and model details, showcased at Hugging Face Spaces.
OpenCerebrum Datasets Unveiled: Introducing OpenCerebrum, an open-source initiative aimed at replicating the proprietary Cerebrum dataset, now available on Hugging Face with approximately 21,000 examples. Detailed information can be found at the OpenCerebrum dataset page.
Launch of Fluently XL v3: Explore the newest version of Fluently XL, a space dedicated to unleashing creativity and language prowess, currently live on Hugging Face at Fluently Playground.

Links mentioned:

**tl;dr: do not depend on benchmark leaderboards…": no description found@xiaotianhan on Hugging Face: "🎉 🎉 🎉 Happy to share our recent work. We noticed that image resolution…": no description found@banghua on Hugging Face: "Have we really squeezed out the capacity of a compact chat model? Thrilled to…": no description found

HuggingFace ▷ #general (73 messages🔥🔥):

Diffusion Repositories Explored: A user inquired about the labmlai diffusion repository. No further details or links were provided about the inquiry or responses.
AutoTrain Model Demonstrated: A participant shared a public model they trained using AutoTrain, providing a direct link to the model and accompanying code snippet for its use.
Guidance for Data Science Students on Big Models: A member seeking to learn about big models in NLP and CV received advice to get involved in open-source software (OSS) projects to gain practical experience.
Requirements for Running LLMs on Colab: A user asked how to run grok-1 on Colab and learnt that Colab Pro is likely necessary due to the high resources required for such large models.
Hugging Face Onboarding: A newcomer to Hugging Face sought a zero-to-hero guide. They were referred to a guide titled "A Total Noob’s Introduction to Hugging Face Transformers" found on the Hugging Face blog.

Links mentioned:

HuggingFace ▷ #cool-finds (4 messages):

Quick on the Draw: A member expressed enthusiasm about the performance of a tool after personally testing it, highlighting quantization as an intriguing aspect in efficient knowledge representation.
Greeting Newcomers: Another member is looking forward to engaging with the large community here.

HuggingFace ▷ #i-made-this (26 messages🔥):

Introducing PAG for Diffusion Models: A new guidance technique called Perturbed-Attention Guidance (PAG) has been introduced for diffusion models, enhancing sample quality without requiring external conditions or additional training. The project is detailed on its project page and a demo is available on Hugging Face.
HyperGraph Datasets Now on the Hub: The Hub now contains preprocessed hypergraph datasets, previously used in a paper for HyperGraph Representation Learning, and they can be found in this collection. An open PR to PyTorch Geometric will allow for direct use within the PyG ecosystem.
Vision Transformer Model Fine-tuned for Medical Imaging: A fine-tuned Vision Transformer model targeting breast cancer image classification has been uploaded to Hugging Face. Model details and a demo are available on the uploader's Hugging Face space.
CFG vs. PAG Testing for Prompts: Users are experimenting with different settings of CFG (Classifier-Free Guidance) and PAG to optimize prompt-following while maintaining image quality. It’s suggested to start with CFG=4.5 and PAG=5.0 and adjust accordingly, and the demo will be updated to correct the output as per the feedback.
Seeking Suggestions for PII Detection: A user requests recommendations for models or approaches for a PII detection project, mentioning that they have already utilized Text Mining models and BERT.

Links mentioned:

HuggingFace ▷ #reading-group (3 messages):

Next Meet-Up Date Confirmed: A member inquired about the date of the next meeting within the reading-group. Another member provided the details, sharing a Discord invite link to the event.

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.

HuggingFace ▷ #computer-vision (16 messages🔥):

Roboflow Gaining Traction: A participant highlighted the growing popularity of Roboflow, a tool for computer vision tasks, and mentioned other domain-specific tools such as 3d Slicer and ITK-SNAP for medical images. For those looking to fine-tune models, they cautioned that apps like SAM might not be ideal.
SAM Fine-Tuning Troubles: One user experienced an issue while fine-tuning the SAM model, encountering a TypeError related to a 'multimask_output' argument. They shared a Towards AI blog post but didn't specify if the error was resolved.
YOLO Conversion Script Shared: In a discussion about converting YOLO models to safetensors, a member shared a script used in the conversion process, pointing to a Hugging Face discussion thread for further details. The script included functions for renaming layers to match expected format by safetensors.
Inquiry into Zero-Shot Image Classifier Fine-Tuning: A self-proclaimed beginner sought advice on fine-tuning a zero-shot image classifier with a custom dataset. While they were unsure of their system's capabilities, they confirmed using an NVIDIA GeForce GTX 1650 GPU.
Pix2Pix Prompt Testing for a Demo: A user was in search of a method to test instruct pix2pix edit prompts for a demo, expressing that while they had success using the prompts in the model's space, they lacked a gradio_client API for further testing. They were open to alternatives as long as they could generate edited images for their demo pipeline.

Links mentioned:

HuggingFace ▷ #NLP (21 messages🔥):

Battling the Infinite Generation: A member shared an issue with their custom LLM (based on decilm7b) persistently generating text without stopping. Fellow members suggested looking into Supervised Fine Tuning (SFT) and tuning generation behavior with repetition penalty or employing a StopCriterion in the configuration.
Making Sense of Summarization Errors: A problem was reported where the BART CNN summarization model began to output an error message during use. The issue highlights the potential pitfalls of model dependencies or API changes.
RAG Enthusiasts Assemble: One user sought advice on setting up a Retrieval Augmented Generation (RAG) system, considering to use faiss for the vectorDB and llama.cpp for model effectiveness with limited GPU resources. Suggestions directed the user towards resources including a video explaining RAG system evaluation and a simple installation process for ollama.
Seeking Research Collaborators: An interaction was spotted where one member reached out to another for a potential research project collaboration, illustrating the friendly and cooperative nature of the NLP community channel.
Discovering Assistant Models via Tokenizer: A newcomer inquired about identifying assistant models compatible with a main model's tokenizer (specifically for the model.generate function), proving that the quest for model compatibility prevails as a common concern in NLP applications.

Link mentioned: Evaluate Retrieval Augmented Generation (RAG) Systems: Retrieval Augmented Generation is a powerful framework which improves the quality of responses that you get from LLMs. But if you want to create RAG systems ...

HuggingFace ▷ #diffusion-discussions (4 messages):

Exploring Labmlai Diffusion: Inquiry about experiences with labmlai's diffusion repository was raised in the discussion.
Marigold Project Expands: The Marigold depth estimation pipeline has been contributed to the Hugging Face Diffusers, and the team is now working on integrating the LCM function with plans for more modalities.
Real-Time img2img Evolution: An optimisation allows img2img to run at 30fps at 800x800 resolution using sdxl-turbo, enabling captivating real-time visual transitions.
Possible img2img Bug Detected: An "off by 1 bug" in the img2img model is believed to cause images to drift to the right; a workaround involving manipulating image edges every few frames has been experimented with. Further investigation into conv2d padding and possible deterministic patterns in the jitter is planned.

LAION ▷ #general (108 messages🔥🔥):

Voice Actors' Data for AI Training: It was discussed that professional voice actors might not be willing to contribute their voice data to AI projects like 11labs, due to anti-AI sentiments, but non-professional voices might suffice for training on natural speech, especially when emotional range is not a priority.
Benchmarks Called Into Question: Multiple messages indicated that existing AI benchmarks may not be reliable indicators of model performance, with claims that benchmarking is currently very flawed. A particular chatbot arena benchmark on HuggingFace was suggested as a more sensible option due to human evaluation.
AI21's New Model Jamba Breaks Ground: AI21 Labs announced Jamba, a new state-of-the-art, hybrid SSM-Transformer LLM which combines the strengths of both architectures, allowing it to match or outpace cutting-edge models in benchmark performance.
Questions on Mamba vs. Flash Attention Advantages: A discussion was held about the advantages of Mamba, a memory-based architecture, over models using flash attention, with skepticism about Mamba's comparative real-world performance and costs.
Debate Over Audio Quality in AI-Generated Music: Some participants critiqued the audio quality of AI music generators like Suno, pointing out issues like noise artifacts and suggesting the need for better neural codecs. There were mixed opinions on whether the latest versions of these tools showed any improvement over earlier iterations.

Links mentioned:

LAION ▷ #research (31 messages🔥):

Misconceptions Cleared on DiT Models: According to a discussion, a YouTube video on DiT (Diffusion Transformers) fails to adequately explain the technology and misrepresents the role of attention in diffusion models. A member clarified that, unlike the video's implications, normal diffusion models also utilize attention mechanisms.
Seeking Clarity on Transformer Diffusion Models: Amidst confusion, there's a call for clarity regarding the implementation and function of transformers within diffusion models. A member highlighted the prevalent lack of understanding by stating that even the basics of U-nets and transformers are opaque to many people without a "science speak" breakdown.
Understanding U-nets at a High Level: There was an attempt to explain U-nets at a high level, describing them as a structure that encodes an image into a lower-dimensional space followed by upsampling. The explanation emphasized the model's ability to discard superfluous information during encoding to simplify subsequent predictive decoding.
Request for Resources on Aesthetics and Preference Ranking: One member announced their experimentation with RAIG that yielded promising image outputs, akin to the style of Midjourney, and they solicited for resources related to aesthetics ranking and human preference selection.

Links mentioned:

LlamaIndex ▷ #blog (5 messages):

Optimizing RAG with Int8 and Binary Embeddings: To tackle memory and cost challenges, @Cohere introduces the use of Int8 and Binary Embeddings for RAG pipelines. This can save both memory and money when working with large datasets.
LLMxLaw Hackathon Involvement: Mark your calendars for an intriguing event where @hexapode and @yi_ding will speak at the LLMxLaw Hackathon at Stanford on April 7th. Interested participants can get on the list to see the location and join the initiative to integrate LLMs in the legal sector.
Enhanced Data Representation for RAG: Take a look at @palsujit's blog post on improving RAG/LLM data representation by using semantic chunking combined with hierarchical clustering and indexing for better results.
Building Self-RAG from Scratch: Discover how to construct a dynamic Retrieval-Augmented Generation (RAG) model with built-in reflection, guided by Florian June's blog post featuring a two-step retrieval process triggered by a special token. The full explanation is available in the shared blog post.
LlamaParse for Enhanced RAG Queries: @seldo demonstrates in a quick video how LlamaParse, powered by LLM, can transform complex insurance policies into simple queries, significantly improving the quality of RAG queries against intricate documents. Watch the instructional video here.

Link mentioned: RSVP to LLM x Law Hackathon @Stanford #3 | Partiful: As artificial intelligence (AI) continues to revolutionize industries across the globe, the legal sector is no exception. LLMs, a foundation model capable of understanding and generating natural langu...

LlamaIndex ▷ #general (107 messages🔥🔥):

Understanding Parsing Complexity: A user inquired about the complexity of PDF parsing compared to Word documents. Another user suggested that while parsing tables and images can be challenging, libraries like LlamaParse and Unstructured could be useful.
Visualization of Embeddings with Qdrant: One user was curious about how to visualize embeddings with Qdrant. Another user indicated that Qdrant has a UI accessible at http://localhost:6333/dashboard, but the address may differ if not on localhost.
GenAI Apps Data Integration: An article from Fivetran was shared that demonstrates how data integration can simplify building GenAI applications and save engineering time, offering a direct link to the article.
Request for Streamlit Integration Examples with LlamaIndex: A new user inquired for examples of LlamaIndex with Streamlit, and a helpful response included a link to a related blog post on Streamlit.
Challenges with RAG Chatbot Retrieval Accuracy: A user working on a RAG chatbot faced issues with the retrieval of correct source documents. They were guided to consider metadata fields, prefix issues, and metadata queries, with a suggestion to delete and recreate the index if issues persist.

Links mentioned:

LlamaIndex ▷ #ai-discussion (2 messages):

Exploring Model Alignment in LLMs: A new blog post examines alignment methods like RLHF, DPO, and KTO for Mistral and Zephyr 7B models, highlighting the superiority of these methods over standard supervised fine-tuning. It suggests that such alignment techniques have greatly enhanced language generation tasks.
Mission to Curate LLM Research: The Shure-dev's mission focuses on providing a curated selection of high-quality papers on LLM (Large Language Models), aiming to serve as a pivotal resource for researchers keeping pace with the field's rapid advancements.

Links mentioned:

OpenInterpreter ▷ #general (59 messages🔥🔥):

Looking for O1 Light International Delivery Options: Members have discussed the possibility of pre-ordering the 01 light for delivery to countries outside the US. It seems viable to order through a US friend and then have it shipped internationally, as there is no requirement for US citizenship to activate and use the device.
Running Open Interpreter in Offline Mode: Users explored using Open Interpreter in offline mode to avoid API costs, leading to discussions of local LLM providers like Jan, Ollama, and LM Studio. Detailed steps for running with local models, including interpreter --model local --api_base http://localhost:1234/v1 --api_key dummykey, have been shared, including a documentation link and troubleshooting related to LM Studio usage on Windows.
Contribution Invites to OpenInterpreter Projects: There's an open call for contributions to the OpenInterpreter/aifs GitHub repository, with a focus on local semantic search enhancements.
AI in Industrial Design: A user reminisced about early concepts of voice-activated AI companions in industrial design, citing examples like Jerome Olivet's ALO concept phone, which features fully vocalized UX.
Using Open Interpreter with Debuggers in Coding IDEs: A conversation took place about debugging O1 projects using PyCharm and Visual Studio Code, highlighting the potential to review conversations between Open Interpreter and local model servers like LM Studio.

Links mentioned:

OpenInterpreter ▷ #O1 (54 messages🔥):

Open Source LLMs Lack OS Control: Discussions revealed that open source LLMs are not pre-tuned to control operating systems like Windows or macOS. A user expressed interest in fine-tuning Mistral with synthetic data for this purpose.
Shipping O1 Light Internationally: One user inquired about pre-ordering the O1 light and shipping it internationally via a friend in the US. It was confirmed that self-built O1s would operate anywhere.
01 Hardware Compatibility with Arduino: Users discussed the technical possibility of integrating 01 Light with Arduino. While an ESP32 was the default, there was interest in using it with other boards like Elegoo during the wait.
01 Software and Vision Capabilities: The conversation touched on 01 software's capability to handle vision tasks, mentioning that it can automatically switch to GPT-4-vision-preview when an image is added as a message.
Windows Installation for 01 Dev Environment: Users reported difficulties installing the 01 OS on Windows, with one member actively working to resolve these issues and planning to share their progress.

Links mentioned:

CUDA MODE ▷ #general (8 messages🔥):

Debating the Merits of Compiler Technology: An individual highlighted the potential advantages of using compiler technology like PyTorch/Triton to generate efficient GPU code over manually writing CUDA, despite having no prior experience with compilers.
Searching for Benchmark Suggestions: A user asked for recommendations on benchmarks to test with cutting-edge hardware they're gaining access to, indicating a current focus on flash attention and triton examples.
Sharing CUDA Knowledge Resources: A link to a lecture series was shared for understanding how compilers aid in optimizing memory bandwidth-bound kernels, but not so much for compute-bound kernels.
Discussing Distributed Data Parallel (DDP): A message mentioned results involving DDP, suggesting that disabled peer-to-peer (p2p) might specifically impact FSDP (Fully Sharded Data Parallel) training setups.
CUDA Development IDE Preferences: A question was raised about preferred IDEs for CUDA development, with a personal affinity for VSCode and some experimentation with CLion being mentioned.
CUDA Programming Cohort Announcement: A link was shared about Cohere's CUDA course, with an announcement of a community-led cohort titled Beginners in Research-Driven Studies (BIRDS), set to begin on April 5th. Cohere CUDA course announcement.

Link mentioned: Tweet from Cohere For AI (@CohereForAI): Our community-led Beginners in Research-Driven Studies (BIRDS) group is kicking off it’s first mini-cohort learning group focused on CUDA Programming for Beginners, beginning on Friday, April 5th 🎉

CUDA MODE ▷ #cuda (9 messages🔥):

CUDA Coursework Queries: A user inquired about CUDA online courses and received a recommendation to peruse the CUDA related news and materials on the GitHub - cuda-mode/resource-stream.
Classic CUDA Course for Beginners: Another course suggestion was the Udacity's Intro to Parallel Programming, which is also available as a playlist on YouTube.
Inquiry on Setting Up CUDA on MacBook: A user sought assistance for setting up CUDA with C++ on an older MacBook (big or sur early 2015).
Potential Solution for CUDA on Mac: The suggestion given was to run a virtual machine (VM) with Linux to use CUDA on a MacBook.

Links mentioned:

CUDA MODE ▷ #beginner (8 messages🔥):

Ease into CUDA Without the Hassle: A chat member expressed a desire for a guide to using a 3090 GPU on a Windows gaming machine without having to install Ubuntu. They are contemplating whether to set up a dual boot as a simpler solution.
WSL to the Rescue: Another suggested using Windows Subsystem for Linux (WSL) for CUDA/Torch as it works well and there is no need to replace the Windows installation. They provided a guide to install Linux on Windows with WSL.
WSL Version Matters: It was clarified that when using WSL for CUDA, one must ensure the system is set to WSL2, because WSL1 is not sufficient for the task.

Link mentioned: Install WSL: Install Windows Subsystem for Linux with the command, wsl --install. Use a Bash terminal on your Windows machine run by your preferred Linux distribution - Ubuntu, Debian, SUSE, Kali, Fedora, Pengwin,...

CUDA MODE ▷ #ring-attention (57 messages🔥🔥):

Fine-tuning Experiments Stop Due to Scope: Fine-tuning using FSDP+QLoRA on meta-llama/Llama-2-13b-chat-hf was halted as it was not aligning with the intended experiments related to ring-attention. The repo did have Flash-Attention-2 but incorporating Ring-Attention into it was uncertain.
Troubleshooting Loss Issues with Ring-Attention:
- Debugging was performed on the loss issue in the modeling_llama.py for LlamaForCausalLM using CrossEntropyLoss.
- The issue was traced to incorrect broadcast handling of the labels in multi-GPU environments and was subsequently patched.
Exploration of Long Context Training: Rigorous testing showcased successful training of the tinyllama model with context lengths from 32k to 100k on A40 GPUs, each having 48GB VRAM. It was observed that llama-2 7B could run on 2x A100, utilizing 54GB with 4k sequence length.
VRAM Requirements Discussed for Llama 7B Training: A Reddit link discussing the VRAM requirements for training models like Llama 7B with QLoRA and LoRA was shared, highlighting the necessity of ample VRAM for large model training.
Dataset Search for Long-Context Models: Members canvassed for long-context datasets, suggesting resources like the booksum dataset from Hugging Face’s Long-Data-Collections and others for potentially fine-tuning models suitable for lengthy text inputs.

Links mentioned:

CUDA MODE ▷ #off-topic (5 messages):

Navigating Zhihu for Triton Tutorials: A member sought assistance in Mandarin to log in to Zhihu, which boasts excellent Triton tutorials. The member managed to create an account on iOS but needed help locating the app's scan feature for login verification.
Scan Button Found: Another member provided a solution for locating the "Scan" button within the Zhihu app, allowing for successful login.
Triton Content Galore: After successfully logging in, the member confirmed finding a wealth of Triton-related content on Zhihu.
Wish for Chinese Search Glossary: A mention was made regarding the difficulty in searching due to language barriers, noting that a Chinese glossary of search terms would be helpful.

Link mentioned: no title found: no description found

CUDA MODE ▷ #triton-puzzles (24 messages🔥):

Sync Issue in Triton-Viz Identified: A contributor identified an error caused by an update in triton-viz, particularly due to changes in a recent pull request, which led to compatibility issues with outdated versions of triton installed in the notebook.
Workaround for Triton-Viz Update: Members shared solutions to overcome the error by installing an older version of triton-viz using the command pip install git+https://github.com/Deep-Learning-Profiling-Tools/triton-viz@fb92a98952a1e8c0e6b18d19423471dcf76f4b36, and for some the workaround was effective.
Persistent Installation Errors: Despite applying the suggested fixes and restarting the runtime, some users continued to face import errors with triton-viz. Suggestions to restart the runtime and re-execute the installation command were given.
Official Fix and Installation Procedure: Srush1301, presumably a maintainer, provided an official fix with a detailed installation procedure that works with the latest triton. This should resolve the issues discussed, as confirmed by a user who mentioned it worked fine.
Clarity on Triton Usage on Windows: A member asked if Triton can be used on Windows, to which another replied Triton itself is not Windows-compatible but Triton-based puzzles can be run on Colab or other non-GPU hosts.

Link mentioned: [TRITON] Sync with triton upstream by Jokeren · Pull Request #19 · Deep-Learning-Profiling-Tools/triton-viz: no description found

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Peek Into Model Popularity: OpenRouter introduced App Rankings for Models, displayed on a new Apps tab on model pages. The tab ranks apps based on their usage of a specific model, showing top public apps and tokens processed, as seen for Claude 3 Opus.
Benchmarking Claude 3 Opus: Claude 3 Opus is hailed as Anthropic's most capable model, excelling in complex tasks with high-level intelligence and fluency, and the benchmark results are available in the launch announcement.
Community Projects On Spotlight: Notable community projects include a Discord bot, Sora, that leverages the Open Router API for enhanced server conversations, and a platform, nonfinito.xyz, allowing users to create and share model evaluations.
New API and Improved Clients: There's a simpler /api/v1/completions API mirroring the chat API functionality with a prompt parameter, alongside improved OpenAI API client support, and OpenRouter has stopped using Groq for Nitro models due to excessive rate limiting.
Optimized Crypto Payments: OpenRouter is improving cryptocurrency payments by reducing gas costs, utilizing Base chain, an Ethereum L2, to make transactions more affordable for users.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #general (107 messages🔥🔥):

Image Support Clarification: Users discussed whether image input is supported, with a clear statement that for models accepting image input, "you just upload it."
No Prefill for Claude in OpenRouter: There was confusion about Claude in OpenRouter supporting a prefill feature, but it was clarified that there is no prefill in Sillytavern, and one must create a prompt with absolute depth 0, which acts similarly to a prefill.
Databricks DBRX and Gemini Pro Excitement: The release of the new model from Databricks, DBRX, sparked interest, and positive sentiments were shared about Gemini Pro 1.5's performance, although there were reports of error 429 (rate limit) and recommendations to try after 00:00 UTC.
Challenges with Model Availability and Error Codes: Multiple users reported issues with models being down or encountering error codes such as error 502 for Claude 2.1 and error 524 related to Cloudflare CDN timeouts. The chat discussed potential fixes and the possibility of setting up failover models.
Discussion on Self-Moderated ClaudeAI and Price Comparisons: There was a discussion about the benefits of using ClaudeAI through OpenRouter, especially for roleplay and sensitive content with less false positive detection. OpenRouter's API standardizes access and pricing was confirmed to be at cost, the same as directly using the official ClaudeAI API.

Links mentioned:

AI21 Labs (Jamba) ▷ #announcements (1 messages):

AI21 Labs Unveils Jamba: AI21 Labs proudly announces Jamba, a state-of-the-art model with a novel architecture that merges Mamba and elements of the Transformer. It's the first open-sourced model from AI21, with key features such as a 256K context window and the ability to fit up to 140K context on a single GPU.
Jamba Elevates Long Context Performance: Jamba delivers an unprecedented 3X throughput for long context use cases like question/answering and summarization, addressing critical enterprise challenges and setting a new standard in GenAI efficiency.
Open Access to Next-Gen AI: This groundbreaking model is now available with open weights under the Apache 2.0 license, and users can access it on Hugging Face as well as the upcoming inclusion in the NVIDIA API catalog.
Harness GenAI with NVIDIA NIM: The message details an expansive offering of production-ready APIs that run anywhere with NVIDIA NIM, touching on models, integrations, run anywhere, how to buy, use cases, ecosystem, resources, and docs. However, the sections beyond "Harness Generative" are incompletely rendered and thus specifics cannot be provided.
Industry Buzz Around Jamba: A TechCrunch Exclusive discusses the movement towards generative AI models with longer contexts and highlights that AI21 Labs' Jamba model competes with models like OpenAI's ChatGPT and Google's Gemini, by offering large context windows that are less compute-intensive.

Links mentioned:

AI21 Labs (Jamba) ▷ #jamba (40 messages🔥):

Jamba Soon on AI21 Labs Platform: AI21 Labs' staff member confirmed that Jamba is currently available via Hugging Face, and an aligned version will soon be available through their SaaS platform.
Multilingual Capabilities Shine: Users have confirmed with AI21 staff that Jamba has been trained on multiple languages, including German, Spanish, and Portuguese, although Korean performance may not be as strong.
Awaiting Jamba's Technical Deep Dive: AI21 plans to release a white paper with details about Jamba's training, including the number of tokens, languages, and hyperparameters.
No Hugging Face Space Planned for Jamba: An AI21 staff member indicated that there are no current plans for a Hugging Face Space where Jamba could be tested.
Open-sourced Jamba Stirring Excitement: The Jamba model's open-sourcing generated a lot of excitement among users, leading to questions about comparisons with other models like Mamba, potential for smaller versions, batch inference engines, and performance on coding tasks.

Links mentioned:

AI21 Labs (Jamba) ▷ #general-chat (56 messages🔥🔥):

Minimum Pricing Policy Vanishes: AI21 Labs has removed the $29/mo minimum charge for their models. However, the pricing page tooltip still shows the minimum fee, leading to a brief confusion.
Fine-Tuning on the Horizon: There's an ongoing consideration about reintroducing fine-tuning at AI21 Labs, but no firm date has been set for its comeback.
Exploring Jamba's Efficiency: A discussion took place regarding how Jamba achieves high throughput despite having transformer blocks that scale with sequence length squared. Mamba and Mixture-of-Experts (MoE) layers contribute to a different, sub-quadratic scaling behavior, making Jamba faster.
Quantization Clarification: Members debated whether Jamba models could be quantized for reduced memory use on GPUs. Links to a Hugging Face quantization guide and comments on creating quantized models even with mid-range hardware were shared.
Transformation Confusion: There was a clarification that the MoE consists of Mamba blocks, not transformers, and that within Jamba, there's a ratio of one transformer per Jamba and seven other types of layers.

Link mentioned: no title found: no description found

tinygrad (George Hotz) ▷ #general (56 messages🔥🔥):

Conda Compiler-Induced Madness: A member expressed frustration while trying to debug a Conda metal compiler bug, noting the ordeal was driving them insane enough to require breaks for making memes.
Tinygrad and Virtual Memory: A discussion was raised about the importance of virtual memory in the tinygrads software stack, with a member considering whether to support virtual memory in their cache/MMU project.
AMD Access Via Runpod: It was mentioned that AMD now appears on Runpod, but users are required to schedule a meeting to gain access, with a hint of sarcasm regarding AMD's perception of customer complaints as PR issues rather than technical ones.
Intel’s Favorable Customer Service Compare to AMD: Members compared Intel's helpfulness and better price-performance of their GPUs to AMD's buggy software, while discussing fluctuating market availability and regional pricing differences.
Seeking Optimizations for Intel Arc: A member described inefficiencies with the transformers/dot product attention implementation for Intel Arc in the IPEX library and suggested that significant performance improvements could be gained with simple optimizations, based on their experience with enhancing stable diffusion performance.

tinygrad (George Hotz) ▷ #learn-tinygrad (39 messages🔥):

FPGA Acceleration Inquiry: A member inquired about porting tinygrad to custom hardware accelerators like FPGA, and was directed to the tinygrad documentation for adding new accelerators.
Deep Dive into Dimension Merging: Merging dimensions in tinygrad is explored with a need for further insights on multiple views interaction, specifically why certain operations create multiple views.
Exploring In-Place Operations and Their Errors: A member questioned potential in-place modifications during evaluation in tinygrad, supported by an "AssertionError" they encountered, and discussed the use of += in Python.
Tensor View Simplification Effort: There was significant engagement in simplifying tensor views within tinygrad, leading to discussions about the necessity and accuracy of retaining multiple views, symbolic representations, and a proposed change to address this issue.
Combining std+mean into One Kernel: Efforts to simplify the standard deviation and mean calculations into a single kernel were mentioned, with a provided write-up on kernel fusion to facilitate solving the associated bounty challenge.

Links mentioned:

Latent Space ▷ #ai-general-chat (93 messages🔥🔥):

OpenAI Prepaid Model Insights: An explanation for OpenAI's prepaid model surfaced through an active discussion featuring a response from LoganK highlighting fraud prevention and a pathway to higher rate limits as key drivers. Concerns about opaque pricing due to unspent credits was paralleled with Starbucks' revenue from unused balances.
AI21 Unveils Transformer Mamba Hybrid "Jamba": AI21 announced a new model called "Jamba," which combines Structured State Space and Transformer architectures. The model was shared on Hugging Face and discussed for its notable MoE structure, though it was pointed out that there was no easy way to test the model at the time of the announcement.
Grok-1.5 Model's Impressive Reasoning Abilities: xAI introduced Grok-1.5, boasting significant performance enhancements over its predecessor, especially in coding and math-related tasks, with notable scores on prominent benchmarks. Users eagerly awaited access to Grok-1.5 on the xAI platform, with discussions emphasizing Grok's capabilities in the official blog post.
Discussion on 1-bit (Ternary) LLMs: A debate about the accuracy of referring to three-valued (-1, 0, 1) models as "1-bit LLMs" led to discussions around BitNet's influence and training methodology for such models. The conversation included a reference to the BitNet paper and clarification that these models are trained from scratch, not quantized.
Major Moves in AI Leadership: Emad Mostaque's departure from Stability AI incited discussion about the causes and implications, illuminated by an interview on Peter Diamandis's YouTube channel and a detailed backstory on archive.is. The shifting leadership underscores emergent dynamics and potential instability within the generative AI sector.

Links mentioned:

OpenAccess AI Collective (axolotl) ▷ #general (45 messages🔥):

Moe Moe or Noe Noe?: Conflicted first impressions of Qwen MOE's performance sparked a brief discussion, with one member initially enthusiastic only to later describe it as "kind of ass."
Jamba's Juicy Details Revealed: AI21 Labs announced their new model, Jamba, standing out with a 256k token context window and MoE architecture with 12b parameters active. Their release includes a blog post and an apparent training time that seems remarkably brief, with knowledge cutoff on March 5, 2024.
High Cost for High Performance: A statement was made highlighting GB200-based server prices as ranging between US$2-$3 million each, leading to discussions on alternative hardware choices due to the hefty price tags.
GPT4's Beast Mode: A comparison was drawn to GPT-4's 32k version, which seems to be recognized as a substantial upgrade, potentially altering the performance dynamics in AI applications.
Datasets for Ring-Attention: In pursuit of suitable long-context datasets, a member shares one from Hugging Face's collections and another provides an alternate link, MLDR dataset on Hugging Face, both geared for long sequence training needs.

Links mentioned:

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (16 messages🔥):

Request for Assistance with LISA and Jamba: A member is seeking help for integrating LISA and testing the Jamba (mamba moe) to see if it works out-of-the-box.
TorchTune Optimizations Revealed: After a conversation with the torchTune team, a participant shares that using bf16 for everything, including the optimizer, leads to dramatic memory usage reductions and matches fp32 or mixed precision training in terms of stability.
bf16 Optimization Real-World Applications: The team successfully applied bf16 to SGD and plans to add support for Adam optimizer soon. This methodology is highlighted for its significant memory efficiency.
PagedAdamW vs. 8bit Adam: A member shares results from testing PagedAdamW, which results in more memory savings as compared to 8bit Adam. The difference is substantial, cutting down peak memory usage by nearly half.
Training Control Differences Affect Memory Savings: Another member points out axolotl does not offer the same level of training control as torchTune, which may affect memory saving outcomes. The conversation suggests that memory savings are attributed to a combination of PagedAdamW and optimizing the backward pass.

Link mentioned: torchtune/recipes/configs/llama2/7B_full_single_device_low_memory.yaml at main · pytorch/torchtune: A Native-PyTorch Library for LLM Fine-tuning. Contribute to pytorch/torchtune development by creating an account on GitHub.

OpenAccess AI Collective (axolotl) ▷ #general-help (5 messages):

Quest for Fine-Tuning Knowledge: A member inquired about the best resources to learn how to fine-tune/train open source models, expressing a desire to understand the foundations before utilizing axolotl for their tasks.
Choosing the Right Model for Text Classification: Another member questioned which base model would be optimal for fine-tuning on a text classification task with around 1000 data points, using only a T4 GPU and specifying the need for English language support. The member asked if qwen with Lora/qlora was a suitable choice given their GPU constraints.
Mistral Recommended for English Text Classification: A member recommended using Mistral for English text tasks, suggesting that qlora with a batch size of 1 could be adequate for small VRAM GPUs like the mentioned 4070ti with 12GB VRAM.
No Experience with Qwen: The same member who recommended Mistral noted they had no experience working with qwen.

OpenAccess AI Collective (axolotl) ▷ #community-showcase (25 messages🔥):

Suikamelon shares conversation modeling strategy: Detailed a formatting method using special block characters ▀ and ▄ as start and end tokens for model prompts, allowing for seamless multi-character dialogue generation without distinguishing between "user" or "model".
Jaredquek discusses dataset structure for nuanced conversation training: Revealed unique dataset construction mixing original text with conversations in chatml form, which includes repetitions to boost information accuracy without overfitting, even proving effective on smaller models like Hermes 7B.
Repetition in Training Data questioned and defended: While some were unsure about the benefits of verbatim repetition in training data, others supported the technique, linking a research paper on data ordering to argue in favor of the practice.
Skepticism about bland repetition techniques: Jaredquek and suikamelon engaged in a conversation debating the effectiveness of bland repetition, where data is cloned multiple times to focus on the same topic, versus varied repetition.
New approaches and ongoing experiments: Jaredquek plans to test full fine-tuning methods on larger models like Galore, after achieving satisfactory results with current techniques, despite facing some out-of-memory challenges.

Link mentioned: In-Context Pretraining: Language Modeling Beyond Document Boundaries: Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to do...

LangChain AI ▷ #announcements (1 messages):

New Discussion Channel for OpenGPTs: A new Discord channel, <#1222928565117517985>, has been created specifically for discussions related to the OpenGPTs GitHub repository. This platform offers a space for community contributions and collaboration on the project.

Link mentioned: GitHub - langchain-ai/opengpts: Contribute to langchain-ai/opengpts development by creating an account on GitHub.

LangChain AI ▷ #general (57 messages🔥🔥):

SQL AI Chatbot Memory Challenge: A member is developing an SQL AI Chatbot but is facing difficulties as the chatbot doesn't remember previous messages and responses. They seek guidance on implementing history-aware responses.
Natural Language Query for Product Recommendations: Discussion about building a bot to recommend products based on natural language queries like "planning to own a pet" using either a vector database for semantic search or extracting attributes and using an SQL agent.
Storing Vectored Data for RAG Applications: A conversation about how to store vectored data of documents when building a Retrieval-Augmented Generation (RAG) AI application using a PostgreSQL database, and handling document differentiation on a user basis.
How to Use on_agent_finish in StreamingStdOutCallbackHandler: Dialog around the correct usage of on_agent_finish in StreamingStdOutCallbackHandler within the LangChain framework, with members seeking and providing clarification, even though not always successfully.
Pythia: The Oracle of AI Hallucination Detection: An announcement about Pythia, a Proof of Concept application for AI hallucination detection, and a request for guidance on integrating it into the langchain ecosystem.
Langchain Skepticism, Seeking Rebuttals: A member called for a point-by-point rebuttal of the "Don't use LangChain" sentiment that is spreading as fear, uncertainty, and doubt (FUD) on the internet, suggesting it could be used to improve the roadmap of LangChain.

Links mentioned:

LangChain AI ▷ #share-your-work (7 messages):

OpenGPTs Custom Food Ordering Hack: A member has demonstrated an innovative integration of custom food ordering API with OpenGPTs, showcasing its adaptability as a platform. They encourage feedback and shared a YouTube tutorial on how to "Hack OpenGPT to Automate Anything".
Automate Your Code Review Process: An announcement about a builder that creates AI pipelines to automate code validation, security, and eliminate tedious tasks was shared, along with a link to the tool and an accompanying explanatory YouTube video.
Bringing Company Data to GenAI Apps: A recent blog post was shared, discussing how Fivetran assists in integrating company data with RAG apps, along with a survey link for more information.
Galaxy AI Unlocks Access to Premier AI Models: GalaxyAI is offering a free API service that provides access to high-end AI models such as GPT-4 and Gemini-PRO, with OpenAI format compatibility for easy project integration. Interested users can Try Now.
Dive into Model Alignment in LLMs: A blog post examining model alignment, focusing on RLHF, DPO, and KTO methods and their practical application on Mistral and Zephyr 7B models was highlighted with a link to the full article.

Links mentioned:

LangChain AI ▷ #tutorials (2 messages):

Data Integration Meets AI: The article shared discusses integrating company data with RAG app using Fivetran to efficiently build General AI applications, reducing engineering work on data management. For more personalized information, there's an invitation to reach out via a SurveyMonkey link.
Custom API Merges with OpenGPTs: A community member showcased the adaptability of OpenGPTs by integrating a custom food ordering API, demonstrating the platform's potential beyond basic usage. The shared YouTube video provides a visual demo with the member inviting feedback on this innovative application.

Link mentioned: Hack OpenGPT to Automate Anything: Welcome to the future of custom AI applications! This demo showcases the incredible flexibility and power of OpenGPTs, an open source project by LangChain. W...

Interconnects (Nathan Lambert) ▷ #news (28 messages🔥):

Introducing Jamba: AI21 Studios announces Jamba, a hybrid of Mamba's Structured State Space model and the traditional Transformer architecture, boasting a 256K context window and remarkable throughput and efficiency. The model outperforms or matches others in its class and is released with open weights under the Apache 2.0 license.
Jamba's Technical Specs Revealed: Discussion in the channel points out the specifications of Jamba, highlighting it as a Mixture of Experts (MoE), featuring 12B parameters for inference purposes, with a total of 52B, and activating 16 experts, with only 2 active per token.
Some Confusion Around Model Identity: A member points out confusion regarding the actual architecture of Jamba, questioning if it could be more accurately described as a larger "striped hyena" rather than Mamba, given its hybrid nature and inclusion of attention mechanisms.
Qwen MoE Model Competency Discussed: Another hybrid model Qwen1.5-MoE-A2.7B is brought into the discussion, which supposedly matches 7B models' performance with only 2.7B activated parameters, indicating significant efficiency in training expense. The model information and resources can be found in various places, with links provided including a GitHub repo and Hugging Face space.
Scaling Paper for Hybrid Architectures Shared: A member shares a scaling paper that conducts extensive research on beyond Transformer architectures. It is noted that 'striped architectures' perform better due to the specialization of layer types.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #random (25 messages🔥):

Microsoft GenAI Snags New Senior Researcher: Liliang Ren announces his move to Microsoft GenAI as a Senior Researcher to focus on creating neural architectures that are efficient and scalable, using small language models with less than 100B parameters.
Databricks Inherits Megablocks: The creator of Megablocks has handed over the project to Databricks, as announced on Twitter. This move is expected to provide a stable long-term home for the development and use of Megablocks.
Debating the Scale of "Small" in AI Language Models: Discussion about what constitutes a "small language model" suggests that under 100B parameters might be the threshold, though the term "small" is seen as a bit of a misnomer.
Prospects of AI Open Data from OMB Guidance: The new OMB guidance on AI is being sifted through, with speculation about potential interesting open datasets that might emerge from the government.
Elon's Attention and AI Releases: Commentary suggests that AI releases, such as those from OpenAI, might be timed for periods when Elon Musk is likely to be paying attention.

Links mentioned:

DiscoResearch ▷ #general (5 messages):

Clarification on Model Fine-Tuning: A member clarified their previous statement, mentioning they were referring to the fine-tuning and training of open source models, not other topics.
LLM Course on Github: A link to a GitHub repository was shared, offering a course with roadmaps and Colab notebooks to help individuals get into Large Language Models.
Introduction of Jamba, an Open Model: AI21 Labs has introduced Jamba, a Mamba-based model combining Structured State Space models (SSM) with Transformer architecture for high quality and performance. You can try Jamba on Hugging Face.
Inquiry about Model Training Data: A member requested more information regarding the training data for a certain model, seeking a direct link for detailed insights.
Model's Language Capabilities Discussed: It was mentioned that the model in question is reportedly trained on English data, according to its model card or affiliated blog post, but noted that the specifics of what "English data" entails weren't clearly defined.

Links mentioned:

DiscoResearch ▷ #discolm_german (3 messages):

Token Insertion Troubleshooting: A member noted encountering unexpected token insertions like <dummy00002>|im_end|>, which may be attributed to quantization or the engine used. The issue was resolved by supplying an added_tokens.json file with specified tokens.
Translation Model Showdown: There is interest in a comparison of translation outputs from models like DiscoLM, Occiglot, Mixtral, GPT-4, DeepL, and Azure Translate. The proposed idea involves translating the first 100 lines of a dataset like Capybara through each service.