Frozen AI News archive

Clémentine Fourrier on LLM evals

**Clémentine Fourrier** from **Huggingface** presented at **ICLR** about **GAIA** with **Meta** and shared insights on **LLM evaluation** methods. The blog outlines three main evaluation approaches: **Automated Benchmarking** using sample inputs/outputs and metrics, **Human Judges** involving grading and ranking with methods like **Vibe-checks**, **Arena**, and **systematic annotations**, and **Models as Judges** using generalist or specialist models with noted biases. Challenges include data contamination, subjectivity, and bias in scoring. These evaluations help prevent regressions, rank models, and track progress in the field.

Canonical issue URL

AI News for 5/22/2024-5/23/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (380 channels, and 5410 messages) for you. Estimated reading time saved (at 200wpm): 551 minutes.


Special addendum for the AI Engineer World's Fair callout yesterday - Scholarships are available for those who cannot afford full tickets!. More speaker announcements are rolling out.

Many people know Huggingface's Open LLM Leaderboard, but you rarely hear from the people behind it. Clémentine Fourrier made a rare appearance at ICLR (to co-present GAIA with Meta, something we cover on the upcoming ICLR pod) and is now back with a blog covering how she thinks about LLM Evals.

image.png

This is not going to be groundbreaking for those very close to the problem, but is a good and accessible "state of the art" summary from one of the most credible people in the field.

Our TL;DR: There are 3 main ways to do evals:

Evals are used to prevent regressions, and to rank models, and to serve as a proxy for progress in the field.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

NVIDIA Earnings and Stock Performance

Mistral AI Model Updates

Meta's Llama and Commitment to Open Source

Anthropic's Constitutional AI

Google's AI Announcements and Issues

Open Source Debates and Developments

AI Safety and Regulation Discussions

Emerging AI Architectures and Techniques

AI Benchmarking and Evaluation

Emerging Applications and Frameworks

Compute Trends and Developments

AI-Generated Voices and Identities

Miscellaneous AI News and Discussions


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Model Releases and Updates

AI Capabilities and Limitations

AI Ethics and Safety

AI Applications and Use Cases

Stable Diffusion and Image Generation


AI Discord Recap

A summary of Summaries of Summaries

1. Model Performance Optimization and New Releases:

2. Fine-Tuning Strategies and Challenges:

3. Open-Source AI Innovations and Collaborations:

4. AI API Integrations and Community Efforts:

5. GPU Optimization and Technical Workshops:


{% if medium == 'web' %}

PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


Perplexity AI Discord

Scheduled Downtime for Database Boost: A scheduled downtime has been announced, set to commence at 12:00am EST and last approximately 30 minutes to upgrade the database improving performance and user experience.

Engineering Excitement Over Free Gemini: Engineering conversations revolved around the free usage of Gemini in AI Studio for high-volume tasks like fine-tuning, spurring discussions on data privacy and cost-saving strategies.

Perplexity Powers Past Performance Hurdles: Notable improvements in Perplexity's web scraping have yielded speeds of 1.52s, significantly surpassing previous performances over 7s, while discussions highlighted the importance of parallel processing and efficient tooling in AI applications.

Comparative AI Discourse: Technically-inclined users compared Perplexity with Gemini Pro and ChatGPT, lauding Perplexity's research and writing capabilities and flexible file management, with suggestions to include additional features like CSV support to reach new heights of utility.

API Anomalies and Alternatives Analysis: Community members discussed discrepancies in outputs between web and API versions of the same models, seeking clarifications on the observed inconsistencies, while also sharing their experiences in balancing model accuracy and utilization within API rate limits for platforms like Haiku, Cohere, and GPT-4-free.


LLM Finetuning (Hamel + Dan) Discord

Instruction Finetuning with ColBERT and Task Updates: Engineers discussed finetuning strategies for instruction embeddings, citing frameworks like INSTRUCTOR and TART as references. A project proposal for automating standup transcript ticket updates involved using examples of standup conversions correlated with ticket actions.

CUDA Woes and Workarounds: Persistent CUDA errors while running LLM models like llama 3 8b were a common issue, with remedies including adjusting batch sizes and monitoring GPU usage via nvidia-smi. Docker was recommended for managing CUDA library compatibility, with a link to a Docker image from Docker Hub provided.

Parameters and Efficient Model Training: Queries emerged about default Axolotl's configuration parameters and optimization strategies for training on A100 and H100 GPUs, where using bf16 and maximizing VRAM usage were among the suggested strategies. Discussions also extended to newer optimizers like Sophia and Adam_LoMo.

Accelerating Free Credits and Workshop Excitement: Modal's fast credit allocation was commended, and excitement built around a GPU Optimization Workshop featuring representatives from OpenAI, NVIDIA, Meta, and Voltron Data. Additionally, there was anticipation for a recording of an upcoming talk by Kyle Corbitt.

Model Fine-Tuning and Training Factors: Fine-tuning LLMs to generate layouts, troubleshooting Axolotl's dataset paths, and considering LoRA hyperparameters were topics of interest. The use of GPT-4 as a judge for level 2 model evaluations and troubleshooting Axolotl on Modal due to gated model access issues were also discussed.

Deployment Dilemmas: Engineers encountered challenges when deploying trained models to S3 on Modal, with solutions including using the modal volume get command and mounting an S3 bucket as a volume, as described in Modal's documentation.

Paper and Tutorial References: The community shared valuable learning resources, such as a YouTube demo on EDA assistant chatbots. They also appreciated illustrative examples from Hamel and Jeremy Howard, with references to both a tweet and a GitHub repo.


HuggingFace Discord


Nous Research AI Discord

Flash Attention Needed for YaRN: Efforts to implement flash attention into the YaRN model are meeting challenges, with some progress but not a perfect fit yet.

Rust Rising Among AI Enthusiasts: Increasing interest and discussions around using Rust for machine learning, with members sharing resources like Rust-CUDA GitHub and rustml - Rust, while recognizing the dominance of Python in AI.

Nous Research Expanding Teams: Nous Research is on the hunt for new talent, as evidenced by their recent hiring announcement and a call to apply via their Google Form.

Python vs Rust in AI Careers: A robust debate over Python's primacy in AI careers with members bringing up alternatives like Rust or Go, alongside sharing insights from AI experts like Yann LeCun's views on focusing beyond LLMs for next-gen AI systems.

RAG's Validity in Question: Proposals made to enhance RAG's model context, emphasizing the need for context accuracy by referencing a debate over the reliability of Google's AI drawing conclusions from outdated sources.


Stability.ai (Stable Diffusion) Discord


LM Studio Discord

LLama Lamentations & Local Model Logistics: There's unrest over Llama 3's 8k context performance, with members revealing it falls short of expectations. Despite being the topic of debate, suggestions for improving its performance, such as introducing longer contexts up to 1M, remain theoretical.

Discussions Turn to Vision Models: OCR discussions saw mixed reviews of vision models like LLaVA 1.6 as users recommend Tesseract for reliable text extraction. Interest in Vision Language Models (VLMs) is evident, but deploying them effectively with web server APIs requires attentive configuration, including apikey incorporation.

Multimodal Mishaps and Merits: Idefics 2.0 multimodal’s compatibility sparked interest, yet it seems to trip on existing infrastructure like llama.cpp. Meanwhile, Mistral-7B-Instruct v0.3 emerges as part of the dialogue, boasting extended vocabulary and improved functional calling (Model Card). In parallel, Cohere's Aya 23 showcases its talents in 23 languages, promising to sway future conversations (Aya 23 on Huggingface).

GPU Grows but Guides Needed: The adoption of 7900xt graphics cards is underway among members seeking to amp up their tech game. However, guidance for effective environment setups, such as treating an RX 6600 card as gfx1030 on Fedora, remains a precious commodity.

Storage Solved, Support Summoned: One member's move to allocate an M.2 SSD exclusively for LM Studio paints a picture of the ongoing hardware adaptations. On the flip side, GPU compatibility queries like dual graphics card support highlight the community's reliance on shared wisdom.


Modular (Mojo 🔥) Discord


Eleuther Discord

Pythia's Pocketbook: Discussing the cost of training models like Pythia, Stellaathena estimated a bill of $250k for the largest model, mentioning efficiency and discounted GPU-hour pricing in calculations.

Cost-Efficiency Report Needs Reviewers: A forthcoming report on frontier model training costs seeks peer review; interested parties would assess GPU-hours and the influence of GPU types like A100 40GB.

LeanAttention Edging Out FlashAttention?: A recently shared paper introduces LeanAttention, which might outperform FlashAttention, raising debates on its innovation. The community also joked about unorthodox practices to improve model benchmarks, playfully noting, "The secret ingredient is crime."

Interpretability's New Frontiers: A new paper was noted for opening research doors in interpretability, kindling curiosity on its implications for future studies.

Evaluating Large Models: Tech tips were exchanged, such as running the lm eval harness on multi-node SLURM clusters and how to set parameters like num_fewshot for evaluations with challenges reported around reproducibility and internet access on compute nodes.


OpenAI Discord


CUDA MODE Discord

Full House at the GPU Optimization Workshop: The GPU optimization workshop raked in excellent engagement with over 2400+ registrants and valuable sessions from experts including Sharan Chetlur (NVIDIA), Phil Tillet (OpenAI), and William Malpica (Voltron Data). Enthusiasts can RSVP for future interactions here, with additional resources available on GitHub.

Breaching CUDA Confusion: A member clarified that __global__ CUDA functions can't be simultaneously __host__ due to their grid launch setup, and they posited the theoretical utility of a __global__ function agnostic of threadIdx and blockIdx.

Tricky Transformation with Triton: One user discussed performance drops when converting a kernel from FP32 to FP6 using triton+compile, speculating on the potential impact of inplace operators.

AI Research Synopsis Spices Up Discussions: A weekly AI research spotlight surfaced, featuring analysis on works like KAN, xLSTM, and OpenAI's GPT-4. The discussion extended to the computationally intensive nature of KANs owing to activation-based edge computation.

The CUDA Cul-de-Sac and Vulkan Ventures: Conversations veered into contributions and coding concerns, including a member's flash-attention repository stalling, GPU model benchmarks like 7900xtx versus 3090, and Vulkan's failure to impress in a heat transfer simulation.

LLM.C Lurches Forward: There was a bustling (busy) exchange about llm.c with members celebrating the integration of HellaSwag evaluation in C, debating CUDA stream optimization for speed, and sharing the challenge of scaling batch sizes without training disruptions.

Please note, some quotes and project links have been shared verbatim as no additional context was provided.


OpenAccess AI Collective (axolotl) Discord


LAION Discord


Interconnects (Nathan Lambert) Discord

OpenAI's Alleged NDA Overreach: OpenAI leadership claimed ignorance over threats to ex-employees' vested equity for not signing NDAs, but documents with leadership's signatures suggest otherwise. Ex-employees were pressured with seven-day windows to sign or face losing millions.

Model Performance Headlines: Gemini 1.5 Pro impressively topped the Reward Bench Leaderboard for generative models, as indicated by Jeff Dean's tweet, while News Corp and OpenAI entered a multi-year deal, allowing AI utilization of News Corp content, as per this announcement.

Merch in a Flash: Nathan Lambert's Shopify store, Interconnects, launches amidst lighthearted uncertainty about operations and with community-driven product adjustments for inclusivity; he assures ethical sourcing.

The Emergence of AI Influencers?: TikTok's teen demographic reportedly resonates with content generated by bots, highlighting the potential for AI-created content to go viral. The platform stands out as a launchpad for careers like Bella Poarch's.

Anthropic AI's Golden Gate Focus: A whimsical experiment by Anthropic AI altered Claude AI's focus to obsess over the Golden Gate Bridge, leading to a mix of amusement and interest in the AI community.


OpenRouter (Alex Atallah) Discord

OpenRouter Swings Open the Gates to Advanced AI Tools: OpenRouter now facilitates the use of Anthropic and Gemini models with a syntax matching OpenAI's, broadening the landscape for AI practitioners. Supported tool calls and function usage instructions can be found in the documentation.

Lumimaid 70B Sashays into the AI Theater: Aimed specifically at roleplay scenarios, the Lumimaid 70B model was tweaked and let loose by the NeverSleep team and details can be scooped from their announcement page.

Calling all Roleplayers to a New Digital Realm: A new roleplaying app granting a free tier has launched, leveraging OpenRouter's multifaceted AI characters, with the creator keen on gathering feedback via RoleplayHub.

Tech Snags and Community Dialogues Tangle in General Channel: Software patches were applied to mend streaming issues with models like Llama-3, and the release of Mistral-7B v0.3 spewed some confusion due to new vocab/tokenizer—uncertainty lingered about if it should be a distinct model route or a direct route upgrade. Meanwhile, Cohere's Aya initiative garnered attention offering multilingual AI research spanning 101 languages, find out more here.

Economies of Scale Kick in for AI Model Access: Sharp price reductions have been executed for several models, including a tempting 30% off for nousresearch/nous-hermes-llama2-13b, among others. These markdowns are stirring up the market for developers and enthusiasts alike.


LlamaIndex Discord


Latent Space Discord


OpenInterpreter Discord


tinygrad (George Hotz) Discord

Challenging the Taylor Takedown: Members questioned the efficacy of Taylor series in approximations, noting that they are only accurate close to the reference point. It was highlighted that range reduction might not be the optimal path to perfect precision, and interval partitioning could offer a better solution.

Range Reduction Rethink: The group debated over the use of range reduction techniques, suggesting alternatives like reducing to [0, pi/4], and referred to IBM's approach as a practical example of interval partitioning found in their implementation.

IBM's Insights: An IBM source file was mentioned in a suggestion to address range reduction problems by treating fmod as an integer, viewable here.

Mathematical Complexity Calmly Contemplated: There was a consensus that the computations for perfect accuracy are complex, especially for large numbers, though typically not slow—a mix of admiration and acceptance for the scientific intricacies involved.

Shape Shifting in ShapeTracker: The group explored ShapeTracker limitations, concluding that certain sequences of operations like permute followed by reshape lead to multiple views, posing a challenge in chaining movement operations effectively. The utility of tensor masking was discussed, with emphasis on its role in tensor slicing and padding.


Cohere Discord


LangChain AI Discord


DiscoResearch Discord


MLOps @Chipro Discord


PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (1009 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #announcements (1 messages):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (81 messages🔥🔥):

Links mentioned:


Perplexity AI ▷ #announcements (1 messages):

- **Scheduled Downtime Announced**: Heads up for a **scheduled downtime** tonight at 12:00am EST. The downtime will last around 30 minutes for a database upgrade aimed at improving performance and user experience.

Perplexity AI ▷ #general (897 messages🔥🔥🔥):

- **Gemini Free Usage Pleasantly Surprises**: Members celebrated that **Gemini in AI Studio** is free even for large usage (*"requests in this UI are free"*) and exclaimed about the ability to perform fine-tuning without cost (*"finetuning for free?"). They discussed possible data privacy concerns, but openness to experimenting with the service prevailed.
- **Perplexity’s Speed Impresses**: **Web scraping** optimizations led to significant performance improvements for **searches using multiple sources**, clocking speeds much faster than previous attempts. One member reported *"web scraping taking 1.52s" compared to earlier times of over 7s* and emphasized proper use of parallel processing.
- **Perplexity vs. Other AI Tools**: Members compared **Perplexity** with other AI tools like **Gemini Pro** and **ChatGPT** regarding file handling and data processing. **Perplexity** received praise for its research and writing capabilities (*"better in both areas"*) and flexible file handling, garnering new insights on **Gemini's role** mainly for its context handling.
- **Integrating Additional Features into Perplexity**: Discussions included potential UI enhancements and tools for **Perplexity**, including the integration of **labs into the main UI** and adding functionalities like history saving and support for formats like **CSV**. The aim is to potentially transform **Perplexity** from a decent tool to the *"best AI website"*.
- **Model Usage and Rate Limits Challenge Members**: Encountering **API rate limits** and exploring various models, members juggled between **Haiku**, **Cohere**, and **GPT-4-free**, sharing frustrations and strategies for optimal usage given free and cost-efficient tiers. They explored alternatives and workarounds while emphasizing the balance between accuracy and context sizes.

Links mentioned:


Perplexity AI ▷ #sharing (7 messages):


Perplexity AI ▷ #pplx-api (2 messages):


LLM Finetuning (Hamel + Dan) ▷ #general (141 messages🔥🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #workshop-1 (14 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (18 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #learning-resources (1 messages):

Link mentioned: lm-hackers/lm-hackers.ipynb at main · fastai/lm-hackers: Hackers' Guide to Language Models. Contribute to fastai/lm-hackers development by creating an account on GitHub.


LLM Finetuning (Hamel + Dan) ▷ #jarvis-labs (32 messages🔥):


LLM Finetuning (Hamel + Dan) ▷ #hugging-face (4 messages):


LLM Finetuning (Hamel + Dan) ▷ #replicate (5 messages):


LLM Finetuning (Hamel + Dan) ▷ #kylecorbitt_prompt_to_model (2 messages):


LLM Finetuning (Hamel + Dan) ▷ #workshop-2 (209 messages🔥🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #axolotl (80 messages🔥🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #zach-accelerate (96 messages🔥🔥):

- **Complimentary GPU Optimization Event**: Filippob82 announced a workshop on GPU optimization featuring speakers from OpenAI, NVIDIA, Meta, and Voltron Data. The event will be livestreamed on YouTube and discussed on [Discord](https://discord.gg/T5sx2MYd5R), with more details in the [README](https://github.com/mlops-discord/gpu-optimization-workshop) and [workshop note](https://docs.google.com/document/d/1TR_5Ax0rPqTj8I2sA7MH-aa4J7TUUt4Ji9272OP8ZJg/edit).

- **Training Tips on A100s and H100s**: Stevenmerrill inquired about general rules for training on A100 and H100 GPUs, with Tddammo recommending to absolutely use bf16 if the GPU supports it and adjust batch sizes to utilize available VRAM. Additionally, sequence lengths can also be increased due to enhanced memory capacity.

- **VRAM Calculation Challenges**: Remek1972 discussed issues with VRAM requirements for larger sequence lengths (e.g., 4096 tokens) on an A6000 GPU, finding that it leads to crashes. The conversation concluded that using mixed precision (bf16) and optimization strategies could mitigate some memory issues but larger models might necessitate offloading or quantization.

- **Paged ADAMW 8-bit Optimizer Discussion**: Lhl mentioned using the paged_adamw_8bit optimizer for efficiency and asked about any potential drawbacks, receiving assurance that performance is equivalent to normal adam. They discussed the community's experience and findings, including the benefits of 8-bit optimization for memory usage.

- **Interest in Latest Optimizers**: Lhl and Tddammo discussed experimenting with new optimizers like Sophia and Adam_LoMo. Recommendations and shared experiences pointed to potential benefits in performance, with Lhl adding links to recent discussions on Twitter regarding new optimizer benchmarks.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #wing-axolotl (56 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #announcements (1 messages):

Links mentioned:


HuggingFace ▷ #general (591 messages🔥🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (3 messages):


HuggingFace ▷ #cool-finds (5 messages):

Links mentioned:


HuggingFace ▷ #i-made-this (14 messages🔥):

- **Excitement builds for open-source protein folding**: A member introduced [ProteinViz](https://huggingface.co/spaces/as-cle-bert/proteinviz), an open-source alternative to AlphaFold3, enabling users to predict protein 3D structures. They also shared a [community blog post](https://huggingface.co/blog/as-cle-bert/what-is-going-on-with-alphafold3) exploring the advancements of AlphaFold3.
  
- **Mistral-7B v0.3 demo impresses**: The ultra-fast [Mistral-7B v0.3 demo](https://huggingface.co/spaces/ehristoforu/mistral-7b-v0.3-chat) was shared, showcasing its capabilities. Users were encouraged to try it out and provide feedback.

- **LayerDiffusion method enabling transparent images**: A user shared [diffuser_layerdiffuse](https://github.com/rootonchair/diffuser_layerdiffuse), a method for generating transparent images from any base model. This technique promises high foreground separation accuracy.

- **SimpleTuner v0.9.6 released**: The latest release of [SimpleTuner](https://github.com/bghira/SimpleTuner/releases/tag/v0.9.6) includes a new randomized aspect bucket feature and custom resolution mapping configs. Users are urged to check out the newer functionalities.

- **Miniature dataset success**: A member celebrated their dataset reaching 1K downloads, highlighting it as part of a blogpost on RAG applications. This miniature dataset contains 3K samples and has gained traction despite the general preference for more visually engaging demos.

Links mentioned:

If you are excited about AlphaFold3, but upset because it…": no description foundRelease v0.9.6 - debias them buckets · bghira/SimpleTuner: debiased aspect bucketing When training on large datasets of heterogenous samples, you will discover a content bias among aspect ratios - vertical images contain portraits, widescreen shots are cin...What is an Instruction Tuned Model?: What is Instruction Tuning? What are Instruction Tuned models? What is a Pretrained Model? How can I make my Large Language Model follow Instructions?These ...GitHub - rootonchair/diffuser_layerdiffuse: Create transparent image with Diffusers!: Create transparent image with Diffusers! Contribute to rootonchair/diffuser_layerdiffuse development by creating an account on GitHub.Proteinviz - a Hugging Face Space by as-cle-bert: no description foundGitHub - AstraBert/proteinviz: Your open-source alternative to AlphaFold3🚀: Your open-source alternative to AlphaFold3🚀. Contribute to AstraBert/proteinviz development by creating an account on GitHub.What is going on with AlphaFold3?: no description found


HuggingFace ▷ #NLP (11 messages🔥):


HuggingFace ▷ #diffusion-discussions (5 messages):


Nous Research AI ▷ #ctx-length-research (3 messages):


Nous Research AI ▷ #off-topic (117 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #announcements (1 messages):

Links mentioned:


Nous Research AI ▷ #general (302 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (6 messages):

Link mentioned: Increase error rates on Opus: no description found


Nous Research AI ▷ #rag-dataset (7 messages):

Links mentioned:


Nous Research AI ▷ #world-sim (3 messages):


Stability.ai (Stable Diffusion) ▷ #general-chat (360 messages🔥🔥):

Links mentioned:


LM Studio ▷ #💬-general (152 messages🔥🔥):

<ul>
    <li><strong>Llama 3 8k context performance complaints:</strong> "Oh yeah now looking at the Llama 3 models. 8k context sucks." There are discussions on Llama models with higher context lengths up to 1M.</li>
    <li><strong>Idefics 2.0 multimodal model inquiries:</strong> Users asked if the Idefics2 model from HuggingFace supports LM Studio. It was noted that idefics models don't work in llama.cpp, but other vision models like LLaVA are supported.</li>
    <li><strong>Query on Context Length affecting Performance:</strong> A member asked if increased context size (e.g., 8k, 16k) makes models slower, to which it was confirmed that larger context sizes do indeed slow down performance.</li>
    <li><strong>ONNX Runtime and GPU driver improvements:</strong> Discussion about new NVIDIA driver updates improving model inference speeds. "Just updated the drivers. I had to reboot because even if they were installed, it kept saying it was using the old ones."</li>
    <li><strong>Helpful LM Studio resources and usage:</strong> Members shared links to tutorials and resources such as a YouTube video on running LM Studio locally. "Explore LM Studio's new CLI tool, lms, in this video."</li>
</ul>

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (45 messages🔥):

Links mentioned:


LM Studio ▷ #📝-prompts-discussion-chat (24 messages🔥):

Link mentioned: lmstudio-community/aya-23-8B-GGUF · Hugging Face: no description found


LM Studio ▷ #🎛-hardware-discussion (3 messages):


LM Studio ▷ #🧪-beta-releases-chat (1 messages):


LM Studio ▷ #amd-rocm-tech-preview (7 messages):


LM Studio ▷ #model-announcements (2 messages):


Modular (Mojo 🔥) ▷ #general (2 messages):


Modular (Mojo 🔥) ▷ #💬︱twitter (1 messages):

ModularBot: From Modular: https://twitter.com/Modular/status/1793427278564917459


Modular (Mojo 🔥) ▷ #📺︱youtube (1 messages):

Link mentioned: Mojo Community Meeting #1: Mojo Community Meeting Public Agenda: https://modul.ar/community-meeting-doc


Modular (Mojo 🔥) ▷ #ai (1 messages):


Modular (Mojo 🔥) ▷ #🔥mojo (65 messages🔥🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #📰︱newsletter (1 messages):

Zapier: Modverse Weekly - Issue 35 https://www.modular.com/newsletters/modverse-weekly-35


Modular (Mojo 🔥) ▷ #nightly (120 messages🔥🔥):

Links mentioned:


Eleuther ▷ #general (69 messages🔥🔥):

Links mentioned:


Eleuther ▷ #research (38 messages🔥):

Link mentioned: Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers: Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art ...


Eleuther ▷ #interpretability-general (2 messages):


Eleuther ▷ #lm-thunderdome (21 messages🔥):

Links mentioned:


OpenAI ▷ #ai-discussions (59 messages🔥🔥):


OpenAI ▷ #gpt-4-discussions (37 messages🔥):


OpenAI ▷ #prompt-engineering (14 messages🔥):


OpenAI ▷ #api-discussions (14 messages🔥):


CUDA MODE ▷ #general (11 messages🔥):


CUDA MODE ▷ #cuda (3 messages):


CUDA MODE ▷ #torch (3 messages):


CUDA MODE ▷ #announcements (1 messages):

- **GPU Optimization Workshop Announced**: A GPU optimization workshop hosted by a member is scheduled for <t:1716490800:F>. Speakers include Sharan Chetlur from NVIDIA, Phil Tillet from OpenAI, and William Malpica from Voltron Data.
- **Live and Interactive Options**: The event will be livestreamed on YouTube, with discussions on [Discord](https://discord.gg/T5sx2MYd5R). Interested participants can RSVP [here](https://lu.ma/1wu5ppl5).
- **Cost and Capacity Details**: The Zoom call will allow up to 100 people, costing $1 to ensure serious participation. Over 2400+ people have already registered.
- **Additional Resources Provided**: Refer to the [README on GitHub](https://github.com/mlops-discord/gpu-optimization-workshop) and the [shared workshop note](https://docs.google.com/document/d/1TR_5Ax0rPqTj8I2sA7MH-aa4J7TUUt4Ji9272OP8ZJg/edit) for detailed reading materials and information.

Link mentioned: GPU Optimization Workshop · Luma: We’re hosting a workshop on GPU optimization with stellar speakers from OpenAI, NVIDIA, Meta, and Voltron Data. The event will be livestreamed on YouTube, and…


CUDA MODE ▷ #algorithms (4 messages):


CUDA MODE ▷ #cool-links (2 messages):

Links mentioned:


CUDA MODE ▷ #jobs (1 messages):


CUDA MODE ▷ #beginner (5 messages):

Link mentioned: Tweet from François Fleuret (@francoisfleuret): @ntenenz @main_horse If I get the same perf with vanilla python code and with cuda graph replay, I cannot accuse python?


CUDA MODE ▷ #pmpp-book (3 messages):


CUDA MODE ▷ #off-topic (1 messages):

Link mentioned: orphee/sandbox/heat_transfer.cpp at master · orion160/orphee: Contribute to orion160/orphee development by creating an account on GitHub.


CUDA MODE ▷ #llmdotc (81 messages🔥🔥):

Links mentioned:


CUDA MODE ▷ #rocm (6 messages):

Link mentioned: GitHub - howiejayz/flash-attention: Fast and memory-efficient exact attention: Fast and memory-efficient exact attention. Contribute to howiejayz/flash-attention development by creating an account on GitHub.


OpenAccess AI Collective (axolotl) ▷ #general (42 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (7 messages):


OpenAccess AI Collective (axolotl) ▷ #community-showcase (2 messages):

Link mentioned: Impact of high-quality, mixed-domain data on the performance of medical language models: AbstractObjective. To optimize the training strategy of large language models for medical applications, focusing on creating clinically relevant systems th


OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (44 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (6 messages):

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.


LAION ▷ #general (86 messages🔥🔥):

Links mentioned:


LAION ▷ #research (14 messages🔥):

Link mentioned: Golden Gate Claude: When we turn up the strength of the “Golden Gate Bridge” feature, Claude’s responses begin to focus on the Golden Gate Bridge. For a short time, we’re making this model available for everyone to inter...


Interconnects (Nathan Lambert) ▷ #news (8 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (8 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (59 messages🔥🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #posts (5 messages):


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):


OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Link mentioned: Chat with 100+ AI Characters for free, uncensored and NSFW | Role Play Hub: RoleplayHub offers unlimited characters and Chats with sexy AI characters, our chatbots are designed to provide you with a personalized experience.


OpenRouter (Alex Atallah) ▷ #general (64 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #blog (4 messages):


LlamaIndex ▷ #general (50 messages🔥):

- **Bigger models better at RAG embeddings still debated**: A user inquired about using bigger AI models for embedding creation in RAG, questioning if larger models provide better embeddings similar to their performance in answering questions. No specific consensus or recommendation for small models was provided.

- **Define custom similarity scores in LlamaIndex**: A query on defining custom similarity scores received guidance with references to the **Hybrid Retriever example** and code snippets highlighting the use of an `alpha` parameter. More details can be found in the [Customizing the stages of querying (LlamaIndex docs)](https://docs.llamaindex.ai/en/latest/understanding/querying/querying#customizing-the-stages-of-querying).

- **Persisted Vector Index: embedding calls are necessary**: A discussion on why external API calls to VoyageAI embeddings are made despite having a locally persisted Vectorstore concluded that query text itself needs embedding with each new query. Relevant code snippets and explanations clarified that this approach is normal.

- **Issues with memory context in building agents**: Users discussed problems with maintaining context in an agent based on query pipelines. Suggestions included checking memory buffers and adjusting token limits, with reference to the [example in the LlamaIndex docs](https://docs.llamaindex.ai/en/stable/examples/agent/agent_runner/query_pipeline_agent/).

- **ReAct agent clarified**: A user questioned the name 'ReAct' for an agent. The response clarified it refers to the algorithm from the [ReAct paper](https://arxiv.org/abs/2210.03629), which combines reasoning traces and task-specific actions for better LLM performance.

Links mentioned:


Latent Space ▷ #ai-general-chat (43 messages🔥):

Links mentioned:

  Talk Summary - RAG for a medical company: the technical and product challenges by Noe Achache &middot; Chris Swart

: no description foundTweet from Tensorlake (@tensorlake): We are super excited to finally announce @tensorlake's open-source, real-time data framework, Indexify. It fits into any LLM stack and provides a foundational building block for bringing your dat...Tweet from Gradio (@Gradio): 📣 📣 Mistral has released 7B v0.3 models with extended vocabulary from v0.2. 🚀 Base + Instruct checkpoints released 🔤 Extended Vocabulary to 32K 👌 v3 Tokenizer 😍 Function calling Demo+links👇Tweet from Linus (@thesephist): Embedding features learned with sparse autoencoders can make semantic edits to text ✨ (+ a reading/highlighting demo) I've built an interface to explore and visualize GPT-4 labelled features lea...Tweet from John Luttig (@absoluttig): despite recent progress and endless cheerleading, open-source AI is a worsening investment for model builders, an inferior option for developers and consumers, and a national security risk. I wrote ab...Tweet from clem 🤗 (@ClementDelangue): Should we acquire Humane to open-source the pin?Tweet from Vaibhav (VB) Srivastav (@reach_vb): Let's fucking go! Mistral just released 7B v0.3 🔥 > Base + Instruct model checkpoints released > Extended vocabulary to 32768 > Supports new v3 Tokenizer > Supports function calling ...


Latent Space ▷ #llm-paper-club-west (5 messages):

Link mentioned: LLM Paper Club (Survey Paper Club!) · Zoom · Luma: It's survey day! Pick a paper from here and cover it in 5 minutes: https://app.sli.do/event/bNV6mo3BFGhe8Bqzb1tonb/live/questions


OpenInterpreter ▷ #general (11 messages🔥):

Link mentioned: New Terminal Option: --no_live_response by Steve235lab · Pull Request #1278 · OpenInterpreter/open-interpreter: Describe the changes you have made: Add a new terminal option which allows users to config whether rendering responses while receiving chunks (classic and default behavior) or perform a one-time re...


OpenInterpreter ▷ #O1 (12 messages🔥):

Link mentioned: M5Flow: no description found


OpenInterpreter ▷ #ai-content (1 messages):

Link mentioned: Tweet from TestingCatalog News 🗞 (@testingcatalog): It turns out that you can easily bypass the macOS ChatGPT app waitlist in this way: 1. Launch the app and log in 2. Press CMD+Q when the window changes its size but before the Login alert. 3. Launch ...


tinygrad (George Hotz) ▷ #general (7 messages):


tinygrad (George Hotz) ▷ #learn-tinygrad (7 messages):


Cohere ▷ #general (14 messages🔥):

Link mentioned: CohereForAI/aya-23-35B · Hugging Face: no description found


LangChain AI ▷ #general (9 messages🔥):


LangChain AI ▷ #share-your-work (3 messages):

Link mentioned: What is an Instruction Tuned Model?: What is Instruction Tuning? What are Instruction Tuned models? What is a Pretrained Model? How can I make my Large Language Model follow Instructions?These ...


DiscoResearch ▷ #general (5 messages):

Links mentioned:


MLOps @Chipro ▷ #events (2 messages):


MLOps @Chipro ▷ #general-ml (2 messages):

Link mentioned: Masked Autoencoders Are Scalable Vision Learners: This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the mis...


AI Stack Devs (Yoko Li) ▷ #multi-modal-starter-kit (2 messages):

<!-- No significant discussions or links were identified in the provided messages from the multi-modal-starter-kit channel. -->

Mozilla AI ▷ #llamafile (1 messages):

daddyd_: was just reading through this repo the other day, very excited to see your progress!





{% else %}

Unsloth AI (Daniel Han) Discord


Perplexity AI Discord

Scheduled Downtime for Database Boost: A scheduled downtime has been announced, set to commence at 12:00am EST and last approximately 30 minutes to upgrade the database improving performance and user experience.

Engineering Excitement Over Free Gemini: Engineering conversations revolved around the free usage of Gemini in AI Studio for high-volume tasks like fine-tuning, spurring discussions on data privacy and cost-saving strategies.

Perplexity Powers Past Performance Hurdles: Notable improvements in Perplexity's web scraping have yielded speeds of 1.52s, significantly surpassing previous performances over 7s, while discussions highlighted the importance of parallel processing and efficient tooling in AI applications.

Comparative AI Discourse: Technically-inclined users compared Perplexity with Gemini Pro and ChatGPT, lauding Perplexity's research and writing capabilities and flexible file management, with suggestions to include additional features like CSV support to reach new heights of utility.

API Anomalies and Alternatives Analysis: Community members discussed discrepancies in outputs between web and API versions of the same models, seeking clarifications on the observed inconsistencies, while also sharing their experiences in balancing model accuracy and utilization within API rate limits for platforms like Haiku, Cohere, and GPT-4-free.


LLM Finetuning (Hamel + Dan) Discord

Instruction Finetuning with ColBERT and Task Updates: Engineers discussed finetuning strategies for instruction embeddings, citing frameworks like INSTRUCTOR and TART as references. A project proposal for automating standup transcript ticket updates involved using examples of standup conversions correlated with ticket actions.

CUDA Woes and Workarounds: Persistent CUDA errors while running LLM models like llama 3 8b were a common issue, with remedies including adjusting batch sizes and monitoring GPU usage via nvidia-smi. Docker was recommended for managing CUDA library compatibility, with a link to a Docker image from Docker Hub provided.

Parameters and Efficient Model Training: Queries emerged about default Axolotl's configuration parameters and optimization strategies for training on A100 and H100 GPUs, where using bf16 and maximizing VRAM usage were among the suggested strategies. Discussions also extended to newer optimizers like Sophia and Adam_LoMo.

Accelerating Free Credits and Workshop Excitement: Modal's fast credit allocation was commended, and excitement built around a GPU Optimization Workshop featuring representatives from OpenAI, NVIDIA, Meta, and Voltron Data. Additionally, there was anticipation for a recording of an upcoming talk by Kyle Corbitt.

Model Fine-Tuning and Training Factors: Fine-tuning LLMs to generate layouts, troubleshooting Axolotl's dataset paths, and considering LoRA hyperparameters were topics of interest. The use of GPT-4 as a judge for level 2 model evaluations and troubleshooting Axolotl on Modal due to gated model access issues were also discussed.

Deployment Dilemmas: Engineers encountered challenges when deploying trained models to S3 on Modal, with solutions including using the modal volume get command and mounting an S3 bucket as a volume, as described in Modal's documentation.

Paper and Tutorial References: The community shared valuable learning resources, such as a YouTube demo on EDA assistant chatbots. They also appreciated illustrative examples from Hamel and Jeremy Howard, with references to both a tweet and a GitHub repo.


HuggingFace Discord


Nous Research AI Discord

Flash Attention Needed for YaRN: Efforts to implement flash attention into the YaRN model are meeting challenges, with some progress but not a perfect fit yet.

Rust Rising Among AI Enthusiasts: Increasing interest and discussions around using Rust for machine learning, with members sharing resources like Rust-CUDA GitHub and rustml - Rust, while recognizing the dominance of Python in AI.

Nous Research Expanding Teams: Nous Research is on the hunt for new talent, as evidenced by their recent hiring announcement and a call to apply via their Google Form.

Python vs Rust in AI Careers: A robust debate over Python's primacy in AI careers with members bringing up alternatives like Rust or Go, alongside sharing insights from AI experts like Yann LeCun's views on focusing beyond LLMs for next-gen AI systems.

RAG's Validity in Question: Proposals made to enhance RAG's model context, emphasizing the need for context accuracy by referencing a debate over the reliability of Google's AI drawing conclusions from outdated sources.


Stability.ai (Stable Diffusion) Discord


LM Studio Discord

LLama Lamentations & Local Model Logistics: There's unrest over Llama 3's 8k context performance, with members revealing it falls short of expectations. Despite being the topic of debate, suggestions for improving its performance, such as introducing longer contexts up to 1M, remain theoretical.

Discussions Turn to Vision Models: OCR discussions saw mixed reviews of vision models like LLaVA 1.6 as users recommend Tesseract for reliable text extraction. Interest in Vision Language Models (VLMs) is evident, but deploying them effectively with web server APIs requires attentive configuration, including apikey incorporation.

Multimodal Mishaps and Merits: Idefics 2.0 multimodal’s compatibility sparked interest, yet it seems to trip on existing infrastructure like llama.cpp. Meanwhile, Mistral-7B-Instruct v0.3 emerges as part of the dialogue, boasting extended vocabulary and improved functional calling (Model Card). In parallel, Cohere's Aya 23 showcases its talents in 23 languages, promising to sway future conversations (Aya 23 on Huggingface).

GPU Grows but Guides Needed: The adoption of 7900xt graphics cards is underway among members seeking to amp up their tech game. However, guidance for effective environment setups, such as treating an RX 6600 card as gfx1030 on Fedora, remains a precious commodity.

Storage Solved, Support Summoned: One member's move to allocate an M.2 SSD exclusively for LM Studio paints a picture of the ongoing hardware adaptations. On the flip side, GPU compatibility queries like dual graphics card support highlight the community's reliance on shared wisdom.


Modular (Mojo 🔥) Discord


Eleuther Discord

Pythia's Pocketbook: Discussing the cost of training models like Pythia, Stellaathena estimated a bill of $250k for the largest model, mentioning efficiency and discounted GPU-hour pricing in calculations.

Cost-Efficiency Report Needs Reviewers: A forthcoming report on frontier model training costs seeks peer review; interested parties would assess GPU-hours and the influence of GPU types like A100 40GB.

LeanAttention Edging Out FlashAttention?: A recently shared paper introduces LeanAttention, which might outperform FlashAttention, raising debates on its innovation. The community also joked about unorthodox practices to improve model benchmarks, playfully noting, "The secret ingredient is crime."

Interpretability's New Frontiers: A new paper was noted for opening research doors in interpretability, kindling curiosity on its implications for future studies.

Evaluating Large Models: Tech tips were exchanged, such as running the lm eval harness on multi-node SLURM clusters and how to set parameters like num_fewshot for evaluations with challenges reported around reproducibility and internet access on compute nodes.


OpenAI Discord


CUDA MODE Discord

Full House at the GPU Optimization Workshop: The GPU optimization workshop raked in excellent engagement with over 2400+ registrants and valuable sessions from experts including Sharan Chetlur (NVIDIA), Phil Tillet (OpenAI), and William Malpica (Voltron Data). Enthusiasts can RSVP for future interactions here, with additional resources available on GitHub.

Breaching CUDA Confusion: A member clarified that __global__ CUDA functions can't be simultaneously __host__ due to their grid launch setup, and they posited the theoretical utility of a __global__ function agnostic of threadIdx and blockIdx.

Tricky Transformation with Triton: One user discussed performance drops when converting a kernel from FP32 to FP6 using triton+compile, speculating on the potential impact of inplace operators.

AI Research Synopsis Spices Up Discussions: A weekly AI research spotlight surfaced, featuring analysis on works like KAN, xLSTM, and OpenAI's GPT-4. The discussion extended to the computationally intensive nature of KANs owing to activation-based edge computation.

The CUDA Cul-de-Sac and Vulkan Ventures: Conversations veered into contributions and coding concerns, including a member's flash-attention repository stalling, GPU model benchmarks like 7900xtx versus 3090, and Vulkan's failure to impress in a heat transfer simulation.

LLM.C Lurches Forward: There was a bustling (busy) exchange about llm.c with members celebrating the integration of HellaSwag evaluation in C, debating CUDA stream optimization for speed, and sharing the challenge of scaling batch sizes without training disruptions.

Please note, some quotes and project links have been shared verbatim as no additional context was provided.


OpenAccess AI Collective (axolotl) Discord


LAION Discord


Interconnects (Nathan Lambert) Discord

OpenAI's Alleged NDA Overreach: OpenAI leadership claimed ignorance over threats to ex-employees' vested equity for not signing NDAs, but documents with leadership's signatures suggest otherwise. Ex-employees were pressured with seven-day windows to sign or face losing millions.

Model Performance Headlines: Gemini 1.5 Pro impressively topped the Reward Bench Leaderboard for generative models, as indicated by Jeff Dean's tweet, while News Corp and OpenAI entered a multi-year deal, allowing AI utilization of News Corp content, as per this announcement.

Merch in a Flash: Nathan Lambert's Shopify store, Interconnects, launches amidst lighthearted uncertainty about operations and with community-driven product adjustments for inclusivity; he assures ethical sourcing.

The Emergence of AI Influencers?: TikTok's teen demographic reportedly resonates with content generated by bots, highlighting the potential for AI-created content to go viral. The platform stands out as a launchpad for careers like Bella Poarch's.

Anthropic AI's Golden Gate Focus: A whimsical experiment by Anthropic AI altered Claude AI's focus to obsess over the Golden Gate Bridge, leading to a mix of amusement and interest in the AI community.


OpenRouter (Alex Atallah) Discord

OpenRouter Swings Open the Gates to Advanced AI Tools: OpenRouter now facilitates the use of Anthropic and Gemini models with a syntax matching OpenAI's, broadening the landscape for AI practitioners. Supported tool calls and function usage instructions can be found in the documentation.

Lumimaid 70B Sashays into the AI Theater: Aimed specifically at roleplay scenarios, the Lumimaid 70B model was tweaked and let loose by the NeverSleep team and details can be scooped from their announcement page.

Calling all Roleplayers to a New Digital Realm: A new roleplaying app granting a free tier has launched, leveraging OpenRouter's multifaceted AI characters, with the creator keen on gathering feedback via RoleplayHub.

Tech Snags and Community Dialogues Tangle in General Channel: Software patches were applied to mend streaming issues with models like Llama-3, and the release of Mistral-7B v0.3 spewed some confusion due to new vocab/tokenizer—uncertainty lingered about if it should be a distinct model route or a direct route upgrade. Meanwhile, Cohere's Aya initiative garnered attention offering multilingual AI research spanning 101 languages, find out more here.

Economies of Scale Kick in for AI Model Access: Sharp price reductions have been executed for several models, including a tempting 30% off for nousresearch/nous-hermes-llama2-13b, among others. These markdowns are stirring up the market for developers and enthusiasts alike.


LlamaIndex Discord


Latent Space Discord


OpenInterpreter Discord


tinygrad (George Hotz) Discord

Challenging the Taylor Takedown: Members questioned the efficacy of Taylor series in approximations, noting that they are only accurate close to the reference point. It was highlighted that range reduction might not be the optimal path to perfect precision, and interval partitioning could offer a better solution.

Range Reduction Rethink: The group debated over the use of range reduction techniques, suggesting alternatives like reducing to [0, pi/4], and referred to IBM's approach as a practical example of interval partitioning found in their implementation.

IBM's Insights: An IBM source file was mentioned in a suggestion to address range reduction problems by treating fmod as an integer, viewable here.

Mathematical Complexity Calmly Contemplated: There was a consensus that the computations for perfect accuracy are complex, especially for large numbers, though typically not slow—a mix of admiration and acceptance for the scientific intricacies involved.

Shape Shifting in ShapeTracker: The group explored ShapeTracker limitations, concluding that certain sequences of operations like permute followed by reshape lead to multiple views, posing a challenge in chaining movement operations effectively. The utility of tensor masking was discussed, with emphasis on its role in tensor slicing and padding.


Cohere Discord


LangChain AI Discord


DiscoResearch Discord


MLOps @Chipro Discord

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}