AI News for 5/30/2024-5/31/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (393 channels, and 2911 messages) for you. Estimated reading time saved (at 200wpm): 337 minutes.

Together with Anthropic's GA of tool use/function calling today on Anthropic/Amazon/Google, with support for streaming, forced use, and vision...

Alex Albert shared 5 architectures for using them in an agentic context:

Delegation: Use cheaper, faster models for cost and speed gains.

For example, Opus can delegate to Haiku to read a book and return relevant passages. This works well if the task description & result are more compact than the full context.

Parallelization: Cut latency (but not cost) by running agents in parallel.

e.g. 100 sub-agents each read a different chapter of a book, then return key passages.

Debate: Multiple agents with different roles engage in discussion to reach better decisions.

For example, a software engineer proposes code, a security engineer reviews it, a product manager gives a user's view, then a final agent synthesizes and decides.

Specialization: A generalist agent orchestrates, while specialists execute tasks.

For example, the main agent uses a specifically prompted (or fine-tuned) medical model for health queries or a legal model for legal questions.

Tool Suite Experts: When using 100s or 1000s of tools, specialize agents in tool subsets.

Each specialist (the same model, but with different tools) handles a specific toolset. The orchestrator then maps tasks to the right specialist (keeps the orchestrator prompt short).

Nothing particularly groundbreaking here but a very handy list to think about for patterns. Anthropic also launched a self guided course on tool use:

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

AI Research and Development

Open Science and Research Funding: @ylecun expressed a clear ethical rule for research: "Do not get research funding from entities that restrict your ability to publish." He emphasized that making new knowledge available to the world is intrinsically good, regardless of the funding source. @ylecun noted that this ethical rule has made him a strong advocate of open science and open source.
Emergence of Superintelligence: @ylecun believes the emergence of superintelligence will be a gradual process, not a sudden event. He envisions starting with an architecture at the intelligence level of a rat or squirrel and progressively ramping up its intelligence while designing proper guardrails and safety mechanisms. The goal is to design an objective-driven AI that fulfills goals specified by humans.
Convolutional Networks for Image and Video Processing: @ylecun recommends using convolutions with stride or pooling at low levels and self-attention circuits at higher levels for real-time image and video processing. He believes @sainingxie's work on ConvNext has shown that convolutional networks can be just as good as vision transformers if done right. @ylecun argues that self-attention is equivariant to permutations, which is nonsensical for low-level image/video processing, and that global attention is not scalable since correlations are highly local in images and video.
AI Researchers in Industry vs. Academia: @ylecun noted that if a graph showed absolute numbers instead of percentages, it would reveal that the numbers of AI researchers in industry, academia, and government have all grown, with industry growing earlier and faster than the rest.

AI Tools and Applications

Suno AI: @suno_ai_ announced the release of Suno v3.5, which allows users to make 4-minute songs in a single generation, create 2-minute song extensions, and experience improved song structure and vocal flow. They are also paying $1 million to the top Suno creators in 2024. @karpathy expressed his love for Suno and shared some of his favorite songs created using the tool.
Claude by Anthropic: @AnthropicAI announced that tool use for Claude is now generally available in their API, Amazon Bedrock, and Google Vertex AI. With tool use, Claude can intelligently select and orchestrate tools to solve complex tasks end-to-end. Early customers use Claude with tool use to build bespoke experiences, such as @StudyFetch using Claude to power Spark.E, a personalized AI tutor. @HebbiaAI uses Claude to power complex, multi-step customer workflows for their AI knowledge worker.
Perplexity AI: @AravSrinivas introduced Perplexity Pages, described as "AI Wikipedia," which allows users to analyze sources and synthesize a readable page with a simple "one-click convert." Pages are available for all Pro users and rolling out more widely to everyone. Users can create a page as a separate entity or convert their Perplexity chat sessions into the page format. @perplexity_ai noted that Pages lets users share in-depth knowledge on any topic with formatted images and sections.
Gemini by DeepMind: @GoogleDeepMind announced that developers can now start building with Gemini 1.5 Flash and Pro models using their API pay-as-you-go service. Flash is designed to be fast and efficient to serve, with an increased rate limit of 1000 requests per minute.

Memes and Humor

@huybery introduced im-a-good-qwen2, a chatbot that interacts in the comments.
@karpathy shared his opinion on 1-on-1 meetings, stating that he had around 30 direct reports at Tesla and didn't do 1-on-1s, which he believes was great. He finds 4-8 person meetings and large meetings for broadcast more useful.
@ReamBraden shared a meme about the challenges of being a startup founder.
@cto_junior shared a meme about Tencent AI developers working to replace underpaid anime artists.
@nearcyan made a humorous comment about people who believe we should not talk to animals, build houses, or power plants, and instead "rot in caves and fight over scraps as god intended."

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Image & Video Generation

Photorealistic avatars: In /r/singularity, impressive photorealistic avatars were showcased from the Neural Parametric Gaussian Avatars (NPGA) research at the University of Munich, Germany (example 1, example 2). These high-quality avatars demonstrate the rapid advancements in AI-generated human-like representations.
Cartoon generation and interpolation: The ToonCrafter model was introduced for generating and interpolating cartoon-style images, with experiments showcasing its capabilities. This highlights the expanding range of AI-generated content beyond photorealistic imagery.
AI-powered game development: An open-source game engine was presented that leverages AI models like Stable Diffusion, AnimateDiff, and ControlNet to generate game assets and animations. The engine's source code and techniques for rendering animated sprites are fully available.
AI animation APIs: The Animate Anyone API, with code available on GitHub, enables the animation of people in images. However, comments suggest that alternatives like MusePose may offer better results.

AI Ethics & Societal Impact

AI partnerships and competition: Microsoft CEO Satya Nadella expressed concerns over a potential OpenAI-Apple deal, highlighting the strategic importance of AI partnerships and the competitive landscape.
Deepfake concerns: The growing potential for misuse of deepfake technology was emphasized, underscoring the need for safeguards and responsible AI practices.
AI in the film industry: Sony's plans to use AI for reducing film production costs raised questions about the impact on the creative industry and potential job displacement.
AI-generated content and realism: An AI-generated image titled "All Eyes On Rafah" faced criticism for lacking realism and potentially misrepresenting a sensitive situation, highlighting the challenges of AI-generated content.
AI and influence campaigns: OpenAI reported that Russia and China used its AI tools for covert influence campaigns, emphasizing the need for proactive measures against AI misuse, as detailed in their efforts to combat deceptive AI use.

AI Capabilities & Advancements

Bioprocessors and brain organoids: A groundbreaking bioprocessor utilizing human brain organoids was developed, offering highly efficient computation compared to digital chips.
AI in healthcare: New AI technology was shown to predict cardiac events related to coronary inflammation up to 10 years in advance, based on a landmark study published in The Lancet.
Quantum computing breakthrough: Chinese researchers, led by a US-returned physicist, claimed to have built the world's most powerful ion-based quantum computer.

OpenAI News & Developments

Leadership clarification: Paul Graham clarified that Y Combinator did not fire Sam Altman, contrary to circulating rumors.
Robotics research revival: OpenAI is rebooting its robotics team, signaling a renewed focus on the intersection of AI and robotics.
Addressing concerns: OpenAI board members responded to warnings raised by former members regarding the company's direction and practices.
AI for nonprofits: OpenAI launched an initiative to make its tools more accessible to nonprofit organizations, promoting beneficial AI applications.
Partnership with Reddit: The announcement of a partnership between OpenAI and Reddit raised questions about the potential implications for both platforms.

AI Humor & Memes

Robots then and now: A humorous comparison of the "I, Robot" movie's portrayal of robots in the past versus the present day was shared.

AI Discord Recap

A summary of Summaries of Summaries

1. Model Performance Optimization and Benchmarking

K2 Triumphs Over Llama 2: The K2 model from LLM360 surpasses Llama 2 70B in performance while using 35% less compute, fully open-sourced under Apache 2.0 license.
NeurIPS Hosts Model Merging Competition: A competition with an $8,000 prize invites contenders to blend optimal AI models, details available on the NeurIPS Model Merging Website.
Tailored Positional Embeddings Boost Transformer Arithmetic: Researchers achieved 99% accuracy on 100-digit sums using specific embeddings, detailed in their paper.

2. Fine-Tuning and Prompt Engineering

Tackling Dataset Merging and Training Tips: Axolotl users discussed effective merging datasets during fine-tuning to avoid issues like catastrophic forgetting. Recommended tools include Hugging Face Accelerate.
Fine-Tuning Techniques for Legal Draft Systems and Chatbots: Users fine-tuning LLMs for applications like legal drafts and financial document summarization swapped strategies, with resources like Fine-Tune PaliGemma.
Resolving Training Issues for Text Classification Models: Issues with training Spanish entity categorization models involved fine-tuning recommendations, exploring frameworks like RoBERTa.

3. Open-Source AI Developments and Collaborations

Milvus Lite for Efficient Vector Storage: Introducing Milvus Lite, a lightweight solution for Python-focused vector storage, detailed in the Milvus documentation.
MixMyAI Integrates Multiple AI Models on a Single Platform: The mixmyai.com platform consolidates open and closed-source models, emphasizing privacy and avoiding server storage of chat data.
LlamaIndex Offers Flexible Retrieval Systems: New Django-based web app templates facilitate Retrieval Augmented Generation (RAG) applications, utilizing data management and user access controls, as detailed here.

4. AI Community Innovations and Knowledge Sharing

Using Axolotl for Consistent Prompt Formats: Adjustments in Axolotl were made to ensure prompt format consistency, guiding users to settings like Axolotl prompters.py.
Challenges with Language Support in Ghost XB Beta: Unsloth AI discussed multilingual support in models like Ghost XB Beta aiming for 9+ languages during training phases, highlighting Ghost XB details.
Incorporating OpenAI and LangChain Tools: Resources like LangChain Intro, and real-time features announced for GPT-4 Alpha, were discussed for creating advanced AI applications.

5. Hardware Advancements and Compatibility Challenges

NVIDIA's New 4nm Research Chip Impresses: Achieving 96 int4 TOPs/Watt efficiency, significantly outpacing Blackwell's capabilities, with discussion on impacts shared here.
ROCm Support Challenges on AMD GPUs: Frustrations over ROCm's lack of support for GPUs like RX 6700 and RX580 led to discussions on potential alternatives and performance impacts.
Implementing Efficient Data Handling in CUDA: Discussions on optimizing CUDA operations, using techniques like fusing element-wise operations for better performance, with source code insights available here.

PART 1: High level Discord summaries

LLM Finetuning (Hamel + Dan) Discord

Transformer Trainings and Troubleshooting: Interactive sessions on Transformer architecture were requested for better understanding of complex topics like RoPE and RMS Normalization. In the meantime, Google Gemini Flash will allow free fine-tuning come June 17th, while careful calculation of costs for production using RAG LLMs remains imperative, prioritizing GPU time and third-party services considerations. The GGUF format is being advocated for to maintain compatibility with ecosystem tools, easing the fine-tuning process. (Fine-tune PaliGemma for image to JSON use cases)
Braving BM25 and Retrieval Woes: A hunt for a reliable BM25 implementation was launched, with the Python package rank_bm25 in the crosshairs due to its limited functionality. The conversation focused on enhancing vector retrieval; meanwhile, Modal users are directed to documentation for next steps after deploying v1 finetuned models, and a Modal credits initiative clarified expiration concerns.
Data Dominates Dialogue: The parsing of structured information from document AI required techniques like OCR and LiLT models. In parallel, data processing for 5K scraped LinkedIn profiles was considered with OpenPipe and GPT-4, while multi-modal approaches and document understanding stayed hot topics. Emphasis on precise matching of training data file formats to prevent KeyError: 'Input' surfaced as a troubleshooting tip.
Learning LLM Legwork and LangChain Link-Up: Resources from Humble Bundle and Sebastian Raschka offered insights into prompt engineering and LLM finetuning, though skepticism was raised about the quality of some materials. Reflecting the community's thirst for knowledge, O'Reilly released Part II of their series on building with LLMs, targeting operational challenges in LLM applications.
Curating Conversational Context: The distinction between instruct-LLM and chat-LLM models was dissected with the former following clear instructions, and the latter mastering conversational context. Projects discussed ranged from an Alexa-like music player to a legal draft system and a chatbot for financial document summaries, indicating the range of possible implementations for fine-tuned LLMs.
Modal Moves and Market Reach: Mediums like blogs played a vital role in spreading knowledge, with John Whitaker's blog becoming a go-to place for learning about things like Basement Hydroponics and LLM performance. More when practitioners shared gradient optimization tricks such as gradient checkpointing and agreed that sometimes, the simplest explanations, like those from Johno's sessions, resonate best.
Space for Spaces: Queries on how to change prompt styles for the alpaca format in Axolotl and Qwen tokenizer usage issues were discussed, with references pointing to specific GitHub configs. Meanwhile, deploying a Gradio-based RAG app sparked interest in using HF Spaces due to its ease of use.
Credit Craze and Communal Connects: Moments of panic and clarification underscored the urgency of filling out credit forms, as emphasized in urgent announcements. Social gatherings and discussions ranged from SF eval meetups to Modal Labs hosting office hours in NYC, indicating robust community connections and knowledge-sharing events.
Europe Engagement and Predibase Prospects: Check-ins from across Europe, such as Nuremberg, Germany and Milan, Italy, manifested the group's geographical span. Elsewhere, the mention of a 30-day free trial of Predibase offering $25 in credits reflected ongoing efforts to provide accessible finetuning and deployment platforms.
Career Crossroads: From academia to industry, members shared experiences and encouraged one another in career transitions. The discussion showcased contracting as a viable pathway, with mentorship and perseverance identified as crucial for navigating the tech landscape where GitHub portfolios can serve as vital stepping stones.

These summaries encapsulate the detailed, often granular discussions among AI Engineers in the Discord guild, highlighting the collective endeavor to optimize LLM fine-tuning and deployment amidst pursuit of career growth and community building.

HuggingFace Discord

K2 Triumphs Over Llama 2: LLM360's K2 model outpaces Llama 2 70B, achieving better performance with 35% less computational effort; it's touted as fully-reproducible and is accessible under the Apache 2.0 license.

Numbers Are No Match for Positional Embeddings: Researchers cracked the nut on transformers' arithmetic abilities; with tailored positional embeddings, transformers reach a 99% accuracy on 100-digit sums, a monumental feat outlined in their paper.

NeurIPS Throws Down the Merging Gauntlet: With an $8,000 purse, the NeurIPS Model Merging Competition invites contenders to blend optimal AI models. Hugging Face, among others, sponsors this competition, more info in the announcement and competition website.

Data Dive: From 150K Datasets to Clothing Sales: A treasure trove of 150k+ datasets is now at engineers' fingertips for exploration with DuckDB, explained in a blog post, while a novel clothing sales dataset propelled the development of an image regression model which was then detailed in this article.

Learning Resources and Courses Amplify Skills: In the perpetually advancing field of AI, engineers can bolster their expertise through Hugging Face courses in Reinforcement Learning and Computer Vision, with more information accessible at Hugging Face - Learn.

Unsloth AI (Daniel Han) Discord

Quantization Quandaries and High-Efficiency Hardware: Unsloth AI guild members highlight challenges with the quantized Phi3 finetune results, noting performance issues without quantization tricks. NVIDIA's new 4nm research chip is generating buzz with its 96 int4 tera operations per second per watt (TOPs/Watt) efficiency, overshadowing Blackwell's 20T/W and reflecting industry-wide advancements in power efficiency, numerical representation, Tensor Cores' efficiency, and sparsity techniques.

Model Fine-Tuning and Upscaling Discussions: AI engineers share insights on fine-tuning strategies, including dataset merging, with one member unveiling an 11.5B upscale model of Llama-3 using upscaling techniques. An emerging fine-tuning method, MoRA, suggests a promising avenue for parameter-efficient updates.

Troubleshooting Tools and Techniques: Engineers confront various hurdles, from GPU selection in Unsloth (os.environ["CUDA_VISIBLE_DEVICES"]="0") and troubleshooting fine-tuning errors to handling dual-model dependencies and addressing VRAM spikes during training. Workarounds for issues like Kaggle installation challenges underscore the need for meticulous problem-solving.

AI in Multiple Tongues: Ghost XB Beta garners attention for its capability to support 9+ languages fluently and is currently navigating through its training stages. This progress reaffirms the guild’s commitment to developing accessible, cost-efficient AI tools for the community, especially emphasizing startup support.

Communal Cooperative Efforts and Enhancements: Guild discussions reveal a collective push for self-deployment and community backing, with members sharing updates and seeking assistance across a spectrum of AI-related endeavors such as the Open Empathic project and Unsloth AI model improvements.

Perplexity AI Discord

Tako Widgets Limited Geographic Scope?: Discussion around the Tako finance data widget raised questions about its geographic limitations, with some users unsure if it's exclusive to the United States.
Perplexity Pro Trials End: Users talked about the discontinuation of Perplexity Pro trials, including the yearly 7-day option, spurring conversations around potential referral strategies and self-funded trials.
Perplexity's Page-Section Editing Quirks: Some confusion arose around editing sections on Perplexity pages, where users can alter section details but not the text itself – a limitation confirmed by multiple members.
Search Performance Trade-offs Noted: There's been observance of a slowdown in Perplexity Pro search, attributed to a new strategy that sequentially breaks down queries, which, despite lower speeds, offers more detailed responses.
Exploring Perplexity's New Features: Excitement was apparent as users shared links to newly introduced Perplexity Pages and discussions about Codestral de Mistral, hinting at enhancements or services within the Perplexity AI platform.

CUDA MODE Discord

NVIDIA and Meta's Chip Innovations Generate Buzz: The community was abuzz with NVIDIA's revelation of a 4nm inference chip achieving 96 int4 TOPs/Watt, outperforming the previous 20T/W benchmark, and Meta unveiling a next-generation AI accelerator clocking in at 354 TFLOPS/s (INT8) with only 90W power consumption, signaling a leap forward in AI acceleration capabilities.
Deep Dive into CUDA and GPU Programming: Enthusiasm surrounded the announcement of a FreeCodeCamp CUDA C/C++ course aimed at simplifying the steep learning curve of GPU programming. Course content requests emphasized the importance of covering GEMM for rectangular matrices and broadcasting rules pertinent to image-based convolution applications.
Making Sense of Scan Algorithms and Parallel Computing: The community engaged in eager anticipation of the second part of a scan algorithm series. At the same time, questions were raised regarding practical challenges with parallel scan algorithms highlighted in the Single-pass Parallel Prefix Scan with Decoupled Look-back paper, as well as requests for clarification of CUDA kernel naming in Triton for improved traceability in kernel profiling.
Strategies for Model Training and Data Optimization Shared: The conversation included a sharing of strategies on efficient homogeneous model parameter sharing to avoid inefficient replication during batching in PyTorch, and issues like loss spikes during model training which could potentially be diagnosed through gradient norm plotting. The idea of hosting datasets on Hugging Face was floated to facilitate access, with compression methods suggested to expedite downloads.
Cross-Platform Compatibility and Community Wins Celebrated: Progress and challenges in extending CUDA and machine learning library compatibility to Windows were discussed, with acknowledgment of Triton's intricacies. Meanwhile, the community celebrated reaching 20,000 stars for a repository and shared updates on structuring and merging directories to enhance organization, strengthening the ongoing collaboration within the community.

Stability.ai (Stable Diffusion) Discord

Online Content Privacy Calls for Education: A participant emphasized the importance of not publishing content as a privacy measure and stressed the need to educate people about the risks of providing content to companies.
Striving for Consistent AI Tool Results: Users noted inconsistencies when using ComfyUI compared to Forge, suggesting that different settings and features such as XFormers might influence results, despite identical initial settings.
Strategies for Merging AI Models Discussed: Conversations revolved around the potential of combining models like SDXL and SD15 to enhance output quality, though ensuring consistent control nets across model phases remains crucial.
Custom AI Model Training Insights Shared: Enthusiasts exchanged tips on training bespoke models, mentioning resources like OneTrainer and kohya_ss for Lora model training, and sharing helpful YouTube tutorials.
Beginner Resources for AI Exploration Recommended: For AI newbies, starting with simple tools like Craiyon was recommended to get a feel for image generation AI, before progressing to more sophisticated platforms.

LM Studio Discord

GPU Blues with ROCm? Not Music to Our Ears: Engineers discussed GPU performance with ROCm, lamenting the lack of support for RX 6700 and old AMD GPUs like RX580, influencing token generation speeds and overall performance. Users seeking performance benchmarks on multi-GPU systems with models such as LLAMA 3 8B Q8 reported a 91% efficiency with two GPUs compared to one.

VRAM Envy: The release of LM Studio models ignited debates on VRAM adequacy, where the 4070's 12GB was compared unfavorably to the 1070's 20GB, especially concerning suitability for large models like "codestral."

CPU Constraints Cramp Styles: CPU requirements for running LM Studio became a focal point, where AVX2 instructions proved mandatory, leading users with older CPUs to use a prior version (0.2.10) for AVX instead.

Routing to the Right Template: AI engineers shared solutions and suggestions for model templates, such as using Deepseek coder prompt template for certain models, and advised checking tokenizer configurations for optimal formatting with models like TheBloke/llama2_7b_chat_uncensored-GGUF.

New Kids on the Block - InternLM Models: Several InternLM models designed for Math and Coding, ranging from 7B to a mixtral 8x22B, were announced. Models such as AlchemistCoder-DS-6.7B-GGUF and internlm2-math-plus-mixtral8x22b-GGUF were highlighted among the latest tools available for AI engineers.

OpenRouter (Alex Atallah) Discord

Speed Boost to Global API Requests: OpenRouter has achieved a global reduction in latency, lowering request times by ~200ms, especially benefiting users from Africa, Asia, Australia, and South America by optimizing edge data delivery.
MixMyAI Launches Unified Pay-as-You-Go AI Platform: A new service called mixmyai.com has been launched, consolidating open and closed-source models in a user-friendly interface that emphasizes user privacy and avoiding the storage of chats on servers.
MPT-7B Redefines AI Context Length: The Latent Space podcast showcased MosaicML's MPT-7B model and its breakthrough in exceeding the context length limitations of GPT-3 as detailed in an in-depth interview.
Ruby Developers Rejoice with New AI Library: A new OpenRouter Ruby client library has been released, along with updates to Ruby AI eXtensions for Rails, essential tools for Ruby developers integrating AI into their applications.
Server Stability and Health Checks Called Into Question: OpenRouter users confronted sporadic 504 errors across global regions, with interim solutions provided and discussions leaning towards the need for a dedicated health check API for more reliable status monitoring.

OpenAI Discord

Pro Privileges Propel Chat Productivity: Pro users of OpenAI now enjoy enhanced capabilities such as higher rate limits, and exclusive GPT creation, along with access to DALL-E and real-time communication features. The alluring proposition maintains its charm despite the $20 monthly cost, marking a clear divide from the limited toolkit available to non-paying users.

AI Framework Favorites Facilitate Functional Flexibility: The Chat API is recommended over the Assistant API for those developing AI personas with idiosyncratic traits, as it offers superior command execution without surplus functionalities such as file searching.

Bias Brouhaha Besieges ChatGPT: A suspension due to calling out perceived racism in ChatGPT's outputs opened up a forum of contention around inherent model biases, spotlighting the relentless pursuit of attenuating such biases amidst the ingrained nuances of training data.

Virtual Video Ventures Verified: Sora and Veo stand as subjects of a speculative spree as the guild contemplates the curated claims and practical potency of the pioneering video generation models, juxtaposed against the realities of AI-assisted video crafting.

API Agitations and Advancements Announced: Persistent problems presented by memory leaks causing lag and browser breakdowns mar the ChatGPT experience, triggering talks on tactical chat session limits and total recall of past interactions to dodge the dreariness of repetition. Meanwhile, the anticipated arrival of real-time voice and visual features in GPT-4 has been slated to debut in an Alpha state for a select circle, broadening over subsequent months as per OpenAI's update.

Nous Research AI Discord

NeurIPS Competition: Merge Models for Glory and Cash: NeurIPS will host a Model Merging competition with an $8K prize, sponsored by Hugging Face and Sakana AI Labs—seeking innovations in model selection and merging. Registration and more info can be found at llm-merging.github.io as announced on Twitter.

AI's Quest to Converse with Critters: A striking $500K Coller Prize is up for grabs for those who can demystify communication with animals using AI, sparking excitement for potential breakthroughs (info). This initiative echoes Aza Raskin's Earth Species Project, aiming to untangle interspecies dialogue (YouTube video).

Puzzling Over Preference Learning Paradox: The community is abuzz after a tweet highlighted unexpected limitations in RLHF/DPO methods—preference learning algorithms are not consistently yielding better ranking of preferred responses, challenging conventional wisdom and suggesting a potential for overfitting.

LLMs Reigning Over Real-Time Web Content: A revelation for web users: LLMs are often churning out web pages in real-time, rendering what you see as it loads. This routine faces hiccups with lengthy or substantial pages due to context constraints, an area ripe for strategic improvements.

Google Enhances AI-Driven Search: Google has upgraded its AI Overviews for US search users, improving both satisfaction and webpage click quality. Despite some glitches, they're iterating with a feedback loop, detailed in their blog post – AI Overviews: About last week.

LlamaIndex Discord

Milvus Lite Elevates Python Vector Databases: The introduction of Milvus Lite offers a lightweight, efficient vector storage solution for Python, compatible with AI development stacks like LangChain and LlamaIndex. Users are encouraged to integrate Milvus Lite into their AI applications for resource-constrained environments, with instructions available here.
Crafting Web Apps with Omakase: A new Django-based web app template facilitates the building of scalable Retrieval Augmented Generation (RAG) applications, complete with RAG API, data source management, and user access control. The step-by-step guide can be found here.
Navigating Data Transfer for Retrieval Systems: For those prototyping retrieval systems, the community suggests creating an "IngestionPipeline" to efficiently handle data upserts and transfers between SimpleStore classes and RedisStores.
Complexities in Vector Store Queries Assessed: The functionalities of different vector store query types like DEFAULT, SPARSE, HYBRID, and TEXT_SEARCH in PostgreSQL were clarified, with the consensus that both text and sparse queries utilize tsvector.
Troubleshooting OpenAI Certificate Woes: Addressing SSL certificate verification issues in a Dockerized OpenAI setup, it was recommended to explore alternative base Docker images to potentially resolve the problem.

Eleuther Discord

Luxia Language Model Contamination Alert: Luxia 21.4b v1.2 has been reported to show a 29% increase in contamination on GSM8k tests over v1.0, as detailed in a discussion on Hugging Face, raising concerns about benchmark reliability.
Ready, Set, Merge!: NeurIPS Model Merging Showdown: A prize of $8K is up for grabs in the NeurIPS 2023 Model Merging Competition, enticing AI engineers to carve new paths in model selection and merging.
Cutting-Edge CLIP Text Encoder and PDE Solver Paradigm Shifts: Recognition is given for advancements in the CLIP text encoder methodology through pretraining, as well as the deployment of Poseidon, a new model for PDEs with sample-efficient and accurate results, highlighting papers on Jina CLIP and Poseidon.
Softmax Attention's Tenure in Transformers: A debate has crystallized around the necessity of softmax weighted routing in transformers, with some engineers suggesting longstanding use trumps the advent of nascent mechanisms like "function attention" that retain similarities to existing methodologies.
A Reproducibility Conundrum with Gemma-2b-it: Discrepancies emerged in attempts to replicate Gemma-2b-it's 17.7% success rate, with engineers turning to a Hugging Face forum and a Colab notebook for potential solutions, while results for Phi3-mini via lm_eval have proven more aligned with expected outcomes.

Modular (Mojo 🔥) Discord

Mojo Rising: Package Management and Compiler Conundrums: The Mojo community is awaiting updates on a proposed package manager, as per Discussion #413 and Discussion #1785. The recent nightly Mojo compiler version 2024.5.3112 brought fixes and feature changes, outlined in the raw diff and current changelog.
Ready for Takeoff: Community Meetings and Growth: The Mojo community looks forward to the next meeting featuring interesting talks on various topics with details available in the community doc and participation through the Zoom link.
The Proof is in the Pudding: Mojo Speeds and Fixes: A YouTube video demonstrates a significant speedup by porting K-Means clustering to Mojo, detailed here. The discovery of a bug in reversed(Dict.items()) which caused flaky tests was rectified with a PR found here.
STEMming the Learning Curve: Educational Resources for Compilers: For learning about compilers, an extensive syllabus has been recommended, available here.
Stringing Performance Together: A more efficient string builder is proposed to avoid memory overhead and the conversation inclines toward zero-copy optimizations along with using iovec and writev for better memory management.

LangChain AI Discord

Public Access for LangChainAPI: A request was made for a method to expose LangChainAPI endpoints publicly for LangGraph use cases, with an interest in utilizing LangServe but awaiting an invite.
LangGraph Performance Tuning: Discussions around optimizing LangGraph configurations focused on reducing load times and increasing the start-up speed of agents, indicating a preference for more efficient processes.
Memory and Prompt Engineering in Chat Applications: Participants sought advice on integrating summaries from "memory" into ChatPromptTemplate and combining ConversationSummaryMemory with RunnableWithMessageHistory. They shared tactics for summarizing chat history to manage the token count effectively, alongside relevant GitHub resources and LangChain documentation.
LangServe Website Glitch Reported: An error on the LangServe website was reported, together with sharing a link to the site for further details.
Prompt Crafting with Multiple Variables: Queries were made on how to structure prompts with several variables from the LangGraph state, providing a formulated prompt example and inquiries about variable insertion timing.
Community Projects and Tool Showcases: In the community work sphere, two projects were highlighted: a YouTube tutorial on creating custom tools for agents (Crew AI Custom Tools Basics), and an AI tool named AIQuire for document insights, which is available for feedback at aiquire.app.

LAION Discord

Fineweb Fires Up Image-Text Grounding: A promising approach called Fineweb utilizes Puppeteer to scrape web content, extracting images with text for VLM input—offering novel means for grounding visual language models. See the Fineweb discussion here.
StabilityAI's Selective Release Stirs Debate: StabilityAI's decision to only release a 512px model, leaving the full suite of SD3 checkpoints unpublished, incites discussion among members on how this could influence future model improvements and resource allocation.
Positional Precision: There's technical chatter regarding how positional embeddings in DiT models may lead to mode collapse when tackling higher resolution images, despite their current standard uses.
Open-Source Jubilation: The open-source project tooncrafter excites the community with its potential, although minor issues are being addressed, showcasing a communal drive towards incremental advancement.
Yudkowsky's Strategy Stirs Controversy: Eliezer Yudkowsky's institute published a "2024 Communication Strategy" advocating for a halt in AI development, sparking diverse reactions amongst tech aficionados. Delve into the strategy.
Merging Models at NeurIPS: NeurIPS is hosting a Model Merging competition with an $8K prize to spur innovation in LLMs. Interested participants should visit the official Discord and registration page.
RB-Modulation for Aesthetic AI Artistry: The RB-Modulation method presents a novel way to stylize and compose images without additional training, and members can access the project page, paper, and soon-to-be-released code.

OpenAccess AI Collective (axolotl) Discord

Yuan2 Model Sparks Community Interest: Members shared insights on the Yuan2 model on Huggingface, highlighting a keen interest in examining its training aspects.
Training Techniques Face-Off: Detailed discussions compared various preference training methodologies, spotlighting the ORPO method, which was suggested to supersede SFT followed by DPO, due to its "stronger effect". Supporting literature was referenced through an ORPO paper.
Challenging Model Fine-Tuning: Concerns emerged over struggles in fine-tuning models like llama3 and mistral for Spanish entity categorization. One instance detailed issues with model inference after successful training.
Members Seek and Offer Tech Aid: From installation queries about Axolotl and CUDA to configuring an early stopping mechanism using Hugging Face Accelerate library for overfitting issues, the guild members actively sought and rendered technical assistance. Shared resources included the axolotl documentation and the early stopping configuration guide.
Axolotl Configuration Clarifications: There was an advisory exchange regarding the proper configuration of the chat_template in Axolotl, recommending automatic handling by Axolotl to manage Alpaca formatting with LLama3.

DiscoResearch Discord

DiscoLeo Caught in Infinite Loop: The incorporation of ChatML into DiscoLeo has resulted in an End Of Sentence (EOS) token issue, causing a 20% chance of the model entering an endless loop. Retraining DiscoLeo with ChatML data is proposed for resolution.
ChatML Favored for Llama-3 Template: German finetuning has shown a preference for using ChatML over the Llama-3 instruct template, especially when directed towards models like Hermes Theta that are already ChatML-based.
IBM’s “Granite” Models Spark Curiosity: Engineers are exploring IBM's Granite models, including Lab version, Starcode-based variants, and Medusa speculative decoding, with resources listed on IBM’s documentation and InstructLab Medium article.
Merlinite 7B Pit Against Granite: The Merlinite 7B model has garnered attention for its proficiency in German, vying for comparison with the IBM Granite models tracked under the Lab method.
Quality Concerns Over AI-Generated Data: The community indicated dissatisfaction with the quality of AI-generated data, illustrated by sub-par results in benchmarks like EQ Bench on q4km gguf quants, and showed interest in new strategies to enhance models without catastrophic forgetting.

Interconnects (Nathan Lambert) Discord

Google's Expansion Raises Eyebrows: A tweet has hinted at Google bolstering its compute resources, sparking speculation over its implications for AI model training capacities.
OpenAI Puts Robotics Back on Table: OpenAI reboots its robotics efforts, now hiring research engineers, as reported via Twitter and in a Forbes article, marking a significant re-entry into robotics since 2020.
Confusion Clouds GPT-3.5's API: Community members expressed frustration with the confusing documentation and availability narrative around GPT-3.5; some pointed out discrepancies in timelines and the inconvenience caused by deleted technical documentation.
Sergey's Call to Arms for Physical Intelligence: Nathan Lambert relayed Sergey's recruitment for a project in physical intelligence, signaling opportunities for those with interest in Reinforcement Learning (RL) to contribute to practical robot utilization.
'Murky Waters in AI Policy' Session Served Hot: The latest episode of the Murky Waters in AI Policy podcast dishes out discussions on California's controversial 1047 bill and a rapid-fire roundup of recent OpenAI and Google mishaps. Nathan Lambert missed the open house for the bill, details of attendance or reasons were not provided.

Cohere Discord

Cohere Pioneering AI Sustainability: Cohere was noted for prioritizing long-term sustainability over immediate grand challenges, with a focus on concrete tasks like information extraction from invoices.
AGI Still Room to Grow: Within the community, it's agreed that the journey to AGI is just starting, and there's a continuous effort to understand what lies beyond the current "CPU" stage of AI development.
Enhanced Server Experience with Cohere: The server is undergoing a makeover to simplify channels, add new roles and rewards, replace server levels with "cohere regulars," and introduce Coral the AI chatbot to enhance interactions.
Express Yourself with Custom Emojis: For a touch of fun and to improve interactions, the server will incorporate new emojis, with customization options available through moderator contact.
Feedback Wanted on No-Code AI Workflows: A startup is looking for insights on their no-code workflow builder for AI models, offering a $10 survey incentive—they're curious why users might not return after the first use.

Latent Space Discord

Adapter Layers Bridge the Gap: Engineers are exploring embedding adapters as a means to improve retrieval performance in AI models, with evidence showcased in a Chroma research report. The effectiveness of these can be likened to Froze Embeddings, which the Vespa team employs to eliminate frequent updates in dynamic systems (Vespa's blog insights).

ChatGPT Goes Corporate with PwC: The acquisition of ChatGPT Enterprise licenses by PwC for roughly 100,000 employees sparked debates around the estimated value of $30M/year, with member guesses on the cost per user ranging from $8 to $65 per month.

Google's Twin Stars: Gemini 1.5 Flash & Pro: Release updates for Google Gemini 1.5 Flash and Pro have been pushed to general availability, introducing enhancements such as increased RPM limits and JSON Schema mode (Google developers blog post).

TLBrowse Joins the Open Source Universe: TLBrowse, melding Websim with TLDraw, was open-sourced, allowing users to conjure up infinite imagined websites on @tldraw canvas, with access to a free hosted version.

AI Stack Devs (Yoko Li) Discord

Literary Worlds in Your Browser: Rosebud AI gears up for their "Book to Game" game jam, inviting participants to build games from literary works with Phaser JS. The event offers a $500 prize and runs until July 1st, details available through their Twitter and Discord.
Navigating the Digital Terrain: A new guild member expressed difficulties in using the platform on Android, describing the experience as "glitchy and buggy". They also sought help with changing their username to feel more at home within the virtual space.

OpenInterpreter Discord

Check the Pins for Manufacturing Updates: It's crucial to stay on top of the manufacturing updates in the OpenInterpreter community; make sure to check the pinned messages in the #general channel for the latest information.
Codestral Model Sparks Curiosity: Engineers have shown interest in the Codestral model with queries about its efficiency appearing in multiple channels; however, user experiences have yet to be shared. Additionally, it's noted that Codestral is restricted to non-commercial use.
Combating Integration Challenges: There's a shared challenge in integrating HuggingFace models with OpenInterpreter, with limited success using the interpreter -y command. Engineers facing these issues are advised to seek advice in the technical support channel.
Scam Alert Issued: Vigilance is essential as a "red alert" was issued about a potential scam within the community. No further details about the scam were provided.
Android Functionality Discussions Ongoing: Members are engaged in discussions regarding O1 Android capability, specifically around installation in Termux, although no conclusive responses have been observed yet.

Mozilla AI Discord

Llamafile Joins Forces with AutoGPT: AutoGPT member announced a collaboration to weave Llamafile into their system, expanding the tool's reach and capabilities.
Inquiry into Content Blocks: Queries were made about whether Llamafile can handle content blocks within messages, seeking parity with an OpenAI-like feature; similar clarity was sought for llama.cpp's capabilities in this domain.

MLOps @Chipro Discord

Netflix PRS Event Gathering AI Enthusiasts: AI professionals are buzzing about the PRS event at Netflix, with multiple members of the community confirming attendance for networking and discussions.

Datasette - LLM (@SimonW) Discord

Mistral 45GB Model Composition Speculations: Interest is brewing around the Mistral 45GB model's language distribution, with a hypothesis suggesting a strong bias towards English and a smaller presence of programming languages.
Codestral Compliance Conundrum: The community is engaging with the intricacies of the Mistral AI Non-Production License (MNPL), finding its restrictions on sharing derivative or hosted works underwhelming and limiting for Codestral development.

tinygrad (George Hotz) Discord

TensorFlow vs. PyTorch Debate Continues: A user named helplesness asked why TensorFlow might be considered better than PyTorch, sparking a comparison within the community. The discussion did not provide an answer but is indicative of the ongoing preference debates among frameworks in the AI engineering world.

The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

LLM Finetuning (Hamel + Dan) ▷ #general (86 messages🔥🔥):

Guest Session on Transformer's Internals Requested: A member requested a session discussing Transformer architecture topics like vanilla transformer, RoPE, RMS Normalization, and more. Video resources on these topics were shared, but the member emphasized the need for interactive sessions for Q&A.
Google's Gemini Flash to Support Fine-Tuning: Starting June 17th, Google will allow free fine-tuning on Gemini Flash, with inference costs matching the base model rates. This was highlighted as a cost-effective opportunity for fine-tuning.
Cost Management for Production-Level Systems: There was an exchange about calculating the costs for production systems using RAG LLMs, focusing on GPU time utilization and third-party service trade-offs. The discussion emphasized the importance of experimenting with compute platforms and managing expectations based on usage scenarios.
GGUF Format for Fine-Tuning Models: GGUF format was recommended for fine-tuning LLMs to ensure compatibility with various tools in the ecosystem. A link to a detailed blog post on fine-tuning and inference steps was shared along with an update that Hugging Face is working on easier HF to GGUF conversions.
Document AI Developments: Multiple users discussed their experiences and challenges with document processing, such as invoices and utility bills. Techniques like OCR, LiLT models, segmentation, and using multimodal/multimodal-less approaches for extracting structured information were shared, along with links to resources and related papers.

Links mentioned:

LLM Finetuning (Hamel + Dan) ▷ #workshop-1 (10 messages🔥):

User offers help with legal draft system: A member expressed willingness to assist another user with their "lgeal draft system". No further details were provided on the system or assistance needed.
Alexa-like music player proposal: A member queried if fine-tuning would be suitable for a product resembling Alexa but not locked to Amazon Music. They suggested using LangGraph + function calling to interact with various music service APIs like YouTube and Spotify.
Chatbot for financial document summaries: A member outlined a project to develop a chatbot capable of answering complex financial questions by summarizing financial documents. They indicated the necessity of RAG and a form of PO for generating user-preferenced summaries.
Fine-tuning LLM for Cypher/GQL translation: One user intends to fine-tune an LLM to translate natural language questions into Cypher/GQL. They noted that this could greatly enhance interaction with graph data.
Discussion on instruct-LLM vs. chat-LLM: An extensive discussion debated the models' distinctions, focusing on the training and evaluation differences. Users noted that while newer models blur these lines, instruct models follow clear instructions and chat models handle conversational context.

LLM Finetuning (Hamel + Dan) ▷ #asia-tz (1 messages):

blaine.wishart: Hi everyone...I'm on Hainan for the next 3 months.

LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (18 messages🔥):

Deployed model next steps: After deploying the v1 finetuned model, a member sought guidance on using it further. This documentation was suggested to understand usage and invoking deployed functions on the Modal platform.
Modal credits hiccups resolved: Various issues and queries about Modal credits were tackled, like a user noting lost credits and another inquisitive about expiration. Modal credits expire after one year, but active users can contact support to potentially roll over credits, and users doing academic research or involved in startups can get additional credits.
Troubleshooting dataset issues: A member faced a "KeyError: 'Input'" while working with a specific training data file. It was recommended to check the dataset's format consistency and ensure the correct field keys match what's defined in the config.

Links mentioned:

LLM Finetuning (Hamel + Dan) ▷ #learning-resources (9 messages🔥):

AI-Coding Humble Bundle Alert: A member informed the channel about an AI-coding humble bundle, but expressed skepticism about the content quality, noting "First skim does not look great." Another member added that generating such materials independently could be cheaper.
Sebastian Raschka’s Chapter on LLM Fine-Tuning: A link to Sebastian Raschka’s chapter on finetuning LLMs for classification in his upcoming book was shared, outlining topics like different finetuning approaches, dataset preparation, and accuracy evaluation for spam classification.
O'Reilly Releases Part II on LLM Building: Following positive feedback on Part I, O'Reilly fast-tracked the release of Part II of their series on building with LLMs, shifting focus from tactical to operational aspects of building LLM applications, and noting challenges worth addressing.

Links mentioned:

LLM Finetuning (Hamel + Dan) ▷ #langsmith (4 messages):

Langsmith-HIPAA Compatibility Inquiry: A member inquired whether Langsmith offers paid plans supporting HIPAA environments, mentioning the need for handling PII/PHI securely and the necessity of a Business Associate Agreement (BAA) in place.
Langsmith Compatibility with OpenAI Models: Another user asked if Langsmith can be used with OpenAI-compatible models like Mixtral, or any models following the same API standards, such as Anthropic.
Langsmith Connecting to various Models: Lucas shared insights on using Langchain and Langsmith with Meta's Llama-3:8b via Ollama and highlighted Langchain’s integration with Together AI. The detailed steps and code snippets for using Together AI can be found in Lucas's blog post.

Links mentioned:

LLM Finetuning (Hamel + Dan) ▷ #kylecorbitt_prompt_to_model (1 messages):

Seeking advice on processing LinkedIn profile data: A user asked for the best approach to processing data from 5K scraped LinkedIn profiles with 20+ columns. They aim to build a fine-tuned model to generate personalized introduction lines using OpenPipe with GPT-4, and later fine-tune a llama-3-8b model.

LLM Finetuning (Hamel + Dan) ▷ #berryman_prompt_workshop (3 messages):

Deck from the talk now accessible: A member asked if the deck from a recent talk was available. Another provided a Discord link that also includes a link to the slides.
Insightful prompt crafting resource shared: A member highlighted ExplainPrompt as a valuable resource. The site is maintained by a GitHub colleague who posts summaries and visual guides about prompt crafting techniques based on the latest papers.

Link mentioned: ExplainPrompt: no description found

LLM Finetuning (Hamel + Dan) ▷ #whitaker_napkin_math (268 messages🔥🔥):

Begin your LLM adventure on Johno's blog: Members were excited to share John Whitaker's valuable content. Check out his blog featuring insightful articles like More=Better?, mini-hardware projects on Basement Hydroponics, and tips on high-surface-area problems.
Quadcopter crash avoided by profile optimization: GitHub links such as fsdp_qlora benchmarks and does LoRA cause memory leaks were posted. These references enhance knowledge about LLM training, memory leak issues, and their practical resolutions.
Johno shares LoRA wisdom: Discussions included practical tips around LoRA's functionality, such as approximating large matrices with smaller ones, and considerations for LoRA ranks (N x r). Useful for those fine-tuning models while optimizing resource efficiency.
The napkin maestro redefines simplicity: Johno's clear and effective teaching style captivated attendees, leading to calls for more in-depth sessions. "He knows how to teach and explain things really well," one member noted, urging further opportunities to learn from him.
Unlock the power of gradient tricks: Members shared advanced techniques like gradient checkpointing and splitting the gradient calculation to optimize memory and speed. Hyperlinks to Twitter, GitHub, and Google Docs were passed around for further reading and exploration.

Links mentioned:

LLM Finetuning (Hamel + Dan) ▷ #workshop-2 (7 messages):

Building Internal Tool for Compliance: A member is developing an internal tool that converts inputs like "CloudTrail should have encryption at-rest enabled" into multiple files, including rego files compliant with a specific company schema. They are assessing whether the system's 66% accuracy is due to the retrieval method and considering fine-tuning the model for schema and code logic improvements.
Challenges in Rule Retrieval and Accuracy: The tool currently retrieves entire documents for context, potentially overwhelming the model. Issues include code compilation errors, incomplete code, hallucinations, and incorrect logic, with considerations on whether fine-tuning could improve its adherence to schema.
Text Classification in Spanish Entities: A member is refining a model to categorize Spanish text entities into persons, companies, or unions but faces poor performance during inference. They outline the multi-step instructions used for classification and seek advice on improving model accuracy.
Maintaining Template Alignment in Fine-Tuning: For multi-turn chat applications, there's a discussion on whether adhering to the official chat template is crucial when fine-tuning models to retain general utility without starting from scratch. A member assumes alignment is beneficial but looks for community confirmation.

LLM Finetuning (Hamel + Dan) ▷ #abhishek_autotrain_llms (57 messages🔥🔥):

AutoTrain Simplifies AI Model Creation: A member shared links to AutoTrain, emphasizing its user-friendly approach to creating powerful AI models without code. AutoTrain handles a variety of tasks including LLM Finetuning, text and image classification, and is integrated with the Hugging Face Hub for easy deployment.
Clarification on ORPO and SimPO: Discussion surrounded ORPO, described as "odds ratio preference optimization" akin to DPO but without a reference model, and SimPO, with participants noting its promising aspects despite being very new and possibly subject to "buzz".
Challenges Without Nvidia GPUs: Members discussed the impracticality of training AI models without Nvidia GPUs, lamenting the slow performance of CPUs and the lack of support for other GPU brands in AI libraries.
Dataset and Optimizer Queries: Participants requested more details on setting up datasets for RAG and customizing optimizer functions for AutoTrain, suggesting these questions be raised in a Zoom Q&A for detailed responses.
Gratitude and Additional Resources: The session ended with multiple users expressing thanks to Abhishek for the presentation on AutoTrain and sharing additional resources, including a GitHub repo for AutoTrain Advanced and various configuration guides.

Links mentioned:

LLM Finetuning (Hamel + Dan) ▷ #clavie_beyond_ragbasics (3 messages):

Love/Hate Relationship with Single Vector Embeddings: Although single-vector embeddings are useful for prototyping, they fall short of a full retrieval pipeline. ColBERT surpasses single-vector methods in out-of-domain tasks due to its token-level encodings, which provide richer information for OOD scenarios (Vespa blog on ColBERT).
Sparse Embeddings and M3 Mentioned Briefly: The discussion will briefly touch on sparse embeddings and M3, focusing primarily on the advantages and limitations of single-vector embeddings for retrieval pipelines.
ColBERT's Detailed Output: Unlike single-vector methods that pool everything into a 1024-dimensional vector per document, ColBERT produces numerous 128-dimensional vectors, one per token, resulting in a higher dimensional output for more detailed information processing. For instance, 500 documents with 300 tokens each yield an output of 500,300,128.

Link mentioned: Announcing the Vespa ColBERT embedder: Announcing the native Vespa ColBERT embedder in Vespa, enabling explainable semantic search using token-level vector representations

LLM Finetuning (Hamel + Dan) ▷ #jason_improving_rag (1 messages):

Search for a reliable BM25 implementation: A user is looking for a BM25 ranking method to mix with vector retrieval and mentions the Python package rank_bm25. They express surprise that it doesn't use sklearn tokenizers/vectorizers or handle n-grams, stop words, or stemming in creating the vocabulary, asking what others are using.

LLM Finetuning (Hamel + Dan) ▷ #gradio (6 messages):

Mitch dives into Gradio fine-tuning: A user shared their Gradio fine-tuning project on GitHub, aiming to generate a quality dataset to leverage Gradio’s latest features. They mentioned following advice from an earlier course to learn by contributing to open-source projects.
Question realism concerns: Another member pointed out that the LLM-generated questions in the dataset might not reflect realistic user queries. They suggested referencing more concrete questions, providing a specific example, and adding few-shot examples to the prompt.
Experimenting with RAG: A user admitted to not trying Retrieval-Augmented Generation (RAG) before diving into fine-tuning, acknowledging that a strong prompt sometimes surpasses fine-tuning efforts. They are considering integrating RAG into their workflow to enhance question-answer generation.
Value of data generation: Members exchanged views on the importance of data generation and collection in fine-tuning projects. One noted the process as the "secret sauce," showing excitement about the progress and potential of this Gradio fine-tuning venture.

Links mentioned:

LLM Finetuning (Hamel + Dan) ▷ #axolotl (7 messages):

Axolotl not logging to WandB during training: A user reported that while running locally, initial training metrics are logged to WandB, but nothing updates as training progresses. They suspect it might be related to changes in their configuration file, mentioning, "the step number restarts after the 2nd step... the metrics reported after the first step 0 are the only metrics I ever see logged."
Override datasets in axolotl CLI: A user inquired whether it's possible to override the datasets (path and type) when calling axolotl.cli.train from the command line. No solutions were provided in the message thread.
Installation issue on Apple M3 Max with axolotl: A user on an Apple M3 Max reported an error when running pip3 install -e '.[flash-attn,deepspeed]'. They also posted a screenshot of the error but did not receive any responses yet.
Creating instruction-style prompt templates: A user asked for help setting up a prompt template in axolotl so that it uses a system message from their dataset instead of the preamble. They mentioned struggling with this issue for a couple of hours and sought advice.

LLM Finetuning (Hamel + Dan) ▷ #zach-accelerate (9 messages🔥):

Loading Large Shards With Accelerate Is Painful: One member inquired about speeding up "loading shards with accelerate" as it "takes quite some time for 70b". Another jokingly suggested getting "a faster hard drive" and warned about the upcoming Lama 400B with weights "near 1TB in size".
Unsloth vs Accelerate Shard Loading: Members discussed how unsloth can save 4bit models and load them in 6 shards, suggesting a similar approach for accelerate. However, one noted that the delay in loading times is likely related to quantization rather than hard drive speed or mere disk read times.

LLM Finetuning (Hamel + Dan) ▷ #wing-axolotl (6 messages):

Qwen tokenizer debugging insights shared: After extensive debugging and communication with the Qwen team, it was determined that PreTrainedTokenizer is correct and using Qwen2Tokenizer might cause issues. This issue stems from differences in how LLamaFactory and Axolotl handle get_train_dataloader calls (Huggingface transformers trainer; Axolotl trainer builder).
Adjusting prompt styles in Axolotl: A member inquired about setting different prompt styles for the alpaca format in Axolotl. Another member suggested using the chat_template: chatml configuration to change the prompt format as per the dataset requirements (Axolotl prompters).

Links mentioned:

LLM Finetuning (Hamel + Dan) ▷ #freddy-gradio (2 messages):

Deploy Gradio RAG app easily: A user asked for the easiest way to deploy a simple Python script for a Gradio-based RAG app so that a small group of users can test it out. Another user recommended using HF Spaces for the deployment.
Concerns about "share=true" functionality: The same user expressed curiosity about whether using "share=true" in the launch method sends their code to be stored on a Gradio server. There were no additional responses to this query in the messages provided.

LLM Finetuning (Hamel + Dan) ▷ #charles-modal (12 messages🔥):

Charles Frye praises community support: Charles thanked the members who posted event links, highlighting specific users. "Thanks so much to the folks who posted the links i mentioned in this chat! y'all are the best."
Anticipation and OLAP orientation: Charles expressed excitement about future discussions and noted their system is "much more oriented to read-heavy OLAP than write-heavy OLTP."
Recording of Office Hours: Modal with Charles Frye missing: A user asked about the recording of the event, noting that the course page still had the "join event" link. Other members confirmed the issue, and Dan fixed it, stating, "Should be fixed now."

LLM Finetuning (Hamel + Dan) ▷ #langchain-langsmith (70 messages🔥🔥):

LangChain Tools Demystified: A discussion on the differences between LangChain, LangSmith, LangGraph, LangFlow, and LangServe revealed that LangChain is a framework for developing applications using LLMs, LangSmith is for inspecting and optimizing chains, and LangServe turns any chain into an API. LangFlow and LangGraph usages were more ambiguously linked to this framework (LangChain Intro).
LangServe Praised, LangFlow Not Directly Related: LangServe was highlighted as a favorite tool for turning chains into APIs. Several users clarified that LangFlow is not directly related to the LangChain suite but uses the LangChain framework.
Infrastructure and Deployment Talks: There was interest in more granular controls within LangServe, and frustrations were expressed regarding its API documentation. Additionally, discussions touched on leveraging OpenAI's batch API for synthetic data generation and the comprehensive learning required for GPU optimization and fine-tuning algorithms.
Generative UI Hype: Members discussed new developments like GenUI for improving consumer understanding of AI, with a notable focus on generative UI examples from CVP at W&B and a Generative UI GitHub template.
Blog Post on LangChain & LangSmith: A user shared a blog post detailing their experience using LangChain and LangSmith with LLama3:8b on Ollama and Jarvislabs, prompting others to share it on social media for broader visibility.

Links mentioned:

LLM Finetuning (Hamel + Dan) ▷ #allaire_inspect_ai (4 messages):

LLM Finetuning (Hamel + Dan) ▷ #credits-questions (12 messages🔥):

Confusion over Registration Deadlines: There was some confusion regarding the deadline for submitting forms for $3,500 in compute credits, with Hamel Husain's tweet indicating May 29, while email communications suggested May 30.
Last-Minute Form Submission: Members were concerned about submitting forms by the deadline, but it was clarified that the deadline was midnight to provide 24hrs of leeway for those who signed up the previous night.
Credit Allocation Delays: Users expressed worries about delays in credit grants from modal forms and Charles explained the slight delay was due to human review processes taking time.
Predibase Registration Issue Solved: Michael Ortega informed users that Predibase had removed the restriction on Gmail addresses for account creation and encouraged users facing issues to contact support.
Notification of Credits: It was clarified that credits would be reflected directly in users' accounts on the respective platforms and that the allocation might take until the middle of next week due to varying vendor responsiveness.

Link mentioned: Tweet from Hamel Husain (@HamelHusain): The $3,500 in compute credits end TODAY. We won't be able to give them out after 11:59 PM PST 5/29/2024 Quoting Eugene Yan (@eugeneyan) PSA: Signups for LLM-conf + finetuning workshop close to...

LLM Finetuning (Hamel + Dan) ▷ #west-coast-usa (1 messages):

Evals Gathering in SF: A member announced a gathering with 50 or so folks at their co-op in the Mission, SF, to discuss evaluations this Sunday. They asked interested parties to DM them for an invite and provide a social account for verification.

LLM Finetuning (Hamel + Dan) ▷ #east-coast-usa (4 messages):

Hopeful Registration for Private Event: One member expressed excitement about registering for an upcoming event and is hopeful for acceptance.
Call for DC Meetup: Another member suggested organizing a meetup event in Washington, DC.
Chicago's Geographical Dilemma: A member highlighted that Chicago feels closer to the East Coast and inquired about the possibility of creating a midwest-usa channel.
Modal Labs Hosting Office Hours in NYC: Modal Labs is hosting an office hours session at their HQ in SoHo, NYC. Details for registration, location, and the event schedule are listed in the event link, open to those verifying token ownership with their wallet.

Link mentioned: [NYC] Modal Office Hours · Luma: Have questions about your Modal deployment or just want to learn more? Come by our first office hours in NY! Even if you don't have a particular question in…

LLM Finetuning (Hamel + Dan) ▷ #europe-tz (4 messages):

Users share their locations across Europe: Members are checking in from different parts of Europe. One user mentioned being in Nuremberg, Germany, another from Milan, Italy, a third from Munich, Germany, and a final one from Linz, Austria.

LLM Finetuning (Hamel + Dan) ▷ #announcements (1 messages):

Last Reminder for Credit Forms: "THIS IS YOUR LAST REMINDER FOR CREDITS! If you not fill out the forms within the next EIGHT HOURS you will NOT BE GRANTED ACCESS TO ANY OF THE CREDITS YOU HAVE AVAILABLE TO YOU! YOU CAN FILL THEM AGAIN JUST IN CASE." Members are urged to fill out credit forms immediately to ensure access.

LLM Finetuning (Hamel + Dan) ▷ #predibase (2 messages):

Predibase offers a free trial: Users were encouraged to sign up for a 30-day free trial of Predibase, which includes $25 of credits. Predibase allows for fine-tuning and deploying open-source models on a scalable cloud infrastructure.
Inquiry about selecting checkpoints in Predibase: A user asked if it's possible to try different checkpoints after fine-tuning a model, referencing Predibase's documentation that "The checkpoint used for inference will be the checkpoint with the best performance (lowest loss) on the evaluation set." The user used Predibase to fine-tune a L3 70B model on a small ~200 record dataset.

Link mentioned: Request Free Trial: Try Predibase for free today - Sign up for your trial

LLM Finetuning (Hamel + Dan) ▷ #career-questions-and-stories (8 messages🔥):

Academic to Industry Transition Troubles: A user with a Ph.D. in philosophy and cognitive science shared their experience of shifting from academia to data science and software engineering. They expressed the challenge of finding new learning opportunities outside a small academic lab and the difficulty of choosing a path in AI given their broad interests and family obligations.
Contracting as a Lucrative Option: Another member shared their positive experiences with contracting, explaining that it offers exposure to diverse problems and cultures. They highlighted the benefits of having the flexibility to choose projects and the possibility of getting long-term offers from organizations if performance is good.
Industry Job Rejections Part of the Process: A user advised that breaking into the industry may involve many rejections and possibly taking an unappealing first job. They emphasized the importance of building a resume and GitHub portfolio to make future job applications easier and more successful.
Transition from Academia to Tech Roles: One member recounted their move from academia to industry, starting with an internship at a tech startup and eventually holding various roles like sales engineer and product manager. They stressed the difficulty of finding opportunities that don't require a decrease in quality of life, especially with current economic challenges.
Encouragement and Offer to Help: Several users encouraged the original poster and others in similar situations, offering support and emphasizing the importance of perseverance. "If there's anything I can do to help, hit me up. I'm all about people lifting each other up."

LLM Finetuning (Hamel + Dan) ▷ #openai (1 messages):

rubenamtz: 👀 , credits are still cooking?

HuggingFace ▷ #announcements (10 messages🔥):

GPT-powered PDF Chats using llama.cpp and Qdrant: Check out the everything-ai project which now supports llama.cpp and Qdrant, enabling users to chat with their PDFs. This has been publicly appreciated as "coolest community news" and well-received by community members.
Codestral-22B Quantization and Nvidia's New Model Demo: The quantized version of Mistral's model, Codestral-22B-v0.1-GGUF, has been highlighted. Nvidia’s embedding model demo adds to the suite of innovative AI applications shared this month.
SD.Next and BLIP Dataset Innovations: The SD.Next release is praised for its new UI and high-res generation capabilities. Additionally, the BLIP dataset was developed using Clotho, which enhances the growing dataset collection.
New Tools and Plugins Galore: From the OSS voice-controlled robotic arm YouTube video to the Falcon VLM demo, multiple utilities and demos are shared. These include free-to-use calligraphy datasets, better transcription apps, and visual 3D protein analysis tools.
Community Events and Engagement: The community events, such as coding sessions and discussions about AI projects and community-led news, have been noted for their value. These highlights are appreciated by several community members for keeping them updated and engaged with the latest advancements.

Links mentioned:

HuggingFace ▷ #general (415 messages🔥🔥🔥):

Confusion Over Data Formatting and Tokens: There was an ongoing discussion about the correct format of chatbot training data. Examples like "<|user|>Do you enjoy cooking?</s><|assistant|>I can bake, but I'm not much of a cook.</s></s>" led to confusion, prompting questions about whether two </s> tokens are necessary.
Billing Issues Cause Uproar: A user named Tensorrist expressed urgent distress over a $100 charge for Hugging Face services, claiming they never used the service. Attempts to direct them to contact support at [email protected] seemed to escalate into a heated back-and-forth.
Model Merging Competition at NeurIPS: An announcement was shared about a model merging competition at NeurIPS, offering a prize pool of $8K. The community was encouraged to participate and revolutionize model selection and merging (link).
Blogpost Discussions: Multiple users discussed creating tutorial blog posts, particularly focusing on fine-tuning specific models like TinyLlama and Mistral. One user requested help in avoiding overwriting their README every time they push to the hub.
Questions About Fine-Tuning with Specific Data: Questions were asked about fine-tuning large language models on unique datasets, such as using RDF dumps from Wikipedia or multi-modal models with only text data. Responses suggested technical methods and directed users to the proper channels for more engagement.

Links mentioned:

HuggingFace ▷ #today-im-learning (3 messages):

Queries about Unit 1's identity: Members discussed the concept of "Unit 1" prompting a question about whether it's a course on Hugging Face. This was clarified with information about various courses including Reinforcement Learning and Computer Vision.
Reinforcement Learning Course: It was specified that one of the courses offered is a Reinforcement Learning course where participants train Huggy the dog. A link to the course was shared, pointing to Hugging Face's learning resources.
Community Computer Vision Course Shared: Another course, focused on computer vision ML, was mentioned. The course objectives include teaching ML concepts using libraries and models from the Hugging Face ecosystem, with a shared link to Community Computer Vision Course.

Link mentioned: Hugging Face - Learn: no description found

HuggingFace ▷ #cool-finds (6 messages):

NeurIPS hosts model merging contest: A member shared an announcement about a model merging competition at NeurIPS with a link to the announcement tweet. The competition offers an $8K prize and is sponsored by Hugging Face, Sakana AI Labs, and arcee ai, with more details available on the official site.
Transformers tackle arithmetic with new embeddings: A new paper titled Transformers Can Do Arithmetic with the Right Embeddings reveals that adding specific positional embeddings to each digit helps transformers solve arithmetic problems more efficiently. The study achieved up to 99% accuracy on 100-digit addition problems by training on 20-digit numbers for just one day. Read the paper here.

Links mentioned:

HuggingFace ▷ #i-made-this (10 messages🔥):

Approvals and Article Publishing in Blog-explorers: A member requested help to approve another user to the community of Blog-explorers, which was subsequently approved. The approved user later published an article titled "Access 150k+ Datasets from Hugging Face with DuckDB and query the data with GPT-4o" published here.
Digital Portrait Generation App in Development: An app to generate digital portraits using Stable Diffusion (SD) with InstantID and depth + pose CNs is being developed. The project is still a work in progress with more updates expected.
Fake Bot for Personal Website: A fake bot was created for a personal website, which can be interacted with and explored. Users are invited to try it out at ngxson.com.
Creative Writing Dataset for LLMs: A small dataset called PlotPalette-10K designed for fine-tuning large language models for creative writing was shared. It's sourced from various literary sources and generated using the Mistral 8x7B language model.
Multi-Aspect Demonstration for SD 2.1: A multi-aspect demonstration space for Stable Diffusion 2.1 with an automatic checkpoint updating system has been set up. This space allows users to view ongoing updates as the training continues at pseudo-flex-v2.

Links mentioned:

HuggingFace ▷ #reading-group (1 messages):

taha_69513: Thaaaaaaaaaaaaaaaaaaaaaaaaaaaaaanks 🙌

HuggingFace ▷ #computer-vision (3 messages):

Papers report testing accuracy: A member asked whether the accuracies reported in papers are usually the testing accuracy or validation accuracy. They referenced a paper fine-tuning a ViT for 10k epochs and shared a funny typo in the graph: The paper.
NeurIPS Model Merging Competition Announced: A competition related to model merging is announced, with $8K in prizes and support from Hugging Face, Sakana AI Labs, and Arcee AI. More details and sign-up at Model Merging Competition and the announcement tweet here.
Shared Dataset for Clothing Sales: A member shared a clothing sales dataset and noted they successfully trained an image regression model using a custom PyTorch model. They also linked their article on this work: Image Regression using PyTorch and 🤗 Transformers.

Links mentioned:

HuggingFace ▷ #NLP (7 messages):

K2 model outperforms Llama 2 70B: LLM360 has unveiled K2, a fully-reproducible large language model that surpasses Llama 2 70B using 35% less compute. K2 is fully open-sourced with all artifacts and intermediate results available under Apache 2.0 license.
NeurIPS Model Merging Competition: A model merging competition will be held at NeurIPS this year, offering $8K in prize money. The competition invites participants to revolutionize model selection and merging with support from sponsors like Hugging Face and SakanaAILabs.
Inquiry on Sentence Transformer: A user inquired if periods (.) serve as sentence demarcations in sentence transformer experiments and whether periods are stripped from abbreviations like "Dr." They are interested in understanding how sentence segmentation is handled.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #general (205 messages🔥🔥):

Phi3 disappoints without quantization tricks: A user shared their frustration with Phi3 finetune results, attributing the issues to exl2 quantization breaking SWA. They concluded, "Running it unquantized and... the results kinda stink."
Discussions on dataset merging and finetuning strategies: Members discussed the best practices for finetuning, including merging datasets before training. One user emphasized that training on mixed datasets helped avoid issues like catastrophic forgetting.
Llama-3 gets an unofficial 11.5B upscale model: A user shared their creation of an unofficial Llama-3 11.5B model using upscaling techniques without continuous pretraining. They assured that, "Full finetune will probably work better," but the model was already functional out of the box.
Possible new finetuning method, MoRA: A member mentioned MoRA, an updated version of LoRA for parameter-efficient finetuning, making a reference to its potential use in future upgrades. The MoRA GitHub code was shared.
Issue with HF Trainer when loading models: Users discussed encountering crashes in notebooks while loading models, especially on CPU. A temporary solution was suggested to remove spaces in folder names to resolve the crash.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #random (9 messages🔥):

NVIDIA's new 4nm chip blows Blackwell out of the water: NVIDIA's new research chip achieves 96 int4 TOPs/Watt compared to Blackwell's 20T/W, representing significant power efficiency. For more details, check out the full research talk.
Confusion around float4's exponent and mantissa: Daniel Han mentions that B200's float4 is claimed to have an exponent and mantissa both equal to 2, leading to confusion. Typically, for float4, a common configuration is sign bit + exponent + mantissa = 4; hence the discussion on whether NVIDIA's configuration lacks a sign bit.
Notable speed-ups from reduced numerical representation: Daniel highlights that significant speed-ups are not solely due to Moore's Law but from reducing the numerical representation from fp32 to f4, providing a 32x boost. However, the Physics of LLMs paper shows int4 can be 2x worse, indicating limits to these performance improvements. Check the paper here.
Tensor Cores and HMMA provide remarkable efficiency: Tensor Cores using complex instructions like HMMA achieve 13x faster performance with lower energy consumption, playing a critical role in enhancing computational efficiency.
Progress in sparsity techniques: NVIDIA is working on shifting from 2:4 to 2:8 sparsity, which could further optimize computational efficiency and performance in AI models.

Link mentioned: Tweet from Daniel Han (@danielhanchen): My notes from a NVIDIA research talk: 1) NVIDIA has an research inference 4nm chip doing 96 int4 TOPs/Watt vs Blackwell's 20T/W 2) B200's float4 is exponent=2 and mantissa=2? Maybe I mishear...

Unsloth AI (Daniel Han) ▷ #help (150 messages🔥🔥):

Selecting GPUs in Unsloth: A member asked about selecting specific GPUs for training in Unsloth. Another member suggested using os.environ["CUDA_VISIBLE_DEVICES"]="0" in Python, providing a reference link for further details.
Troubleshooting Fine-Tuning Errors with Quantized Models: A user encountered a ValueError related to fine-tuning quantized models with unsloth/llama-3-8b-bnb-4bit using a personal dataset. Another pointed out the need to attach trainable adapters on quantized models, referencing Parameter-Efficient Fine Tuning (PEFT) for more information.
Handling Fine-Tuning with Two Models: A member discussed the complexity of handling two tightly related tasks where the first model's output serves as the input to the second. They considered generating "synthetic" data but concluded it would be equivalent to the real data.
Issues with Unsloth on Kaggle: A user reported an issue with installing Unsloth in a Kaggle notebook, which was acknowledged and being investigated. They also filed an issue on GitHub (link).
Understanding VRAM Spikes in Training: Members discussed VRAM spikes during training with Unsloth, noting that long sequence lengths (16K tokens) caused fragmentation and memory allocation issues. Suggestions included using environment variables like os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True".

Links mentioned:

Unsloth AI (Daniel Han) ▷ #showcase (4 messages):

Ghost XB Beta supports multiple languages: The Ghost XB Beta model is set to support 9+ languages fluently. The model is currently at 29% of its first training stage, with updates available on Ghost AI's X page and ghost-x.org.
Prioritize openness and efficiency: The project emphasizes community support and startups, focusing on self-deployment and cost-efficiency for small and medium-sized models. For more details and deployment, users are directed to Hugging Face.
Optimized for production: The models are designed for high performance and enterprise-level scalability with a low cost. They can be deployed on various platforms, ensuring wide accessibility and ease of use.
Ghost Alpha for advanced tasks: The next generation Ghost Alpha models are optimized for reasoning, multitask knowledge, and multilingual support. Users can explore these models on Hugging Face.

For further exploration, check out Ghost Alpha on Hugging Face.

Links mentioned:

Perplexity AI ▷ #general (281 messages🔥🔥):

Tako Widget Queries Locale Limits: A member asked if the finance data widget, Tako, is limited to the USA only, but received an uncertain response. Another user confirmed their uncertainty with "not sure".
Pro Trials Gone for Good: One member inquired about the availability of Perplexity Pro trials for friends, only to be informed they have been removed, including the yearly 7-day trial. This led to a discussion about the possibility of covering trial costs themselves or using referral strategies.
Editing Page Sections Confusion: One user struggled with editing sections of pages, learning that the edit button allows for changing section details but not actual text. Another user confirmed and explained the limited functionality.
Discussion on Slower Pro Search: Several users noticed that Perplexity Pro search has slowed down recently, which was attributed to a new search strategy that breaks down queries into sequential steps. Despite the slowdown, users appreciated the improved detailed responses.
Inquiries About Pro Features and Models: Members discussed whether Pro features include image attachments compatible with GPT-4o and the absence of certain model limits in the UI. It was also noted that Pro users should select models other than the default for better performance.

Links mentioned:

Perplexity AI ▷ #sharing (2 messages):

Perplexity Pages Launched: A member shared a link to the Perplexity Pages Debut. This introduces a new feature or service on the Perplexity AI platform.
Codestral de Mistral Explored: Another member posted a link about Codestral de Mistral. This likely elaborates on a specific topic or functionality related to Perplexity AI.

CUDA MODE ▷ #general (11 messages🔥):

NVIDIA's Advanced Research Chip Impresses: A member shared notes from a NVIDIA research talk, highlighting a new 4nm inference chip boasting 96 int4 TOPs/Watt, far surpassing Blackwell's 20T/W. The discussion also covered advancements in tensor cores and the potential shift to 2:8 sparsity.
Meta's Next-Gen AI Accelerator Unveiled: Discussion revolved around Meta's next-generation AI training and inference accelerator which promises 72 accelerators per node and 354 TFLOPS/s (INT8) at 90W, setting the stage for advanced AI infrastructure.
Upcoming CUDA C/C++ Course Announcement: A member announced their upcoming GPU programming course on FreeCodeCamp, inviting the community for feedback. This initiative aims to lower the high barrier to entry in GPU programming by offering detailed explanations and fostering community input.
Requests for Specific GEMM Coverage: Feedback on the CUDA course includes requests to cover GEMM for rectangular matrices and broadcasting rules, essential for img2col based convolution. Additionally, another user noted looking forward to the course despite their WebGPU experience.
Concerns on Parallel Scan Paper: A member sought clarification on concepts from the Single-pass Parallel Prefix Scan with Decoupled Look-back paper, particularly regarding its statement on two full passes and the notion of allocation within global allocation mechanisms.

Links mentioned:

CUDA MODE ▷ #torch (1 messages):

Human-readable Triton kernel names: The default name for a kernel is usually triton_. There is a config in _inductor/config.py that can change these names to be more human-readable, though using ncu for kernel profiling is suggested.

CUDA MODE ▷ #announcements (1 messages):

Scan Algorithm Part 2 Begins: The announcement informed the community that part 2 of the scan algorithm session was about to start. Members were encouraged to join in the next few minutes.

CUDA MODE ▷ #beginner (10 messages🔥):

Choose vllm for the webserver setup: A member clarified the utility of vllm, stating, "As for the inference part, i think vllm is the better choice (provides a ready to use chat webserver)."
Batch processing with single model struggles: A member asked how to handle batching in PyTorch without creating multiple model instances, expressing frustration with, "very inefficient manner where i create copies of model for each image," and sought advice on sharing model parameters efficiently.
Pytorch DataLoader vs. custom batching: One user suggested using Pytorch DataLoader for batching, but the original poster countered that their function contains custom image-by-image operations, making a full batch-handling refactor undesirable.
Error with torch.multiprocessing and CUDA: Despite attempting to use torch.multiprocessing to handle multiple processes, a member encountered an error related to CUDA core reservation, noting, "it gives out the error of using cuda cores which are already reserved."
Dataloader iteration for single image compatibility: The discussion concluded with the suggestion that if the model supports only one image at a time, a dataloader would still return single images iteratively, avoiding the need for batch-based functionality changes.

Links mentioned:

CUDA MODE ▷ #pmpp-book (2 messages):

Izzat's Scanning Session Part 2 Starts: A member announced that the second part of the scan by Izzat is starting now. They provided a link to join the session.

Link mentioned: Join our Cloud HD Video Meeting: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom ...

CUDA MODE ▷ #llmdotc (127 messages🔥🔥):

Hitting 20k Stars Today: One user excitedly noted, "probably hitting 20k stars today 🎉", reflecting the community's anticipation of a significant milestone.
Merging the llmc Lib Directory: A member discussed the planned merge of the llmc lib directory and structuring scripts for better organization. The pull request can be viewed here.
Trainer Struggles with Loss Spikes: Several users articulated issues with training runs encountering reproducible loss spikes. One suggested plotting gradient norms to identify potential causes, while another recommended considering hardware differences during debugging.
Dataset Download and Hosting on Hugging Face: Users discussed discrepancies in dataset shards and considered hosting on Hugging Face for ease of access. One user provided a Hugging Face dataset link and proposed using compression methods to optimize download speeds.

Links mentioned:

CUDA MODE ▷ #youtube-watch-party (3 messages):

Uncertainty about next lecture: A member mentioned that Lecture 7 might be up next, but they were not sure about the schedule of the NAM team. They asked for confirmation on this.
Inquiry about NAM team's Zoom links: Another member inquired whether the NAM team's Zoom links are posted in the channel. This question was directed towards figuring out the proper platform for the upcoming session.
Link shared: A Discord link was shared here potentially for relevant information or resources. No additional context was provided for this link.

Link mentioned: Join the PMPP UI lectures timezones Discord Server!: Check out the PMPP UI lectures timezones community on Discord - hang out with 37 other members and enjoy free voice and text chat.

CUDA MODE ▷ #bitnet (4 messages):

Windows debugging for CUDA: A member expressed interest in making it work with Windows, despite torchao not officially supporting Windows. They mentioned, "I can debug it with my Windows PC."
Compiler differences cause build errors: There was a discussion on build errors caused by different meanings of __restrict__ and __restrict between compilers.
Triton difficulties acknowledged: A member noted that making Triton work on Windows won't be an easy task, saying, "the triton stuff isn't gonna happen so easily."
FP6 LLM discussions moved: For discussions about FP6 LLM, they referred to another channel: “we can discuss at <#1235775768756359289>”.

Stability.ai (Stable Diffusion) ▷ #general-chat (152 messages🔥🔥):

Privacy and Permissions in Online Content: One member emphasized that the best privacy feature is the option to not publish. Another added that people need to be educated on not giving companies their content to avoid their usage.
Inconsistent Results Across Different Tools: A member was frustrated about not getting the same results in ComfyUI as in Forge even with identical settings. They pointed out differences in settings and features like XFormers possibly affecting outcomes.
Combining Models for Improved Outputs: Discussions highlighted that models like SDXL and SD15 can be combined in various ways for better results, although control nets need consistency across model phases.
Training and Using Specific Models: Users shared concerns and advice related to training models specific to their needs. One member referenced youtube videos and pointed to tools like OneTrainer and kohya_ss for Lora training.
Resource Recommendations: For beginners, resources like Craiyon were recommended for initial experiments with AI-generated images before moving to more advanced web services or local installations.

Links mentioned:

LM Studio ▷ #💬-general (60 messages🔥🔥):

GPU struggles with ROCm support: A user expressed frustration over their RX 6700's performance, noting they can't exceed 10 tokens per second likely due to lack of ROCm support. Another user mentioned that dual-booting to Ubuntu won't improve it significantly since backend speeds are similar across OSes.
Fixing unexpected endpoint errors: A user reported an error with GET /v1/chat/completions/models, leading to another recommending posting in a specific channel with more context. Additionally, it was pointed out that the correct endpoint is v1/models.
AVX2 requirement exclusion frustration: Multiple users experienced issues while trying to run LM Studio, with troubleshooting revealing that AVX2 instructions are a requirement. An older version, 0.2.10 for AVX, was recommended as a workaround.
PDF and text-to-video limitations: Users frequently asked about feeding PDF files to the AI and were directed to use server mode with a third-party app called AnythingLLM. Another inquiry about text-to-video applications highlighted the limited, proof-of-concept options like Stable Video Diffusion, available only for NVIDIA GPUs.
Localization and CPU instruction set issues: Errors during setup on non-EN locale settings and lack of AVX2 CPU instructions were identified as common issues preventing successful installation. Users with old CPUs without AVX2 support were provided a link to an older beta version as a potential solution.

Links mentioned:

LM Studio ▷ #🤖-models-discussion-chat (30 messages🔥):

Coder Prompt Template Explained: One of the members clarified that a particular model uses the Deepseek coder prompt template, highlighting its significance for those wondering about its setup.
Manually Prompting FIM with Codestral: A member inquired about techniques for manually prompting FIM with Codestral, indicating interest in the method's specifics. No follow-up or guidance was provided.
Model Parameter Limitation Discussion: A member posed a question about whether increasing a model's parameter size is the only way to improve its handling of complex instructions like maintaining multiple personalities. They expressed skepticism about the effectiveness of Mixture of Expert models, noting that larger models typically perform better in their experience.
Llama2 7B Chat Uncensored Formatting Issue Solved: An issue with prompt formatting for TheBloke/llama2_7b_chat_uncensored-GGUF was discussed, with guidance provided through a specific model card link. The conversation included a suggestion to check tokenizer configurations and model cards for appropriate chat templates.
Text Extraction Model Success: There was a brief discussion about the success of different models in text extraction tasks. However, no specific models were mentioned or recommended during this exchange.

Links mentioned:

LM Studio ▷ #📝-prompts-discussion-chat (2 messages):

Wrong channel for visual models: A member mistakenly inquired about support for visual models in the wrong chatroom. They quickly apologized for the mistake, acknowledging their error with "oops wrong chatroom soryr."

LM Studio ▷ #🎛-hardware-discussion (21 messages🔥):

RX580 faces compatibility issues with ROCm: Members discussed the limitations of using the AMD RX580 GPU with ROCm, noting that it's considered "too old" and therefore incompatible, making users rely on OpenCL for GPU inference. One member confirmed, "ROCm is incompatible with Polaris GPUs like the 580."
ROCm and RX 6600XT unsupported issues: Another user inquired about the RX 6600XT GPU's compatibility with ROCm, only to be informed that this GPU is also unsupported. The reaction was one of disappointment: "Damn."
M3 Max beats Snapdragon X Elite: A user highlighted that the M3 Max outperforms the Snapdragon X Elite in every category, even without considering the M4 and M3 Ultra models. They noted, "The cheapest Snapdragon PC is the 899 developer kit, but that is the slower 32GB model."
Multi-GPU performance tests with LLAMA 3 8B Q8: Testing various configurations, a user found that using two GPUs yielded 91% of the performance of a single GPU, considering the impact of PCIE bandwidth. It was concluded that splitting load across multiple GPUs via X1 to X16 adaptors offered better performance stability: "I've decided to just get another X1 to X16 adaptor."
GPUDirect shows potential, but has limits: Discussing NVIDIA's GPUDirect technology for enhancing data movement, a user noted possibilities for directly reading from NVMe storage to reduce VRAM memory pressure. Another member remarked that such attempts have been made, but disk for RAM usage remains too slow: "[Using disk for RAM] is just too slow to be of use."

Links mentioned:

LM Studio ▷ #🧪-beta-releases-chat (11 messages🔥):

4070 and 1070 VRAM comparison sparks debate: A member pointed out that the 4070 has only 12GB of VRAM compared to the 1070, which has an extra 8GB. This triggered a discussion about the suitability of these cards for larger models like "codestral".
1070’s performance scrutinized: One member downplayed the performance of the 1070 as being slow, to which another responded with practical use cases showing decent performance rates, such as running Phi-3 Small at 50 tokens/s and Llama-3-8b at 35 tokens/s.
CPU thread utilization issue resolved: Khabu asked for advice on optimizing CPU thread usage for better model performance, explaining issues when increasing threads beyond 4. After some troubleshooting advice, Khabu mentioned that the problem was resolved without needing further discussion.

LM Studio ▷ #model-announcements (1 messages):

InternLM models galore: Tons of InternLM models were announced, covering Math and Coding, ranging from 7B to a mixtral 8x22B based math model. Several models are available, including AlchemistCoder-DS-6.7B-GGUF, AlchemistCoder-L-7B-GGUF, internlm2-math-plus-7b-GGUF, internlm2-math-plus-20b-GGUF, and internlm2-math-plus-mixtral8x22b-GGUF.

OpenRouter (Alex Atallah) ▷ #announcements (7 messages):

OpenRouter enhances API performance: Major scalability improvements have reduced latency by at least ~200ms globally, with significant gains for Africa, Asia, Australia, and South America. "By pushing more of our user data closer to the edge, we shaved off at least ~200ms from every single request."
Monitor model uptime with new charts: OpenRouter introduced uptime charts to visualize the benefits of their provider load balancing, like the one on WizardLM-2 8x22b. This feature helps avoid impacts from sporadic upstream outages.
Early preview of Category Rankings available: Users can see how different models rank across various categories on openrouter.ai/rankings. Notable insights include MythoMax's dominance in roleplay and GPT-4o leading in programming.
Laravel developers get a new package: moe-mizrak/laravel-openrouter was announced to help Laravel developers integrate with OpenRouter.
DB issues cause API disruptions but are resolved: An internal error with the DB cache led to API calls returning 504 or 500 errors. The problem primarily affected the India (bom1) and Singapore (sin1) regions but was resolved by adding a fallback direct DB lookup, as reported by "UPDATE: The fix is now up, and our 1-hour uptime chart is recovering."

Link mentioned: WizardLM-2 8x22B by microsoft | OpenRouter: WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing ...

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

MixMyAI.com offers pay-as-you-go AI services: Introducing mixmyai.com as a comprehensive solution for all AI needs without monthly fees to different vendors. The platform combines both closed and open-source models under one roof, providing the most affordable pricing options.
MixMyAI emphasizes user privacy: The service prioritizes privacy by not storing any chats on servers and offers a transparent dashboard to track spending. It also ensures models are always current by retiring old ones.
User-friendly and powerful UI: MixMyAI boasts a powerful user interface that allows users to search chat history, save prompts, and tweak LLM settings. The platform emphasizes ease of use and accessibility.

OpenRouter (Alex Atallah) ▷ #general (97 messages🔥🔥):

Latent Space podcast welcomes fans: The first in-depth interview on MosaicML MPT-7B discusses overcoming GPT-3 limitations in context length. They are also hosting meetups and inviting beta testers for an upcoming AI course.
OpenRouter Ruby library released: Obie Fernandez announces the release of the OpenRouter Ruby client library. He also mentions maintaining Ruby AI eXtensions for Rails, a library dependent on OpenRouter.
API 504 issues rampant across regions: Numerous users report encountering 504 errors with specific models like "Mixtral 8x7B Instruct" and "llama-3-70b-instruct", spanning multiple global locations including India, Vietnam, and Poland. A temporary fix was applied, but stability remains inconsistent.
Category rankings feedback and updates: Discussions focused on improving category rankings data, with users suggesting new categories and highlighting the need to evaluate models based on common use cases and "quality". Alex Atallah and others affirmed that more detailed rankings and additional categories are forthcoming.
Discussion on health check endpoints: Users requested a health check API to monitor OpenRouter's status and take actions accordingly. Alex Atallah suggested using the model pages for health checks until a dedicated endpoint is available.

Links mentioned:

OpenAI ▷ #ai-discussions (85 messages🔥🔥):

New Features and Benefits for Pro Users: Members discussed the benefits of OpenAI pro users, including higher rate limits and access to DALL-E, GPT creation, real-time voice, and video chat. Free users cannot create GPTs, making the pro subscription more appealing despite the $20 monthly fee.
Application Use Case Suggestions: A member described developing an AI acting as a ruler with specific traits, considering using Chat API over Assistant API due to not needing certain features like file search/coding. It was recommended to use Chat API for better control and efficiency.
ChatGPT and Bias Concerns: A member expressed frustration over being suspended for calling ChatGPT's responses racist, leading to an in-depth discussion on model biases and training data, and how such biases are inevitable but mitigated by additional work on top of training data.
Anthropic AI Agents Feature: Members compared Anthropic's new "tool use" feature with OpenAI's function calling, noting that both allow custom assistant creation through API integration. Despite apparent similarities, it was suggested Anthropic's feature might provide deeper integration with personal data.
Sora and Video Generation Hype: Discussion touched on the excitement around new-gen video models such as Sora and Veo, including anecdotal claims about high curation ratios for video generation. There's skepticism over the hype versus practical usability in current video AI technology.

Links mentioned:

OpenAI ▷ #gpt-4-discussions (10 messages🔥):

Dashboard link request: A user requested a link that shows them as a creator with the list of their GPTs instead of a single GPT URL. There was no direct response provided in the chat to this request.
Memory leak crashes browsers: Members reported severe lag and browser crashes when using ChatGPT, attributing the cause to a memory leak issue on the website. Recommendations included avoiding long context and refreshing the page when locks occur.
Code generation freezes browsers: A user experiences browser freezes and crashes when generating code inside codeblocks, even on a high-end PC, mentioning the error: Out of memory. This issue is recent and persists across multiple browsers.
Tricks to control usage: One member speculated that OpenAI might control usage during high demand by monitoring how often users hit their usage cap daily, suggesting it might apply to both the web interface and the API.
Voice mode timeline shared: In response to a query, a timeline from OpenAI was shared, stating that real-time voice and vision features for GPT-4 will start rolling out to a limited Alpha for ChatGPT Plus users soon, with wider availability planned over the coming months. OpenAI's shared timeline.

OpenAI ▷ #prompt-engineering (3 messages):

Dealing with repetitive GPT-3 responses: A member is facing an issue with the API returning repetitive answers even when the prompt's topic is not narrow. Another member suggested limiting the number of messages per chat session to around 10 or repeating the whole previous answers to mitigate this problem.

OpenAI ▷ #api-discussions (3 messages):

Repetitive answers in OpenAI API: A member reported getting repetitive answers from the OpenAI API despite having a broad prompt. They queried if keeping track of previous answers and informing the model would solve this, but were concerned the list might become too extensive.
Limiting chat session helps avoid repetition: Another member suggested limiting the number of messages per chat session to around 10 as one method to avoid repetitive answers. They also mentioned repeating the whole previous answer could help mitigate this issue.

Nous Research AI ▷ #ctx-length-research (1 messages):

moonride303: https://x.com/jaseweston/status/1795978611784089799

Nous Research AI ▷ #off-topic (6 messages):

Borsch with a twist: A member shared an interesting borsch recipe made with pork and beef bone stock, cow heart, mayonnaise, and monosodium glutamate. They paired it with cucumber, Borodinsky bread, apples, an orange, and fruit tea sweetened with stevia instead of sugar.
Debate on Stevia vs. Sugar: One member questioned the use of stevia in the recipe, asserting that "real sugar is good for you". However, the original poster retorted that "sugar is poison" and "we aren't evolved to eat it."
Questioning Stevia's Use: Another member humorously noted that humans might be even less evolved to handle stevia unless they are "indigenous people from the Amazon rainforest."
Infrequent Berry Consumption: When asked about the consumption of berries, the original poster clarified that berries should be a "seasonal and infrequent treat."

Nous Research AI ▷ #interesting-links (16 messages🔥):

RLHF/DPO's surprising limitation: A shared tweet suggested that RLHF/DPO does not effectively produce policies that assign a higher likelihood to preferred responses. This insight sparked confusion and debate among members, questioning the efficacy of these algorithms in preference learning.
Debate on ranking accuracy: Members discussed the concept of "ranking accuracy," seeking to understand its definition and implications in the context of reference models. It was concluded that preference learning aims to train models for high ranking accuracy, meaning they can rank preferred outputs higher than dispreferred ones, but this doesn't seem to hold true universally.
Overfitting revelation: After multiple members reviewed the thread, it was suggested that the discovery might show that it's possible to overfit when using RLHF/DPO methods. This opinion was countered by pointing out that the discussion could involve deeper mathematical proof about why DPO, especially without additional measures like NLL, underperforms.
Comparing techniques: Members mentioned that problems similar to those found in DPO are also present in PPO, which is used by OpenAI. Another user highlighted that DPO models, despite their issues, tend to perform better than SFT (Supervised Fine-Tuning) alone.

Link mentioned: Tweet from Angelica Chen (@_angie_chen): New work w/@sadhikamalladi, @lilyhzhang, @xinyichen2, @QiuyiRichardZ, Rajesh Ranganath, @kchonyc: Contrary to conventional wisdom, RLHF/DPO does not produce policies that mostly assign higher likeli...

Nous Research AI ▷ #general (63 messages🔥🔥):

NeurIPS hosts Model Merging competition: A competition related to model merging, announced on Twitter, will take place at NeurIPS, with a prize of $8K. Sign up here, with sponsorship from Hugging Face and Sakana AI Labs.
Coller Prize for AI and animal communication: A $500K prize is offered for successfully using AI to communicate with animals, detailed via tweet and further info. A related YouTube video by Aza Raskin about the Earth Species Project was also mentioned.
Llama Model Finetuning Resource: Upscaled version of Llama-3 described as nearly lossless except for TruthfulQA, recommended for finetuning (Tweet, HuggingFace link). One user commented, "TruthfulQA is a meaningless benchmark anyways so that's pretty nice".
Google AI Overviews update: Google brings AI Overviews to U.S. users, enhancing search satisfaction and engagement (Blog post: Google AI Overviews). Despite some erroneous overviews, they claim higher quality clicks to webpages and continuous feedback loop improvements.

Links mentioned:

Nous Research AI ▷ #ask-about-llms (1 messages):

LLMs generate most web content on-the-fly: As explained, pretty much everything is made by a LLM, with web pages often created in real-time as users see them "loading". Despite this, LLMs struggle with making very long or oversized webpages due to context limitations.

LlamaIndex ▷ #blog (2 messages):

Milvus Lite: A compact, efficient vector store for Python: Milvus Lite, a lightweight vector database for Python, is now available. Further details and how to get started can be found here.
Milvus Lite runs in-process in Python: Milvus Lite integrates seamlessly with AI development stacks like LangChain and LlamaIndex, making it suitable for environments with limited resources. Instructions to incorporate it into your AI applications can be found here.
Build a full web app with Omakase RAG Orchestrator: A new web app template for building scalable Retrieval Augmented Generation (RAG) applications using Django, LlamaIndex, and Google Drive has been introduced. It features a full RAG API, data source management, and user access control details here.

Links mentioned:

LlamaIndex ▷ #general (72 messages🔥🔥):

Prototype Retrieval Systems Are Tricky: A user shared their experience prototyping a retrieval system using SimpleStore classes and sought advice on how to transfer data to RedisStores. Others suggested various methods, with one recommended creating an "IngestionPipeline" for "transfer documents" to handle upserts and data transfer more efficiently.
React App Response: A member had issues with their RAG app’s simplified final output compared to more detailed observations. Suggestions included providing additional instructions as context using ReActAgent.from_tools(..., context="Ensure your final answers are detailed.").
OpenAI Certificate Verification Issue: A Dockerized OpenAI setup using FastAPI and Nginx faced SSL certificate verification issues. One suggestion was to try a different base image to potentially solve the problem.
Understanding Vector Store Options: Users discussed different vector store query options like DEFAULT, SPARSE, HYBRID, and TEXT_SEARCH in postgres, with some confusion about their functionalities. It was concluded that both text and sparse use tsvector.
Editing Document Objects: A member sought ways to manually edit Document objects, especially following errors in PDF extraction. The community suggested directly modifying document.text or using external text editors and reinserting the edited text back into the Document object.

Links mentioned:

Eleuther ▷ #general (20 messages🔥):

Luxia v1.2 contamination confirmed: A member highlighted that Luxia 21.4b v1.2 shows a 29% contamination increase over GSM8k tests compared to v1.0. They used a widely known contamination test, noting that other benchmarks like ARC and Wino showed 0.0 contamination.
NeurIPS model merging competition: A competition related to model merging was announced for NeurIPS 2023 with potential for breakthroughs in model selection. The event aims to attract participants with a prize of $8K and invites the community to sign up and participate.
Scaling joint generator and reward model idea: A member discussed scaling up their model that combines cDPO for the LM head and a novel reward head for pruning generated samples based on rewards, particularly for fine-grained metrics like toxicity and creativity.
Query on embedding storage efficiency: A member sought advice on efficiently storing millions of T5 embeddings for large-scale dataset sharing, mentioning the excessive space taken by T5 XL embeddings in fp16 configuration. They considered quantization, which halved the size with ~95% accuracy, but it still remained too large for their needs.

Links mentioned:

Eleuther ▷ #research (34 messages🔥):

Excitement over NeurIPS Model Merging Competition: Members shared an announcement about a Model Merging competition at NeurIPS, with $8K in prize money. Interested parties were directed to the competition's sign-up page and Discord.
CLIP Text Encoder breakthrough: A paper claiming a non-suck CLIP text encoder has been lauded. Pretraining the text encoder with masked language modeling followed by combining text-image and text-text contrastive losses shows promise.
Discussion around Alignment Paper - Direct Preference Heads: An engaging conversation occurred around a new alignment paper and its unique approach using Direct Preference Heads. The discussion highlighted its departure from LaMDA's reliance on LM head for ratings, aiming to disentangle reward signals from the output distribution.
Poseidon Multiscale Operator Transformer for PDEs: A new transformer model named Poseidon for learning solution operators of PDEs was introduced with promising results. The model excels in sample efficiency and accuracy, showing strong performance on various PDE tasks through a novel training strategy.

Links mentioned:

Eleuther ▷ #scaling-laws (7 messages):

Transformers excel with MLPs: A user argued that transformers use the best data dependence by leveraging MLPs. They emphasized the superiority of MLPs in scaling, stating, "any other data dependence would not use mlps so it would be worse at scale imo."
Softmax attention debate: Discussions questioned the necessity of softmax weighted routing in data-dependent aggregation. One member noted that while alternatives exist, "softmax attention is Lindy" due to extensive previous trials.
Alternatives to softmax still resemble current methods: A counterpoint mentioned that replacing softmax would not introduce a novel mechanism but rather form a "function attention" maintaining a context-dependent T x T matrix. This suggests that truly distinct methodologies might still face fundamental similarities to current attention mechanisms.

Eleuther ▷ #lm-thunderdome (9 messages🔥):

Gemma-2b-it results discrepancy sparks discussion: A user reported not being able to replicate the 17.7% results for Gemma-2b-it, achieving only ~9%, even with majority voting. They linked to a discussion on Hugging Face, seeking others' experiences.
Evaluation prompt remains elusive: There was a discussion about the evaluation of results being 8-shot, as seen with Llama-2. One user pointed out that the exact evaluation prompt wasn't released and suggested checking the evaluation of Mistral-7b for additional context.
Using Colab for GSM8K evaluation: A link to a Google Colab notebook was shared to aid in replicating the results, but it required signing in and specific CUDA configurations.
Phi3-mini evaluation aligns better: A user mentioned that running Phi3-mini in lm_eval produced results closer to the reported numbers, with less significant differential. This provided a bit more confidence in the evaluation process despite other inconsistencies noticed.

Links mentioned:

Eleuther ▷ #gpt-neox-dev (1 messages):

gpantaz: Thank you for the reply 🙂

Modular (Mojo 🔥) ▷ #general (13 messages🔥):

Mojo Package Manager Still in Early Stages: Members discussed the lack of updates on the Mojo roadmap regarding a package manager. They referenced past GitHub discussions such as Discussion #413 and Discussion #1785 which show the proposed plans for project manifest and build tools.
Compiling Expertise Exploration: A member asked for recommendations on learning materials about compilers. Another member suggested this syllabus which includes a comprehensive list of readings on compilers, systems programming, and other topics.
Upcoming Mojo Community Meeting: The announcement for the second Mojo Community Meeting, set to feature talks on Basalt, Compact Dict, and Pandas for Mojo, with updates on Mojo Stdlib. Details were provided in a Google Document, and the Zoom link for the meeting is available here.

Links mentioned:

Modular (Mojo 🔥) ▷ #💬︱twitter (1 messages):

ModularBot: From Modular: https://twitter.com/Modular/status/1796606227981726168

Modular (Mojo 🔥) ▷ #ai (3 messages):

Mojo handles SHA-256 and 256-bit integers: A member queried whether Mojo can perform SHA-256 hashing and manage 256-bit integers. Another member confirmed, saying it can be done if one is familiar with the implementation and suggested representing a 256-bit integer using SIMD[DType.int64, 4].

Modular (Mojo 🔥) ▷ #🔥mojo (28 messages🔥):

Mojo speeds up K-Means clustering: A YouTube video showcases a step-by-step guide to porting K-Means clustering from Python+NumPy to pure Mojo, promising a 250x speedup. This video was highlighted as a great example for learning Mojo.
Mojo aims to be Python's superset: One user noted that the Modular team intends for Mojo to be a superset of Python, making existing Python code work seamlessly and benefiting from features like Cython underpinnings and NumPy compatibility.
Tuple handling in Mojo: A user shared code to get elements from a tuple and asked for help with recent changes, questioning if a possible bug exists given the errors faced.
ndarray initialization methods in Mojo: A member sought advice on whether to use memset or vectorized methods for initializing ndarrays in Mojo, leading to a discussion highlighting that memset is likely more optimized due to its lower-level implementation, as seen in its source code.
Improving string builder performance in Mojo: A user shared a string builder implementation that claims to be significantly faster than string concatenation, and sought feedback to ensure it avoids memory issues. Another user suggested zero-copy approaches and provided insights into utilizing vectored writes and avoiding unnecessary data moves, with references to iovec and writev.

Links mentioned:

Modular (Mojo 🔥) ▷ #🏎engine (1 messages):

Clarifying Mojo-based Model Training in Max: One member inquired whether implementing the backward pass, an optimizer, and basic training handling like training loops in Mojo would suffice to train a model using Max exclusively in Mojo. They sought to understand if these components were the only missing elements for conducting model training.

Modular (Mojo 🔥) ▷ #nightly (9 messages🔥):

Debugging joy: finding the bug in reversed Dict items: A member enthusiastically shared that they found and fixed an undefined behavior in reversed(Dict.items()) and reversed(Dict.values())—potentially solving weeks of flaky tests. They attached a GitHub PR link that details the fix.
Assertions save the day: Another member highlighted the importance of enabling assertions in unit tests to avoid flaky tests, reinforcing the value of diligent debugging practices.
Major nightly Mojo compiler release: The newest release for the nightly Mojo compiler version 2024.5.3112 was announced, featuring various changes such as fixes in changelogs, removing certain math functions, and renaming others. Detailed updates can be found in the raw diff and the current changelog.
Main vs Nightly branch PR issues: A discussion illuminated that a bug was caused by starting a PR on the main instead of the nightly branch, directing to another GitHub PR as evidence. This crucial identification aids in streamlining future PR submissions.
Celebration of quick assist: Another member humorously suggested they might have been the first to notice and share excitement about the new updates, showcasing community engagement.

Links mentioned:

LangChain AI ▷ #general (41 messages🔥):

Expose LangChainAPI publicly for LangGraph projects: A user asked for a quick and inexpensive way to expose the LangChainAPI endpoints to a public location rather than keeping them on localhost. They were interested in using LangServe but had not been invited yet.
Speeding up LangGraph configurations: A user inquired about configuration options to speed up LangGraph, as it takes a considerable amount of time to load and for a single agent to start. The conversation hints at looking for optimization choices.
Passing summaries into ChatPromptTemplate: A user asked how to pass the summary from "memory" into ChatPromptTemplate and received guidance on using MessagesPlaceholder with the appropriate variable name. Specific implementation details and relevant GitHub links were shared for further reference.
Integrating ConversationSummaryMemory with RunnableWithMessageHistory: Another user sought help on integrating ConversationSummaryMemory with RunnableWithMessageHistory in Python. Detailed code examples and GitHub resources were shared to explain this process.
Reducing input tokens by summarizing chat history: A user faced issues with RunnableWithMessageHistory using too many tokens due to the chat history. The solution provided involved summarizing the chat history before running it through the chain to manage token usage better.

Links mentioned:

LangChain AI ▷ #langserve (5 messages):

RunnableLambda solves the issue: One user shared they solved their issue using RunnableLambda, suggesting to wrap the chain into a function and create a Pydantic BaseModel class with input and chat history attributes.
LangServe website error reported: A user pointed out an error with the LangServe website, providing this link for reference.

LangChain AI ▷ #langchain-templates (2 messages):

User seeks help constructing prompts with multiple variables: A member asks about constructing prompts with multiple variables from Langgraph state. They provide an example prompt: "As a {topic} expert, answer questions {questions} using these information: {knowledge}", asking when and how to pass these variables.

LangChain AI ▷ #share-your-work (2 messages):

Learn Crew AI Custom Tools: Check out this YouTube video titled "Crew AI Custom Tools Basics." It covers creating custom tools for agents to automate tasks and enhance LLM generative AI productivity.
AIQuire unveils document insights: A member introduced AIQuire, an AI-powered tool designed to help users understand and extract answers from complex documents. They invited others to try out AIQuire at aiquire.app and provide feedback to aid further development.

Links mentioned:

LAION ▷ #general (48 messages🔥):

Huge potential for Fineweb: Discussions centered around implementing Fineweb, leveraging Puppeteer to visit URLs and dump images alongside the text content for use in contextual input to ground a VLM. Fineweb details.
StabilityAI's Decision Sparks Reactions: StabilityAI plans to release a 512px model instead of their full suite of SD3 checkpoints. Members discussed how this decision might affect model improvements and shared opinions on the necessity of high GPU resources.
Positional Embeddings in DiT Models: Some technical discourse on how positional embeddings work for DiT models and their potential for handling different resolutions. Members noted that despite standard implementations, positional embeddings tend to mode collapse at higher resolutions.
Open-Source Tools Get Community Excited: Open-source projects like tooncrafter have excited the community with new features despite minor issues. Discussions highlighted the community's optimism about quickly improving these tools.
AI Strategy by Eliezer Yudkowsky’s Institute: Eliezer Yudkowsky's institute published a "2024 Communication Strategy," aiming to shut down AI development as a precautionary measure. Read more.

Link mentioned: Tweet from Nirit Weiss-Blatt, PhD (@DrTechlash): Eliezer Yudkowsky's institute published its "2024 Communication Strategy" The main goal (as he argued in TIME magazine) is to 🔻shut down🔻 AI development. So, let's take a look at t...

LAION ▷ #research (2 messages):

NeurIPS to host a Model Merging competition: Members announced a Model Merging competition at NeurIPS interested participants can stand a chance to win $8K and contribute to LLMs innovation. Check out the official Discord and sign-up page for more details.
RB-Modulation for Image Stylization and Composition: A new method called RB-Modulation was introduced, offering a training-free plug-and-play solution for stylizing and composing content-style images. More details are available on the official project page including the paper and code (coming soon).

Links mentioned:

OpenAccess AI Collective (axolotl) ▷ #general (22 messages🔥):

Exploring Yuan2 Model on Huggingface: Members discussed the Yuan2 model from Huggingface, expressing interest in training it and sharing the provided link for further examination.
Preference Training Comparison: There was a conversation about various training methods, with members mentioning HF recommends SFT on the data then DPO and contrasting it with ORPO, which can "be done by itself with stronger effect than DPO with SFT." A link to the ORPO paper was provided: arxiv.org/abs/2403.07691.
Advantages of ORPO Over Traditional Methods: Members discussed the benefits of ORPO, highlighting its ability to eliminate the need for an additional preference alignment phase. This is reflected as “a monolithic preference optimization without reference model”.
Potential Integration of ORPO in Axolotl: Inquiry about adding ORPO to Axolotl was raised, with one member suggesting that ORPO "should work", indicating the possibility of trying it out in the system.

Link mentioned: Paper page - YUAN 2.0: A Large Language Model with Localized Filtering-based Attention: no description found

OpenAccess AI Collective (axolotl) ▷ #general-help (12 messages🔥):

Struggles with fine-tuning for text classification: A user is having trouble fine-tuning models for Spanish entity categorization, using llama3 and mistral. They provided specific instruction steps and noted that while training succeeds, inference performs poorly.
Assistance with inference setup: Another user inquired if the dataset and inference methods could be shared to better understand the fine-tuning issues. The original poster has not yet responded with further details.
CUDA 12.1 installation woes and a solution: A member needed help installing CUDA 12.1 on Ubuntu 22 with an NVIDIA driver and initially faced issues. The resolution was installing CUDA 12.1 using the run file without the driver installation.
Framework for fine-tuning embedding models: A member asked for recommendations on frameworks suitable for efficiently fine-tuning RoBERTa-style embedding models. The query remains unanswered.

OpenAccess AI Collective (axolotl) ▷ #datasets (2 messages):

Confusion on Formatting for LLama3: A user asked whether the text field should include specific tokens like <|start_header_id|> when using Alpaca format with LLama3. Another member advised setting the chat_template in the config as Axolotl will handle the formatting automatically.

OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (6 messages):

Axolotl Installation Troubles: A member reported encountering an error during the installation process of Axolotl. Despite following general troubleshooting steps like ensuring Python version compatibility and using a virtual environment, the issue persisted.

Links mentioned:

OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (7 messages):

Member seeks help with model overfitting: "need to stop training as my model is starting to overfit, how can i do this?". Another member suggests implementing an early stopping mechanism and provides a detailed code example using the Hugging Face Accelerate library.
GitHub link for early stopping configuration: A member shares a GitHub link for further instructions on configuring an early stopping mechanism in OpenAccess-AI-Collective/axolotl. The original poster acknowledges and appreciates the help: "Thanks will definitely try it!".

Links mentioned:

DiscoResearch ▷ #general (5 messages):

DiscoLeo Needs Retraining to Fix EOS Token Issue: A member reported that merging ChatML with state-of-the-art models like Hermes Theta causes issues with the EOS token, getting the model stuck in an endless loop 20% of the time. They suggested retraining DiscoLeo with ChatML and requested fine-tuning data.
Preference for ChatML Over Llama-3 Template: Members discussed the preference for ChatML over the original Llama-3 instruct template. One argued that finetuning on the German base using ChatML is better, especially since the target model (Hermes Theta) already uses ChatML.

DiscoResearch ▷ #discolm_german (23 messages🔥):

Interest in IBM's Granite models: Members queried the performance and availability of the new IBM "Granite" models in both English and German. IBM's models, including the Lab version, are listed on watsonx.ai, sparking curiosity due to their limited mention in open-source circles.
Confusion over IBM model versions: Discussions highlighted confusion due to IBM's range of Granite models such as 3B/8B enhanced Llama versions and 20B/34B Starcoder-based models. A user pointed out the various versions available, including a 7B base, instruct, and instruct accelerator version (medusa speculative decoding).
Merlinite model gaining attention: The Merlinite 7B model was noted for its interesting aspects, with mentions of its upload to Ollama and testing under the Lab method. Users expressed interest in comparing its abilities in German to models like Granite.
Generated data and benchmarks: Concerns were raised about the quality of AI-generated data with some users noting it was mostly unsatisfactory. Benchmarks such as EQ Bench on q4km gguf quants were mentioned to be below common expectations, highlighting interest in comparing new "enhancing without catastrophic forgetting" approaches.
Community sharing resources: A resourceful link to a Medium article on InstructLab was shared, summarizing the ease of tuning pre-trained LLMs, reflecting ongoing community efforts to better understand and implement these models.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #news (10 messages🔥):

Google's New Compute Resources Ignite Interest: In reference to rumors of a Google-Apple deal, a post shared a Twitter link suggesting Google is expanding its compute resources. The link mentions another cluster arriving, indicating increased capacity for training AI models.
Court Verdict and its Implications: Users humorously discussed a high-profile court case, referencing hashtags like #miscarriage-of-justice and #they-hate-him-cuz-they-aint-him. Concerns were raised about how the verdict might influence future political events and potential insurrection attempts.
OpenAI Resurrects Robotics Team: OpenAI has formally re-established its robotics team after abandoning its initial efforts in 2020. This initiative, reported on Twitter and in a Forbes article, mentions the new team has been active for around two months and is looking to hire research engineers.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #ml-drama (7 messages):

Skepticism on GPT-3.5 API Availability: A user remarked on the repeated notion that "Anyone can build a chatbot off gpt 3.5 api," critical of its frequent use and accuracy.
Clarification on GPT-3.5 Timelines: Another user clarified that while GPT-3.5-002 was available longer, GPT-3.5-003 came out with/after ChatGPT, suggesting discrepancies in public understanding of availability.
Confusion Over Deleted Documentation: Users expressed frustration over the deletion of a certain page related to GPT-3.5 and described the naming scheme as confusing. One user stated they had reported the issue months ago but no changes were made.
Concerns About Information Deletion: A user voiced concerns that deleting information to serve a convenient narrative is problematic. Another hinted it might be a classic AI safety measure, while one user suggested it could be an oversight, offering to share an archived version of the site.

Interconnects (Nathan Lambert) ▷ #random (8 messages🔥):

Sergey recruits for physical intelligence: Nathan Lambert shared that Sergey tried to recruit him for a project on physical intelligence, mentioning they have a "cool setup". He indicated openness to others joining, saying "If anyone is interested lmk".
Interest in RL and publishing: A community member expressed interest in joining if there were opportunities to discuss Reinforcement Learning (RL) and publishing, although they were nervous about keeping up with the group’s expertise.
Research support and robot usage: Nathan Lambert reassured that the group is practical and would adapt, supporting research due to a need for more people using robots. The potential member humorously doubted their suitability, mentioning their only experience with robots being a robot vacuum.

Interconnects (Nathan Lambert) ▷ #retort-podcast (2 messages):

Murky Waters podcast episode released: Nathan Lambert and Tom discuss various recent AI policy happenings in a new podcast episode titled Murky Waters in AI Policy. Topics include California's "anti open source" 1047 bill, the senate AI roadmap, Google's search snafu, OpenAI's activities, and reader feedback.
Missed the open house for the California bill: Nathan Lambert mentions that he intended to attend the open house for California's AI bill but couldn't make it. Further details are not provided in the messages above.

Link mentioned: The Retort AI Podcast | Murky waters in AI policy: Tom and Nate catch up on many AI policy happenings recently. California's

Cohere ▷ #general (11 messages🔥):

Cohere aims for long-term sustainability: A member remarked, "Cohere is probably far ahead if you consider long term sustainability as a metric." They emphasized the value of focusing on specific tasks like extracting information from invoices rather than solving vast problems like curing diseases immediately.
AGI still in its infancy: The discussion highlighted that achieving AGI is only the beginning, posing the question, "Then what... We're still at the 'Then what...' stage." The current state of AI is seen more as a "CPU" rather than a comprehensive system.
Server refresh and new community incentives: A member announced a refresh to the server, simplifying the channel layout and introducing new roles and rewards. They mentioned, "we are taking down the server levels & instead the most active server members just became cohere regulars."
Cohere introduces Coral, the AI chatbot: A test confirmed Coral, an AI chatbot, is up and running with the response, "I am Coral, an AI chatbot trained to assist human users by providing thorough responses." The successful interaction led to appreciation from the tester.
New emojis and stickers for community engagement: The server will feature new emojis and reactions, and members can customize emojis by contacting the moderator. This change aims to enhance user interaction and fun within the community.

Cohere ▷ #project-sharing (2 messages):

No-Code Workflow Builder for AI Models Seeks Feedback: A startup is working on a no-code workflow builder designed to mix and match AI models and aims for future capabilities to auto-build workflows and auto-select the best LLMs. They are seeking feedback on why users do not continue using the platform and offer a $10 incentive for survey participants.
Community Encouragement and Support: A member praised the approach and offered feedback without participating in the incentive, expressing a willingness to spend 10 minutes on the platform. They appreciated the form and etiquette of the outreach post, highlighting it as a good way to engage the community.

Latent Space ▷ #ai-general-chat (12 messages🔥):

Embedding Adapters improve retrieval performance: Members discussed the potential of embedding adapters ("a quick win for retrieval"), with a link to a Chroma research report. The report evaluates applying linear transforms to embeddings for improved retrieval applications.
Frozen Embeddings akin to Embedding Adapters: Another discussion likened embedding adapters to Frozen Embeddings used by the Vespa team, referencing a Vespa blog article. Frozen Embeddings help avoid tedious updates to embeddings in dynamic environments like e-commerce.
PwC's massive ChatGPT Enterprise contract: A tweet highlighted PwC's purchase of ChatGPT Enterprise licenses for ~100,000 employees, estimating the contract at $30M/year. Members debated the price, with guesses ranging from $8/user/month to previously heard rates of $65/user/month.
Google Gemini updates released: Google Developers have announced the general availability of Gemini 1.5 Flash and 1.5 Pro, including a 1,000 RPM limit for Flash, new tuning options, and JSON Schema mode in the API. More details can be found here.
TLBrowse open-sourced: TLBrowse, which merges Websim with TLDraw, has been open-sourced by its creator. Users can generate imagined websites on an infinite @tldraw canvas with a free hosted version available to try.

Links mentioned:

AI Stack Devs (Yoko Li) ▷ #events (1 messages):

Rosebud AI hosts "Book to Game" Game Jam: Roberto from the Rosebud AI team announced a new game jam, "Book to Game," using Phaser JS on their AI Game Maker platform. The event encourages participants to create interactive games based on literary works, with a $500 prize pool.
Details and Participation: The submission deadline for the game jam is July 1st, 12:00 AM PST. More details and participation guidelines are available on Rosebud AI's Twitter and their Discord.

Link mentioned: Tweet from Rosie @ Rosebud AI 🌹 (@Rosebud_AI): Turn your favorite story into a game using AI! 📚 👾 Get ready for our third Game Jam: “Book to Game”. Use Rosebud Game Maker to transform a literary work into an interactive game and bring stories t...

OpenInterpreter ▷ #general (5 messages):

Manufacturing update pinned: Members are directed to check the pinned message in <#1194880263122075688> for a manufacturing update. It's crucial to stay informed through the pinned messages.
Interest in Codestral model: A member asked if anyone has tried Codestral yet, stating it "seems like a good model." This indicates a growing interest in exploring new models within the community.
Struggles with HuggingFace integration: A member expressed frustration in using HuggingFace models with OpenInterpreter, noting only one successful attempt using the command interpreter -y --model huggingface/mistralai/Mistral-7B-Instruct-v0.3. Another member suggested creating a detailed post in <#1149558876916695090> to seek further assistance.
Potential scam alert: A member issued a "red alert" concerning a potential scam, tagging another user for caution. This underscores the importance of vigilance in the community.

OpenInterpreter ▷ #O1 (6 messages):

Codestral model generates buzz: A member inquired if anyone had tried Codestral, mentioning it seems like a promising model. The request for user experiences and insights remains unanswered.
Query on O1 Android functionality: Multiple members expressed interest in getting O1 Android working, with one asking if it needs to be installed in Termux. Responses to this inquiry were not provided.
License limitations noted: A member highlighted that the usage of Codestral is limited to non-commercial purposes only. This point was brought up without extended discussion.

Mozilla AI ▷ #llamafile (3 messages):

AutoGPT integrates Llamafile: A member from AutoGPT announced collaboration with another user to integrate Llamafile into their system.
Questions about content block support: The same member inquired if Llamafile supports content blocks in user messages, similar to OpenAI's functionality. Another member questioned if llama.cpp supports them.

MLOps @Chipro ▷ #events (2 messages):

PRS Event at Netflix Attracts Attention: A member announces they will be attending the PRS event at Netflix tomorrow and asks if anyone else will be there. Another member confirms they will also be attending.

Datasette - LLM (@SimonW) ▷ #ai (2 messages):

Curiosity Sparks Over Mistral 45GB Model: A member speculated about the composition of the 45GB model, suggesting it might have a heavier weight on English and smaller portions dedicated to programming languages. They expressed excitement to see the actual breakdown.
MNPL Compliance Dilemmas for Codestral: A member raised concerns about finding a legal use case for Codestral under the Mistral AI Non-Production License (MNPL). The MNPL seems to limit sharing derivative or hosted works with others, which the member found restrictive and disappointing.

tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):

helplesness: Why is tensoflow better than pytorch?

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}