**Dictionary Learning is All You Need.**

AI News for 5/20/2024-5/21/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (376 channels, and 6363 messages) for you. Estimated reading time saved (at 200wpm): 738 minutes. The Table of Contents and Discord Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

A relatively news heavy day, with monster funding rounds from Scale AI and Suno AI, and ongoing reactions to Microsoft Build announcements (like Microsoft Recall), but we try to keep things technical here.

Probably the biggest news is Anthropic’s Scaling Monosemanticity, the third in their modern MechInterp trilogy following from Toy Models of Superposition (2022) and Towards Monosemanticity (2023). The first paper focused on ā€œPrincipal Component Analysisā€ on very small ReLU networks (up to 8 features on 5 neurons), the second applied sparse autoencoders on a real transformer (4096 features on 512 neurons), and this paper now scales up to 1m/4m/34m features on Claude 3 Sonnet. This unlocks all sorts of intepretability magic on a real, frontier-level model:

image.png

image.png

Definitely check out the feature UMAPs

Instead of the relatively highfaluting ā€œsuperpositionā€ concept, the analogy is now ā€œdictionary learningā€, which Anthropic explains as:

borrowed from classical machine learning, which isolates patterns of neuron activations that recur across many different contexts. In turn, any internal state of the model can be represented in terms of a few active features instead of many active neurons. Just as every English word in a dictionary is made by combining letters, and every sentence is made by combining words, every feature in an AI model is made by combining neurons, and every internal state is made by combining features. (further reading in the notes)

Anthropic’s 34 million features encode some very interesting ā€œabstract featuresā€, like code features and even errors:

image.png

sycophancy, crime/harm, self representation, and deception and power seeking:

image.png

The signature proof of complete interpretability research is intentional modifiability, which Anthropic shows off by clamping features from -2x to 10x its maximum values:

{% if medium == ā€˜web’ %} image.png

image.png

image.png

image.png

image.png

image.png

image.png

{% else %}

You’re reading this on email. We’re moving more content to the web version to create more space and save your inbox. Check out the excerpted diagrams on the [web version]({{ email_url }}) if you wish.

{% endif %}

Don’t miss the breakdowns from Emmanuel Ameisen, Alex Albert, Linus Lee and HN.


{% if medium == ā€˜web’ %}

Table of Contents

[TOC]

{% endif %}


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Microsoft Launches Copilot+ PCs for AI Era

  • Copilot+ PCs introduced as the biggest update to Windows in 40 years: @mustafasuleyman noted Copilot+ PCs are the fastest, most powerful AI-ready PCs anywhere, re-inventing PCs for the AI era with the whole stack re-crafted around Copilot.
  • Real-time AI co-creation and camera control demoed on Copilot+ PCs: @yusuf_i_mehdi showed Copilot controlling Minecraft gameplay, while @yusuf_i_mehdi demoed real-time AI co-creation on the PCs.
  • Copilot+ PCs feature photographic memory and fastest performance: @yusuf_i_mehdi highlighted Copilot’s photographic memory of everything done on the PC. He also called them the fastest, most powerful and intelligent Windows PCs ever.

Scale AI Raises $1B at $13.8B Valuation

  • Scale AI raises $1B at $13.8B valuation in round led by Accel: @alexandr_wang announced the funding, stating Scale AI has never been better positioned to accelerate frontier data and pave the road to AGI.
  • Scale AI powers nearly every leading AI model by providing data: As one of the three fundamental AI pillars alongside compute and algorithms, @alexandr_wang explained Scale supplies data to power nearly every leading AI model.
  • Funding to accelerate frontier data and pave road to AGI: @alexandr_wang said the funding will help Scale AI move to the next phase of accelerating frontier data abundance to pave the road to AGI.

Suno Raises $125M to Build AI-Powered Music Creation Tools

  • Suno raises $125M to enable anyone to make music with AI: @suno_ai_ will use the funding to accelerate product development and grow their team to amplify human creativity with technology, building a future where anyone can make music.
  • Suno hiring to build the best tools for their musician community: Suno believes their community deserves the best tools, which requires top talent with technological expertise and genuine love for music. They invite people to join in shaping the future of music.

Open-Source Implementation of Meta’s Automatic Test Generation Tool Released

  • Cover-Agent released as first open-source implementation of Meta’s automatic test generation paper: @svpino shared Cover-Agent, an open-source tool implementing Meta’s February paper on automatically increasing test coverage over existing code bases.
  • Cover-Agent generates unique, working tests that improve coverage, outperforming ChatGPT: @svpino highlighted that while automatic unit test generation is not new, doing it well is difficult. Cover-Agent only generates unique tests that run and increase coverage, while ChatGPT produces duplicate, non-working, meaningless tests.

Anthropic Releases Research on Interpreting Leading Large Language Model

  • Anthropic provides first detailed look inside leading large language model in new research: In a new research paper and blog post titled ā€œScaling Monosemanticityā€, Anthropic offered an unprecedented detailed look inside a leading large language model.
  • Millions of interpretable features extracted from Anthropic’s Claude 3 Sonnet model: Using an unsupervised learning technique, @AnthropicAI extracted interpretable ā€œfeaturesā€ from the activations of Claude 3 Sonnet, corresponding to abstract concepts the model learned.
  • Some extracted features relevant to safety, providing insight into potential model failures: @AnthropicAI found safety-relevant features corresponding to concerning capabilities or behaviors like unsafe code, bias, dishonesty, etc. Studying these features provides insight into the model’s potential failure modes.

Memes and Humor

  • Scarlett Johansson’s voice cloned without permission by OpenAI draws Little Mermaid comparisons: @bindureddy and @realSharonZhou reacted to news that OpenAI cloned Scarlett Johansson’s voice for their AI assistant without permission, drawing comparisons to The Little Mermaid plot.
  • Heated coffee cup collection sadly unused due to electronic mug: @ID_AA_Carmack mused if battery density is good enough for a heated stir stick to bring electronic temperature control to any cup, as his wife’s Ember mug leaves her other cups unused.
  • Linux permissions meme reacting to Microsoft Copilot’s photographic memory: @svpino shared a meme about Linux file permissions in response to Microsoft’s Copilot having a photographic memory.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

OpenAI Controversies and Legal Issues

GPT-4o and Copilot Demos and Capabilities

AI Progress and the Path to AGI

Humor and Memes


AI Discord Recap

A summary of Summaries of Summaries

  1. Optimizing Models to Push Boundaries:
  • Transformer Integrations and Model Contributions Generate Buzz: Engineers are integrating ImageBind with the transformers library, while another engineer’s PR got merged, fixing an issue with finetuned AI models. Moreover, the llama-cpp-agent suggests advancements in computational efficiency by leveraging ZeroGPU.

  • LLM Efficiency Gains with Modular: Modular’s new nightly release, bolstered by improved SIMD optimization and async programming techniques, promises large performance gains with methods like k-means clustering in Mojo.

  • Members highlighted the importance of tools like Torch’s mul_() and the practical uses of vLLM and memory optimization techniques to enhance model performance on limited VRAM systems.

  1. ScarJo Strikes Back at AI Voice Cloning:

    • Scarlett Johansson’s OpenAI lawsuit: Johansson sues OpenAI for voice replication controversy, forcing the company to remove the model and potentially reshaping legal landscapes around AI-generated voice cloning.

    • Discussions highlighted the ethical and legal debates over voice likeness and consent amid industry comparisons to unauthorized content removals featuring musicians like Drake.

  2. New AI Models Set Benchmarks Aflame:

    • Phi-3 Models and ZeroGPU Excite AI Builders: Microsoft launched Phi-3 small (7B) and Phi-3 medium (14B) models with 128k context windows that excel in MMLU and AGI Eval tasks, revealed on HuggingFace. Complementing this, HuggingFace’s new ZeroGPU initiative offers $10M in free GPU access, aiming to boost AI demo creation for independent and academic sectors.

    • Discovering Documentary Abilities of PaliGemma: Merve highlighted the document understanding prowess of PaliGemma through a series of links to Hugging Face and related tweets. Inquiries about Mozilla’s DeepSpeech and various resources from LangChain to 3D Gaussian Splatting reveal the community’s broad interest in various AI technologies.

    • M3 Max for LLMs received praise for performance, particularly with 96GB of RAM, fueling more significant strides in model capabilities and setting new standards for large language model training efficiency.

  3. Collaborative Efforts Shape AI’s Future:

    • Hugging Face’s LangChain Integration: New packages aim to facilitate seamless integration of models into LangChain, offering new architectures and optimizing interaction capabilities for community projects.

    • Memary Webinar presents an open-source long-term memory solution for autonomous agents, addressing critical needs in knowledge graph generation and memory stream management.

  4. AI-Community Buzz with Ethical and Practical AI Implementations:

    • Anthropic’s Responsible Scaling Policy: The increased computing power suggests significant upcoming innovations and aligns with new responsible scaling policies to manage ethical concerns in AI development.

    • Collaborations in AI continue to thrive in events like the PizzerIA meetup in Paris and San Francisco, enhancing the Retrieval-Augmented Generation (RAG) techniques and community engagement in AI innovations.


{% if medium == ā€˜web’ %}

PART 1: High level Discord summaries

LLM Finetuning (Hamel + Dan) Discord

  • PDF Extraction with PyMuPDF and Tesseract: Engineers shared tools and workflows for PDF text extraction using PyMuPDF and OCR, with mentions of fitz and the sort=True option, as well as ABBYY and MarkerAPI for handling complex PDFs.

  • Optimizing LLM Training and Fine-Tuning: Technical discussions highlighted tools like vllm for multiple user services, with references to workflows using pyenv and virtualenv, and dependencies in Axolotl. Insights were shared from Anthropic’s research on model interpretability with a nod to Claude Sonnet’s research.

  • Innovative Learning and Collaboration: Engineers brainstormed over resources like Vik’s Marker API and GitHub repositories for fine-tuning models, with a strong focus on multilingual model fine-tuning and shared problem-solving.

  • Model Serving Tips on Modal: For serving LLM models efficiently, engineers were advised to use modal serve over modal run, with insights on cost management and minimizing idle container times. Modal credits can be obtained through this form and $500 in credits plus $30/month on the free tier are available on signing up.

  • Bangalore Meetup Enthusiasm: There’s keen interest for a Bangalore meetup. Techniques for incorporating new languages into models without impairment, performance discussions on Japanese LLMs, and region-specific meetups were all points of fervor.

  • Course Structure and Engagement: A newly explained course structure includes Fine-Tuning Workshops, Office Hours, and Conference Talks. Technical challenges with Llama3, hyperparameters, and resources for fine-tuning like Stanford’s Pyvene were exchanged among erudite participants.

  • Hugging Face’s Accelerate Touted: Members were encouraged to check out Accelerate, useful for distributing PyTorch code across configurations, with examples provided for starting with nlp_example on Hugging Face’s GitHub. Resources for estimating model memory and FLOPS, like Model Memory Utility, were also highlighted.

  • Axolotl and BitsandBytes Queries: Engineering queries on bitsandbytes and MLX support on macOS were addressed with a particular reference to issues on GitHub. Offers for fine-tuning comparison between OpenAI and Axolotl sparked interest in OpenAI’s 30-minute token-based service.

  • Systematic Prompt Engineering Curiosity: Interest was piqued in Jason’s techniques for systematic prompt engineering, with eager await for his ā€œrecipeā€ during his upcoming workshop session.

  • Gradio’s Approachable Interface Development: Gradio’s maintainer invited queries and demo sharing, advocating for its ease of developing user interfaces for AI models and sharing useful guides like the quickstart tutorial and how to build a chatbot swiftly.


Perplexity AI Discord

  • Perplexity and Tako Unite: Perplexity AI has collaborated with Tako to enhance user experience through advanced knowledge search and visualization, now available in the U.S. and in English, with a mobile version expected soon. Details are available here.

  • Perplexity Powers Rich Discussions: Engineers exchanged insights on using Perplexity AI with a lively debate on platform loyalty, discussions around model use cases with GPT-4 and Claude 3 Opus, and shared excitement for new features like Tako charts. They also banded together when facing service downtime, suggesting a strong user community.

  • Perplexity API Woes and Wins: AI engineers identified challenges integrating Perplexity API with Open WebUI, with particular confusion surrounding model compatibility. Solutions involved proxy servers and precise Docker commands, and engineers actively shared progress and advice.

  • Perplexity: A Portal to Knowledge: Contributions in the sharing channel underlined Perplexity AI’s ability to address a diverse array of topics, from history and mathematics to script creation and technical computing concepts, echoing the platform’s versatility as a knowledge resource.

  • API Integration Tactics and Teething Troubles: The pplx-api channel buzzed with tactical discussions on configuring Docker for optimal Perplexity API usage, verifying the absence of a /models endpoint, and clarifying current limitations like the lack of image support through the API.


HuggingFace Discord

Phi-3 Models and ZeroGPU Excite AI Builders: Microsoft launched Phi-3 small (7B) and Phi-3 medium (14B) models with 128k context windows that excel in MMLU and AGI Eval tasks, revealed on HuggingFace. Complementing this, HuggingFace’s new ZeroGPU initiative offers $10M in free GPU access, aiming to boost AI demo creation for independent and academic sectors.

Discovering Documentary Abilities of PaliGemma: Merve highlighted the document understanding prowess of PaliGemma through a series of links to Hugging Face and related tweets. Inquiries about Mozilla’s DeepSpeech and various resources from LangChain to 3D Gaussian Splatting reveal the community’s broad interest in various AI technologies.

LangChain Memory Trick: Practical advice was offered to incorporate conversation history into LLM-based chatbots using LangChain, addressing a common challenge of bots forgetting prior interactions. Meanwhile, a user critiqued story enhancement abilities of llama3 8b 4bit, unveiling a limitation in the model’s creative processes.

Transformer Integrations and Model Contributions Generate Buzz: Engineers are integrating ImageBind with the transformers library, while another engineer’s PR got merged, fixing an issue with finetuned AI models. Moreover, the llama-cpp-agent suggests advancements in computational efficiency by leveraging ZeroGPU.

Vision Tech Queries and Solutions Exchange: In the computer vision domain, requests for papers on advanced patching techniques in Vision Transformers and methods for zero-shot object detection in screenshots were highlighted. The conversations indicate a need for more sophisticated approaches and zero-shot methodologies in object recognition tasks.


Unsloth AI (Daniel Han) Discord

  • ScarJo Strikes Back at AI Voice Cloning: Scarlett Johansson has sued OpenAI for unauthorized replication of her voice. As a consequence, OpenAI has already taken down the voice model amidst mounting public concern.

  • Phi-3 Debuts on Hugging Face: Microsoft has released the Phi-3-Medium-128K-Instruct model on Hugging Face, touting enhanced benchmarks and an extended context of 128k. Engineers in the guild are currently deliberating its merits and the challenges with its large context window.

  • Colab Conundrum with T4 GPUs Resolved: Imperfect T4 GPU detection by PyTorch on Colab led to notebook chaos until Unsloth’s update was propagated. The fix addresses PyTorch’s incorrect assumption of T4’s bfloat16 support.

  • Discussion Brews Around MoRA: A discussion kicked off about a new fine-tuning method called MoRA, with a link to the arXiv paper provided. Guild members are showing early interest in testing its vanilla implementation in their workflows.

  • Dolphin-Mistral’s Lean Success: There’s buzz around dolphin-mistral-2.6 being refined with around 20k samples to match the instructional performance of the original, which used millions. This novel training approach has piqued interest and a promised paper could detail the process later in the year.


Stability.ai (Stable Diffusion) Discord

  • Scam Alert for AI Enthusiasts: Users are advised to avoid scam subscriptions for Stable Diffusion services and to use only the official stability.ai site for legitimate access.

  • Stable Diffusion Runs Offline Too: Stable Diffusion’s capability to run locally without an internet connection was confirmed, reducing dependencies on constant online connectivity.

  • Tech Support for Stable Diffusion Setup: Community support is at hand for those struggling with the setup of Stable Diffusion and tools like ComfyUI, with users sharing advice on tackling installation issues.

  • EU AI Act Raises Eyebrows: The newly introduced EU AI Act is sparking debate regarding its implications for AI-generated content, including worries about mandatory watermarks and enforcement challenges.

  • Mitigating Hardware Performance Bottlenecks: Discussions on Stable Diffusion performance problems suggest checking system configurations and using diffusers scripts, with a speculation of thermal throttling on new hardware setups.


OpenAI Discord

  • Real-Time AI: GPT-4o’s ability to process video at 2-4 frames per second sparked discussion, and integration of GPT-4o into Microsoft Copilot is anticipated to bring real-time voice and video capabilities. OpenAI’s Sky feature voice resemblance to Scarlett Johansson stirred legal and ethical debates.

  • Model Precision and Characteristics: GPT-4’s 128,000 token context window includes both the prompt and response, while strategies for achieving precise language and specific behavior in responses, akin to the AI in the movie ā€œHerā€, were hot topics.

  • Prompt Engineering for Conciseness: Ingenious prompt crafting techniques were shared to keep GPT-4 outputs within specific character limits, with a focus on clear templates and strategic use of token count to ensure concise and relevant responses.

  • Ethics and Legality in AI: The ability to sell AI-generated art was confirmed, though complexities surrounding copyright issues were highlighted, and community members expressed concerns about GPT-4’s evaluation of numerical values.

  • Safety and Updates: A significant safety update was announced at the AI Seoul Summit with further details available at the OpenAI Safety Update, reinforcing OpenAI’s commitment to responsible AI development.


LM Studio Discord

Run LM Studio as Admin for Log Access: Running LM Studio with admin permissions solves blank server log issues, providing users access to needed log files for troubleshooting.

AVX2 a Must for LM Studio: Understanding that AVX2 instructions are necessary to run LM Studio, users can check CPU compatibility for AVX2 using tools like HWInfo. Older CPUs lacking AVX2 support will face compatibility issues with the software.

Efficient Image Gen via Civit.ai: For improved image quality, members recommended using local models like Automatic1111 and ComfyUI with supporting resources from Civit.ai, cautioning the need for sufficient VRAM and RAM in system specs.

Getting Specific with Models: To ensure response completeness in LM Studio, setting max_tokens to -1 resolves issues of prematurely cut-off responses encountered when the value is set to null. The community also discussed using model-specific prompts, as shown with MPT-7b-WizardLM; referencing Hugging Face for required quantization levels and templates.

ROCm and Linux Bonding Over AMD GPUs: Linux aficionados with AMD GPUs have been invited to test an early version of LM Studio integrated with ROCm, as listed on AMD’s supported GPU list. Success reports have come from users running unsupported GPUs, with users sharing their diverse Linux distribution experiences and findings involving infinity fabric (fclk) speed sync affecting system performance.


Modular (Mojo šŸ”„) Discord

Zooming into Mojo Community Meetings: The Mojo community meeting was held, and though some faced notification issues, the recording is now available on YouTube. There was initial confusion regarding the need for a commercial Zoom account, which was clarified as unnecessary.

Boosted Mojo Performance with k-means Clustering: A blog post taught readers to use the k-means clustering algorithm in Mojo, promising considerable performance improvements compared to Python.

Challenging Code Conundrums and Compiler Chronicles: Discussions included handling null terminators in strings, exploring asynchronous programming, and utilizing the Lightbug HTTP framework within Mojo. Solutions and workarounds were devised within the community, with some technical queries leading to GitHub issue discussions.

Nightly Updates Navigate Compiler Complexities: The latest nightly Mojo compiler release was detailed, with conversations around the pop method in dictionaries, Unicode support in strings, and other GitHub issue and PR delibarations.

Peering into SIMD Optimization: Members engaged in discussions around optimizing SIMD gather and scatter operations in Mojo, conquering challenges such as ARM SVE and memory alignment, with suggestions on minimizing gather/scatter operations and tips for sorting scattered memory for iterative decoders.


CUDA MODE Discord

Kubernetes: Necessity or Overkill?: Some members argue managed Kubernetes services like EKS may efficiently replace on-prem ML servers, despite others noting Kubernetes isn’t essential for ML infrastructure; decision should be tailored to project requirements.

Triton Gets a Makeover: Updates to the Triton library include a pull request improving tutorial readability and new insights into how GPU kernel specifics affect maximum block size.

Wrangling with SASS and Complex Operations: Engineers discuss academic resources on SASS, and deliberate on the merits of ā€œcucomplexā€ versus ā€œcuda::std::complexā€ for atomic operations on advanced NVIDIA architectures.

Torch Tricks for Efficient Memory Use: Users discover that Torch’s native * operator doubles memory usage whereas mul_() doesn’t, and torch.empty_like outperforms torch.empty for CUDA device allocations.

Activation Quantization Takes Center Stage at CUDA: Focus shifts to activation quantization using features like 2:4 sparsity and fp6/fp4 on newer GPUs, with an eye to integrating these into torch.compile for enhanced graph-level optimizations.

Torchao 0.2 Ushers Custom Extensions: The torchao 0.2 release on GitHub introduces custom CUDA and CPU extensions, and the integration of NF4 tensors with FSDP for improved model training.


Eleuther Discord

  • SF Seeks Safety Specialists: A newly established San Francisco office of the UK Artificial Intelligence Safety Institute (AISI) is offering competitive salaries to attract talent. They’re engaging in collaborations, including a UK-Canada AI safety partnership.

  • A Call to Action Against SB 1047: Stakeholders in the AI community are mobilizing against California’s SB 1047, arguing the bill could threaten open-source AI development with its stringent regulatory measures, as detailed in this analysis.

  • FLOP-Sweating the Details: Intricate discussions emerged on the computation of FLOPs for attention mechanisms, referencing the EleutherAI cookbook for FLOP calculations, highlighting the necessity to include QKVO projections.

  • Multi-Modal Models Making Headlines: Discussions centered on improving AI models through multi-modal training, including the benefits observed in CLIP when incorporating audio for zero-shot classification. Performance enhancements without emergent capabilities were noted in models like ImageBind.

  • Efficiency in MoE Spotlighted: New research introduces MegaBlocks, a resource-efficient system for MoE training that forgoes token dropping and utilizes block-sparse operations, offering considerable enhancements in training efficiency.


Nous Research AI Discord

  • Temporal Conquers Workflow Management: Following discussions on workflow orchestration, a guild member has confirmed the selection of Temporal.io over Apache Airflow due to its robust features.

  • Navigating the AI Labyrinth: Members highlight various challenges such as the ineffective LLM leaderboard and Chatbot Arena’s skewed ratings. Microsoft’s Copilot+ presentation stirred chat, and the unveiling of the Yi-1.5 model garnered attention for addressing different context size needs.

  • Research Initiatives Thrive: The Manifold Research Group’s continued progress in the NEKO Project reflects the community’s drive towards developing comprehensive models, further underlined by the Phi-3 Vision’s release aligning vision and text with fine-tuning and optimization techniques.

  • Picturing AI Boundaries: Creative exploration via ASCII and generated simulation images spurred discussions on the functional and symbolical capacities of AI, particularly the applications of WorldSim.

  • Knowledge in Motion: A shared timelapse of an Obsidian knowledge graph and the call for support with public evaluation methods for rerankers reflect the dynamic and collaborative nature of the engineering community.


LAION Discord

Sky Voice Grounded: OpenAI has temporarily halted the use of the Sky voice in ChatGPT due to user feedback; the company is working to address these concerns. The decision strikes a chord with ongoing discussions about AI-generated voices and the ethical considerations inherent in such technologies. Read the tweet

CogVLM2: Use with Caution: The CogVLM2 model, which was noted for its 8K content length support, comes with a controversial license that restricts usage against China’s national interest, stirring discussions about real open-source principles. The license also stipulates that any disputes are subject to Chinese jurisdiction. Review the License

AI Copilot: From Code to Life’s Companion?: Mustafa Suleyman’s teaser of the upcoming Copilot AI that can interact with the physical world in real-time sparked a variety of reactions, reflecting the community’s mixed sentiments towards the increasingly blurred lines between AI assistance and privacy. See the tweet

ScarJo’s Voice DoppelgƤnger Dilemma: The use of a voice resembling that of actress Scarlett Johansson by OpenAI’s voice assistant sparked a debate on ethical boundaries and legal issues around AI’s mimicking of human voices, particularly celebrities.

Sakuga-42M Dataset Disappears Amidst Bot Backlash: High demand and automated downloading led to the removal of the Sakuga-42M dataset from hosting platforms, fueling a conversation on the challenges of maintaining accessible datasets in the face of aggressive web scraping. Hacker News Discussion


Interconnects (Nathan Lambert) Discord

  • OpenAI’s Voice Controversy Halts ā€œSkyā€: OpenAI has halted the use of ā€œSky,ā€ a voice AI resembling Scarlett Johansson, due to legal pressure and negative public perception, highlighting the ethical concerns of voice mimicry and consent. The incident is reminiscent of controversies involving impersonations of public figures like musicians, sparking discussions on accountability and the need for clear ethical guidelines in the AI industry.

  • Anthropic’s Quantum Leap in Compute: Anthropic has ramped up its compute resources to four times that of its previous model Opus, stirring the community’s interest regarding what the company has in the pipeline. Details are scarce, but the magnitude of compute increase points to significant developments.

  • AI Arena Faces Hard Prompts Challenge: The introduction of the ā€œHard Promptsā€ category by LMsysorg has turned up the heat in AI model evaluations, proving particularly strenuous for models like Llama-3-8B which showed a notable performance dip against GPT-4-0314. The rigorous evaluation raises questions about the effectiveness of current judge models, such as Llama-3-70B-Instruct.

  • OpenAI’s Superalignment Commitment Breach: OpenAI faces scrutiny over allegations from a Fortune article that it reneged on a promise to allocate 20% of its computing power to their Superalignment team, leading to a team shake-up. This revelation sparks dialogues on the prioritization between product development and AI safety, with some viewing the company’s move as a predictable deviation from its commitments.

  • The Domain Deal and AI Dataset Dilemma: Nathan Lambert’s purchase of the domain rlhfbook.com for a dealing price of $7/year, and joking banter about the potential legal risks associated with using the AI Books4 dataset to train LLMs, spotlight both the quirky side of AI development and the serious legal considerations of data use. The reference of Microsoft Surface AI experiencing latency raises questions about the balance between local processing and cloud-dependent safety verifications, suggesting an area for potential optimization.


Latent Space Discord

  • ā€œMemory Tuningā€ Raises the Bar for LLMs: Sharon Zhou of Lamini introduced a new technique called ā€œMemory Tuning,ā€ which claims to significantly reduce hallucinations (<5%) in Language Models (LLMs), surpassing the performance of LoRA and traditional fine-tuning methods. Details on early access and further explanations are pending (Sharon Zhou’s Tweet).

  • Scarlett Johansson’s AI Voice Controversy: OpenAI temporarily ceased using an AI-generated voice similar to Scarlett Johansson’s after legal action was suggested by her lawyers, stirring debates about likeness and endorsements (NPR Article).

  • Scaling Up: Scale AI’s Billion-Dollar Injection: Scale AI secured $1 billion in funding at a $13.8 billion valuation, planning to use the investment to enhance frontier data and target profitability by the end of 2024, with Accel leading the round (Fortune Article).

  • Microsoft Unveils Phi 3 Model Lineup: Microsoft released Phi 3 models at MS Build with benchmark performances competitive to Llama 3 70B and GPT 3.5, supporting context lengths up to 128K and released under the MIT license (Tweet about Phi 3 Models).

  • Introducing Pi: The Emotionally Intelligent LLM: Inflection AI announced a shift towards creating more emotive and cognitive AI models, with more than 1 million daily users interacting with their empathetic LLM ā€œPi,ā€ showcasing AI’s transformative potential (Inflection AI’s Announcement).


OpenRouter (Alex Atallah) Discord

  • Rate Limiting Ruffles Feathers: Azure’s GPT-32k model has been hitting token rate limits, with users citing specific issues when making requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview.

  • Phi-3 Models Gain Traction: The community has been exploring Phi-3 models for superior reasoning with data, examining models like Phi-3-medium-4k-instruct, which uses supervised fine-tuning, and Phi-3-vision-128k-instruct, that features direct preference optimization.

  • New Twist on LLM Interaction Coming Up: A novel approach for interaction with LLMs has been circulating, termed ā€œAction Commandsā€, and a discussion thread sharing experiences and seeking feedback is available here.

  • Conciseness vs. Verbosity Debate Continues: Strategies for managing verbosity in models like Wizard8x22 are being evaluated, with some members advocating for a decrease in repetition penalty to ensure more concise outputs.

  • OpenRouter Shows Open Wallet for Non-Profits: OpenRouter discussed its 20% margin pricing policy in response to a user’s Error 400 billing issue and their request for non-profit discounts.


OpenAccess AI Collective (axolotl) Discord

  • Grok Enthusiasts Gear Up: AI engineers are showing enthusiasm for training Grok using the PyTorch version, discussing potential enhancements with torchtune integration, and comparing compute platforms, including Mi300x versus H100s.
  • Sharp Turn in Mistral Finetuning: Members are troubleshooting Mistral 7B finetuning issues, with proposals ranging from full finetuning to Retrieval-Augmented Generation (RAG) techniques to address content retention, as noted in a shared configuration guide.
  • OOM Woes and Wisdom: Out-of-Memory (OOM) errors are a central topic, with a multitude of solutions including gradient accumulation steps, mixed precision training, model parallelism, batch size adjustments, and DeepSpeed ZeRO optimization being put forward to tackle VRAM limitations, with more details on Phorm.ai.
  • M3 Max Takes the Stage: The M3 Max chip earns praise for its LLM performance capabilities, with recommendations to equip it with 96GB of RAM to get the most out of large language models.
  • Code Debacles and Python Queries: Conversations include troubleshooting Syntax Errors in the Transformers library involving CohereTokenizer with the exploration of faster alternatives, as discussed in a GitHub pull request, and the search for a Python library to accelerate the speech-to-text to LLM to speech synthesis chain.

LlamaIndex Discord

Memary Makes Memories: An upcoming webinar focused on memary, an open-source long-term memory system for autonomous agents, promises deep dives on its use of LLMs and neo4j for knowledge graph generation. Scheduled for Thursday at 9am PT, engineers can join by registering here.

Knack for Stacking RAG Techniques: In the realm of retrieval-augmented generation (RAG), @hexapode will share advanced strategies at PizzerIA in Paris, while Tryolabs and ActiveLoop will present at the first in-person meetup in San Francisco next Tuesday—sign up here.

GPT-4o Integrates with LlamaParse: LlamaIndex.TS documentation is enhanced, and GPT-4o now seamlessly works with LlamaParse for analyzing complex documents. Further, you can safely execute LLM-generated code using Azure Container Apps as per their latest offering.

Resolving Twin Data Quandaries: Engineers discussed methods to compute unique hashes for documents to avoid duplicates in Pinecone and examined workarounds for dealing with empty nodes in VectorStoreIndex.

Streamlining Systems and Storage: Insights were shared on how to modify an OpenAI agent’s system prompt using chat_agent.agent_worker.prefix_messages, and the merits of utilizing Airtable over Excel/Sqlite due to its Langchain integration—info available here.


AI Stack Devs (Yoko Li) Discord

  • Emotive AI On The Horizon: Inflection AI is reportedly planning to integrate emotional AI into business bots, raising prospects for more empathetic AI companions, detailed in a VentureBeat article. The conversation also touched on AI characters, with a Just Monika reference from Doki Doki Literature Club clarified through a GIF from Tenor.

  • Cracking AI Town’s Memory Woes: Community feedback indicates that AI characters in AI Town often fail to remember past interactions, leading to repeated dialogues. It was advised to tweak convex/constants.ts to adjust the NUM_MEMORIES_TO_SEARCH and ease the retrieval of past exchanges.

  • Overcome SQL Schema Confusion: Engineers shared SQL queries and tools for exporting AI Town conversation data, including links to GitHub repositories like townplayer and an explanatory Twitter thread, facilitating data manipulation and understanding.

  • Introductions to 3D AI: A tease of an ongoing project involving 3D character chatbots was mentioned, with the recommendation to check out further details in another channel within the community.

  • Animated Explanation Lacks Impact: A playful discussion around the cultural impact of AI waifus was noted, underlining both the humor and significance of AI character development in user interfaces.


LangChain AI Discord

LLMs Tangle with Text Types: LLMs, including structured and unstructured data handlers like Hermes 2 Pro - Mistral 7B and OpenAI’s chatML, don’t have innate preferences for text types but excel with finetuning.

LangChain’s Community Contributions: The langchain-core package is streamlined for base abstractions, while langchain-openai and langchain-community house more niche integrations, detailed in the architectural overview.

Sequential Chains in Action: A YouTube tutorial has been pointed out for setting up sequential chains, where one chain’s output becomes the next one’s input.

Commissions from Chat Customizations: An affiliate program entices with a 25% commission for the ChatGPT Chrome Extension - Easy Folders, detailed here, despite some users reporting issues with the extension’s performance.

Agent Upgrades and PDF Insights: Transitioning from LangChain to the newer LangGraph platform has been expounded in a Medium article, alongside a guide to querying PDFs with Upstage AI solar models, available here.


OpenInterpreter Discord

AI-Empowered DevOps on the Rise: A full-stack junior DevOps engineer is creating a lite O1 AI project with the prospect of providing discreet auditory assistance for various DevOps tasks, seeking community insights for development and practical applications.

OpenInterpreter’s Symbiosis with Daily Tech: Engineers are exploring how Open Interpreter can streamline their workflow, from code referencing across devices to summarizing technical documents, underlining the practical impact of AI in everyday technical tasks.

Combining Voice Tech with OpenInterpreter: A community member is integrating Text-to-Speech with Open Interpreter and has been directed to the relevant GitHub repository to further their project.

Connection Queries and Missing Manuals: One member sought help with linking their laptop to a light app despite the absence of instructions in the provided guides, while another requested advice on assembling 3D printed parts for their version of Open Interpreter lite 01.

Humorous Nod to Misssed Opportunities: The user ashthescholar. lightheartedly noted a missed opportunity in naming conventions, showcasing the playful side of technical communities.


Cohere Discord

  • Codegen-350M-mono Tackles Compatibility: A solution to compatibility issues with using Codegen-350M-mono in Transformers.js is provided through an ONNX version shared by members, indicating successful cross-platform implementation.
  • Translating with CommandR+: For Korean-English translation tasks, CommandR+ has been highlighted as an effective tool, with the Chat API documentation serving as a resource with sample code and usage instructions.

Datasette - LLM (@SimonW) Discord

  • Johansson and OpenAI’s Voice Controversy: OpenAI has paused the use of the Sky voice in GPT-4o, substituting it with Juniper, amid copyright claims and an issued statement from Scarlett Johansson.
  • GPT-4o’s Unified Modal Approach: GPT-4o has augmented its capabilities by integrating a unified model for text, vision, and audio which enhances emotional understanding in interactions but could complicate the model’s performance and potential use cases.
  • Lem’s Take on System Reliability: Engineers shared a perspective from Stanisław Lem’s work, advocating for the construction of resilient rather than perfectly reliable systems, acknowledging the inevitability of system failures.
  • Voice Cloning’s Moral Maze: Engineers discussed the nuanced ethical and legal challenges posed by voice cloning technologies, cautioning against sole reliance on legislation for protection of identity.
  • All Eyes on Qualcomm’s New Kit: Qualcomm’s launch of the Snapdragon Dev Kit for Windows was met with excitement, boasting specs such as a 4.6 TFLOP GPU, 32GB RAM, and 512GB storage; available for $899.99, drawing comparisons to Apple’s Mac Mini. Read more about the dev kit.

DiscoResearch Discord

  • SFT vs Preference Optimization Debate: A community member questioned the necessity of Supervised Fine-Tuning (SFT) when Preference Optimization seems to achieve a similar outcome by adjusting the probability distribution for both desired and undesired outputs.

  • Phi3 Vision Gains Recognition: Phi3 Vision, a 4.2 billion parameter model, received praise for its impressive low-latency live inference capabilities on image streams, with potential applications in robotics highlighted in Jan P. Harries’s post.

  • Model Matchup: Phi3 Vision vs Moondream2: The community compared Phi3 Vision and Moondream2 on image inference tasks, noting Moondream2’s reduced hallucinations but issues with some datasets.

  • New Models from Microsoft: Microsoft introduced new AIs with 7 billion and 14 billion parameters, with mentions of these releases only providing the instruct versions, sparking interest and discussion among community members.

  • Further Discussion Required: The insights provided prompted further discussion, likely leading the community to deep-dive into the efficacy and applications of these models.


Mozilla AI Discord

  • SQLite-VeC In The Spotlight: Alex introduced sqlite-vec, a new SQLite extension for vector search, describing its use for features like RAG and semantic search; the extension is compatible with cosmopolitan and is currently in beta.
  • Diving into ā€˜sqlite-vec’: A detailed blog post by Alex unveils the aspirations for sqlite-vec to outshine sqlite-vss with better performance and easier embedding in applications; binaries and packages will be available for various programming environments.
  • Call to Collaborate and Experiment: Acknowledging that sqlite-vec is in beta, Alex is offering his support to help anyone interested in integrating or troubleshooting the extension within their projects.
  • Community Buzz for Llamafile Integration: The integration possibilities of sqlite-vec with Llamafile have sparked excitement among guild members, highlighting the extension’s potential to advance current project capabilities.

LLM Perf Enthusiasts AI Discord

GPT-4o Outshines Its Predecessors: A Discord guild member detailed a notable performance leap in GPT-4o over GPT-4 and GPT-4-Turbo in the domain of complex legal reasoning, emphasizing the significance of the advancement with a LinkedIn post.


MLOps @Chipro Discord

  • Manifold Research Group Calls for Collaboration: The Manifold Research Group, an open-source R&D lab focusing on generalist models and AI agents, is seeking collaborators and has shared links to their research log, Discord, and GitHub.
  • NEKO Project Charts Course for Open-Source AI: The NEKO Project is ambitiously building a large-scale, open-source generalist model that incorporates a diverse array of modalities, including tasks in control and robotics, details of which are outlined in their project document.

The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

LLM Finetuning (Hamel + Dan) ā–· #general (225 messagesšŸ”„šŸ”„):

  • Mastering PDF Extraction with Python and OCR: Members shared tools and code snippets for PDF text extraction using PyMuPDF and tesseract. One highlighted the efficiency of fitz with the sort=True option, while others discussed OCR solutions like ABBYY and MarkerAPI for handling complex and low-quality PDFs (PyMuPDF tutorial).

  • Exploring and Optimizing LLM Training and Fine-Tuning: Detailed discussions on optimizing LLM training setups, with references to tools like vllm for serving multiple users simultaneously. Users also shared fine-tuning workflows using pyenv, virtualenv, and addressed dependency issues in Axolotl (StarCoder2-instruct).

  • Handling Large Language Models and Memory Optimization: Participants explored methods for handling large language models, particularly on GPUs, and shared insights from new research. Discussions included memory tuning, using vLLM for efficient model serving, and recent findings from Anthropic on model interpretability (Claude Sonnet research).

  • Collaborative Learning and Resource Sharing: Attendees connected over shared resources and tools, such as the Vik’s Marker API for PDF processing and various GitHub repos for fine-tuning models. Many also shared their experience and sought collaboration on multilingual and domain-specific model fine-tuning (Marker API).

  • Workshop Logistics and Participation: Queries about session recordings, managing time zones, and accessing course materials were discussed, with confirmatory responses that all sessions will be recorded. Participants also reflected on the credit distributions from sponsors and the organizational structure of the course’s Discord meetings (Modal examples).

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #workshop-1 (141 messagesšŸ”„šŸ”„):

  • Creative Writing AI Sparks Interest: Members discuss creating AI for assisting in creative writing, focusing on prompt engineering to generate ideas and overcome writer’s block. Fine-tuning is suggested to align the model with specific genres or writing styles.
  • BERT and Sentence Transformers in Action: Members introduce the use of BERT-type models and sentence-transformers like all-MiniLM-L6-v2 for tasks like clustering and semantic search. Sample code shows practical usage of the model for encoding sentences.
  • Legal Document Summarization Debated: Discussion on using LLMs for summarizing legal documents and providing client support. The combination of fine-tuning, RAG, and prompt engineering is explored for tasks like legal research and strategy development.
  • RAG vs Prompting for Customer Support: A member reconsiders using fine-tuning versus prompt engineering for an LLM designed to help customers create feature tickets. Initial thoughts lean towards fine-tuning for tone and procedures, but later prompting is preferred due to practical considerations.
  • Mental Health and Medical AI Use Cases Emerge: Multiple members propose creating AI systems for medical coding, summarizing patient records, and offering mental health advice, utilizing fine-tuning and RAG. Examples include summarizing ICD-10 codes and providing targeted mental health insights.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #asia-tz (49 messagesšŸ”„):

  • Bangalore Meetup Gains Traction: Multiple members expressed interest in organizing a meetup for Bangalore. The idea received significant enthusiasm with users chiming in from Bangalore.

  • Inquiry about Non-English Language Model Fine-Tuning: An interesting exchange occurred on techniques for adding new languages to models without degrading performance. Suggestions included using a 90/10% data mix to minimize catastrophic forgetting and possibly employing techniques like layer freezing.

  • Japanese LLM Performance Discussion: A member shared extensive updates on Japanese language model development, mentioning various models and benchmarks. Links were provided to their benchmark framework and a Hugging Face model that matches GPT-3.5-turbo in Japan.

  • Link to Detailed Review on Training Datasets: A notable mention of the Shisa project and a review on public Japanese training sets provided insights into the challenges and methodologies in Japanese LLM development.

  • Multi-City Meetup Initiatives: Invitations were extended for meetups in various other locales, including NCR, Pune, Singapore, and Malaysia. Enthusiastic responses and commitments were noted from several members in these regions.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #🟩-modal (37 messagesšŸ”„):

  • Unlock Modal Credits, Get Decoding: Members received guidance on obtaining Modal credits by filling out the Modal hackathon credits form after signing up on modal.com. Credits amount to $500, valid for one year, and additional $30/month on the free tier.

  • Stay Active to Save on Modal Costs: A member shared tips on managing Modal service costs by setting container_idle_timeout to minimize charges during testing. Using GPU services prudently for workloads like LLM serving was emphasized, supported by a GitHub example.

  • Fine-tuning and Model Serving Tips: For effective fine-tuning and serving LLM models on Modal, modal serve is recommended for development over modal run. For optimized results, reference the TensorRT-LLM serving guide and engage in batch processing.

  • Smoother Experience with Modal Deployments: Members discussed operational issues like setting container_idle_timeout correctly and avoiding repetitive model loading. Valid usage of modal serve vs. modal deploy was clarified through community insights and links to relevant GitHub projects.

  • Join the Modal Slack for Faster Support: Members were directed to the Modal Slack for specialized support from the engineering team. Questions suitable for the general or LLMS channels were encouraged for quicker, around-the-clock responses.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #jarvis (3 messages):

  • Running Axolotl Interests Users: A user expressed interest in running Axolotl, asking specifically for a member’s attention. They shared a direct link to the relevant discussion.

LLM Finetuning (Hamel + Dan) ā–· #hugging-face (10 messagesšŸ”„):

  • Credits will be sorted soon: Updates on credit distribution will be provided shortly. Appreciation was shown for community patience.

  • Axolotl models search issue acknowledged: Users observed that they can filter but not search for axolotl models on HuggingFace. It’s explained that the search bar uses predefined tags to avoid confusion, and potential UI improvements are discussed to handle additional tags better.

  • Alternative way to filter axolotl models via code: A user shared a code snippet to filter all axolotl models using the Hugging Face API:

    from huggingface_hub import HfApi
    hf_api = HfApi()
    models = hf_api.list_models(filter="axolotl")
  • Positive feedback on hybrid sharding strategy: A member expressed enthusiasm for the energy and efforts focused on the HYBRID_SHARD strategy, which involves sharding models using Fully Sharded Data Parallel (FSDP) and DeepSpeed (DS) techniques.

Link mentioned: Models - Hugging Face: no description found


LLM Finetuning (Hamel + Dan) ā–· #replicate (4 messages):

  • Credits provision query causing confusion: A member expressed that they signed up with their email address but have not received the credits yet. In response, another member assured that the credits issue will be sorted out soon and thanked everyone for their patience.

  • Clarifying Replicate’s use case: A member inquired about the primary use case for Replicate, questioning whether it is meant to offer API endpoints for downstream tasks for firms or individuals. They also mentioned specific features like fine-tuning and custom datasets.

  • Registration mismatches being a common issue: Another member pointed out that their situation mirrored another user’s issue regarding different registration methods between Replicate and a conference. This highlights a recurring concern about consistency in user registration methods.


LLM Finetuning (Hamel + Dan) ā–· #langsmith (5 messages):

  • New Members Join Course: Two new members announced their enrollment in the course. One user mentioned not receiving their LangSmith credit after signing up.

  • Query About Free Credit: A member asked whether setting up billing is necessary to receive an additional 250 free credits on top of the existing 250. Another member reassured that credit allocation will be sorted out soon and updates will be provided.


LLM Finetuning (Hamel + Dan) ā–· #workshop-2 (613 messagesšŸ”„šŸ”„šŸ”„):

  • Debate on Discord Stages and Zoom Chat Integration: Members discussed the pros and cons of using Discord stages. One participant noted that stages ā€œare just audio onlyā€ and another confirmed it, suggesting it’s a voice/video/screenshare channel.

  • New Course Structure Explained: Hamelm outlined the three types of sessions for the course: Fine-Tuning Workshops, Office Hours for deeper Q&A, and Conference Talks. Calendar invite titles have been updated to clarify session types.

  • Technical Discussions on Fine-tuning: In-depth conversations about Llama3 model issues, hyperparameter importance, and multilingual capabilities. Participants referenced specific challenges and shared resources like Stanford’s Pyvene.

  • Resources and Tips Shared: Numerous links, blog posts, and papers were shared for further reading and resource pooling, such as Practical Tips for Finetuning LLMs and Axolotl’s GitHub.

  • Issues with Apple Silicon for Fine-tuning: Users discussed difficulties using Axolotl on Apple M1 due to bitsandbytes not supporting the architecture. Suggestions such as using Docker or mlx were provided as potential workarounds.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #jason_improving_rag (3 messages):

- **Jason's W&B course wows**: A user expressed excitement about Jason's session and mentioned being halfway through his **Weights & Biases (W&B) course**. They used the teacher emoji to show their admiration.
- **Prompt engineering curiosity peaks**: Another user inquired about Jason's systematic approach to prompt engineering, praising his extensive work on optimizing prompts. They were eager to learn his "recipe" during his workshop session.

LLM Finetuning (Hamel + Dan) ā–· #gradio (2 messages):

  • Gradio Maintainer Introduces Himself: Freddy, a maintainer of Gradio, a Python library for developing user interfaces for AI models, invited members to ask questions and share demos. He provided links to Gradio’s quickstart guide and another guide on how to build a chatbot in 5 lines of code.
  • Member Shows Interest in Gradio: A member expressed gratitude for the shared resources and mentioned they will eventually have questions, particularly related to an A1111-extension they had previously worked on and found challenging.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #askolotl (13 messagesšŸ”„):

  • Issue with bitsandbytes on macOS: An issue related to installing bitsandbytes on macOS is discussed in this GitHub thread. The specific error is ā€œNo matching distribution found for bitsandbytes==0.43.0 for macOSā€.
  • MLX support not yet available: A member pointed out that Axolotl does not yet support MLX, referencing an open issue on GitHub. MLX is praised for its efficiency in fine-tuning large language models on consumer hardware.
  • Fine-tuning comparison: OpenAI vs Axolotl: One user shared their experience using OpenAI for fine-tuning, stating it takes about 30 minutes and charges per token. They queried how Axolotl compares in terms of time and cost for fine-tuning.
  • Apple M1 not ideal for fine-tuning: A statement highlighted that Apple ARM (M1) does not support q4 and q8, making it less suitable for fine-tuning. The user was advised to rent a Linux GPU server on RunPod instead.
  • MLX-examples for guidance: For those interested in using MLX, a reference was provided to the MLX examples documentation on GitHub for further guidance.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #zach-accelerate (1 messages):

  • Accelerate your PyTorch with Accelerate: A member shared a presentation on Hugging Face’s Spaces introducing Accelerate, a library that simplifies running PyTorch code across any distributed configuration. The linked Accelerate documentation shows how to implement it with just a few lines of code.

  • Accelerate Features Quicktour: The Quicktour on Hugging Face illustrates Accelerate’s features including a unified command line interface for distributed training scripts, a training library for PyTorch, and Big Model Inference support for large models.

  • Examples to Get You Started: A collection of examples is available on Hugging Face’s GitHub, recommended to start with nlp_example. The examples showcase the versatility of Accelerate in handling various distributed training setups.

  • In-Depth Model Memory Estimators: Members shared links to a memory usage estimator and the TransformerAnalyzer tool which provides detailed FLOPS and other parameter estimates, useful for understanding model requirements.

  • Run Large Language Models Efficiently: The Can I Run it LLM Edition space, discussed on Hugging Face, focuses on inference abilities, highlighting LoRa applicability for efficient large language model deployment.

Links mentioned:


Perplexity AI ā–· #announcements (1 messages):

  • Perplexity AI partners with Tako for advanced knowledge search: ā€œWe’re teaming up with Tako to bring advanced knowledge search and visualization to our users.ā€ This allows users to search, juxtapose, and share authoritative knowledge cards within Perplexity, initially available in the U.S. and in English, with mobile access coming soon. Read about our partnership.

Link mentioned: Tako: no description found


Perplexity AI ā–· #general (735 messagesšŸ”„šŸ”„šŸ”„):

- **Loyalty to platforms debated**: One member shared their experience using Perplexity and Gemini, emphasizing that users have "zero loyalty" and praised Perplexity for its direct answers ([Tenor GIF](https://tenor.com/view/oh-no-homer-simpsons-hide-disappear-gif-16799752)).
- **Perplexity’s feature tips shared**: There was a discussion about using Perplexity with various functionalities, including understanding the API, tweaking search engine options in browsers like Firefox, and handling system prompts.
- **Perplexity temporarily down**: Multiple users reported issues with Perplexity being down; they sympathized over missing the service and speculated on maintenance and updates.
- **Model preferences and uses discussed**: Members compared models like GPT-4o and Claude 3 Opus, discussing their strengths and preferences for tasks such as creative writing and coding ([Spectrum IEEE article](https://spectrum.ieee.org/perplexity-ai)).
- **Interactive features in Perplexity**: Members were curious about and shared tips on using Perplexity's new features like Tako charts, with some mentioning tips like adding `since:YYYY/01/01` to improve search results. 

Links mentioned:


Perplexity AI ā–· #sharing (9 messagesšŸ”„):

  • Historical Questions Answered via Perplexity AI: A member shared a link asking ā€œQui est Adolf?ā€ featuring detailed historical insights. Explore here.

  • Understanding Ideal Structures in Mathematics: A link was posted addressing the question ā€œDoes every ideal?ā€ which delves into complex mathematical theories. Explore here.

  • Script Creation Query via Perplexity: A user shared a search for ā€œCreate a script,ā€ likely aimed at generating specific scripts or code snippets. Explore here.

  • Exploring Technical Concepts in Computing: One member asked ā€œwhat is layer?ā€ in a Perplexity AI search, touching upon detailed discussions in computing or machine learning. Explore here.

  • Discussion on Indoor Topics: Another search titled ā€œtalk about indoorā€ suggested a focus on indoor environments or activities. Explore here.


Perplexity AI ā–· #pplx-api (98 messagesšŸ”„šŸ”„):

- **Struggles with Perplexity API on Open WebUI**: A user reported issues with model compatibility, noting, "it works perfectly fine with OpenAI (Closed) and Groq, but maybe they don’t have the model names setup to work with PPLX." Another user suggested using `api.perplexity.ai` directly but discovered Perplexity doesn't have a `/models` endpoint, causing further complications.
- **Proxy Server Solution and Execution Assistance**: A workaround was proposed to create a local server that proxies the models and chat completions endpoints. A user mentioned completing the proxy and instructing, "you need to add the `--network=host` to your docker command" to fix localhost issues.
- **Docker Configuration Conversations**: Users discussed the intricacies of Docker configurations, with one summarizing the correct command, "docker run -d --network=host -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main," while troubleshooting connection issues.
- **Inquiries about Sending Images**: When asked, "Is there a way to send images via the API?", it was clarified that currently, Perplexity's API only supports text, stating, "they are just using Claude and Openai vision api," and the LLAVA models that support images are not available via API.
- **User Appreciation and Final Adjustments**: One user showed gratitude saying, "Thank you, šŸ™‚" while another user confirmed they needed to align Docker configurations to ensure proper API functionality. This indicates ongoing effort and collaboration to resolve the issues.

Links mentioned:

  • no title found: no description found
  • šŸ” Home | Open WebUI: Open WebUI is an extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs.

HuggingFace ā–· #announcements (1 messages):

  • Phi-3 Models Take the Stage: Microsoft released Phi-3 small (7B) and Phi-3 medium (14B) models with up to 128k context, achieving impressive scores on MMLU and AGI Eval. Check them out here!

  • $10M of Compute Up for Grabs: Hugging Face announced a $10M commitment to free GPU access through ZeroGPU, facilitating AI demo creation for indie and academic AI builders. Learn more about the initiative here.

  • Transformers 4.41.0 packed with new features: The latest update includes Phi3, JetMoE, PaliGemma, VideoLlava, and Falcon 2, as well as improved support for GGUF, watermarking, and new quant methods like HQQ and EETQ. Full release notes are available here.

  • LangChain Integration Simplified: New langchain-huggingface package facilitates seamless integration of Hugging Face models into LangChain. Check out the announcement and details.

  • CommonCanvas and Moondream Updates: CommonCanvas released the first open-source text-to-image models trained on Creative Commons images, with the largest dataset available on Hugging Face. Moondream now runs directly in browsers via WebGPU, improving user privacy.

Links mentioned:

  • Tweet from clem šŸ¤— (@ClementDelangue): GPU-Poor no more: super excited to officially release ZeroGPU in beta today. Congrats @victormustar & team for the release! In the past few months, the open-source AI community has been thriving. Not...
  • Tweet from Lysandre (@LysandreJik): From a model page to your Local App in seconds, the @huggingface Hub welcomes Local Apps! Suggest your favorite Local App leveraging the Hub there to get them added to the dropdown and ✨ deep linked...
  • Tweet from Omar Sanseviero (@osanseviero): Transformers 4.41.0 has lots of goodiesšŸ¤— 🄳 New models: Phi3, JetMoE, PaliGemma, VideoLlava, and Falcon 2. 🤯 GGUF support with from_pretrained šŸ¤ New quant methods: HQQ and EETQ šŸ” Watermarking sup...
  • Tweet from Philipp Schmid (@_philschmid): We are excited to announce huggingface-langchainšŸš€ A new open-source package to seamlessly integrate the latest open Models from @huggingface into @LangChainAI, supporting local models hosted models! ...
  • Tweet from apolinario (multimodal.art) (@multimodalart): Quite excited that CommonCanvas is JUST out! šŸ–¼ļø • First open source text-to-image models trained fully on openly licensed images (SD2 and SDXL architectures) • The dataset, with ~70M openly license...
  • Tweet from Xenova (@xenovacom): Moondream, your favorite tiny vision language model by @vikhyatk can now run directly in the browser on WebGPU! 🤯 Powered, of course, by Transformers.js and ONNX Runtime Web! šŸ¤— Local inference mean...
  • Tweet from Xenova (@xenovacom): You can now use šŸ¤— Transformers.js with Google Visual Blocks, a visual programming framework that lets you create machine learning pipelines in a no-code graph editor! šŸ› ļø Rapid workflow prototyping ...
  • Tweet from Ilyas Moutawwakil (@IlysMoutawwakil): Optimum-Benchmark on PyPI šŸŽ‰ But why now ? šŸ¤” Because it's getting integrated in Transformers' benchmarking workflow šŸ˜ Your favorite transformers will only get faster and lighter ; Kudos to @...
  • Tweet from Omar Sanseviero (@osanseviero): Curious about LLMs? Join this Fine-Tuning course with top experts! šŸš€ @huggingface is offering $501.42 in GPU credits for can Space demos, fine-tuning, inference, and more! Enjoy šŸ¤— https://maven.co...

HuggingFace ā–· #general (678 messagesšŸ”„šŸ”„šŸ”„):

  • Voice Models Showdown: A user shared links to two notable text-to-speech models, bark by Suno on Hugging Face and the paid service Eleven Labs, and inquired about the underlying models used in Udio.
  • Git LFS Upload Issues: Multiple users discussed troubleshooting issues related to uploading large files using git LFS to Hugging Face repositories. Suggestions included using the upload_file function from the huggingface_hub library.
  • Language Model Specifications: There was a discussion surrounding the largest language models with references to GPT-4 and Google’s 1.5 trillion parameter model, and an exploration into optimizing Falcon-180B and Llama models.
  • Hugging Face Store Anticipation: Users expressed excitement and impatience for the reopening of the Hugging Face merchandise store, highlighting a strong community desire for official swag.
  • Job Application Success: Congratulations and best wishes were shared with members who had applied for roles at Hugging Face, reflecting the community’s support and encouragement.

Links mentioned:


HuggingFace ā–· #today-im-learning (2 messages):

  • Working on ImageBind integration for Transformers: One member mentioned, ā€œWorking on adding ImageBind to transformers.ā€ While details were sparse, this suggests ongoing efforts to enhance the capabilities of the Transformers library.

HuggingFace ā–· #cool-finds (13 messagesšŸ”„):

  • Merve showcases PaliGemma’s document models: ā€œQuoting merve (@mervenoyann) I got asked about PaliGemma’s document understanding capabilitiesā€¦ā€. For more details, refer to the tweet.

  • DeepSpeech inquiry: A member asked, ā€œhas anyone here worked with mozillas deepspeech?ā€, capturing interest around Mozilla’s DeepSpeech project.

  • LangChain to LangGraph transition guide: An in-depth guide on upgrading from legacy LangChain to LangGraph was shared through an article.

  • Leveraging LLMs in Magnolia CMS: A member shared insights into using LLMs for content creation in Magnolia CMS via this Medium post.

  • Curated 3D Gaussian Splatting resources: A comprehensive list of 3D Gaussian Splatting papers and resources, with significant potential in robotics and embodied AI, was highlighted in this GitHub repository.

Links mentioned:


HuggingFace ā–· #i-made-this (15 messagesšŸ”„):

  • Announcing Sdxl Flash Mini: A member announced the release of SDXL Flash Mini in collaboration with Project Fluently. The model is described to be fast and efficient, with less resource consumption while maintaining respectable quality levels SDXL Flash Mini.

  • SDXL Flash Demo by KingNish: Exciting new demo of SDXL Flash available on Hugging Face Spaces, demonstrated by KingNish. This provides a practical showcase of its capabilities SDXL Flash Demo.

  • Tokun Tokenizer Release: Inspired by Andrej Karpathy, a member developed a new tokenizer called Tokun, aimed at significantly reducing model size while enhancing capabilities. Shared both the GitHub project and article about testing.

  • Transformers Library Contribution: A member celebrated their PR merge into the Transformers library, which fixes an issue with finetuned AI models and custom pipelines. Shared the link for the PR here.

  • llama-cpp-agent Using ZeroGPU: The member shared the creation of llama-cpp-agent on Hugging Face Spaces utilizing ZeroGPU technology, indicating a promising advancement in computational efficiency llama-cpp-agent.

Links mentioned:


HuggingFace ā–· #reading-group (4 messages):

  • LLMs struggle with story enhancements: One member found that using llama3 8b 4bit to implement ā€œCreating Suspenseful Stories: Iterative Planning with Large Language Modelsā€ was ineffective. The LLM could critique the plot proficiently but failed to enhance it when fed the critique, exemplifying a notable limitation of current models.
  • Need for better prompts or bigger models: Another member acknowledged the trend where LLMs are better at critiquing than improving based on that critique, suggesting the need for at least 13b models or better prompts like chain-of-thought (CoT) to achieve more effective results.

HuggingFace ā–· #computer-vision (2 messages):

  • Seeking Advanced Vision Transformer Techniques: A user inquired about papers explaining patching techniques in Vision Transformers that are more advanced than VIT. They are looking for in-depth resources to expand their knowledge on this topic.
  • Zero-Shot Object Detection in Screenshots: Another user described a task involving finding all objects similar to a reference image within a webpage screenshot, emphasizing the need for zero-shot methods due to the reference image always changing. They are seeking guidance or solutions on achieving this capability efficiently.

HuggingFace ā–· #NLP (12 messagesšŸ”„):

  • LLMs Forget Conversations, Store Histories Manually: A user expressed difficulty with their bot not considering conversation history. Members advised to manually concatenate previous messages as LLMs inherently do not remember previous exchanges. GitHub repository for the bot.

  • Comparing Runtimes: Gemini 1.5 Flash vs Llama3-70B: A user noted that Llama3-70B provides accurate data pattern analysis and truthful answers, while Gemini Flash tends to hallucinate. This suggests Llama3-70B’s stronger performance in complex data scenarios.

  • Ensemble Model for Hallucination Detection: A member working on a master thesis shared their approach using an ensemble of Mistral 7B models to measure different types of uncertainty. They asked for questions potentially lying outside the model’s training data to test for increased epistemic uncertainty as an indicator of hallucinations.

  • Hosting Fine-Tuned LLMs on HuggingFace: A user asked about hosting a fine-tuned LLM on HuggingFace and using an API for requests. They were confident, saying, ā€œlike 99.9%ā€ sure it can be done.


HuggingFace ā–· #diffusion-discussions (10 messagesšŸ”„):

  • French-to-English translation request in Diffusion channel: A user initially posted in French, then translated their message to English, explaining an issue with the llmcord chatbot not retaining conversation history. Another member suggested that such queries are more appropriate for NLP channels rather than the Diffusion Discussions channel.

  • LLMcord chatbot conversation history tip: Another user recommended a solution for the conversation history problem by sending the history within the prompt. They shared a link to the LangChain documentation which explains how to manage chat message history.

  • Diffusion model denoiser issue and math inquiry: A user shared their struggle with implementing a diffusion model, mentioning success with the forward diffusion process but issues with the denoiser. They asked for advice on which math field to study, specifically inquiring about fields related to gaussians and normal distributions; another user suggested studying variational inference.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #general (402 messagesšŸ”„šŸ”„):

  • Scarlett Johansson sues OpenAI for voice replication: Reported details on Scarlett Johansson suing OpenAI for generating her voice and discussed potential legal implications. Members noted that OpenAI has since removed the voice amidst public backlash.
  • Phi-3 model release shakes things up: Microsoft released the Phi-3-Medium-128K-Instruct model on Hugging Face, boasting improved benchmarks and up to 128k context. Participants debated its performance and potential issues with the context length.
  • Colab issues linked to PyTorch’s T4 GPU detection: Due to PyTorch misidentifying Tesla T4’s capabilities, Colab notebooks misbehaved until an update from Unsloth’s side was implemented. A tweet by Daniel Hanchen confirmed the recognition glitch.
  • Diverse finetuning discussions: Discussions ranged from the use of multiple GPUs to fine-tuning models on Google Cloud vs. Colab. Practical nuances of fine-tuning included dataset handling, epoch configurations, and avoiding dataset shuffling for curriculum learning.
  • Optimizers and FSDP updates: Detailed exchanges about the intricacies of using 8bit optimizers with Fully Sharded Data Parallel (FSDP). Participants shared their troubleshooting methods for saving checkpoint issues and managing optimizer states across different GPUs.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #random (4 messages):

  • New Method Alert: MoRA: A user mentioned a new method called MoRA and expressed interest in trying out its vanilla implementation. Another user responded with enthusiasm, saying it ā€œlooks epic.ā€ arxiv link.

Link mentioned: MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning: Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings sugge…


Unsloth AI (Daniel Han) ā–· #help (246 messagesšŸ”„šŸ”„):

- **Upload models trained with Unsloth**: A user shared a model fine-tuned using Unsloth and uploaded to Hugging Face, asking about the best way to run it, particularly mentioning concerns about Ollama only working with predefined models. Another user recommended tools like Ollama, LM Studio, Jan, and GPT4ALL and pointed out that only the LORA adapters were uploaded.
- **Fine-tuning Mistral with dataset dependency issues**: A user faced issues with Mistral-instruct-7b overly depending on the dataset, giving erroneous or empty outputs for new inputs. Others suggested mixing datasets to help the model generalize better.
- **Issues with TRT and Flash Attention on T4s**: Multiple users experienced errors related to running Unsloth on Google Colab with T4 GPUs due to updates to PyTorch 2.3 and issues with Flash Attention. Specifying the dtype or following updated installation instructions helped mitigate the problem.
- **Use 4bit models due to VRAM limitations**: Users discussed challenges in fine-tuning models on devices with limited VRAM. Mentioned the utilization of 4bit quantized models to fit larger models within VRAM constraints, particularly for hardware like a GTX 3060 with 6GB VRAM.
- **Confirmation of recurring instructions in fine-tuning datasets**: Users explored the effectiveness of using repetitive instructions in fine-tuning datasets. The dialogue indicated curiosity and active experimentation with the approach but no definitive conclusion on its overall impact.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #showcase (13 messagesšŸ”„):

  • Dolphin-Mistral-2.6 Equally Matched with Fewer Samples: A member reported successfully matching the performance of dolphin-mistral-2.6 on instruction following evaluation using only ~20k samples, compared to millions used for the original model. The models kolibri-mistral-0427 and kolibri-mistral-0426-upd were discussed, highlighting differences in training data pipelines.

  • Upcoming Model Release: The user plans to publish the model within a few days and promised to share the training ā€œrecipeā€ soon, albeit with some proprietary data which might impact reproducibility slightly. A possible paper on these findings might be published later this year.

  • Community Reactions: The community reacted enthusiastically to the news, with multiple members congratulating and expressing excitement. One member shared their anticipation for an article detailing the lower sample training method, noting their personal challenge of not reducing training samples below 52k.


Stability.ai (Stable Diffusion) ā–· #general-chat (618 messagesšŸ”„šŸ”„šŸ”„):

- **Navigating Subscription Confusion**: Users expressed confusion over different websites offering subscriptions for Stable Diffusion, with some being identified as scams. The official site [stability.ai](https://stability.ai) was recommended as the legitimate source for accessing Stable Diffusion services.
- **Running Software Offline**: Concerns about running Kohya locally without an internet connection were discussed. Users confirmed that with proper model downloads and setup, it’s possible to run it offline.
- **Stable Diffusion Installation Struggles**: Several users sought help with installing and running Stable Diffusion and associated tools like ComfyUI. Guidance was offered on navigating dependencies and troubleshooting through terminal commands.
- **EU AI Act Worries**: The passing of the EU AI Act caused concern among users, particularly about its potential impact on AI-generated content and the introduction of watermark requirements. Many expressed skepticism about the practicality and enforcement of such regulations.
- **Benchmark Performance Confusion**: A user highlighted performance issues with SD generations on new hardware, suspecting thermal throttling as the cause. Community members suggested checking configurations and using diffusers scripts for better diagnostics.

Links mentioned:


OpenAI ā–· #annnouncements (1 messages):

  • Safety Update Announced at AI Seoul Summit: A new safety update has been shared in conjunction with the AI Seoul Summit. For more details, visit the OpenAI Safety Update.

OpenAI ā–· #ai-discussions (229 messagesšŸ”„šŸ”„):

  • GPT-4o Frame Sampling Discussion: Members discussed the video processing capabilities of GPT-4o, speculating it processes video at 2-4 frames per second. One member shared a link to a Community discussion describing the process of converting videos to frames for the model.
  • Passing Image Buffers to GPT-4o API: A member struggled with passing Buffer objects to the GPT-4o Vision API, and others suggested encoding it as a base64 data URL. They discussed ensuring the correctly set MIME type for the base64 string to avoid silent failures in the API response.
  • Microsoft Copilot and GPT-4o Integration: Members discussed the announcement of GPT-4o integration into Microsoft Copilot, promising real-time voice and video capabilities. They expect availability in the ā€œcoming weeksā€ and speculate on the advantages of the integrated system.
  • Controversy Over Scarlett Johansson’s Voice: Discussion on the controversy about OpenAI’s use of a voice similar to Scarlett Johansson in its Sky voice feature. Community pointed out the potential for legal and ethical implications, following Johansson’s lawyer’s intervention.
  • Microsoft’s New Phi-3 Models: Announcement of new Phi-3 models by Microsoft, including multimodal models integrating language and vision, available on Azure. Members showed mixed reactions and shared the link for further reading.

Links mentioned:


OpenAI ā–· #gpt-4-discussions (38 messagesšŸ”„):

  • Understanding GPT-4’s Context Window: A member asked about the ā€œcontext windowā€ for GPT-4 Omni’s 128,000 tokens, seeking clarification if it referred to the prompt size. Another member clarified that the context window is the maximum size of the prompt and response combined, referring to a help article.

  • Issues with JSON Response Lengths: A member faced issues with receiving large JSON responses despite configuring the system instructions to limit content to 200 tokens. They noted that using GPT-4 turbo resulted in shorter responses and planned to adjust the system instructions further.

  • Selling AI-Generated Art: Discussions affirmed that it is possible to sell AI-generated art, though the copyrightability of such art remains a separate, complex issue. One member mentioned that the public domain can be a source for sellable work since prompting AIs effectively is challenging.

  • Concerns with GPT-4 Evaluating Values: A discussion unfolded about GPT-4 struggling to correctly evaluate simple numerical expressions, revealing that it might benefit from relying on a code interpreter for accuracy.

  • Caution for Downloading Mac GPT App: Members advised waiting for an official prompt on their accounts for downloading the ChatGPT app for macOS, warning against unofficial links.


OpenAI ā–· #prompt-engineering (73 messagesšŸ”„šŸ”„):

  • Set Character Limits in GPT-4: To achieve a 150-character response limit, prompt the model with example outputs around 120 characters to prevent overshooting. One user shared examples of the model attempting this task, illustrating the difficulty.
  • Training Models for Specific Behaviors: For replicating the AI from the movie ā€œHer,ā€ define accurate behavior parameters and use input/output pairs to shape responses. Avoid negative instructions for clearer guidance.
  • Inconsistent Preciseness in Responses: Users discussed the challenge of getting exact answers from the model, such as specific ranges instead of general terms. Repeatedly asking for specifics helps, but models may ā€œdreamā€ or autocomplete when unable to provide accurate data.
  • Managing Token Limits to Avoid Overrun: Setting a max token parameter and crafting specific, succinct prompts can help manage the verbosity of outputs. Including a clear output template and limiting responses to one paragraph or sentence can improve conciseness.
  • Efficient Use of Prompt Engineering for Code: Users shared prompt strategies for efficient code generation, emphasizing precise indentation and role-based character prompts for collaborative coding environments. Examples included detailed prompts for creating and debugging full-stack applications.

OpenAI ā–· #api-discussions (73 messagesšŸ”„šŸ”„):

  • Setting character limits is tricky in GPT-4: A member requested advice on setting a response limit of 150 characters. Advice included giving example outputs of about 120 characters as the model often overshoots (ā€œIt will overshoot, thus the target’s smaller than the limit, so you’re hopefully not overā€).

  • Training model like AI in ā€˜Her’ sparked discussion: A user asked how to train a model to act like the AI in the movie ā€œHer.ā€ Suggestions included input/output pairs and avoiding negative instructions.

  • Exact language use trouble: A member discussed issues with the model giving vague answers despite instructions for precise data, such as nutrition labels or salary ranges. It’s suggested this might be due to autocompletion and instruction conflicts (ā€œit tends to follow the format less carefullyā€).

  • Stopping LLM from running on: Members discussed issues with models producing lengthy responses despite token limits in the API. Suggestions included using specific questions, asking for succinct answers, and employing an output template (ā€œprompt should request it to limit its answers to one succinct sentenceā€).

  • Prompt sharing and improvements: A user offered to share working prompts for building full-stack applications and noted errors in prompt engineering. Another member pointed out this might be more suitable for the Prompt Labs channel, noting their frustration with model verbosity and the usage of the Explore GPTs menu.


LM Studio ā–· #šŸ’¬-general (191 messagesšŸ”„šŸ”„):

  • Troubleshooting LM Studio Server Issues: A user had difficulties with LM Studio server logs being blank and non-responsive. The issue was resolved by running LM Studio with admin permissions to access log files properly.

  • AVX2 Instruction Set Confusion Cleared: Several members clarified that AVX2 instructions are essential for running LM Studio, and users can use tools like HWInfo to check if their CPU supports it. AVX2 is a hard requirement, and older CPUs without this will not support LM Studio.

  • Loading and Managing Models: Users discussed various issues related to downloading and running models in LM Studio. An effective strategy includes downloading models in GGUF format and ensuring all system prompts and settings are correctly configured.

  • Integration of LM Studio with Other Tools: Questions were raised about integrating LM Studio with tools like StarCoderEx and Continue.dev for enhanced functionalities. Some users experienced with these integrations provided helpful links Continue.dev integration instructions.

  • Common GPU and Performance Queries: Addressing frequent performance issues, it was highlighted that GPUs should have at least 8GB VRAM for efficient operation. Users also shared specific errors mentioning insufficient VRAM and outdated drivers as common causes, suggesting updates and GPU offload settings tweaks.

Links mentioned:


LM Studio ā–· #šŸ¤–-models-discussion-chat (57 messagesšŸ”„šŸ”„):

  • Successful model setup requires right prompts: A member asked how to use the MPT-7b-WizardLM model on LMStudio, and another advised using the correct quantization level and template, pointing to model-specific details on Hugging Face.
  • Image generation quality tips: Several members discussed improving image quality with local AI models like Automatic1111 and ComfyUI. Recommendations included using resources from Civit.ai and considering system specs like VRAM and RAM.
  • Phi-3-Small and Medium models released: Members mentioned the release of new Phi-3 models on Hugging Face with context lengths of 4K, 8K, and 128K tokens. Phi-3-Small-8K and Phi-3-Medium-4K were specifically highlighted.
  • Improving LLM response with specialized models: A user mentioned using the codeqwen model for better coding capabilities. Improvement suggestions included using finetuned models and leveraging advanced setups like ComfyUI for specialized tasks.
  • Local vision models struggle with specific prompts: A user reported issues with vision models not adhering to specific prompt queries. Multiple users suggested that local vision models typically do not handle multi-turn conversations effectively.

Links mentioned:


LM Studio ā–· #announcements (9 messagesšŸ”„):

  • Introducing Hugging Face integration with LM Studio: Users can now integrate Hugging Face models directly into LM Studio by clicking ā€œUse this model,ā€ which requires LM Studio 0.2.23 or newer. As highlighted, this feature ensures ā€œNo cloud, no cost, no data sent to anyone, no problemā€.
  • Model Download Customization: In the current version, users must manually choose the file they wish to download after selecting a Hugging Face model. Suggestions like setting a default quantization level or auto-downloading based on available RAM were discussed.
  • Compatibility Limitations: It was noted that not all models would be supported in LM Studio, especially many safetensor models. Only models in the GGUF format are currently compatible.

Link mentioned: Tweet from LM Studio (@LMStudioAI): 1. Browse HF 2. This model looks interesting 3. Use it in LM Studio šŸ‘¾šŸ¤— Quoting clem šŸ¤— (@ClementDelangue) No cloud, no cost, no data sent to anyone, no problem. Welcome to local AI on Hugging Fa…


LM Studio ā–· #šŸ“-prompts-discussion-chat (3 messages):

  • System Prompt tuning stops premature cut-offs: A member suggested that adding ā€œDo not prematurely cut off a responseā€ to the [system] prompt will help fix ongoing issues with incomplete responses. This insight aimed to enhance the chatbot’s response reliability.
  • Direct quotations improve instruction clarity: The member suggested quoting required text directly and adding instructions in the prompt such as ā€œConsidering the following text alone as input, .ā€ This method is proposed to refine the specificity of prompts for better outcomes.
  • Old posts humorously reaffirmed: A member humorously acknowledged the age of a previous post with ā€œDidn’t realize how old that post was. šŸ˜†ā€ This adds a light-hearted touch to the discussion’s context.

LM Studio ā–· #āš™-configs-discussion (1 messages):

  • LM Studio struggles with VPN on Linux: A user reported an issue where LM Studio cannot perform searches for models when connected via a VPN on Linux. They are seeking others who have encountered this issue and any possible solutions.

LM Studio ā–· #šŸŽ›-hardware-discussion (27 messagesšŸ”„):

  • Infinity Fabric Speed Sync Affects Performance: One member highlighted the importance of keeping infinity fabric (fclk) speed in sync with memory speed for optimal performance, suggesting ā€œfclk should be in sync with the memory speed, otherwise you will see performance degrading.ā€
  • Free Services and Energy Concerns: Free services like Groq and OpenRouter are recommended to avoid high costs. One user shared that their powerful rig with 144GB VRAM heats up the house significantly in warm weather.
  • RAM Speed Impact on Models: Upgrading RAM speed from 2133MHz to 3200MHz resulted in a performance increase for the Goliath model, but negligible improvement for other models beyond 2666MHz. It was suggested that iQuant might perform worse once VRAM capacity is exceeded.
  • Experimenting with Different Models: Testing with various Quant models revealed differences in performance between iQuant and regular Quant, with iQuant underperforming when VRAM capacity is exceeded.
  • Running LM Studio on Dual GPUs: A query about running LM Studio with two different GPUs was answered stating it is possible as long as both GPUs are of the same brand, with an example being ā€œboth nvidia or both amd.ā€

LM Studio ā–· #🧪-beta-releases-chat (4 messages):

  • Seeking Model to Impose 60-word Limit: A member sought assistance in making Meta Lama 3 Instruct adhere to a 60-word response limit. Another member suggested listing the attempted methods and their results to better troubleshoot the issue.
  • Searching for a More Suitable Model: The original poster queried if there is a better model than Meta Lama 3 for enforcing a strict response limit. They acknowledged the advice and planned to provide more details on their attempts.

LM Studio ā–· #autogen (13 messagesšŸ”„):

  • Null max_tokens causes cut-off issue: It’s noted that setting max_tokens to null causes responses to cut off after two tokens in LM Studio. The workaround is setting it to -1, which helps the local server function correctly.
  • CLI LMStudio Client solution shared: A member building a CLI LMStudio Client confirmed that setting the max_tokens to -1 resolves the issue of responses being cut off. Another contributor mentioned having to manually edit code in autogpt for it to work.
  • Autogen Studio fix methods debated: There was a discussion on whether this fix applies to the command line version only or can be implemented in Autogen Studio. Some confirmed success by changing the value in the root autogen package, hinting at similar effectiveness in Autogen Studio.
  • Manager agents reliability concerns: It’s suggested that manager agents are only reliable with OpenAI models. Testers have noted bugs and poor performance in selecting appropriate agents, recommending round-robin or hard-coded workflows until improvements are made.
  • Deleting cache might help: To address the cut-off issue, deleting application caches after setting max_tokens to -1 is advised. Members often face this problem and find cache deletion necessary for the fix to work.

LM Studio ā–· #amd-rocm-tech-preview (42 messagesšŸ”„):

  • Calling All Linux Fans with AMD GPUs: A member announced a call for Linux users with new-ish AMD GPUs to test an early version of LM Studio for Linux + ROCm and provided a link to the supported GPU list. Various users, from those with 6600xt to 7900XT GPUs, expressed interest, with comments like ā€œ6900xt reporting for dutyā€ and ā€œ6600xt here.ā€

  • Unsupported GPUs Seem to Work: Several users reported success running ROCm on GPUs not listed as officially supported. One member with a 6600xt mentioned, ā€œIt’s not listed on that supported ROCm list, but I already have ROCm running with it for Stable diffusion.ā€

  • Diverse Linux Distros in the ROCm Testers Group: Users running a range of Linux distributions, including Arch, Fedora, and Ubuntu, shared their experiences. One even noted the successful use of ROCm on a RX 6600xt using ā€œHSA_OVERRIDE_GFX_VERSION=10.3.0.ā€

  • CPU Usage Observations and Discussions: Discussions emerged around the CPU usage with ROCm on Linux, with one member humorously noting, ā€œah yes, 229% cpu usage,ā€ and another suggesting Linux speeds up processes. Comments about Linux’s performance included, ā€œit’s fast doe,ā€ and debate over Linux vs. Windows RAM usage.

  • Arch Linux and ROCm Compatibility Praise: Members praised the ease of setting up ROCm and HIP SDK on Arch Linux. Quickdive noted, ā€œarch makes rocm and hip sdk so easy,ā€ with many agreeing and sharing similar success stories.


Modular (Mojo šŸ”„) ā–· #general (38 messagesšŸ”„):

  • Mojo open community meeting kicks off: Mojo’s open community meeting is live, and you can join via the provided Zoom link. A member inquired about the availability of the recording, which will be shared later.
  • Recording available on YouTube: The recording of the Mojo Community Meeting is now available on YouTube.
  • Zoom account confusion cleared up: Some members were confused about needing a commercial Zoom account to join. It was clarified that only a basic account is needed, though there was a possible misconfiguration initially.
  • Missed meeting woes: Helehex expressed sadness about missing the meeting due to not receiving notification pings. Upcoming meeting details were provided, including a calendar subscription option.
  • IPC in Python discussions: Moosems_yeehaw sought advice on IPC in Python to avoid main thread lag in a Tkinter app example. Various suggestions were given, including threading, message queues, and async IPC modules.

Modular (Mojo šŸ”„) ā–· #šŸ’¬ļø±twitter (2 messages):

  • Modular shares latest tweet: A link to a Modular tweet has been shared.
  • Another tweet from Modular: Another link to a Modular tweet was also shared.

Modular (Mojo šŸ”„) ā–· #āœļø±blog (1 messages):

  • K-means Clustering in Mojo for Speed: A new blog post aims to teach readers how to implement the k-means clustering algorithm from scratch in both Python and MojošŸ”„, emphasizing performance benefits in Mojo. The post also provides a detailed guide for porting Python code to Mojo to achieve significant speed improvements. Read more on Modular’s Blog.

Link mentioned: Modular: Fast⚔ k-means clustering in MojošŸ”„: Guide to porting Python to MojošŸ”„ for accelerated k-means clustering: We are building a next-generation AI developer platform for the world. Check out our latest post: Fast⚔ k-means clustering in MojošŸ”„: Guide to porting Python to MojošŸ”„ for accelerated k-means clusteri…


Modular (Mojo šŸ”„) ā–· #šŸ”„mojo (258 messagesšŸ”„šŸ”„):

  • Learning Mojo and ML with a tutorial: A user asked if they should implement a machine learning tutorial in Mojo to learn both Mojo and ML. Another user recommended trying it, noting that Mojo doesn’t support classes but structs can be used, and some numpy functionalities might need to be implemented.

  • Modular Community Meeting Notice: A user informed the channel about an ongoing Modular Community meeting, sharing a Zoom link. Another user commented on a statement by Chris Lattner during the meeting about moving Tensor out of the standard library.

  • Null Terminator Handling in Strings: A user struggled with handling null terminators when converting bytes to strings and iterating over them. They shared their efforts and the solution found through community help, including using the append(0) method to handle null terminators correctly.

  • Mojo’s Asynchronous Programming Debate: Members discussed the pros and cons of function coloring in asynchronous programming. Some argued for exploring colorless async programming to simplify API usage and reduce burden, while others highlighted the benefits of retaining function coloring for safety and reasoning about code behavior.

  • Lightbug HTTP Framework Usage: A user asked about using the Lightbug HTTP framework for making GET requests and decoding responses. After struggling with the implementation, the maintainer and community provided assistance and moved the conversation to an issue on GitHub for further discussion.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #performance-and-benchmarks (13 messagesšŸ”„):

  • Optimizing SIMD Gather and Scatter in Mojo: A member questioned whether Mojo’s SIMD gather and scatter operations are fully optimized and discussed aligning values to 32-bit boundaries for potential speed improvements. Another member shared their experience, indicating gather and scatter are well-optimized, though alignment benefits are uncertain.

  • Challenges with ARM SVE and SIMD Width: Discussion highlighted the complexities of ARM Scalable Vector Extension (SVE), variable vector widths, and speculative loads across page boundaries. A member noted that LLVM struggles with SVE formats, compounded by limited CPU availability.

  • Consider Reducing SIMD Operations: A member suggested reducing the number of gather/scatter operations by always using the highest SIMD width possible, involving more index manipulation for better performance. They plan to update and share results from their MoCodes project accordingly.

  • Sorting for Scattered Memory Access: Another member recommended sorting an array of pointers to optimize performance when dealing with several kilobytes of scattered memory, particularly for iterative decoders.

  • Vectorized DTypePointer Memset Implementation: A member shared that a vectorized implementation of memset for 100,000 bytes performs 20% faster than LLVM’s call, while the performance advantage flips for 1,000,000 bytes. The member expressed concern about reliability, noting the use of ā€œclobber memory.ā€


Modular (Mojo šŸ”„) ā–· #nightly (31 messagesšŸ”„):

  • New Mojo nightly compiler release: A new nightly build of the Mojo compiler (version 2024.5.2012) has been released. You can view the diff since the last release and changes since the last stable release.

  • Dict pop method issue: Discussed issues with the pop method in dictionaries, particularly related to difficulties in moving values out of a DictEntry and calling __del__ correctly. A proposed solution involves changing the value field type from V to Optional[V].

  • GitHub issue and PR discussions: Users discussed several GitHub issues and PRs, such as issue #2696 regarding a ā€œwhile loop logic causes seg faultā€ and PR #2739 for changing argument messages in assertions to be keyword-only.

  • Delayed nightly release on 5/21: Multiple users noted a delay in the nightly release, attributed to potential CI infra/release issues. It was resolved later, and the nightly build for 5/21 was confirmed to be available.

  • Unicode support in strings proposal: A detailed discussion took place about implementing Unicode support in strings, proposing various internal representations and debating the trade-offs of null termination. The idea is to optimize aggressively, ensuring efficient memory usage and inter-operability.

Links mentioned:


CUDA MODE ā–· #general (5 messages):

  • Managed Kubernetes for ML workloads debated: A member questioned the need for managing on-prem servers for ML serving, suggesting that managed Kubernetes services like EKS could be an alternative. They expressed confusion over the perceived differences between scaling web servers and ML tasks, except for the occasional need for GPUs.

  • Kubernetes not essential for ML infra: It was clarified that Kubernetes is used mainly for infrastructure purposes and is not inherently tied to ML work. The choice between using Kubernetes or not is up to the individual project needs.


CUDA MODE ā–· #triton (13 messagesšŸ”„):

  • Hardware and GPU kernel nuances: The maximum block size is influenced by the hardware, kernel specifics, and dtype, as each thread loads multiple elements to utilize vector instructions on GPUs effectively.
  • CUDA scheduling principles hold true: Blocks are scheduled to one SM and share memory within the block, similar to CUDA, ensuring consistency in GPU processing.
  • Team praises and recommendations: Byronhsu1230 expressed gratitude for Horace’s informative posts, suggesting the need for a Triton compiler article. The team appreciates the valuable insights provided by Horace.
  • Enhancing Triton tutorials: Lancerts shared a GitHub pull request detailing minor changes to Triton tutorials to improve readability and consistency, tested on GPU with successful results.

Link mentioned: Small refactor of the tutorial5 and small change of tutorial1 by lancerts Ā· Pull Request #3959 Ā· triton-lang/triton: Changes are tested on GPU, with parity on the execution. In tutorial 1, change gbps = lambda ms: 12 * size / ms * 1e-6 to gbps = lambda ms: 3 * x.numel() * x.element_size() / ms * 1e-6. This is m…


CUDA MODE ā–· #cuda (2 messages):

  • Seeking SASS Papers: A member asked if anyone has recommendations for papers relating to SASS. The query was straightforward and looking for academic resources.
  • Debate Over cucomplex vs cuda::std::complex: Another member is targeting Volta Ampere and Hopper architectures and discussed using either ā€œcucomplexā€ or ā€œcuda::std::complexā€ for atomic operations. They sought advice on which would be more appropriate for their needs, specifically for atomic add operations on x and y.

CUDA MODE ā–· #torch (21 messagesšŸ”„):

  • Torch native multiplication doubles memory usage: A member noticed that the native * operator in Torch seems to double the memory, even when done in place. After examining the issue, they found that using mul_() resolves this and results in flat memory consumption.

  • torch.empty_like and torch.empty performance difference: A user shared a PSA highlighting that torch.empty_like is much faster than torch.empty, and similarly, torch.empty(..., device='cuda') performs better than torch.empty(...).to('cuda'). Another user confirmed that this behavior is also present in NumPy, notably with np.zeros_like.


- **Member finds the discussion amazing**: One member described the talk as *"amazing."* 
- **Clarification requested**: Another member asked for elaboration on why the talk was considered *"amazing."*

CUDA MODE ā–· #jobs (3 messages):

  • Focus on Activation Quantization at CUDA: ā€œOur top focus is on activation quantization (fp8/int8).ā€ A member discussed the need to fuse small operations around GEMMs with Cutlass epilogue fusion to realize inference acceleration.
  • Next-Gen GPU Features Utilization: The member highlighted plans to use 2:4 sparsity and fp6/fp4 in new GPUs.
  • Torch.compile Backend Development: The team is developing a user-defined backend for torch.compile to enable graph-level optimizations and improve performance through more fusion.
  • Underoptimized vLLM Components: Identified MoE kernel and sampling kernels as underoptimized areas in vLLM that are current priorities.
  • LinkedIn Offers Assistance: Another member from LinkedIn showed interest in collaborating, asking for details on ā€œgraph-level optimization.ā€

CUDA MODE ā–· #beginner (1 messages):

norton1971: anyone please?


CUDA MODE ā–· #torchao (1 messages):

  • torchao 0.2 release hit the spotlight: The new release of torchao 0.2 is now available on GitHub. It features a custom CUDA and CPU extension with binary support, among other enhancements.
  • Custom extensions in action: One member used the new version to set up some fp6 kernels. This highlights the flexibility and extensibility offered by the new custom op registration mechanism.
  • Speedy kernels merged: Speedy kernels for GaLoRe, DoRA, and int4/fp16 were merged by another member. These improvements are aimed at enhancing performance and efficiency.
  • NF4 tensors and FSDP compatibility: Building on previous work, this release supports NF4 tensors that can compose with FSDP. A detailed blueprint was provided for integrating smaller dtypes with FSDP, ensuring better resource utilization.

Link mentioned: Release v0.2.0 Ā· pytorch/ao: What’s Changed Highlights Custom CPU/CUDA extension to ship CPU/CUDA binaries. PyTorch core has recently shipped a new custom op registration mechanism with torch.library with the benefit being th…


CUDA MODE ā–· #off-topic (1 messages):

iron_bound: Ray casting https://frankforce.com/city-in-a-bottle-a-256-byte-raycasting-system/


CUDA MODE ā–· #llmdotc (193 messagesšŸ”„šŸ”„):

- **Debate over moving bounds checks**: Members discussed whether to move bounds checks outside kernels into asserts, expressing concerns over performance implications. One mentioned, "asserts should generally be turned off for performance," and noted potential issues with hidden dimension constraints.

- **GPT-2 reproduction blockers**: A member listed out remaining tasks blocking GPT-2 reproduction, including initialization, weight decay management, and learning rate schedules. Checkpoints save & load functionality were highlighted as essential.

- **Prompt for DataLoader refactor**: One member outlined a refactor to the DataLoader to introduce new features such as proper .bin headers, uint16 data storage, and dataset sharding. The goal is to improve data handling for large datasets like FineWeb.

- **Discussion on CI compatibility**: Members discussed ensuring compatibility with older CUDA versions for fp32.cu files, suggesting the inclusion of C11 and C++14 standards. They emphasized testing with older CUDA versions to catch issues.

- **Merge of dataset refactor**: The DataLoader refactor was merged to master, causing breaking changes. A member advised that pulling the changes would break current implementations and suggested re-running data preprocessing scripts to fix the issues.

Links mentioned:


CUDA MODE ā–· #bitnet (11 messagesšŸ”„):

  • Pre-allocate tensor to speed up unpacking: It’s suggested to avoid using torch.stack and pre-allocate a tensor with torch.empty for faster unpacking when compiled with torch.compile. An example code for unpacking from a uint8 format was shared, highlighting this approach.

  • Implement changes to torchao uint4: Vayuda recommends updating the torchao uint4 implementation to reflect the proposed pre-allocation optimization. Coffeevampir3 has acknowledged and shared a related GitHub notebook with changes.

  • Optimizing unpacking code: Mobicham points out an additional optimization, mentioning the removal of unnecessary type casting to uint8 within the unpacking function. This feedback was addressed in the updated code example by Coffeevampir3.

  • Ensuring numerical correctness and efficiency: Coffeevampir3 suggests packing and unpacking tensor data correctly by adding a shift to handle unsigned integers. The approach is verified with example adjustments to the quantization process.

  • Use opcheck() for custom ops: Vayuda brings attention to using opcheck() to ensure custom operations meet various requirements, hinting at the need for implementing necessary functions in __torch_dispatch__. They query about the existence of a function list based on use cases.


Eleuther ā–· #general (127 messagesšŸ”„šŸ”„):

  • New AI Safety Institute Opens SF Office with Higher Salaries: The UK AISI has announced the opening of an office in San Francisco with adjusted upwards salaries compared to the London office. They are actively seeking talent and collaborating with Canada, as featured in this partnership announcement.

  • Discussion on OpenAI Staff Movement and AISI Hiring: Several members speculated on whether former OpenAI aligners joined the new AISI. Interest in Canadian office openings and criteria for employment were discussed, pointing to the UK-Canada AI safety partnership details.

  • Evaluation of Dropout in Language Models: Members debated the relevance of using dropout in modern language models, with some expressing confusion over its current usage. Alternative strategies like label smoothing were also considered for overfitting mitigation.

  • PSA on California SB 1047 Impact on AI Development: A call to action was made to oppose California’s SB 1047 via legislative engagement. The bill, as described in this analysis, could severely impact open-source AI by introducing unaccountable regulatory measures with potential jail time for developers.

  • Tool and Technique Sharing for AI Model Development: Members shared resources and insights on tools such as the Flash Attention implementation in JAX and its performance boosts over naive implementations. This included links to conversation references like Flash Attention in JAX and related performance benchmarks.

Links mentioned:


Eleuther ā–· #research (72 messagesšŸ”„šŸ”„):

- **Examining Multi-Modal Training in CLIP**: A discussion focused on whether training CLIP with additional modalities like audio improves zero-shot ImageNet classification performance. [ImageBind](https://arxiv.org/abs/2305.05665) was mentioned, which shows improvements in cross-modal retrieval using combined embeddings but does not address non-emergent capability improvements.
  
- **Non-Determinism in GPT-3 at Temperature 0**: In response to a query about non-deterministic behavior in GPT-3 even at temperature 0, several papers and sources were shared, including [a paper on Mixture of Experts attacks](https://arxiv.org/abs/2402.05526) and discussions on consistent hashing overflow in distributed systems.

- **Self-Aware Simulacra Capabilities**: Users shared experiences about language models becoming aware of their fictional status and the implications this has on their subsequent behavior. The consensus is that larger models, like llama 2 70b and custom fine-tunes, can exhibit nuanced understanding and adaptability when guided through this concept gradually.

- **Positive Transfer in Multi-Modal Learning**: The potential benefits of multi-modal training for unimodal tasks were debated, with references to models like Gato and PaLM-E which showed "positive transfer" between tasks, suggesting that additional modalities might indeed enhance task performance.
  
- **Efficient MoE Training with MegaBlocks**: The [MegaBlocks](https://arxiv.org/abs/2211.15841) system was introduced, highlighting its ability to avoid token dropping by reformulating MoE computation with block-sparse operations, achieving significant training efficiency gains without compromising on model quality.

Links mentioned:


Eleuther ā–· #scaling-laws (12 messagesšŸ”„):

  • New paper proposes alternative scaling laws: A link to a new paper on arXiv was shared that proposes an observational approach to building scaling laws from ~80 publicly available models, bypassing the need for training models across many scales. The paper posits that language model performance is a function of a low-dimensional capability space where families vary in training compute efficiencies.

  • FLOP calculations for attention mechanisms debated: Detailed discussions on how to compute FLOPs for forward and backward passes of attention mechanisms included multiple references such as the PALM paper explanation and the EleutherAI cookbook. Members clarified the inclusion of QKVO projections in the FLOP calculations.

  • Optimal scaling laws for Bitnet architecture questioned: A member pondered whether the Chinchilla scaling laws would suggest a higher or lower parameter-to-token ratio for a Bitnet using significantly less compute. Another member suggested that with magically faster computation, the scaling laws would likely remain the same but allow for a larger model due to increased compute budget.

  • Sample efficiency in relation to scaling laws: Sample efficiency’s definition and measurement were questioned as being critical to understanding scaling laws. The discussion focused on how resource management should adapt as datasets grow, implying that efficient scaling is key to resource utilization.

  • Perception of difficulty in training on small datasets: A member clarified that training on a small dataset is often less effective than pre-training on large datasets and fine-tuning, hinting that it’s challenging to bridge this performance gap. This was in context to the general notion that small dataset training is ā€œnotoriously difficultā€.

Link mentioned: Observational Scaling Laws and the Predictability of Language Model Performance: Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of …


Eleuther ā–· #interpretability-general (1 messages):

  • Anthropic’s work on interpretable features excites: A member shared their enthusiasm for Anthropic’s recent work on interpretable features in transformers. They provided a link to the research publication for further reading.

Eleuther ā–· #lm-thunderdome (30 messagesšŸ”„):

  • Batch size tips for evaluation: Members discussed setting the --batch_size parameter, noting it can be set to a positive integer or ā€œautoā€ to optimize memory usage. One suggested using ā€œauto:Nā€ to dynamically re-select the maximum batch size multiple times during evaluation, which helps speed up the process.

  • Naming conventions for translated evals: A user inquired about naming conventions for machine-translated ARC challenge evaluations. Suggestions included names like arc_challenge_mt_language or mt_arc_challenge_language.

  • No dedicated channel for AI Safety events: There was an inquiry about a channel for promoting AI Safety/benchmark events. It’s confirmed that EleutherAI does not have such a dedicated channel.

  • Concerns about benchmark answer randomization: Users discussed the potential bias in multiple-choice questions (MCQs) if answer choices are not randomized. It was mentioned that for SciQ randomization doesn’t matter since choices aren’t in the context, but for MMLU, it’s relevant though currently unimplemented.

  • Concerns over medical benchmarks: A member shared their focus on how medical benchmarks could be harmful, emphasizing the importance of improved benchmark interpretation. There was excitement about upcoming related work, including updates to the Pile dataset and papers on race-based medicine.

Link mentioned: lm-evaluation-harness/lm_eval/tasks/sciq/sciq.yaml at 1710b42d52d0f327cb0eb3cb1bfbbeca992836ca Ā· EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness


Nous Research AI ā–· #off-topic (7 messages):

<ul>
    <li><strong>Temporal.io Wins Out</strong>: A member inquired about experiences with Airflow and Temporal.io, ultimately deciding to go with <strong>Temporal</strong>.</li>
    <li><strong>Manifold Research Group Updates</strong>: A member from <strong>Manifold Research Group</strong> shared their <a href="https://www.manifoldrg.com/research-log-038/">latest research log</a>, detailing progress on projects like the NEKO Project aiming to build a large-scale open-source "Generalist" Model. They are expanding their team and inviting others to join via Discord or GitHub.</li>
    <li><strong>Fictional Civilization Simulation</strong>: Links were shared to a <a href="https://websim.ai/">Websim</a> project that simulates a fictional civilization in ancient Anatolia on the Black Sea coast.</li>
    <li><strong>Course on LLMs Announced</strong>: Details of a new course, "Applying Large Language Models (LLMs) through Project-Based Learning," were shared, focusing on practical applications such as semantic movie search, RAG for food recommendations, and using LLMs for software and website creation. Interested members were encouraged to DM for more information.</li>
</ul>

Links mentioned:


mautonomy: https://fxtwitter.com/vikhyatk/status/1792512588431159480?s=19


Nous Research AI ā–· #general (172 messagesšŸ”„šŸ”„):

  • Yi-1.5 Context Versions Released: The release of the 16k and 32k context versions of Yi-1.5 was announced with a link to Hugging Face. These versions cater to different context size requirements which could influence model performance.

  • LLM Leaderboard Critique: Members criticized the usefulness of the LLM leaderboard, calling it ā€œofficially unusableā€ due to excessive noise and difficulty in filtering relevant models. The LLM leaderboard is flooded with entries, making it hard to discern quality rankings.

  • Chatbot Arena’s Objectivity Questioned: Concerns were raised about the objectivity of Chatbot Arena ratings, particularly regarding user preferences skewing towards simple, easy-to-verify tests. The platform introduced a ā€œHard Promptsā€ category to address these biases, as discussed in their blog post.

  • Microsoft’s AI Event: Members discussed the recent Microsoft event revealing the new Copilot+ PCs and the recording’s availability on YouTube. The event was anticipated but not live-streamed, leading to comments on watching replays for detailed insights.

  • Qwen MoE Model Mentioned: A member highlighted Qwen’s release of a MoE model with 14 billion total parameters but only 2.7 billion active during runtime, described as Qwen1.5-MoE-A2.7B-Chat. It boasts 1.75x faster performance during inference compared to their 7B model.

Links mentioned:


Nous Research AI ā–· #ask-about-llms (1 messages):

  • Struggling to find public evaluation for reranker benchmarking: A member expressed difficulty in finding a public evaluation for a finetuned reranker they made. They observed that other rerankers use various datasets but remained confused about specific queries and benchmarking methods.

Nous Research AI ā–· #project-obsidian (1 messages):

  • Phi-3 Vision unveiled: A member shared that Phi-3 Vision is now available, describing it as a lightweight, state-of-the-art open multimodal model with 128K token context length. It focuses on high-quality, reasoning dense data from both text and vision sources and uses supervised fine-tuning and direct preference optimization for enhanced instruction adherence and safety.
  • Explore Phi-3 Vision resources: Key resources include the Phi-3 Microsoft Blog, the Phi-3 Technical Report, and the Phi-3 Cookbook. There is also a link to Phi-3 on Azure AI Studio for practical implementation.

Link mentioned: microsoft/Phi-3-vision-128k-instruct Ā· Hugging Face: no description found


Nous Research AI ā–· #world-sim (20 messagesšŸ”„):

  • CLI Prompt Creates Whacky Images: Members enjoyed some whacky images generated using a CLI prompt, with one member loving the cat in the generated images. ASCII art was also highlighted, and the images are shared here.

  • Potential of WorldSim: Participants discussed the potential of WorldSim evolving into a global intelligence platform and a new form of collaborative thinking. One commented on its potential as the ā€œworld’s most intelligent toyā€ that could foster a new global state of mind and suggested having another discussion session to further explore these ideas.

  • Symbolic Meaning Knowledge Graph: Inspired by Tek’s mapping, members considered symbolic meaning within AI frameworks, mentioning the creation of starter knowledge graphs and viewing them as a blend of a Rorschach test and a semantic web for AI.

  • Imagining WorldSim Worlds: Members shared generated images representing imagined WorldSim worlds, drawing inspiration from diverse architectural styles and landscapes. These representations can be found here and here.

  • Obsidian Knowledge Graph Timelapse: A member shared an impressive timelapse of an Obsidian user’s knowledge graph formation, describing it as a work of art resembling a synthetic brain in action. Find the timelapse on YouTube.

Links mentioned:


LAION ā–· #general (105 messagesšŸ”„šŸ”„):

  • OpenAI halts Sky voice due to user concerns: An OpenAI status update addressed concerns about the voice choice for ChatGPT, especially the Sky voice. They are pausing the use of Sky while they work on addressing these concerns source.

  • CogVLM2 model gains attention with key features: There was enthusiasm over the release of CogVLM2, highlighting improvements such as 8K content length support and significantly better performance on benchmarks like TextVQA.

  • Mixed reactions to Copilot AI advancements: Mustafa Suleyman’s announcement about the next level of Copilot, which can ā€œsee, hear, speak and help in real timeā€, drew both intrigue and skepticism. Some users found it creepy while others joked about the potential for a backseat gaming version that criticizes everything source.

  • Scarlett Johansson voice controversy at OpenAI: Members debated the ethical and legal implications of OpenAI’s voice assistant allegedly mimicking Scarlett Johansson’s voice from the movie Her. There was consensus that contacting Johansson and replicating her voice led to significant backlash and accusations of ā€œpassing offā€.

  • Sakuga-42M dataset removal clarified: The removal of the Sakuga-42M dataset from both Hugging Face and GitHub was attributed to anti-bot measures enacted by websites due to heavy downloading. It sparked conversation about the difficulties in maintaining open datasets when subjected to high traffic source.

Links mentioned:


LAION ā–· #research (24 messagesšŸ”„):

  • CogVLM2 License Restrictions: A user warns about the restrictive terms in the new CogVLM2 license, which prohibit usage that may undermine China’s national security or public interest. The license and dispute resolution fall under Chinese jurisdiction, raising concerns about ā€œfake open sourceā€ and potential malice in the license terms. CogVLM2 License on GitHub

  • Mamba Architecture Underwhelming for Vision: A recent arXiv paper discusses the Mamba architecture with an SSM token mixer and concludes it’s not ideal for image classification tasks. The study introduces MambaOut models that perform better for image classification but highlights Mamba’s potential for long-sequence visual tasks. Mamba Paper on arXiv

  • Character-Based Embeddings Experiment: A user describes an experiment converting sentence embeddings to character strings and feeding them into a small LLM (Smol 101M) for MS COCO caption predictions. The method, implemented on a Colab T4 instance, produced ā€œkinda relatedā€ captions, suggesting potential use for cheap captioning or proof of concepts.

  • Discussion on Improved Model Papers: Members discussed various model improvements, referencing Meta’s new paper that continues their cm3leon work with enhanced tricks for scaling and efficiency. The conversation included a link to the recent paper Meta Paper on arXiv and comparisons to other advanced models like GPT-4O.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #news (18 messagesšŸ”„):

- **Anthropic scales up compute**: The [latest update from Anthropic](https://www.anthropic.com/news/reflections-on-our-responsible-scaling-policy) mentions using 4 times more compute than Opus, sparking curiosity about their new developments. One user expressed awe with "*yo what is anthropic cookin*".

- **Arena gets tougher with Hard Prompts**: [LMsysorg introduced the "Hard Prompts" category](https://fxtwitter.com/lmsysorg/status/1792625968865026427) to evaluate models on more challenging tasks, causing significant ranking shifts. For example, Llama-3-8B sees a drop in performance compared to GPT-4-0314 under these hard prompts.

- **Controversy over Llama-3-70B-Instruct as Judge**: [Llama-3-70B-Instruct](https://fxtwitter.com/lmsysorg/status/1792625977207468315) is used as the judge model to classify criteria in Arena battles, raising concerns about its effectiveness. One user argued it "*just adds noise*" rather than useful evaluation, although training might mitigate this issue.

- **Vision model Phi-3 Vision debuts**: Users confirmed that Phi-3 Vision, a somewhat larger model compared to its predecessors, is new. This was highlighted in a brief exchange about model releases and sizes. 

Links mentioned:

  • Tweet from Aidan McLau (@aidan_mclau): yo what is anthropic cookin 4Ɨ more compute than opus damm
  • Tweet from lmsys.org (@lmsysorg): Introducing "Hard Prompts" Category in Arena! In response to the community's growing interest in evaluating models on more challenging tasks, we are excited to launch the new "Hard Pr...
  • Tweet from lmsys.org (@lmsysorg): How did we classify these criteria? We adopt Llama-3-70B-Instruct as the judge model to help us label over 1 million Arena battles. Overall our analysis reveals that the quality of Arena user prompts...

Interconnects (Nathan Lambert) ā–· #ml-drama (31 messagesšŸ”„):

  • Nathan Lambert ponders writing about OpenAI drama: Nathan Lambert discusses whether to write another post about OpenAI, suggesting that there’s not much more to add beyond saying ā€œI was rightā€. They propose a title ā€œOpenAI’s Second Very Bad Not Good Weekā€.
  • Scarlett Johansson’s statement on OpenAI: A Twitter user shared a statement from Scarlett Johansson regarding OpenAI using a voice similar to hers without permission, which prompted her to take legal action. The controversy centers around OpenAI’s alleged intentional mimicry of her voice for their ā€œSkyā€ system.
  • Public reaction to Sky Johansson voice issue: Nathan Lambert and others discuss the significant implications of Johansson’s statement, comparing it to previous high-profile issues like the New York Times lawsuit against AI developments. Nathan reflects on the broader impacts and mentions the removal of similar unauthorized content featuring musicians like Drake.
  • OpenAI and the Superalignment team controversy: A Fortune article highlights that OpenAI did not fulfill its promise to allocate 20% of its computing resources to its Superalignment team, leading to resignations and accusations of prioritizing product launches over AI safety. This incident is seen as a predictable outcome by some in the discussion.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #random (30 messagesšŸ”„):

  • Nathan debates and buys domain rlhfbook.com: Nathan Lambert contemplates buying the domain rlhfbook.com and eventually purchases it for $7/year from Porkbun, considering it a bargain and easy to own.

  • Potential legal risks with new AI dataset: A member humorously warns about the potential legal repercussions of using the new AI Books4 dataset for training LLMs, referencing a similar situation with ā€œthe original pile.ā€

  • MSFT Surface AI’s slow performance due to cloud checks: There’s discussion around Microsoft’s new Surface drawing AI, which despite operating locally, experiences latency due to sending safety checks to the cloud. A member cites Ben Thompson’s write-up as a source of this information.

  • Critique of a former colleague’s integrity: Nathan Lambert criticizes a former colleague for misleading claims on her resume about working with notable individuals. He expresses a desire to confront her at a conference about this dishonesty.

Link mentioned: no title found: no description found


Interconnects (Nathan Lambert) ā–· #memes (9 messagesšŸ”„):

  • OpenAI hits pause on Scarlett Johansson-like voice: OpenAI is stopping the use of Sky, a voice AI that sounds like Scarlett Johansson, after much public attention. The company insists that Sky’s voice is not an imitation but is performed by a different actress with her own voice, as detailed in The Verge article.

  • Product decisions from blog impact: One member humorously noted that a lead product person reading their blog paid off, suggesting the impact might have influenced a product call. There was speculation about whether this led to unsubscribing, but it was dismissed with a laugh.

  • Critique of AI labs: A tweet by Liron Shapira criticized AI labs, comparing them to ā€œresponsible adultsā€ but warning, ā€œYOU GUYS DON’T KNOW WHAT YOU’RE DOING AND WE’RE ALL GONNA DIE BECAUSE OF THATā€. This sparked some reactions but no further commentary.

  • Propaganda humor: A member posted, ā€œu are not immune to propaganda (šŸ¤ž)ā€, sharing an emoji-filled response. This light-hearted banter was acknowledged with enjoyment, reflecting the casual nature of the channel.

Links mentioned:


Latent Space ā–· #ai-general-chat (78 messagesšŸ”„šŸ”„):

<ul>
  <li><strong>Memory Tuning Explained</strong>: Sharon Zhou from Lamini introduced "Memory Tuning" as a technique to enhance LLMs' accuracy in critical domains like healthcare and finance, achieving up to <em>"no hallucinations (&lt;5%)"</em>. This method outperforms LoRA and traditional fine-tuning, and Zhou promises more details and early access soon (<a href="https://x.com/realsharonzhou/status/1792578913572429878">link tweet</a>).</li>
  <li><strong>Lawyers demand OpenAI disclose AI voice origin</strong>: Lawyers for Scarlett Johansson are asking OpenAI how it developed its latest ChatGPT voice, which has been compared to Johansson's from the movie "Her." OpenAI has paused using the voice amid public debate, as users point out the tenuous legal arguments around likeness and endorsements (<a href="https://www.npr.org/2024/05/20/1252495087/openai-pulls-ai-voice-that-was-compared-to-scarlett-johansson-in-the-movie-her">NPR article</a>).</li>
  <li><strong>Scale AI raises $1B funding</strong>: Scale AI has announced $1 billion in new funding at a $13.8 billion valuation, led by Accel with participation from prominent investors like Wellington Management and Amazon. CEO Alex Wang stated this positions Scale AI to accelerate the abundance of frontier data and aims for profitability by the end of 2024 (<a href="https://fortune.com/2024/05/21/scale-ai-funding-valuation-ceo-alexandr-wang-profitability/">Fortune article</a>).</li>
  <li><strong>MS Phi 3 Models Released</strong>: Microsoft unveiled the Phi 3 models at MS Build, touting major benchmarks such as the Medium model being competitive with Llama 3 70B and GPT 3.5. The models offer context lengths up to 128K and utilize heavily filtered and synthetic data, released under the MIT license (<a href="https://x.com/reach_vb/status/1792949163249791383">link tweet</a>).</li>
  <li><strong>Emotionally Intelligent AI from Inflection</strong>: Inflection AI's new CEO announced a focus on integrating emotional and cognitive AI abilities, with their empathetic LLM "Pi" now used by over 1 million people daily. This move is aimed at helping organizations harness AI's transformative potential (<a href="https://inflection.ai/redefining-the-future-of-ai">Inflection announcement</a>).</li>
</ul>

Links mentioned:


OpenRouter (Alex Atallah) ā–· #general (76 messagesšŸ”„šŸ”„):

  • GPT-32k faces issues with rate limits: Users reported encountering token rate limit issues with Azure’s GPT-32k model. One user stated, ā€œRequests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded the token rate limit.ā€

  • Phi-3 models discussed for robust performance: Members discussed Phi-3-medium-4k-instruct and Phi-3-vision-128k-instruct for high-quality, reasoning-dense data handling. Both models incorporate supervised fine-tuning and direct preference optimization for enhanced performance.

  • New interaction methods with LLMs: One user shared a thread on a new way of interacting with LLMs using ā€œAction Commands.ā€ They sought feedback from others to see if anyone had similar experiences.

  • Handling verbosity in models: Members discussed handling verbosity in models like Wizard8x22. One suggested lowering the repetition penalty to reduce verbosity, while another noted that different models might be better suited for specific tasks.

  • Discount request and credit issues for non-profits: A user had issues with Error 400 related to billing address and requested discounts for non-profits. An admin explained that OpenRouter passes bulk discounts down to users and keeps a 20% margin.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #general (43 messagesšŸ”„):

  • M3 Max shines for LLMs: A user praised the M3 Max for its performance, stating, ā€œit’s amazingā€ and suggested going for 96GB RAM for better use with LLMs.
  • Git Patch Merge Struggles: A user encountered issues with merging a Git patch themselves and discussed updating a specific file for testing. They noted, ā€œusing git is a bit tricky as i pushed it to my repo and not the upstream oneā€.
  • Unsloth and ROCm Compatibility Issues: Another user reported compatibility issues with new unsloth updates on ROCm due to dependencies on xformers. Despite this, ā€œunsloth gradient_checkpointing worked tho and gave a good memory improvementā€.
  • Syntax Error Troubleshooting in Transformers Library: Users collaborated to solve a ValidationError and AttributeError related to CohereTokenizer in the transformers library. They explored alternatives like CohereTokenizerFast and AutoTokenizer as solutions link to GitHub pull request.
  • Seeking Python Library for Faster STT -> LLM -> SST: A user inquired if anyone remembered the name of a Python library designed for faster speech-to-text to LLM to speech synthesis. No specific answer was provided in the logs.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #axolotl-dev (8 messagesšŸ”„):

  • Training Grok with Grok-1 PyTorch Version: A user shared their initiative to train Grok using the Grok-1 PyTorch version and sought opinions on the choice. Another user expressed approval and mentioned upcoming torchtune integration with Axolotl.
  • Torchtune Integration Lightens up: There was speculation whether torchtune would replace or be an option beside the Hugging Face backend. Some users had strong opinions, with one suggesting to ā€œDismantle hf.ā€
  • Compute Power Check-in: Interest piqued when someone asked about the compute power being used for this training endeavor. The response was Mi300x, leading to curiosity about user satisfaction and comparisons with H100s.

Link mentioned: hpcai-tech/grok-1 Ā· Hugging Face: no description found


OpenAccess AI Collective (axolotl) ā–· #general-help (15 messagesšŸ”„):

  • Struggles with Mistral 7B finetuning: A member is having issues finetuning Mistral 7B on their data as the model mixes up information despite the loss decreasing. They shared a configuration link and expressed confusion over why the model is not learning properly.
  • Full finetuning vs. LoRA: Another member recommended trying a full finetuning or utilizing Retrieval-Augmented Generation (RAG) for better memory retention in the model, suggesting that LoRA might be more effective for style rather than content retention.
  • Inference Configuration Issues: There was a discussion about ensuring the chat template is added manually during inference since the current setup may not include it automatically. A member shared a link to potential tokenization mismatch issues here.
  • Config Sharing for Troubleshooting: A participant was asked to share their configuration to assist others in understanding the setup and providing better guidance.
  • Next Stable Release Inquiry: A user inquired about the timing of the next stable major release for axolotl.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #axolotl-phorm-bot (5 messages):

  • Running into OOM issues despite 24GB VRAM: A user faced Out-of-Memory (OOM) errors during training despite having 24GB of VRAM. They shared their configuration and various settings they tried without success.
  • Phorm suggests solutions for OOM problems: To address the OOM issues, Phorm suggested increasing gradient accumulation steps, enabling mixed precision training, using model parallelism, reducing batch size, and leveraging DeepSpeed ZeRO optimization among other methods. Detailed configurations provided include mixed_precision: ā€˜fp16’ and zero_optimization with stage: 3.
  • DeepSpeed and ZeRO Optimization Strategies: By utilizing DeepSpeed’s ZeRO-2 and ZeRO-3 stages, significant memory footprint reduction can be achieved. Example configs for offloading optimizer and parameter states to the CPU were shared.
  • Mixed Strategies for Managing Memory: Additional methods include CPU and Disk Offloading, utilizing efficient models and operations, memory profiling tools like torch.cuda.memory_summary(), and dynamic padding for variable-length sequences. These techniques can help train larger models by optimal memory management.
  • Phorm.ai for more details: Users are advised to check back for more details on Phorm.ai for further information and updates regarding solutions to prevent OOM errors.

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.


LlamaIndex ā–· #announcements (1 messages):

  • Exciting Webinar on Open-Source Long-Term Memory: A new webinar is scheduled for Thursday at 9am PT, featuring the authors of memary – a fully open-source reference implementation for long-term memory in autonomous agents. Participants can join by signing up here.

  • Deeper Dive into memary: The webinar will include an in-depth discussion and Q&A session about memary, covering its functionalities like extracting agent inputs/responses into a knowledge graph using LLMs and neo4j, utilizing a memory stream for interaction timelines, and ranking popular entities.

Link mentioned: LlamaIndex Webinar: Open-Source Longterm Memory for Autonomous Agents Ā· Zoom Ā· Luma: In this webinar we’re excited to host the authors of memary - a fully open-source reference implementation for long-term memory in autonomous agents šŸ§ šŸ•øļø In…


LlamaIndex ā–· #blog (6 messages):

  • New Webinar on Memary for Autonomous Agents: This Thursday at 9am PT, there will be a deep dive session featuring the authors of memary, a fully open-source reference implementation for long-term memory in autonomous agents. Join the webinar to learn more.
  • PizzerIA Talk on Advanced RAG Techniques: Catch @hexapode at PizzerIA in Paris this Thursday discussing advanced retrieval-augmented generation techniques. Event details are available for those interested.
  • First In-Person Meetup in San Francisco: Next Tuesday, meet the LlamaIndex team and hear from @jerryjliu0, Tryolabs, and ActiveLoop at their SF HQ. RSVP here to join and learn about advancing RAG systems beyond vanilla setups.
  • Upgraded TypeScript Docs for LlamaIndex: LlamaIndex.TS docs got an upgrade including new starter tutorials and step-by-step guides for building agents. Check out the updated documentation.
  • Complex Document RAG with GPT-4o: GPT-4o is now natively integrated with LlamaParse to handle complex PDFs and slide decks using multimodal capabilities. See more details in the announcement.
  • Secure LLM-Generated Code in Azure Sandboxes: Launching today, securely run LLM-generated code in a sandbox using Azure Container Apps dynamic sessions. More information is available in the launch details.

Links mentioned:

  • RSVP to GenAI Summit Pre-Game: Why RAG Is Not Enough? | Partiful: Note: This is an in-person meetup @LlamaIndex HQ in SF! Stop by our meetup to learn about latest innovations in building production-grade retrieval augmented generation engines for your company from ...
  • What is LlamaIndex? | LlamaIndex.TS: LlamaIndex is a framework for building LLM-powered applications. LlamaIndex helps you ingest, structure, and access private or domain-specific data. It's available as a Python package and in Type...
  • Getting started | LlamaIndex.TS: In this guide we'll walk you through the process of building an Agent in JavaScript using the LlamaIndex.TS library, starting from nothing and adding complexity in stages.

LlamaIndex ā–· #general (45 messagesšŸ”„):

  • Tackling Document Hashing for Pinecone: A member sought advice on how to calculate a document or node hash to prevent duplicate entries in Pinecone. They explained their use case involving overlaps in web-scraped content and PDF documents.

  • Changing OpenAI Agent’s System Prompt: One member asked about altering the system prompt for an OpenAI agent without creating a new object. Another suggested using the chat_agent.agent_worker.prefix_messages attribute.

  • Running gguf Format Models with LlamaIndex: A query was raised about using LlamaIndex with a gguf format model from Hugging Face without OpenAI. It was clarified that LlamaIndex can work by loading the model and tokenizer with HuggingFaceLLM.

  • Using Airtable vs. Excel/Sqlite: The advantages of Airtable over Excel and Sqlite were discussed, highlighting Airtable’s integration with Langchain for direct function usage. A link to the Langchain Airtable integration documentation was also shared.

  • Handling Empty Nodes in VectorStoreIndex: A discussion focused on resolving empty node issues when loading indexes using VectorStoreIndex.from_vector_store. It was advised to ensure proper loading of documents into the docstore from the database.

Link mentioned: Airtable | šŸ¦œļøšŸ”— LangChain: * Get your API key here.


AI Stack Devs (Yoko Li) ā–· #ai-companion (7 messages):

  • AI Waifus Spark Quick Banter: A user asserts, ā€œAI waifus save lives!ā€, sparking playful engagement with another user responding, ā€œJust monikaā€.
  • 3D Character Chatbots Project Tease: A user mentions their work on 3D character chatbots at 4Wall AI and directs others to check out a teaser in another channel.
  • Inflection AI Plans to Embed Emotional AI: A user shares a link from VentureBeat about Inflection AI’s plans to integrate emotional AI in business bots, hinting at the possibility of AI waifus understanding and processing emotions.
  • Confusion Over Character Reference: In response to ā€œJust monikaā€, another user asks, ā€œWho dat?ā€ and receives a GIF link from Tenor.com to clarify the reference.

Link mentioned: Ddlc Doki Doki Literature Club GIF - Ddlc Doki Doki Literature Club Just Monika - Discover & Share GIFs: Click to view the GIF


AI Stack Devs (Yoko Li) ā–· #ai-town-discuss (18 messagesšŸ”„):

  • AI Town Conversations Lack Context: A user reported that characters ā€œdo not react at all to what the other character said,ā€ leading to repetitive greetings like ā€œHi! It’s so great to finally talk to you!ā€ Another user suggested there’s a vector memory system that retrieves past conversations but might be affected by settings or configurations.

  • Adjust Convex Settings for Fewer Memory Fetches: To address issues of empty bubbles in AI Town conversations, users were advised to adjust values in convex/constants.ts, particularly changing the NUM_MEMORIES_TO_SEARCH from its default value of 3 to 1.

  • Exporting AI Town Conversations from SQLite: One user struggled to export conversation data due to schema misunderstandings. Another provided a useful SQL query and recommended using DB Browser for SQLite, while a GitHub repo (townplayer) and relevant Twitter thread were shared for more advanced queries and tools related to AI Town data extraction.

Links mentioned:


LangChain AI ā–· #general (18 messagesšŸ”„):

  • Structured Data in LLMs Clarified: A member asked if LLMs handle structured data differently from unstructured text. Another member explained that LLMs process both structured (like JSON) and unstructured text similarly but can be fine-tuned for specific structures, mentioning examples like Hermes 2 Pro - Mistral 7B and OpenAI’s chatML format.

  • Langchain Package Differences Explained: A member asked about the difference between langchain and langchain_community. A response indicated that langchain-core contains base abstractions with lightweight dependencies, while popular integrations are in separate packages like langchain-openai, and less common ones are in langchain-community architecture.

  • Sequential Chains in LangChain: A member shared code illustrating the setup of a sequential chain where the output from one chain serves as the input to another. This was backed by a YouTube tutorial demonstrating this concept.

  • Handling Concurrent Requests in LangServe: Another member reported trouble with handling multiple concurrent requests in langserve. There were no responses to this issue yet.

  • Securing LLM Responses for Sensitive Data: A new user asked if it’s possible to secure LLM responses in RAG applications by hiding sensitive data such as customer names or card numbers. No solution was provided in the discussion.

Links mentioned:


LangChain AI ā–· #share-your-work (3 messages):

  • Launch of Affiliate Program for Easy Folders: A new affiliate program has been launched for the ChatGPT Chrome Extension - Easy Folders. Affiliates earn a 25% commission, and customers get a 10% discount. More details can be found here, and the extension can be downloaded from the Chrome Web Store.
  • Easy Folders Extension Criticized and Praised: Users gave mixed reviews on the Easy Folders extension. One criticized it for adding clutter and slow performance, while another user expressed satisfaction before losing their saved folders and chats.
  • Upgrading from LangChain to LangGraph: A user shared a Medium blog post about transitioning legacy LangChain agents to the new LangGraph platform. Interested users can read more about it here.
  • Query PDF Files with Upstage AI and LangChain: A blog post was shared detailing how to create a PDF query assistant using Upstage AI solar models integrated with LangChain. Check out the blog post here.

Links mentioned:


LangChain AI ā–· #tutorials (1 messages):

bayraktar47: <@1043024658812895333>


OpenInterpreter ā–· #general (13 messagesšŸ”„):

  • OS1 Reference in ā€œHerā€ movie sparks realization: A user shared an interesting observation that Open Interpreter O1 is a nod to OS1 from the movie ā€œHerā€. This revelation sparked curiosity among the members.

  • Seeking help for DevOps AI module: A junior full-stack DevOps engineer is looking to build a lite O1 AI to assist with DevOps tools, configuration terminals, and cloud computing. The goal is to provide these resources through discreet earphones for unobtrusive AI assistance in various work environments.

  • Installation and development setup queries: Members are discussing how Open Interpreter accesses the file system and reviews project structures. Specific questions are being raised regarding more efficient development setups.

  • Daily uses and problem-solving with Open Interpreter: In response to an open question about daily uses and complex problem-solving, multiple users expressed interest in documented success stories and shared their specific use cases. Examples include seamless referencing between devices, querying context-specific data while coding, and summarizing research papers.

  • Integrating Text-to-Speech with Open Interpreter: A member sought advice on combining the Text-to-Speech engine and voice recognition with Open Interpreter. They were directed to the relevant GitHub repository and encouraged to explore additional support channels.

Link mentioned: GitHub - OpenInterpreter/01: The open-source language model computer: The open-source language model computer. Contribute to OpenInterpreter/01 development by creating an account on GitHub.


OpenInterpreter ā–· #O1 (3 messages):

  • Seeking Steps to Connect Laptop to Light App: A member requested guidance on connecting their laptop to a light app, noting that the steps were not listed in the guitar. The details of the app or specific connections were not provided in the message.
  • Junior DevOps Engineer Needs Help with Lite 01 Project: A junior full-stack DevOps engineer expressed the need for assistance in building lite 01, aiming to simplify daily tasks and benefit others in similar roles. They are developing an AI module for providing resources and discreet assistance, seeking help to create an open interpreter lite 01, as pre-orders won’t be available until next fall.
  • Request for Guidance on Assembling 3D Printed Parts: The same junior DevOps engineer showed interest in learning how to assemble parts and a 3D printed case for the open interpreter lite 01. They asked for tips or guidance on the assembly process from someone who had already completed it.

OpenInterpreter ā–· #ai-content (1 messages):

ashthescholar.: missed opportunity to make it moo


Cohere ā–· #general (15 messagesšŸ”„):

  • Codegen-350M-mono in Transformers.js: Members discussed using the Codegen-350M-mono model with Huggingface’s Transformers.js. A link to Xenova’s codegen-350M-mono with ONNX weights was shared as a solution for compatibility issues.
  • CommandR+ for translation: Someone inquired about using CommandR+ for translation, mentioning it works well for Korean to English. They were directed to the Chat API documentation for sample code and further details.

Links mentioned:


Datasette - LLM (@SimonW) ā–· #ai (10 messagesšŸ”„):

  • Sky Voice Mode Paused Amidst Controversy: OpenAI has paused the use of the Sky voice in their GPT-4o demo, amid allegations of imitating Scarlett Johansson. A user noted that Sky has been replaced with Juniper, another feminine voice, while Scarlett Johansson issued a statement addressing this issue.

  • GPT-4o Integrates Multi-Modal Model: According to a user, prior versions of GPT used different models to handle audio and text, resulting in limitations like inability to recognize tone or background noises. GPT-4o now uses a single model for text, vision, and audio, potentially increasing emotional depth but also introducing complexities and potential drawbacks.

  • Resilience Over Perfection: A user referenced Stainslaw Lem’s short story to argue that perfect reliability in complex systems is unattainable. Instead, the focus should be on building resilient systems capable of responding to inevitable failures.

  • Legal Complications in Voice Cloning: Users discussed the legal and ethical implications of voice cloning, especially in light of Scarlett Johansson’s concerns. One user criticized reliance on legislation for protecting likeness, highlighting limitations in enforcement and existing open-source voice cloning technologies.

  • Qualcomm Snapdragon Dev Kit Launch: A member shared enthusiasm for Qualcomm’s new $899.99 Snapdragon Dev Kit for Windows. This dev kit offers significant power with its 4.6 TFLOP GPU, 32GB RAM, and 512GB storage, packaged similarly to Apple’s mini desktop.

Links mentioned:


DiscoResearch ā–· #general (6 messages):

  • Supervised Fine-Tuning (SFT) vs Preference Optimization: A member asked for clarification on the differences between Supervised Fine-Tuning (SFT) and Preference Optimization. They suggested that while SFT pushes up the probability distribution of the SFT dataset, preference optimization adjusts both undesired and desired probabilities, questioning why SFT is necessary.

  • Phi3 Vision impresses with its efficiency: A member expressed their admiration for Phi3 Vision, a 4.2 billion parameter model, praising its performance in low-latency/live inference on image streams. They shared a post on X discussing the potential applications in robotics.

  • Comparing Phi3 Vision and Moondream2: Another member encouraged the use of Moondream2 on the same image as Phi3 Vision to compare results. Feedback indicated that Moondream2 performs well and has reduced hallucinations, though some datasets remain problematic.

  • Microsoft releases new models: Microsoft released 7 billion and 14 billion parameter models. Notably, only the instruct versions are available, as observed by a community member.

Links mentioned:


Mozilla AI ā–· #llamafile (2 messages):

  • Alex introduces sqlite-vec to the community: Alex shared his new project sqlite-vec, a SQLite extension for vector search, and mentioned it might be integrated with Llamafile for features like RAG, memory, semantic search, etc. ā€œIt’s written entirely in C and should work with cosmopolitan, though haven’t tested myself yet.ā€

  • Detailed project description: Alex provided a detailed blog post explaining the potential and progress of sqlite-vec, which aims to replace sqlite-vss and offer a more performant and embeddable solution. The extension is still in beta but available for early trials, with distributions in C/C++ projects and packages on pip/npm/gem platforms.

  • Open for collaboration and support: Alex expressed his willingness to support and help anyone get started with sqlite-vec and address any issues users might encounter during the beta phase. ā€œMore coming soon, but happy to help anyone here get started or get around any issues!ā€

  • Community excitement: A member welcomed Alex and expressed enthusiasm about the project’s potential integrations with Llamafile. ā€œsuper excited about your project, and about the possibilities presented by integrating it with llamafile.ā€

Links mentioned:


LLM Perf Enthusiasts AI ā–· #gpt4 (1 messages):

  • GPT-4o Excels in Legal Reasoning: A member shared their experience running internal evaluation tests on GPT-4o for complex legal reasoning tasks. They reported a ā€œnon-trivial improvementā€ from GPT-4 and GPT-4-Turbo, and linked a LinkedIn post about the release of GPT-4o.

MLOps @Chipro ā–· #general-ml (1 messages):

  • Manifold Research Group seeks collaborators: A representative from Manifold Research Group introduced their OS R&D Lab focused on generalist models and AI Agents. They invited interested individuals to learn more or join the team through their Discord and Github.
  • NEKO Project aims high with open-source generalist models: The NEKO Project is building the first large-scale, open-source generalist model trained on various modalities, including control and robotics tasks. More information can be found in their detailed project document.

Link mentioned: Research log #038: Welcome to Research Log #038! We document weekly research progress across the various initiatives in the Manifold Research Group, and highlight breakthroughs from the broader research community we thi…


{% else %}

The full channel by channel breakdowns are now truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}