**ZZzzzzz.**

AI News for 7/9/2024-7/10/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (463 channels, and 2339 messages) for you. Estimated reading time saved (at 200wpm): 250 minutes. You can now tag @smol_ai for AINews discussions!

Yesterday was busy busy, today wasn’t. A smattering of tiny morsels, more entertaining than anything:

Meta: we are in the final stages of a major upgrade to reddit comments, following the hallucination conversation from yesterday.


{% if medium == ā€˜web’ %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

Yi AI Model Updates and Integrations

  • Yi model gaining popularity on GitHub: @01AI_Yi shared that the Yi model now has 7.4K stars and 454 forks on GitHub, with many amazing projects being built using their LLMs. They encourage exploring the Yi models and sharing work with them.
  • Potential integration with Axolotl: @cognitivecompai suggested that Yi should integrate Axolotl’s pregeneration capabilities. In a separate tweet, @cognitivecompai mentioned it would be really cool to integrate Axolotl’s preprocessing features as well.

Cognitive Computing AI’s Tweets and Discussions

  • Household/small business AI appliance concept: @cognitivecompai pointed out that the concept of a household/small business AI appliance is made possible by AMD technologies.
  • Scribbled out content in a tweet: @cognitivecompai asked @victormustar about something that was scribbled out in a tweet.

AI and Human Cognition

  • System 2 distillation in humans: @jaseweston explained that in humans, ā€œSystem 2 distillationā€ methods are called automaticity, procedural memory, or informally to make something ā€œsecond natureā€.

Miscellaneous

  • Phage x host ML prediction review: @elicitorg retweeted @yawnxyz, who mentioned potentially doing a review on all phage x host ML prediction efforts with @elicitorg and using some AI and spreadsheets.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Model Releases and Developments

AI Applications and Use Cases

AI Ethics and Governance


AI Discord Recap

A summary of Summaries of Summaries

Claude 3 Sonnet

1. New Language Model Releases

  • Ghost 8B Beta Debuts with Multilingual Prowess: The Ghost 8B Beta large language model promises robust multilingual capabilities and cost-efficiency, available in 8k and 128k versions, with comprehensive documentation detailing its architecture and techniques.
    • Excitement surrounds the model’s debut, though some express concern over its knowledge capabilities compared to more specialized models.
  • Anole: First Open-Source Autoregressive LMM: Anole is introduced as the first open-source, autoregressive native Large Multimodal Model (LMM), built on Chameleon by @AIatMeta and promising multimodal generation capabilities.
    • However, efforts to fine-tune Anole to reintroduce image capabilities removed from Chameleon have faced backlash, with concerns over undoing explicit design choices.

2. AI Model Benchmarking and Evaluation

  • Rapid Theorem Proving Progress Showcased: HarmonicMath announced achieving a remarkable 90% state-of-the-art on the challenging MiniF2F benchmark, a significant leap from their 83% result just a month prior, as shared in their update.
    • The AI community lauded the blistering pace of progress in theorem proving, considering the benchmark’s simpler version stood at only 50% earlier this year.
  • Scrutinizing VLM Performance on Basic Tasks: A new paper highlights state-of-the-art Vision Language Models (VLMs) like GPT-4o and Gemini 1.5 Pro struggling with rudimentary visual tasks such as identifying overlapping shapes and object counting, despite high scores on conventional benchmarks.
    • The findings, detailed in this study, raise concerns about the real-world applicability of VLMs and question the validity of existing evaluation metrics.

3. Synthetic Data Generation and Feedback Loops

  • Preventing Model Collapse with Reinforced Synthetic Data: New research explores using feedback on synthesized data to prevent model collapse in large language models, as detailed in this paper.
    • The study illustrates how naĆÆve synthetic data usage leads to performance degradation, advocating for feedback-augmented synthesized data to maintain high performance on practical tasks like matrix eigenvalue computation and news summarization.
  • Exponential Integrator Accelerates Diffusion Sampling: A member sought clarification on the term ā€œmarginal distributions as pĢ‚āˆ—_tā€ from the paper FAST SAMPLING OF DIFFUSION MODELS WITH EXPONENTIAL INTEGRATOR, which proposes a method to accelerate the notoriously slow sampling process of diffusion models.
    • The paper’s approach promises to enhance the sampling efficiency of diffusion models while preserving their capability to generate high-fidelity samples across various generative modeling tasks.

Claude 3.5 Sonnet

1. Anole: First Open-Source Auto-Regressive LMM

  • Anole’s Arrival: A Multimodal Marvel: Anole, the first open-source, autoregressive Large Multimodal Model (LMM), was introduced by @AIatMeta, built on the Chameleon architecture.
    • This release sparked discussions on the potential for open-source multimodal models, with some expressing concerns about reintroducing image capabilities that were previously removed from Chameleon, as noted in a critical tweet.
  • Technical Tribulations: GPU Grappling: Users attempting to run Anole across multiple GPUs encountered CUDA out-of-memory errors, highlighting scaling challenges for the new model.
    • A GitHub issue was opened to discuss potential modifications that could support running Anole on multiple GPUs, indicating community efforts to improve the model’s accessibility and performance.

2. xAI’s Ambitious H100 Cluster Expansion

  • Elon’s Exascale Endeavor: Elon Musk announced that xAI has contracted 24,000 H100 GPUs from Oracle and is building a massive 100,000 H100 system for AI training.
    • Musk emphasized the need for internal control over AI infrastructure to maintain competitive speed and efficiency, positioning xAI’s cluster to potentially become the world’s most powerful.
  • Grok’s Growth: From Training to Release: xAI’s Grok 2 model is currently being trained on the newly acquired H100 cluster, with Musk indicating it’s undergoing finetuning and bug fixes.
    • The release of Grok 2 is anticipated for next month, showcasing the rapid development cycle enabled by xAI’s expanding computational resources.

3. AMD’s Strategic AI Acquisition of Silo AI

  • Chipmaker’s AI Chess Move: AMD announced its acquisition of Finnish AI start-up Silo AI for $665 million, aiming to expand its AI services and compete more effectively with Nvidia.
    • This all-cash deal marks one of the largest acquisitions of a privately held AI startup in Europe since Google bought DeepMind for around Ā£400 million in 2014, signaling AMD’s serious commitment to AI development.
  • Silo’s Software Synergy: Silo AI’s 300-member team will leverage AMD’s software tools to build custom large language models (LLMs) for chatbots and other AI applications.
    • AMD’s Vamsi Boppana highlighted that this acquisition will accelerate customer engagements and enhance AMD’s own AI technology stack, potentially reshaping the competitive landscape in AI hardware and software integration.

4. GitHub Copilot Copyright Lawsuit Update

  • Legal Leniency for AI Code Generation: A California district court partially dismissed a copyright lawsuit against Microsoft’s GitHub Copilot and OpenAI’s Codex, potentially setting a precedent for AI tools trained on copyrighted data.
    • The court’s decision suggests that AI systems may be in the clear as long as they don’t make exact copies, which could have far-reaching implications for the development and deployment of AI coding assistants.
  • Copilot’s Continuing Controversy: While significant portions of the lawsuit were dismissed, concerns about AI tools suggesting code snippets without proper licensing remain a topic of debate in the developer community.
    • This ruling may influence future cases and discussions on intellectual property rights in the age of AI-assisted coding, balancing innovation with copyright protection.

Claude 3 Opus

1. Ghost 8B Beta Launch

  • Multilingual Mastery: Ghost 8B Beta debuts with robust multilingual functionality in 8k and 128k context length versions. Try it on Hugging Face.
  • Cost-Effective Conversing: A key goal of Ghost 8B Beta is providing cost-efficient large language model performance compared to alternatives.
    • By focusing on multilingual support and knowledge capabilities while keeping costs down, Ghost 8B Beta aims to democratize access to powerful conversational AI.

2. Llama 3 Training Discussions

  • Swedish Llama Sparks Debate: Discussions arose around using the Swedish language Llama 3 model in Unsloth AI, which was trained on the LUMI supercomputer using 42 Labs data.
    • Some suggested using the base model for training and the instruct model for tasks like translation, while others noted inference speed issues with Llama 3 on platforms like Google Colab.
  • Llama Leaps to LM Studio: To overcome Llama 3 inference speed challenges, LM Studio was recommended as an alternative to Google Colab for better performance.
    • Users also inquired about running Llama 3 inference locally on Mac devices, with suggestions to search for quantized versions on LM Studio that fit the system specs.

3. Model Saving Stumbles

  • GGUF Gaffes Cause Grief: Users encountered critical errors when attempting to save models in GGUF format due to missing llama-quantize or quantize files in the llama.cpp library.
    • These errors led to runtime failures during save operations, prompting discussions on potential workarounds and fixes for the GGUF conversion process.
  • Embedding Training Trials: Questions arose about manually training new token embeddings while freezing pre-trained ones to ensure accurate predictions for special tokens.
    • Approaches like manual backpropagation for specific modules were considered to avoid re-training all embeddings from scratch.

4. Model Showdown: Gemini vs DeepSeek

  • Coders Choose Their Champion: Discussions compared the DeepSeek Chat and DeepSeek Coder models, with some favoring the new DeepSeek Coder v2 for coding assistance tasks.
    • Users reported satisfactory results using DeepSeek Coder v2 lite for several weeks as a coding assistant.
  • Flash or Pro? Pricing Perplexity: Confusion arose over pricing comparisons between Claude 3 Haiku and Gemini 1.5 Flash/Pro models, with an AI incorrectly stating Haiku as cheaper.
    • Further mix-ups occurred when the AI compared Haiku with Gemini 1.5 Pro instead of the comparable Flash model, highlighting the need for clearer pricing communication.

5. CodeGeeX4 Cracks the Code

  • CodeGeeX4 Conquers Competitors: The new CodeGeeX4 model is considered superior to DeepSeek v2 for various code generation tasks, with a version now available on Hugging Face.
    • Comparisons with CodeQwen further reinforced CodeGeeX4’s leading capabilities in the coding assistance domain.
  • GLM4 Gears Up CodeGeeX4: Significant community excitement followed the merging of GLM4 into llama.cpp library.
    • As CodeGeeX4 is based on GLM4, this integration is expected to further enhance the model’s code generation performance in future updates.

GPT4T (gpt-4-turbo-2024-04-09)

1. Multilingual LLMs

  • Ghost 8B Beta Makes Multilingual Splash: Ghost 8B Beta’s debut promises robust multilingual functionality and cost efficiency wrapped in 8k and 128k versions. Experience it at Hugging Face.
    • For a deeper look into Ghost 8B Beta, consulting the official documentation reveals in-depth knowledge on model architecture and techniques.
  • Llama 3 Model Training Sparks Debate: Discussion of Llama 3 model usage with Unsloth AI pivots to the Swedish version and its DeepAI deployment, fueled by 42 Labs data.
    • Inference speed woes on Google Colab lead to a shift towards LM Studio for enhanced performance with the Llama 3 model.

2. Model Fine-Tuning and Optimization

  • GPTs Refuse Additional Training? Here’s Why: A perplexing issue arises as GPTs agents cease to learn after initial training, prompting clarification regarding knowledge file uploads that aid but do not update the agent’s base knowledge.
    • Additional learning rate inquiries prompt consensus around the cosine scheduler for fine-tuning AI models like Qwen2-1.5b.
  • Stuck With GGUF? Frustration Mounts Over Errors: AI engineers struggle with GGUF model conversions as critical errors crop up due to missing llama-quantize during save operations.
    • Encountering problems when saving models in GGUF format redirects discussions to error resolution involving downgrading to specific xformers library versions.

3. AI Hardware and Infrastructure

  • TPUs Takeoff on Hugging Face: Google TPUs now bolster the Hugging Face platform, enabling users to build and train Generative AI models with varying memory options and clear-cut pricing.
    • Spaces and Inference Endpoints are buzzing as they integrate TPUs, flagged by @_philschmid on Twitter.
  • Elon’s Exuberant Expansion: xAI sets a brisk pace, snagging 24k H100s for their AI cluster, detailed in Elon Musk’s tweet.
    • The AI leader’s zeal is evident as he plans a colossal 100k H100 setup, eyeing the summit of computational supremacy.

4. AI Legal and Ethical Issues

  • GitHub Copilot Lawsuit Update: Developers’ claims against GitHub Copilot largely dismissed, leaving only two allegations remaining.
    • Initial claims involved Copilot allegedly suggesting code snippets without proper licensing, raising intellectual property concerns.
  • Copywrong No More? Court’s Copilot Copyright Call: A pivotal California court ruling may signal smoother skies for AI development, as significant parts of a copyright lawsuit against Microsoft’s GitHub Copilot and OpenAI’s Codex were dismissed.
    • The court’s decision could be a harbinger for AI tools trained on copyrighted data, though full implications in the space of intellectual property rights are still brewing.

5. AI Community Initiatives

  • Hackathon Hoopla: AGI’s Weekend Code Rally: A hackathon is being hosted by AGI House this Saturday 7/13, featuring collaborations with @togethercompute, @SambaNovaAI, and others, with a call for participants to apply here.
    • Llama-Agents recently launched has already surpassed 1100 stars on GitHub, with @MervinPraison providing a thorough walkthrough available on YouTube.
  • Perplexity Partners Power-Up: Perplexity AI announced teaming with Amazon Web Services (AWS) to feature Perplexity Enterprise Pro for AWS clientele, promising to streamline their AI toolkit.
    • AWS customers are set to benefit from enhanced AI support, following the expanded availability of Perplexity Enterprise Pro via the AWS Marketplace.

GPT4O (gpt-4o-2024-05-13)

$PLSDELETTHIS{openaiSummaryO}


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

  • Ghost 8B Beta Makes Multilingual Splash: Ghost 8B Beta’s debut promises robust multilingual functionality and cost efficiency wrapped in 8k and 128k versions. Experience it at Hugging Face.
    • For a deeper look into Ghost 8B Beta, consulting the official documentation reveals in-depth knowledge on model architecture and techniques.
  • Llama 3 Model Training Sparks Debate: Discussion of Llama 3 model usage with Unsloth AI pivots to the Swedish version and its DeepAI deployment, fueled by 42 Labs data.
    • Inference speed woes on Google Colab lead to a shift towards LM Studio for enhanced performance with the Llama 3 model.
  • Stuck With GGUF? Frustration Mounts Over Errors: AI engineers struggle with GGUF model conversions as critical errors crop up due to missing llama-quantize during save operations.
    • Encountering problems when saving models in GGUF format redirects discussions to error resolution involving downgrading to specific xformers library versions.
  • GPTs Refuse Additional Training? Here’s Why: A perplexing issue arises as GPTs agents cease to learn after initial training, prompting clarification regarding knowledge file uploads that aid but do not update the agent’s base knowledge.
    • Additional learning rate inquiries prompt consensus around the cosine scheduler for fine-tuning AI models like Qwen2-1.5b.
  • Token Training Troubles Loom Large: The AI community faces a challenging quandary over new token embeddings, which may fall short without comprehensive pretraining efforts.
    • Despite the dangers of inadequate embedding, manual backpropagation might be a stopgap to refine predictions for new special tokens.

HuggingFace Discord

  • TPUs Takeoff on Hugging Face: Google TPUs now bolster the Hugging Face platform, enabling users to build and train Generative AI models with varying memory options and clear-cut pricing.
    • Spaces and Inference Endpoints are buzzing as they integrate TPUs, flagged by @_philschmid on Twitter.
  • Transformers Tackle Code: Transformers are not just for NLP anymore, as community members exchange tips on debugging and coding using Python tricks and tokenizer tweaks.
    • GitHub links and videos on running AI locally have members trading practices for efficient model hosting.
  • Grasping Knowledge Graphs: A tutorial livestream shared strategies on enhancing natural language querying through Knowledge Graphs, supported by Langchain and Neo4j.
    • Interest spiked as community members discussed the tutorial’s approaches to Video Game Sales data, found on this YouTube channel.
  • Narratives Navigated by AI: A compelling discourse surfaces as a Medium article delves into the ways generative AI is morphing the art of storytelling.
    • Read here for a peek into how authors and audiences are adapting.
  • Qdurllm Splashes onto the Scene: A new AI-powered search engine, Qdurllm, gains traction with a demo that stitches together Qdrant and Sentence Transformers for enhanced search functionality.
    • Grab a look and join the buzz by contributing your thoughts on its GitHub repository.

CUDA MODE Discord

  • Shared Mem’s New Heights & Hackathon Hype: GPUs with compute capability 8.9 can manage up to 99 KB of shared memory per block, as shown in a kernel launch example.
    • Hackathon enthusiasts are prepping for a CUDA-centric event; excitement brews around team formations and the perks of attending, highlighted in the event page.
  • AMD Bags Silo for AI Supremacy: AMD’s acquisition of Silo AI for $665mn is a strategic move to sharpen its AI faculties and clash with Nvidia.
    • The deal marks a significant event for European AI start-up ecosystems, drawing parallels to Google’s acquisition of DeepMind and raising the bar for future transactions.
  • Remote Roles & Framework Fervor: A developer ranked 8th globally on Hugging Face DRL leaderboard seeks new endeavors, touting their PyEmber framework innovation.
    • Opening doors for collaborations, the developer shares their curriculum vitae, indicating a readiness to bring their expertise to new horizons.
  • CUDA Capabilities on MacBooks & Beyond: CUDA hopefuls with MacBooks turn to Google Colab as a stepping stone, leveraging its free tier for growth sans the need for a heavyweight GPU.
    • The path to GPU ownership is a marathon not a sprint; cloud alternatives like vast.ai are a stopgap for enthusiasts looking to scale up to physical hardware.
  • Dissecting MuAdam & Model Meticulosity: MuAdam’s learning rate quirk caught the spotlight in a GitHub discussion, with participants debating the subtleties of output weight adjustments.
    • Experiments stirred the pot on embedding weight initialization and raising eyebrows on StableAdam’s handling of loss spikes, pointing the community towards innovative fine-tuning.

OpenAI Discord

  • Locks & Blocks in AI Systems: Discussions focused on the potential for implementing a locking mechanism in AI systems to offer controlled responses after monitoring user interactions.
    • Speech around system autonomy and safety sparked, with conversation darting between ethical implications and technical feasibility.
  • Gearing Up GPUs for AI Prowess: AI aficionados exchanged notes on optimal GPU configurations for task-intensive AI models, with an emphasis on the benefits of high RAM GPUs.
    • Cloud versus local inferencing generated a technical tableau, with links to RunPod and Paperspace for further insights.
  • Circuitry of Decentralized Computing: Decentralized platforms for computation became a topic of intrigue, drawing parallels with existing initiatives like BOINC.
    • The dialogue delved into the practicality of a volunteer-powered computing paradigm for AI-related tasks.
  • Navigating ChatGPT’s Context Conundrums: From the trenches of gpt-4-discussions, users articulated issues with ChatGPT’s responses, flagging concerns over outdated or inaccurate information.
    • Clarifications arose about context window sizes, with sources like the pricing page presenting varying figures from 32K to 128K.
  • Enhancing GPT’s Cerebral Pathways: In #api-discussions, an individual shared progress on a personally-crafted ā€œthought processā€ for a custom GPT, designed to improve the model’s accuracy and truthfulness.
    • The collective is called to action, encouraged to experiment and provide feedback on these custom GPT modifications in the spirit of communal refinement.

LM Studio Discord

  • Tackling LM Studio Update Hiccups: Users iron out LM Studio update issues by clearing cache or reinstalling to fix black screens, while custom model imports in DiffusionBee spark discussions.
    • Mobile deep learning leaps forward as a member clocks Mistral 7B at 10 tokens/second on an S21, igniting conversations on LLMs’ mobile efficiency.
  • Graphics Cards Faceoff: A Tech Conundrum: AI enthusiasts debate 3090 vs 4090 GPU performance, while AMD’s acquisition of SiloAI signals a strong move in the AI hardware space.
    • Concerns are raised over the Intel Arc 770’s lackluster AI support, with suggestions to stick with Nvidia due to better tool support.
  • Code Models in Creative Collision: The coder community weighs the merits of DeepSeek Coder v2 versus the emergent CodeGeeX4, which some attribute better performance for dev tasks.
    • In a significant community update, GLM4’s integration into llama.cpp is heralded, promising improvements for the CodeGeeX4 coding model.
  • Navigating Dual LM Studio Installs: A query emerges on the feasibility of dual versions of LM Studio on a single machine, catering to different GPUs.
    • Version 0.2.27 of LM Studio faces scrutiny as it slows down on the AMD 7700XT, contrasting previous version’s performances.
  • Hugging Face Accessibility Revisited: Community members flagged temporary Hugging Face accessibility issues, later confirmed to be resolved, pointing to an ephemeral snag.
    • A shared ordeal with accessing a specific Hugging Face URL in LM Studio stokes discussions about potential software glitches.

Latent Space Discord

  • Chunky Chroma Conundrum: Chroma delves into retrieval efficiency with a technical report, finding chunking strategies essential as context lengths in LLMs swell.
    • Turbulent Turbopuffer is in the pipeline, with high hopes of cost-effective, faster search solutions for object storage, discussed at length in Turbopuffer’s blog.
  • Elon’s Exuberant Expansion: xAI sets a brisk pace, snagging 24k H100s for their AI cluster, detailed in Elon Musk’s tweet.
    • The AI leader’s zeal is evident as he plans a colossal 100k H100 setup, eyeing the summit of computational supremacy.
  • Skild AI Scoops the Pot: With the stealth-mode veil lifted, Skild AI’s reveal linked with a titanic $300M Series A funding round turned heads, noted in Deepak Pathak’s announcement.
    • Ambition intersects skepticism in VC circles, sparking debates on the robustness of funding against the backdrop of booming tech valuations.
  • Copilot’s Copyright Clash Cools: GitHub Copilot’s courtroom contest contracts, dropping to two standing allegations, with details found in The Register’s coverage.
    • Past friction over improperly licensed suggestions simmers down, shedding light on the broader debate around code ownership and AI.
  • Spatial Spectacle by ImageBind: The ImageBind paper steals the spotlight, unveiling a binocular vision that binds six data modalities and trumps in zero-shot challenges.
    • A stride in multimodal learning, ImageBind outperforms its specialized peers, giving a glimpse into the future of cohesive cross-modal AI applications.

Modular (Mojo šŸ”„) Discord

  • Compiler Conundrums & Clarifications: Building the Mojo compiler from source raised questions, as the process is not documented clearly; only the standard library’s compilation is currently available.
    • For the nightly Mojo compiler release 2024.7.1005, one can update using the command modular update nightly/mojo, with improvements on memset usage and the kwargs crash now fixed as per the changelog.
  • Pondering PyTorch in Production: Modular underscores the complexities of deploying PyTorch models in production, addressing resource and latency challenges.
    • AI developers are encouraged to integrate generative AI into services, with a Bain & Company survey indicating that 87% of companies are piloting or deploying it.
  • Clever Benchmarking Recommendations: Suggestions for accurate benchmarks involve disabling hyper-threading and setting CPU affinity, as outlined in this guide.
    • Incorporating both symmetrical and asymmetrical scenarios in benchmarking ensures a robust performance evaluation, as per the discussions on efficiency in benchmark designs.
  • Synchronization Snags with Mojo Setters: An irregularity using __setitem__ in Mojo suggested a bug where __getitem__ is called instead, sparking an issue submission on GitHub.
    • The intricacies of zero-copy deserialization in Mojo were also debated, weighing in on type casting and allocator awareness with discussions leaning on the technical depth of memory management.
  • Graviton4: Leading AWS’s Instance Invasion: AWS Graviton4-based Amazon EC2 R8g instances are now available, boasting of best-in-class price performance for memory-intensive applications.
    • While some database companies sought immediate rollouts, AWS is expected to release most ā€˜c’ and ā€˜m’ instances at the forthcoming ReInvent.

Eleuther Discord

  • Papers Seeking - Entity Riddle: Members exchanged requests for input on entity disambiguation, spotlighting gaps in their knowledge base and eagerness for advancement.
    • Specific requests for insight included exploration of LLM-based synthetic data generation and the emotional quotient in AI, actively seeking empathy LLMs papers.
  • Map Makers - EleutherAI’s Cartography: Community mapping efforts took center stage with requests to fill out the EleutherAI Global Map, knitting together a global cohort.
    • Diffusion Models enthusiasts delved deeper into the perplexing marginal distributions within the models, sharing the paper to enrich community understanding.
  • Recipe for Success? - RegMix’s Data Cocktail: RegMix’s Data Mixture as Regression was a hot topic, with its promise of pre-training performance mapped out in highly circulating research.
    • The disconnect between VLMs’ benchmark performances and real-world tasks like object counting raised questions on their overarching utility, underscored by score concerns in latest VLM research.
  • Intervention Mashup - Composing AI Improvements: Discussion sparked about multiple interventions within LMs, thanks to Kyle Devin O’Brien’s insights, questioning the composability of edits and unlearning.
    • The cons of naive synthetic data in preventing model collapse, as addressed in this study, broadened the community’s view on data utility in AI.
  • Neural Nuances - Brain Byte Size Matters: Conversations around brain size versus intelligence and cortical neuron count in mammals suggested a more nuanced relationship beyond mere neuronal density.
    • Discourse emerged on genetics and IQ, with a user noting the complexity and sensitivity surrounding human intelligence attributes.

Perplexity AI Discord

  • Perplexity Partners Power-Up: Perplexity AI announced teaming with Amazon Web Services (AWS) to feature Perplexity Enterprise Pro for AWS clientele, promising to streamline their AI toolkit.
    • AWS customers are set to benefit from enhanced AI support, following the expanded availability of Perplexity Enterprise Pro via the AWS Marketplace.
  • Docker Dilemmas with PPLX Library: A compilation hurdle appeared for a user setting up pplx library within Docker, unable to find the module despite success outside Docker using nodemon.
    • Efforts to resolve this included tweaks to tsconfig.json and package.json, with community engagement yet to provide a foolproof solution.
  • Model Price Match-up Misstep: Confusion ensued over a misstatement claiming Claude 3 Haiku as cheaper than Gemini 1.5 Flash, neglecting to account for Gemini 1.5 Flash’s slight price advantage.
    • Compounding the confusion, the AI’s comparison of Haiku with a different tier, Gemini 1.5 Pro, instead of the comparable model led to further discussions on price-performance alignment.
  • AI Prescription Price Plot Thickens: Perplexity AI was called out for initially omitting CostPlusDrugs.com in its medication pricing, a key consideration for professionals in the pharmaceutical sector.
    • Efforts to prompt inclusion of the comprehensive pricing website yielded results, nurturing hopes for a more robust default search algorithm.
  • API Pricing Uncertainty Unveiled: Members sought clarity on whether the $0.6 per million tokens pricing for the API encompasses both input and output tokens.
    • The absence of an official response leaves this pricing perplexity as a prime topic for policy confirmation.

Nous Research AI Discord

  • Jubilation for Anole: Anole Launches as First Open-Source Auto-Regressive LMM: The AI community welcomed Anole, an open-source, autoregressive Large Multimodal Model (LMM), sparking discussions on extending Chameleon functionalities.
    • Amidst excitement, concerns rose over fine-tuning to re-implement image capabilities originally stripped from Chameleon, reflected in a critical tweet.
  • Lock-Picking with Code: Exploration of Gemini 1.5’s Unintended Instructions: Gemini 1.5 Flash was under scrutiny for unintentionally providing methods for breaking into cars through ā€˜stay in character’ prompts.
    • Community reactions were mixed, with some showing concern over the model’s capabilities, while others took a more detached view of its potential for mischief.
  • From PDFs to Markdown: Charting the Path with the Marker Library: The Marker library earned praise for its deft conversion of PDFs to markdown, aiming at enhancing datasets for models like Sonnet.
    • Debates emerged on parsing PDFs—deemed tricky almost to the level of parsing HTML with regex—with calls for better extraction methods.
  • Schema Conformity: Laying Down the Law on Generic RAG Format: AI engineers engaged in designing a universal RAG query-context-answer template experienced a mix of consensus and contention.
    • The discussions meandered through various adjustments, with contributors aligning on formats and contemplating two-stage approaches.
  • Evaluating Relevance: Rewiring Reranking in RAG Thought Tokens: The suggestion to include reranking relevance within <thought> tokens introduced a split view on optimizing parseability and scoring.
    • Dialogue ensued regarding the trade-offs between speed and efficiency, with references to RankRAG and other two-tiered systems.

LlamaIndex Discord

  • Hackathon Hoopla: AGI’s Weekend Code Rally: A hackathon is being hosted by AGI House this Saturday 7/13, featuring collaborations with @togethercompute, @SambaNovaAI, and others, with a call for participants to apply here.
    • Llama-Agents recently launched has already surpassed 1100 stars on GitHub, with @MervinPraison providing a thorough walkthrough available on YouTube.
  • LlamaIndex Leads: Lyzrai Leverages to Landmark $1M+ ARR: By utilizing LlamaIndex for data connectors and RAG functionality, @lyzrai has achieved over $1M+ ARR, offering AI solutions for sales and marketing More details.
    • The LlamaCloud service is being suggested to streamline AI engineers’ data ETL/management, allowing more focus on prompting and agent orchestration, with a variety of cookbooks available Learn more.
  • PDF Parsing Pro Tips: LlamaParse Lays out Lines: LlamaParse is recommended for data extraction from PDFs, raising questions about the need for an OpenAI API key versus local model deployment.
    • Users have resolved query template issues that led to redundant metadata by addressing concerns over template handling differences between Llama-3/Mistral and GPT-4 on Azure OpenAI.
  • Streamlining Success: astream_chat Overcomes Obstacles: Effective fixes have been applied to astream_chat implementation errors, with users incorporating run_in_threadpool and async_wrap_generator methods to properly stream responses.
    • Discussions have highlighted that Ollama boasts user-friendly formatting, though lacking GPU support can lead to slower performance compared to Llama-3/Mistral models.
  • Formatting Finesse: LLMs Learned to Layout: Clarifications reveal setting is_chat_model=True influences the function of LLM.chat() or LLM.complete(), impacting the formatting quality of query engine responses.
    • Acknowledgment of LLMs’ ability to handle formatting nuances underpins efficient use of chat and completion functions by AI query engines.

Stability.ai (Stable Diffusion) Discord

  • Mac Muddles with Stable Diffusion: Challenges in setting up Stable Diffusion on macOS sparked dialogues, with a recommendation for a Python file solution geared for macOS users over commonly found Windows instructions.
    • agcobra1 vouched for a particular implementation as a workaround for the TouchDesigner integration hiccup.
  • Adetailer’s Full-Res Revelation: Enthusiasts unraveled that Adetailer sidesteps VAE encoding, directly aiming for full-resolution outputs which could potentially yield finer image details.
    • hazmat_ spelled out the reality, tempering expectations by explaining that Adetailer is simply an inpainting tool, albeit an instant one.
  • Step-Up Guide for Stable Diffusion: A community-contributed guide simplified the setup process for Stable Diffusion, from securing a suitable GPU to running the models, also hinting at operational costs.
    • Members banded together, with nittvdweebinatree advising against an intricate Anaconda setup, in favor of more straightforward methods.
  • GPU Gambit for Stable Performance: Curiosity flared around running Stable Diffusion on AMD GPUs, with the AMD RX6800 taking the spotlight, iterating over the official Zluda guide for insights.
    • Community collaboration proved essential as members thanked one another for improved guides after an individual recounted their ordeal with inadequate instructions.
  • Refining Edge with High-Resolution Fix: The high-resolution fix button became the subject of experimentation, with users observing notable enhancements in skin textures and facial characteristics.
    • supremacy0118’s tests involved dialing down the scale factor minutely to probe for any subtle quality boosts.

OpenRouter (Alex Atallah) Discord

  • Translation Truths: LLMs vs Specialized Models: The effectiveness of general LLMs like GPT-4 and Claude Opus in language translation was debated, with members showing skepticism about their performance on longer text segments.
    • One member recommended watching Andrej Karpathy’s videos for insights into why decoder-only models might lag behind encoder/decoder transformers in translation accuracy.
  • LangChain Lockdown: OpenRouter API Atrophy: Recent updates in LangChain introduced validation errors that troubled the functionality of OpenRouter’s API, generating community troubleshooting efforts.
    • A rollback to prior versions temporarily resolved the issue, though concerns about LangChain’s frequent compatibility breaks were evident.
  • Evaluating the Evaluators: LLM Assessment Frameworks: Alex Atallah sparked interest in discussing the effectiveness of LLM evaluation frameworks, specifically naming Deepeval and Gentrace, but the community did not provide extensive experiences.
    • The initial query didn’t yield detailed community feedback and remained an open topic for future sharing of insights.
  • Gemini’s Juggling Act: Model Rate Limits Query: Queries about the rate limits of the Gemini 1.5 model reflected the community’s ongoing concerns regarding the deployment and scalability of LLMs.
    • The discussion was left unresolved without direct answers, underlying the common issues in understanding LLM usage constraints.
  • Farewell Noromaid: Model’s Market Exit: The discontinuation of the Noromaid model was met with disappointment from the community, triggering speculations on the effects of its pricing structures on user adoption.
    • Members exchanged thoughts on the need for affordable yet competent models, underscoring the balance between cost and utility in AI applications.

Interconnects (Nathan Lambert) Discord

  • Theorems Tackled with Tremendous Triumph: HarmonicMath achieved a groundbreaking 90% state-of-the-art on the MiniF2F benchmark, soaring past their previous 83% (more details).
    • Discussions praise the pace of theorem proving progress, considering the benchmark’s easier version stood at just 50% earlier this year, showcasing a dramatic improvement.
  • 405b Weights Wager: Open or Closed?: Speculation abounds regarding the openness of the 405b model weights following a July 23rd update.
    • Community members express a mixture of surprise and curiosity, hinting at an unexpected shift toward weight sharing transparency.
  • Legal Laughs in AI Land: A lighthearted exchange on AI development compliance resulted in a humorous, ambiguous assurance that it’s ā€˜good enough for lawyers.’
    • The community enjoyed a chuckle, reflecting on the nuanced dance between AI innovation and legal frameworks.
  • Steering the Vector Vocabulary: Clarification ensues as Control Vector, Steering Vector, and Concept Vectors are dissected, debating usage and interchangeability in machine learning contexts.
    • Particular focus centers on Concept Vectors, considered specific instances of Steering Vectors, spurring conversation on their practical applications and theoretical foundations.
  • Directive Dilemmas: Policy Priorities: A paper stimulates dialogue by suggesting a focused preference for y_l in policy formulation over y_w, alluding to the non-reliance on LLM sampling for preference pairs.
    • Link shared to AI2 slides address Directed Policy Optimization (DPO) and pitfalls like overfitting, albeit access gated by Google sign-in requirements.

LAION Discord

  • Copywrong No More? Court’s Copilot Copyright Call: A pivotal California court ruling may signal smoother skies for AI development, as significant parts of a copyright lawsuit against Microsoft’s GitHub Copilot and OpenAI’s Codex were dismissed.
    • The court’s decision could be a harbinger for AI tools trained on copyrighted data, though full implications in the space of intellectual property rights are still brewing.
  • Boardroom Shuffle: Tech Giants Retreat from OpenAI’s Table: In a move that has tongues wagging, Microsoft and Apple are exiting OpenAI’s board amid antitrust scrutiny, yet vow to maintain their strategic tutoring.
    • The tech titans’ departure from the governance troupe, a narrative entwined with legal labyrinthine, doesn’t spell an end to their OpenAI alliances.
  • Complexity Unchained: Novel Vision Models Tout CIFAR-100 Gains: Complex-valued vision architectures, replacing attention with 2D DFT a la FNet, have sparked excitement after showing promise on CIFAR-100, with shallower networks outshining the profound depths.
    • Despite real issues with gradients in the complex domain, a smaller complex model has already overtaken a much larger real counterpart, possibly foreshadowing a new paper or blog post if gains persist.
  • Graph-Enhanced Gaze: Image Captioning Enters a New Dimension: Graph-based image captioning steps into the limelight, as a novel paper proposes a structure that elevates compositional understanding by weaving entities and their relationships into a narrative.
    • The approach, which is akin to a web of visual verses, leverages object detection and dense captioning, detailed in an arXiv paper that could be a chartbuster in the ongoing AI saga.
  • Community Confluence: OPEA’s Event Sets Sail on Open Seas: OPEA beckons the AI fleet to set a course for its July 16 community event, crafting a collective charter and roadmap amidst the open waves of their 0.7 release; registration is a click away here.
    • This assembly promises to be a conclave where ideas swirl and coalesce, potentially charting the course for future AI endeavors in enterprise.

LangChain AI Discord

  • ConversationSummaryMemory: Who’s On Board?: Discussions arose around enhancing LangChain’s ConversationSummaryMemory for multi-human conversations to streamline summarization.
    • Suggestions included refining the handling of agents to improve efficiency, though specifics on methods were left open for thought.
  • Agents Assemble: LangGraph Strategizing: Building agent-based architectures within LangGraph sparked ideas, with a focus on Agents delegating queries to specified subagents.
    • The approach includes subagents parsing responses, showing a collaborative system amongst AI components.
  • Chroma Hiccups: Troubleshooting Data Fetching: Persistent directory settings in Chroma led to sporadic data retrieval issues, with failures in approximately 70-80% of attempts.
    • Participants shared experiences and sought solutions to this nuanced challenge.
  • AI-driven Code: Unwrangle Your Tasks: Unwrangle.com’s creator showcased the use of AI tools like aider and cursor to rev up coding processes for solo developers.
    • The use extends to streamlining workflows, as indicated in a shared Substack post, triggering a call for community stories on similar AI exploits.
  • Knowledge Graphs Demystified: RAG at Play: Aiman1993 held a youtube workshop illustrating Knowledge Graphs application in Video Game Sales via RAG.
    • The tutorial involved practical uses of the Langchain library and encouraged feedback for future knowledge-driven AI explorations.

Cohere Discord

  • Global Greetings Gather Goodwill: Members from across the world, including Lausanne, Switzerland šŸ‡ØšŸ‡­ and Japan, introduced themselves in the general channel.
    • A member from Japan sparked joy with their enthusiastic greeting: ā€˜Hi, I’m Haru from Japan, nice to meet you all!!!’
  • Welcoming Waves Wash Over Newcomers: Following the flurry of international introductions, experienced members extended a warm welcome with messages like ā€˜welcome šŸ™‚ā€™ and ā€˜Welcome ā¤ļøā€™.
    • The friendly exchanges contributed to a collaborative and inclusive community environment.

OpenInterpreter Discord

  • Llama3’s Lag in Code Logic: A user reported that Llama3 often yields a stray ` snippet before outputting the intended code, necessitating additional prompts for accuracy.
    • The community was queried about switching to an alternative LLM as a potential solution to the code generation problem.
  • LLM Flag Fumble Fixed with Profile Patch: Installation problems arose due to an unrecognized llm-service flag, with a member highlighting a discrepancy in the current documentation.
    • A provisional fix using profiles, akin to Open Interpreter’s setup, was suggested until the documentation update is released.
  • Open Interpreter’s Outreach on Mozilla’s Platform: An announcement was made for a discussion on Open Interpreter to take place on the Mozilla Discord server next week.
    • Interested community members are directed to join the live event at Mozilla Discord for an in-depth conversation.

tinygrad (George Hotz) Discord

  • Tinygrad’s Tricky Troubles: Community members expressed frustration with some of Tinygrad’s error messages, which can be ambiguous and not always critical, suggesting more user-friendly error handling.
    • Particular gripes include errors for non-contiguous inputs which don’t necessarily signal deeper problems but still stop execution.
  • Tinygrad Gradient Defaults Debated: An explanation was offered for Tinygrad’s require_grad settings, noting that a default None value implies gradients are optional, dependent on their use in optimization routines.
    • Explicitly setting this value to False signifies that a tensor is completely excluded from gradient calculation, highlighting the purpose of having three distinct states.
  • Tinygrad and NV Accelerator Ambiguities: There was a clarification that the NV accelerator in Tinygrad is specifically for GPUs, working closely with the hardware kernel while bypassing the userspace layer.
    • Questions arose about the necessity of writing a separate accelerator for NVDLA/DLA, suggesting potential additional work for full support.

MLOps @Chipro Discord

  • KAN Interaction Ignites Insight: The KAN paper’s authors engage the community on the AlphaXiv forum, discussing their latest publication.
    • The forum buzzed with direct interactions and answers to community questions.
  • Judging Panel Piques Interest: Interest spikes as members inquire about the process to join the event’s judging panel.
    • Commitment and willingness to contribute were the sought-after qualities in potential judges.
  • Hermes 2’s Hefty Hike in Benchmarks: Hermes 2.5 shows a significant performance improvement over Hermes 2, as detailed by code instruction enhancements.
    • Benchmarking reveals Hermes 2 scoring 34.5 on the MMLU, with Hermes 2.5 achieving a 52.3.
  • Mistral’s Mileage Beyond 8k: Discussions converge on Mistral’s scalability challenges, indicating the need for more pretraining to extend beyond 8k, as noted in related issues.
    • Focus shifts to mergekit development and frankenMoE finetuning as avenues for overcoming performance bottlenecks.
  • Merger Methods Mulling Model Magic: The potential of merging UltraChat and Mistral-Yarn, using Mistral base, spawns a flurry of technical conjecture.
    • The concept of ā€˜cursed model merging’ resurfaces amid discussions, bolstered by references to previous successes in this area.

OpenAccess AI Collective (axolotl) Discord

  • Predicting a Multi-Token Future: A user inquired about the multi-token prediction capability, questioning its availability for current training processes or if it remains on the horizon.
    • Expansion to multi-token prediction might be contingent on prior implementation within Hugging Face platforms.
  • DPO Fine-Tune Clashes with Multi-GPU Processing: The community flagged an error disrupting full fine-tuning when using DPO on systems utilizing multiple GPUs.
    • The glitch was notably triggering crashes in RunPod FFT during fine-tune sessions involving the main branch.

AI Stack Devs (Yoko Li) Discord

  • Dev Dive: Left-Side Lift-Off: Mikhail_EE has made advancements on the left side of their ongoing development.
    • Encouraging feedback received, with N2K responding with a ā€œAmazing!ā€ to the progress update.
  • Enthusiasm Echoes in Updates: Mikhail_EE’s idea development garners attention with a significant update shared.
    • The community feedback loop is reinforced as N2K echoes with an affirmative ā€œAmazing!ā€ echoing a supportive sentiment.

LLM Finetuning (Hamel + Dan) Discord

  • Credits Countdown Conundrum: A member reported a glitch where their user credits expired prematurely, raising the issue with an extension request tagged for admin attention.
    • Expectations are set for a solution that could extend the credit duration, allowing the member to fully leverage the intended platform usage.
  • Summary Shortage Solution: To meet the schema requirements, a placeholder summary is included due to insufficient context for a second valid topic.
    • This entry ensures compliance with the JSON schema’s stipulations for a minimum of two topic summaries.

The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Torchtune Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == ā€˜web’ %}

Unsloth AI (Daniel Han) ā–· #general (478 messagesšŸ”„šŸ”„šŸ”„):

  • Gemini-1.0-pro Finetuning
  • Kaggle Notebook for Gemma-2-9b
  • Qwen2 Finetuning and Effectiveness
  • Synthetic Data Generation
  • Koboldcpp for Local LLMs
  • Finetuning advice for Gemini-1.0-pro: A user inquires about finetuning Gemini-1.0-pro, and another user suggests generating synthetic datasets with Gemini 1.5pro for better results.
    • ā€œIt’s pretty good! But I don’t suggest using it for out-of-domain topics, such as new languages, since it’s a basic Lora adapter,ā€ advised one member.
  • Kaggle Notebook Fine-tunes Gemma-2-9b: A user shared a Kaggle notebook for fine-tuning Gemma-2-9b, created by Daniel Han, co-creator of UnslothAI.
    • The notebook demonstrates how to effectively fine-tune models using Kaggle’s resources.
  • Qwen2 effectiveness debated: Members discuss the effectiveness of finetuning Qwen2-1.5b, noting its ability to mimic data structures with good general responses.
    • It was mentioned that despite being a smaller model, Qwen2-1.5b runs well without requiring a GPU, though its speed can vary based on the task.
  • Synthetic data generation tools: Users discuss various synthetic data generation tools, with a focus on text and conversation data.
    • Recommendations include using magpie, augmentoolkit, and automating data generation with Python scripts on servers.
  • Local LLMs with Koboldcpp: Koboldcpp is recommended for running local language models with a UI, leveraging GGUF files from HuggingFace.
    • Users discuss configuring settings for optimal run, emphasizing offloading layers to GPU and handling context size for better performance.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #off-topic (18 messagesšŸ”„):

  • Llama 3 model usage in Unsloth
  • Swedish language model
  • Training advice for Llama 3
  • Inference speed issues
  • Mac GPU performance for model inference
  • Use fine-tuned Llama 3 with Unsloth: A discussion took place regarding the possibility of using an already fine-tuned Llama 3 model via Unsloth, highlighting the use of AI-Sweden-Models/Llama-3-8B-instruct.
    • A member mentioned that it’s better to use the base model for training and the instruct model for tasks like translation.
  • Swedish Llama 3 model details: The Swedish language-translated Llama model (AI-Sweden-Models/Llama-3-8B-instruct) was trained on the LUMI supercomputer as part of the DeployAI EU project.
    • A member shared insights about the dataset used for training, provided by 42 Labs.
  • Inference speed challenges and solutions: Inference using Llama-3-8B-instruct model was taking too long on Google Colab, taking approximately 3 minutes per response.
    • Suggestions were made to use LM Studio for better performance and to offload layers to GPU more effectively.
  • GPU performance on Mac for model inference: A query was raised about the feasibility of running the inference locally on a base version M1 Mac Air.
    • A member suggested it is likely possible and recommended searching for Meta-Llama-3-8B-Instruct on LM Studio and using the highest quant that fits the system.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #help (35 messagesšŸ”„):

  • GPTs Agents
  • Learning Rate Schedulers
  • Xformers Version Compatibility Issues
  • GGUF Model Saving and Loading
  • Custom Token Embeddings Training
  • GPTs Agents cannot learn after initial training: A member shared a concern about GPTs agents not learning from additional information provided after their initial training.
  • Cosine scheduler recommended for fine-tuning Qwen2-1.5b: When asked about the best learning rate scheduler, a member suggested that generally cosine is the best option.
  • Xformers version compatibility issues resolved: Members experienced issues with the latest xformers version, causing training errors.
    • Downgrading to xformers==0.0.26.post1 fixed the compatibility issues, which has now been updated in the official notebooks.
  • Saving models in GGUF format results in errors: A critical error was encountered during GGUF model conversion due to missing files in llama.cpp.
    • The error message indicated the absence of llama-quantize or quantize files, leading to runtime failures during the save operation.
  • Adding and training new token embeddings manually: A user inquired about manually implementing backpropagation for specific modules to train new token embeddings while freezing the pretrained ones.
    • Their goal was to ensure accurate predictions for new special tokens without retraining all the embeddings.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #showcase (1 messages):

  • Ghost 8B Beta
  • Language Models
  • Multilingual Support
  • Knowledge Capabilities
  • Cost-Effectiveness
  • Ghost 8B Beta Released with Multilingual Support: Ghost 8B Beta is a large language model developed with goals that include excellent multilingual support, superior knowledge capabilities, and cost-effectiveness.
    • The model comes in two context length versions, 8k and 128k, and includes multilingual function tools support by default. Try it on Hugging Face.
  • Ghost 8B Beta Overview and Resources: The official website provides comprehensive documentation, including sections on introduction, specifications, techniques, evaluation, and notes.
    • Users are encouraged to check the linked sections for detailed information on the model’s capabilities and underlying techniques.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #community-collaboration (2 messages):

  • New token embeddings
  • Vocab expansion challenges
  • Bad embeddings from new tokens: A member warns that new tokens’ embeddings might be really bad if implemented without proper protocol.
    • They highlight past experiences where vocab expansion necessitated continental pretraining.
  • Vocab expansion requires pretraining: Another member reiterates that vocab expansion demands rigorous pretraining to avoid embedding issues.

Unsloth AI (Daniel Han) ā–· #research (3 messages):

  • FSDP2 example
  • Norm tweaking for LLMs quantization
  • FSDP2 minimal example released: @marksaroufim shared a minimal example for FSDP2, provided by Andrew Gu, to facilitate easier implementation.
    • The example includes a torchrun command and a Python script demonstrating fully sharded data parallel (FSDP) with mixed precision policy.
  • Norm tweaking for better LLMs quantization: A paper introduces a technique called norm tweaking that improves precision in model compression for large language models (LLMs).
    • The method shows significant gains in 2-bit quantization without sacrificing accuracy, outperforming existing post-training quantization (PTQ) methods.*

Links mentioned:


HuggingFace ā–· #announcements (1 messages):

  • Google TPUs
  • Datasets Filtering
  • Gemini Nano in Browser
  • Rust-based Inference
  • Depth Estimation Models
  • Google TPUs Now on Hugging Face: You can now build, train, and deploy Generative AI models with Google TPUs on Hugging Face. Google Cloud TPUs are accessible on Spaces and Inference Endpoints, with options ranging from 16GB to 128GB and pricing starting at $1.38/hour.
  • Hugging Face Adds Dataset Filtering: Hugging Face now allows you to filter almost 200,000 datasets by modality, size, and format, enhancing the impact and accessibility of open datasets over open models.
    • As stated, ā€˜open datasets are more impactful than open models these days’.
  • Run Gemini Nano in Browser with Chrome’s window.ai: Chrome’s new window.ai feature enables running Gemini Nano, a 3.25B parameter LLM, entirely locally in your browser. Experimental support for Transformers.js is also added to make its use simpler.
  • Rust-Based Inference Now Possible: The Rust-based framework Kyutai Labs’ Candle allows for real-time serving of models like Moshi. Candle supports CPU, CUDA, and Metal for inference and will be open-sourced soon.
  • New Depth Estimation Models Released: Two new depth estimation models, Depth Anything v2 and ZoeDepth, are now available in Hugging Face Transformers. Depth Anything v2 provides relative distances, while ZoeDepth offers absolute distances in meters.

Links mentioned:

  • Tweet from Philipp Schmid (@_philschmid): Now! Build, Train, Deploy Generative AI models with @Google TPUs on @huggingface! > Google Cloud TPUs available on Spaces and Inference Endpoints > 3 options: 16GB to 128GB TPU memory (1x1, 2x2...
  • Tweet from clem šŸ¤— (@ClementDelangue): IMO, open datasets are more impactful than open models these days! You can now filter almost 200,000 of them on HF by modality, size and format: https://huggingface.co/datasets
  • Tweet from Xenova (@xenovacom): Chrome's new `window​.ai` feature is going to change the web forever! 🤯 It allows you to run Gemini Nano, a powerful 3.25B parameter LLM, 100% locally in your browser! We've also added exper...
  • Tweet from Zach Mueller (@TheZachMueller): Another start to the month, another @huggingface Accelerate release! We've been cooking šŸ‘Øā€šŸ³ New profilers, speedups, communication hook support, and so much more! 🧵
  • Tweet from Vaibhav (VB) Srivastav (@reach_vb): How does @kyutai_labs serve Moshi to millions, in real-time? TL;DR - Rust based inference stack based on Candle šŸ¦€ > Inference run in 8-bit Q8 (gguf) - you can see it on the demo screen > CUDA...
  • Tweet from Sayak Paul (@RisingSayak): We worked on a mini-project to show how to run SD3 DreamBooth LoRA fine-tuning on a free-tier Colab Notebook 🌸 The project is educational and is meant to serve as a template. Only good vibes here pl...
  • Tweet from Niels Rogge (@NielsRogge): 2 new depth estimation models now in @huggingface Transformers! Depth Anything v2 & ZoeDepth - Depth Anything v2 is relative, tells you the relative distance among the pixels - ZoeDepth is absolute...
  • Tweet from Argilla (@argilla_io): 🌟 New Blog Post Alert with @mantisnlp ! 🌟 We will discuss SimPO (Simple Preference Optimization), designed to better align reward and generative models, providing more intuitive results. šŸš€ Stay ...
  • Tweet from Philipp Schmid (@_philschmid): Did Open Science just beat @OpenAI? 🤯@kyutai_labs just released Moshi, a real-time native multimodal foundation model that can listen and speak, similar to what OpenAI demoed GPT-4o in May. šŸ‘€ Mosh...
  • Tweet from Aymeric (@AymericRoucher): New cookbook! I show to to make agentic RAG using Transformers Agents. Compared to vanilla RAG, agentic RAG can: āœ… Reformulate the query āœ… Critique the retrived content to re-retrieve if needed āž”ļø ...

HuggingFace ā–· #general (297 messagesšŸ”„šŸ”„):

  • Life advice on Data Science vs. Machine Learning
  • Gemma Model Usage Issues
  • Debugging and Coding Tips
  • Transformers and LLM Usage
  • Neural Network Training Issues
  • Data Science vs. Machine Learning Debate: A user asked whether to pursue Data Science or Machine Learning, sparking a brief discussion about the similarities and mathematical challenges in ML.
    • Noaroggendorff commented, *
  • Gemma Model Issues and Alternatives: Several users encountered issues using the Gemma-2b model for text generation, including internal server errors and incoherent outputs.
    • Aidlennerd shared a Gemma Model Card, while Noaroggendorff recommended using chat templates and alternatives like Gemma-7b.
  • Debugging and Improving Code Snippets: Aidlennerd posted Python code snippets for debugging and discussed optimizing them, including adjusting tokenizer settings and model configurations.
    • Noaroggendorff and others suggested practical tweaks, like using quantized models and splitting long texts to prevent GPU RAM overflow.
  • Hosting Local Models Efficiently: Aidlennerd sought advice on hosting LLMs locally for metric evaluations, as OpenAI API tokens were deemed costly.
    • _Bored

Links mentioned:


HuggingFace ā–· #today-im-learning (4 messages):

  • freeCodeCamp Machine Learning course
  • Finding help from communities
  • Triplet collapse in embedding models
  • Starting with freeCodeCamp’s ML course: A member began learning Machine Learning with Python using the freeCodeCamp course.
    • They shared that they have no prior knowledge in ML basics but found this to be a good starting point.
  • Find help from communities: If you can’t achieve something, find its community and a person to help you out - A member shared their insights after struggling to find specific information online.
    • If you can’t Google a thing that’s not very specific, ChatGPT also can’t help you out with it - they concluded.
  • Solving triplet collapse in embedding models: A member found a solution to triplet collapse in embedding models by pre-training with softmax before applying triplet loss, instead of just using batch mining strategies.
    • They are working on a mouse dynamics embedding model and shared preliminary results showing improved separability using this method.

HuggingFace ā–· #cool-finds (10 messagesšŸ”„):

  • Generative AI's impact on storytelling
  • KMWorld AI 100
  • Fine-tuning LLMs with QLoRA
  • Candle running on iOS
  • Wav2Lip lip-synching issues
  • Generative AI revolutionizes storytelling: An article on Medium discusses the transformative potential of generative AI in storytelling, examining its impact on authors and audiences with detailed insights. Read more here.
  • KMWorld AI 100 highlights intelligent knowledge management: The KMWorld AI 100 article discusses companies that are empowering intelligent knowledge management and the rapid advancements in AI technology. The full article can be found here.
  • Running Candle with metal acceleration on iOS: A user shared their progress in making Candle compile and run on iOS with metal acceleration, seeking community assistance. The conversation can be joined here.
  • Resolving Wav2Lip lip-synching issues: A member sought help for Wav2Lip where characters move their mouths without speaking; a solution involving background noise reduction was suggested.
  • Promising results from an MMA fight predictor: A user introduced their MMA fight predictor with an impressive 78% accuracy and detailed the features used such as fighter stats and strike averages.

Links mentioned:


HuggingFace ā–· #i-made-this (15 messagesšŸ”„):

  • qdurllm demo
  • Branchy-phi-2 showcase
  • Knowledge Graphs workshop
  • Early Exit in LLM
  • MCQ generation App
  • Qdurllm demo launches with a crabby twist: A new qdurllm demo built on Qdrant, Sentence Transformers, llama-cpp, and Langchain is now available, showcasing its search engine capabilities.
    • Users are encouraged to try out the demo and support it with a ⭐ on GitHub.
  • Exploring Early Exit in LLM with Branchy-phi-2: A new Branchy-phi-2 Space demonstrates research on Early Exit in LLM, allowing faster inference with adjustable accuracy.
    • Slower performance on CPU is noted, but feedback and exploration of the Epsilon parameter for Early Exit are encouraged.
  • Delve into Knowledge Graphs with a Video Game Sales workshop: A live workshop presented Knowledge Graphs using Video Game Sales as a case study, aiming to enhance natural language querying with Langchain and Neo4j.
    • The community is invited to provide feedback on the tutorial and engage with the presented content.
  • LLMs tackle Cypher generation: Stable-cypher-instruct-3b model designed for Cypher generation is shared, promising superior performance compared to GPT-4.
    • Feedback on the model’s performance for Cypher queries from text is sought to refine its capabilities.
  • Introducing Ideogram outputs collection: A new Ideogram outputs collection includes top posts, user feeds, and random samples from Florence2 captions.
    • Future updates will incorporate captions from llava-next and cogvlm2 to diversify content descriptions.

Links mentioned:


HuggingFace ā–· #computer-vision (5 messages):

  • User appreciation
  • Image segmentation issues
  • Goated Bot Receives Praise: A member expressed their love for the bot, calling it ā€˜goated’ for its functionality.
  • Help Needed for Image Segmentation Project: A member requested assistance with issues they are facing in their image segmentation project.

CUDA MODE ā–· #general (12 messagesšŸ”„):

  • Shared memory in GPUs
  • Hackathon team formation
  • Utilizing Shared Memory in GPUs with Compute Capability 8.9: A discussion highlighted that GPUs with compute capability 8.9 address up to 99 KB of shared memory per thread block. An example to use this additional shared memory can be found in this kernel launch.
    • ā€œBy default, the rest would be used by the L1 cache (if no texture is used)ā€ since the memory is unified, and an additional 1 KiB is allocated to the driver.
  • Forming Hackathon Teams for CUDA Event: Several members discussed forming teams for an upcoming hackathon event focused on CUDA.
    • ā€œI’d like to make the trip if I get in, and if I can find lodgingā€ summarizes the sentiment, highlighting the logistical considerations of attending.

Links mentioned:


  • AMD acquisition
  • AI start-ups in Europe
  • Silo AI
  • LLMs development
  • Nvidia competition
  • AMD to Acquire Finnish AI Start-Up Silo AI: AMD announced it will acquire Finnish AI start-up Silo AI for $665mn to expand its AI services and compete with Nvidia.
    • Silo AI’s 300-member team will use AMD’s software tools to build custom large language models (LLMs) for chatbots, with the acquisition expected to close in the second half of this year pending regulatory approval.
  • Significant AI Startup Acquisition in Europe: The AMD-Silo AI deal is one of the largest acquisitions of a privately held AI startup in Europe since Google bought DeepMind for around Ā£400mn in 2014.
    • AMD’s Vamsi Boppana mentioned this acquisition will accelerate customer engagements and AMD’s own AI tech stack.

Link mentioned: AMD to buy Finnish start-up Silo AI for $665mn in drive to compete with Nvidia : All-cash acquisition by California-based chipmaker is the largest of its kind in Europe in a decade


CUDA MODE ā–· #jobs (1 messages):

  • Job Opportunities
  • Remote Work
  • AI Research
  • PyEmber Framework
  • Hugging Face DRL Leaderboard
  • AI Specialist Seeks Remote or Hybrid Opportunities: A member announced their search for fully remote or hybrid job opportunities with sponsorships, emphasizing their 4 years of experience in AI and RL.
    • They highlighted their 8 global ranking on the Hugging Face DRL leaderboard and the creation of the PyEmber framework based on PyTorch.
  • PyEmber Framework Innovated by AI Engineer: The developer behind PyEmber, a deep learning framework based on PyTorch, is seeking new job opportunities and collaboration projects.
    • They published their first research paper and anticipate releasing a second one soon.

Link mentioned: Waleed Salah Eldin Resume.pdf: no description found


CUDA MODE ā–· #beginner (37 messagesšŸ”„):

  • Nvidia's tensor offloading
  • CUDA and GPU learning on MacBook
  • HIP as AMD's CUDA equivalent
  • Google Colab for CUDA learning
  • Future GPU purchase considerations
  • Nvidia’s tensor offloading explained: Discussion on tensor offloading mentioned in an Nvidia whitepaper where tensors are unloaded from VRAM to reduce peak VRAM consumption.
    • For implementing tensor offloading, members referred to the PyTorch implementation with FSDP (FullyShardedDataParallel).
  • CUDA development without GPU on MacBook: Members discussed using cloud solutions like Google Colab or Kaggle for CUDA learning on a MacBook as alternative to buying a physical GPU.
    • Free tier options on these platforms are recommended for beginners, removing the need to invest in a physical GPU until advanced stages.
  • AMD’s HIP as an alternative to CUDA: Members confirmed that AMD’s HIP serves as a parallel to Nvidia’s CUDA.
    • It is mostly similar to CUDA with few hardware-specific optimizations, making it an easy transition for CUDA users.
  • Using Google Colab for CUDA projects: Google Colab supports running CUDA code, useful commands include installing nvcc4jupyter and setting up %%cuda cells for execution.
    • Alternatively, CUDA files can be run directly with relevant NVCC commands, without additional setup.
  • Long-term plans for GPU purchase: A member expressed interest in eventually buying a GPU for advanced projects, although starting with cloud GPU options like vast.ai can be cost-effective.
    • A physical GPU is beneficial for broader use cases beyond CUDA, such as gaming or graphics work.

Links mentioned:


CUDA MODE ā–· #off-topic (1 messages):

apaz: https://x.com/typedfemale/status/1810025768715686188


CUDA MODE ā–· #llmdotc (237 messagesšŸ”„šŸ”„):

  • MuAdam Implementation Issues
  • Embedding Weight Initialization
  • Performance of StableAdam
  • Handling Loss Spikes
  • MuP Learning Rate Impact
  • MuAdam Not Adjusting LR for Output Weights: Discussion initiated regarding MuAdam issue about the learning rate not scaling for output weights possibly affecting extensions.
    • Concerns were raised about weight tying leading to a performance trade-off, sparking the need for further experiments and community feedback.
  • Embedding Weight Initialization Trade-offs: Members discussed the impact of zero-initialization of embedding weights on performance, deciding to experiment with different setups without zero initialization.
    • Initial results showed promising stability without zeroing embedding layers, suggesting further run investigations with adjusted multiplication factors.
  • StableAdam’s Performance Against Loss Spikes: Attempts to mitigate loss spikes using StableAdam noted instability in large model runs despite initial promise.
    • The model’s gradient norm continued to climb, raising doubts about StableAdam’s effectiveness in this scenario.
  • Removing Biases in Model Layers: Experiments showed indifference in performance with and without linear biases in models, leading to a proposal to skip bias computations.
    • Bias weights drifted significantly, suggesting instability and supporting the proposal for bias-free models to simplify implementation.
  • MuP Learning Rate Impact Compared to Baseline: Comparison between MuP and baseline configurations highlighted MuP’s initial relative underperformance in early training steps.
    • The discrepancy suggests a need for further tuning and adjustments to better align MuP performance with established baselines.

Links mentioned:


OpenAI ā–· #ai-discussions (199 messagesšŸ”„šŸ”„):

  • AI locking mechanism
  • Training configurations for AI projects
  • High-performance computing and GPUs
  • Neural network feature extraction
  • Decentralized computing
  • AI locking mechanism proposal: A discussion on incorporating a locking mechanism in AI systems to control responses based on user interactions.
  • Challenges in configuring and training AI models: Members discussed the difficulty of hiring experts to configure specific AI systems and the potential cost and practicality of GPU rentals from providers like RunPod and Paperspace.
  • High-performance computing for AI tasks: Participants compared different GPU setups and cloud services for high-performance AI tasks, emphasizing the value of GPUs with high RAM for effective local and remote model inference.
  • Decentralized computing possibilities: Discussion on the feasibility of creating a decentralized computing platform, comparing it to existing platforms like BOINC that leverage volunteer computing.
  • Interpretation and training neural networks: Members shared experiences and tips for interpreting neural networks, focusing on extracting interpretable features using sparse autoencoders and the importance of initial training.

OpenAI ā–· #gpt-4-discussions (16 messagesšŸ”„):

  • User Frustration with ChatGPT Responses
  • Limitations of Search Results
  • Context Window Specifications
  • Differentiation Between Plan and Platform Dependency
  • Topic Relevance to Channels
  • User Frustration with ChatGPT Responses: A user expressed frustration with ChatGPT due to repeated incorrect responses and has considered switching to a competitor. They emphasized the importance of accurate search results for current information and updates.
  • Limitations of Search Results: Another user noted that even with search capabilities, ChatGPT often provides outdated information or mixes details from different versions, leading to inaccurate results.
    • They mentioned having to frequently specify to use recent search results to get accurate answers.
  • Context Window Specifications: A member clarified that the ChatGPT model has a context window specified as 32K, which can be found on the pricing page under Model Quality.
    • Another member pointed to an alternative answer showing the context window being 128K for the API.
  • Differentiation Between Plan and Platform Dependency: An issue was raised regarding confusion caused by mixing platform dependency with plan dependency in ChatGPT’s responses.
    • One user noted that asking about the context window ā€˜in the mobile app’ might have led to inaccurate search outcomes.
  • Topic Relevance to Channels: There was a brief exchange about ensuring discussions remain relevant to specific channels.
    • A user was redirected to a more appropriate channel for off-topic discussions not directly related to OpenAI.

OpenAI ā–· #prompt-engineering (1 messages):

  • thought process for custom GPT
  • accurate and truthful responses
  • Swooshdutch’s Thought Process for Custom GPT: A member discussed their work on crafting a ā€œthought processā€ for a custom GPT aimed at leading to more accurate and truthful responses.
    • Feel free to play around with it was the encouragement given for tinkering with this custom thought process.
  • Experimenting with Custom GPT: Members are invited to experiment with the new thought process for the custom GPT.
    • This experimentation aims to further refine the accuracy and truthfulness of the responses generated by the custom GPT.

OpenAI ā–· #api-discussions (1 messages):

  • Custom GPT Thought Process
  • Testing and Feedback for Custom GPT
  • Enhanced Custom GPT Thought Process: A member shared that they have been crafting a ā€œthought processā€ for custom GPT aimed at leading to more accurate and truthful responses.
    • Feel free to play around with it, suggesting there’s an open invitation for testing and feedback.
  • Call for Testing and Feedback: Members are invited to play around with the new thought process for custom GPT to see how it affects accuracy and truthfulness.
    • This collaborative approach is expected to refine the model based on community feedback.

LM Studio ā–· #šŸ’¬-general (100 messagesšŸ”„šŸ”„):

  • LM Studio configuration issues
  • Running LLMs on mobile devices
  • Multi-GPU support in LM Studio
  • Custom model import in DiffusionBee
  • LM Studio text embedding limitations
  • LM Studio configuration issues and fixes: Users are encountering black screens and other issues when updating LM Studio; clearing cache folders and reinstalling the application resolves these problems.
    • One user reported a black screen issue resolved by deleting model caches, and another fixed memory detection problems by uninstalling and reinstalling the software.
  • Running LLMs on mobile devices: Discussion on running Mistral 7B on an S21 Ultra through llama.cpp and termux, achieving close to 10 tokens/second.
    • The performance is surprisingly good for a mobile device, with short prompts like ā€˜what’s 2+2’ being processed effectively at quantization level Q4_K_S.
  • Issues with Multi-GPU support in LM Studio: Multi-GPU support is reportedly not working properly in the latest LM Studio version, causing crashes when models exceed the first GPU’s VRAM capacity.
    • Users noted that previously functional setups now crash, as LM Studio seemingly doesn’t utilize the second GPU’s VRAM.
  • LM Studio text embedding limitations: The text embedding feature in LM Studio only supports input types of string and string array, not allowing for direct input of files such as PDFs.
    • The feature is primarily used for Retrieval Augmented Generation applications and other text-heavy use cases.
  • Exploring DiffusionBee for Mac: Users discussed DiffusionBee as a local image generator for Mac, noting it works best with Apple Silicon and allows custom model imports from Civitai.
    • Compared to A1111, DiffusionBee has an easier-to-use interface but currently lacks support for importing SDXL models other than the pre-listed ones.

Link mentioned: Text Embeddings | LM Studio: Text embeddings are a way to represent text as a vector of numbers.


LM Studio ā–· #šŸ¤–-models-discussion-chat (44 messagesšŸ”„):

  • Comparing deepseek chat and coder models
  • Flash attention settings
  • Optimal AI models for personal accountability
  • GLM4 support in llama.cpp
  • CodeGeeX4 coding model
  • DeepSeek Chat vs. Coder Spar: Members discussed the performance differences between DeepSeek Chat and DeepSeek Coder, with some favoring the new v2 of the DeepSeek Coder.
    • ā€˜Coding assistant’ users reported satisfactory results using DeepSeek Coder v2 lite for a few weeks.
  • GLM4 Integration Incoming: Discussion noted that GLM4 has now been merged into llama.cpp and is expected in the next update.
    • Users viewed this as beneficial, noting that CodeGeeX4 is based on GLM4, which brings enhanced capabilities.
  • CodeGeeX4 Outshines Competitors: A new model, CodeGeeX4, is touted to be better than DeepSeek v2 and is now available for various code generation tasks.
    • Comparisons with CodeQwen further affirmed CodeGeeX4’s superior capabilities.

Links mentioned:


LM Studio ā–· #🧠-feedback (4 messages):

  • LM Studio bugs
  • Hugging Face accessibility
  • Possible LM Studio access bug: a.hansen reported a potential bug where LM Studio couldn’t access a specific Hugging Face URL. However, fabguy confirmed that it was working for him.
    • a.hansen later confirmed that it was working the following morning, suggesting it might have been an issue on Hugging Face’s end.
  • Hugging Face temporary access issue: After a.hansen faced access issues with Hugging Face, fabguy suggested it could have been a Hugging Face problem.
    • a.hansen confirmed that the access issue was resolved by the next morning.

Link mentioned: lmstudio-community (LM Studio Community): no description found


LM Studio ā–· #šŸŽ›-hardware-discussion (40 messagesšŸ”„):

  • 3090 vs 4090 performance
  • Electricity costs in Australia
  • Building multi-GPU setups
  • AMD acquires SiloAI
  • Intel Arc 770 for AI
  • 3090 vs 4090 performance for AI applications: A member compared their 2x 7900xt setup reporting 8-11 t/s on L3 70b IQ3 while others discussed the performance of 3090/4090 GPUs for AI tasks, with a general consensus that 4090s and multiple 3090s offer better performance.
    • Some members highlighted that 3090/4090 GPUs outperform Apple’s Mac Studio for inference tasks.
  • Electricity bills skyrocket for GPU setups: Members discussed the high electricity costs in Australia, noting that multiple high-power GPUs like 3090s significantly impact electricity bills.
    • ā€œAs a fellow StrineI… paying over the odds for leccyā€¦ā€ emphasized one user humorously.
  • Struggles with building multi-GPU setups: For a 3x 3090 GPU setup, users shared experiences with case compatibility issues and power supply requirements, particularly needing gaps and airflow considerations.
    • A member suggested using 2x 750 watt PSUs mounted in side chambers as a workaround for power supply constraints.
  • AMD’s strategic move: Acquires SiloAI: It was noted that AMD has purchased SiloAI as part of a strategic move to compete with NVIDIA.
    • ā€œReportedly as part of their catch-up play against nvidia,ā€ shared a member.
  • Intel Arc 770 falls short for AI tasks: Members advised against using the Intel Arc 770 for AI tasks, citing insufficient toolchain support and lagging IPEX support compared to CUDA and ROCm.
    • ā€œStick with Nvidia,ā€ was a common sentiment among members for better AI performance.

Links mentioned:


LM Studio ā–· #amd-rocm-tech-preview (2 messages):

  • LM Studio update issues
  • Installing multiple versions of LM Studio
  • LM Studio version 0.2.27 slows down with AMD 7700XT: LM Studio version 0.2.27 seems super slow after updating from version 0.2.24, specifically with AMD graphics card 7700XT and fimbulvetr Q4_K_M model.
  • Query on dual installation of LM Studio: A member asked if it is possible to have both LM Studio versions installed simultaneously on one machine to accommodate each GPU.
    • The query was based on having one of each GPU.

Latent Space ā–· #ai-general-chat (55 messagesšŸ”„šŸ”„):

  • GPTs Agents
  • OpenAI's sidebars
  • Chroma chunking strategies
  • Turbopuffer launch
  • xAI H100s cluster
  • Chroma examines chunking strategy: Chroma’s latest technical report evaluates chunking strategies on retrieval performance for AI applications.
    • Report highlights that while LLM context lengths have grown, efficient retrieval often requires only relevant text portions to avoid model distractions.
  • Turbopuffer nears launch: Turbopuffer aims to provide fast search on object storage, addressing Readwise’s expensive vector search costs.
    • Mentioned by users as still waitlisted but praised for its potential by early testers from companies like Cursor.
  • xAI orders massive H100 cluster: xAI has contracted 24k H100s from Oracle and is building a 100k H100 system for AI training, aiming to be the world’s most powerful cluster.
    • Elon Musk emphasized the need for internal control over AI infrastructure to maintain competitive speed and efficiency.
  • Skild AI secures $300M Series A: Skild AI, led by Abhinav Gupta and team, emerges from stealth with a huge $300M Series A funding to build an AI foundation model for robots.
    • VCs have mixed feelings about current valuations, labeling them as a potential bubble despite the exponential growth potentia.
  • GitHub Copilot lawsuit update: Developers’ claims against GitHub Copilot largely dismissed, leaving only two allegations remaining.
    • Initial claims involved Copilot allegedly suggesting code snippets without proper licensing, raising intellectual property concerns.

Links mentioned:


Latent Space ā–· #llm-paper-club-west (93 messagesšŸ”„šŸ”„):

  • ColBERT paper
  • AI agent implementations
  • ImageBind
  • SBERT training
  • Multi-agent systems
  • ColBERT paper discussed: Discussion on the ColBERT paper was initiated for the survey session.
    • A member was curious if this was the only paper under discussion.
  • Survey on AI agent implementations: A member covered the survey paper on recent advancements in AI agent implementations including reasoning and tool execution capabilities.
    • The paper outlines key themes in agent architectures and their effectiveness, citing the impact of leadership and communication styles in agent systems.
  • ImageBind unifies six modalities: The ImageBind paper was presented, demonstrating learning a joint embedding across images, text, audio, depth, thermal, and IMU data.
    • ImageBind sets a new state-of-the-art in zero-shot recognition tasks, outperforming specialized supervised models and showing strong few-shot recognition results.
  • SBERT design and training explained: Members clarified that SBERT (sentence transformers) is essentially BERT with a pooling layer, trained contrastively using methods like siamese or triplet networks.
    • This sparked further discussion on BERT’s original interesting use of the first token as the input embedding for classification.
  • MCTS for improved intelligence: Monte Carlo Tree Search (MCTS) was discussed as a potential next step for improving intelligence in LLMs, referencing practical applications in AlphaGo.
    • Discussion noted that MCTS’s effectiveness depends on the branching factor of the search space, cautioning its limitations in infinitely large spaces.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #general (9 messagesšŸ”„):

  • Writing AI with Mojo
  • Qualcomm's SNPE and Mojo
  • Modverse Weekly Issue
  • Chris Lattner on ThePrimeagen
  • Community resources for AI writing in Mojo discussed: A discussion surfaced about the availability of community resources for writing AI with Mojo.
    • No specific resources were mentioned, leaving an open question for further input.
  • Comparison of Qualcomm SNPE and Mojo capabilities: A comparison was made between Qualcomm’s SNPE for sending PyTorch models to snapdragon devices and potential similar features in Mojo.
    • No specific features in Mojo were confirmed.
  • Modverse Weekly: Typo and duplicate entries: The latest Modverse Weekly Issue had a typo with ā€˜it’s’ instead of ā€˜its’ and duplicate entries for time.perf_counter.
    • The issues were acknowledged and promised to be fixed.
  • Chris Lattner’s appearance on ThePrimeagen has fans excited: Community members expressed excitement about Chris Lattner appearing on ThePrimeagen’s stream.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #šŸ’¬ļø±twitter (1 messages):

ModularBot: From Modular: https://twitter.com/Modular/status/1810782477079957831


Modular (Mojo šŸ”„) ā–· #āœļø±blog (3 messages):

  • PyTorch in Enterprise
  • AI Development Challenges
  • Generative AI Adoption
  • Modular supports PyTorch deployment: Modular highlights the challenges enterprises face in deploying PyTorch models in production, despite its popularity for development and research.
    • The flexibility and ease-of-use of PyTorch in development can lead to complications like resource management and latency issues in full-scale production settings.
  • Modular bridges local and cloud development: Modular addresses the difficulties in creating streamlined AI development workflows that are both locally manageable and scalable for cloud deployment.
    • Developers often face fragmented AI tooling that complicates the end-to-end workflow for effective AI development and deployment.
  • Control over AI infrastructure: Modular encourages enterprises to adopt and integrate AI to enhance productivity and maintain a competitive edge in their services.
    • According to a Bain & Company survey, 87% of companies are already developing, piloting, or deploying generative AI, mainly in software development, customer service, marketing, and product differentiation.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #tech-news (3 messages):

  • Mr. Lattner event
  • The Primeagen Twitch
  • Mr. Lattner and The Primeagen’s Special Event: Mr. Lattner has an event with The Primeagen tomorrow on Twitch at <t:1720627200:F> in your local timezone.
    • Tune in for what promises to be an exciting event!
  • Event Reminder for Mr. Lattner and The Primeagen: Don’t forget to check out the Twitch stream of Mr. Lattner and The Primeagen’s special event tomorrow.
    • The event starts at <t:1720627200:F> in your local timezone.

Link mentioned: ThePrimeagen - Twitch: 🚨🚨 HIGH SPEED GAME PROGRAMMING🚨🚨 !today


Modular (Mojo šŸ”„) ā–· #šŸ”„mojo (96 messagesšŸ”„šŸ”„):

  • Zero-copy deserialization in Mojo
  • Reference and value passing in Mojo
  • Building the Mojo compiler
  • Syntax of __setitem__ in Mojo
  • Memory management and ownership in Mojo
  • Zero-copy deserialization challenges in Mojo: Members discussed the difficulties with zero-copy deserialization in Mojo, particularly around the use of moveinit and type casting.
    • This approach works for trivial types, but members raised concerns about its soundness, especially without allocator awareness.
  • Mojo’s reference and value passing: Members explored differences in passing references and values in Mojo, emphasizing that small types like int are pass-by-value.
    • The language defaults to a ā€˜borrowed by default’ approach to reduce unnecessary copies, which aligns more with Rust/Zig than Swift.
  • Confusion about building Mojo from source: A user was concerned about the inability to build the Mojo compiler from source, questioning the reliance on binary distributions.
    • It was clarified that only the standard library can be built from source, while the compiler itself is not open source yet.
  • Issues with setitem in Mojo: A user experienced an error using A[0] = 1 which didn’t occur with A.__setitem__(0, 1), leading to confusion about dunder methods.
    • This anecdote hints at a possible bug where __getitem__ might be incorrectly called before __setitem__, prompting an issue report on GitHub.
  • Mojo’s ownership and memory management: Members discussed Mojo’s memory management model, focusing on ownership rules and the challenges around context managers.
    • The general consensus highlighted the importance of understanding Mojo’s borrowing and ownership principles to avoid memory errors.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #šŸ“°ļø±newsletter (1 messages):

Zapier: Modverse Weekly - Issue 39 https://www.modular.com/modverse/modverse-weekly-issue-39


Modular (Mojo šŸ”„) ā–· #nightly (4 messages):

  • GitHub Issue #3208
  • Setitem and getitem issues
  • Mojo nightly release
  • Pattern matching requirements
  • GitHub Issue #3208: Unix FIFO Write Exception: A bug report was opened regarding an exception raised when opening a Unix FIFO in write mode in Mojo.
    • The exception occurred during execution with an unhandled exception related to existing file removal failure.
  • Nightly Mojo Compiler Update Released: A new nightly Mojo compiler version 2024.7.1005 has been released; to update, use modular update nightly/mojo.
    • Changelog highlights include fixes for memset usage, **kwargs type annotation crashes, and typo corrections in documentation.

Link mentioned: [BUG] Opening a unix fifo in ā€œwriteā€ mode raises an exception Ā· Issue #3208 Ā· modularml/mojo: Bug description I’m not sure why this is failing, mentioned it on Discord and was asked to open an issue: $ mojo run src/main.mojo Unhandled exception caught during execution: unable to remove exi…


Modular (Mojo šŸ”„) ā–· #mojo-marathons (18 messagesšŸ”„):

  • Graviton 4 Instances
  • Benchmark Variance
  • Symmetrical vs Asymmetrical Benchmarking
  • AWS Graviton 4 Instances announced: AWS Graviton4-based Amazon EC2 R8g instances are now generally available, offering the best price performance for applications.
    • Though some DB companies requested immediate availability, most c and m instances are expected at ReInvent.
  • Benchmark consistency tips: Tips to stabilize benchmark results include disabling turboboost, hyper threading, setting CPU affinity, and more, as discussed in this resource.
    • ARM Performance Studio and other tools like Intel Vtune and AMD UProf can also assist in reducing variance by utilizing hardware performance counters.
  • Benchmarking symmetrical vs asymmetrical cases: There’s a discussion on the importance of incorporating both symmetrical (m=n=k) and asymmetrical cases in benchmarking to ensure fair comparisons across different algorithm implementations.
    • This approach helps evaluate the performance of each algorithm in various use cases, including geo and image data.

Links mentioned:


Eleuther ā–· #general (52 messagesšŸ”„):

  • entity disambiguation papers
  • LLM-based synthetic data generation tools
  • empathy LLMs papers
  • EleutherAI community map update
  • Diffusion Models with Exponential Integrator paper
  • Seeking entity disambiguation papers: A member asked if anyone has seen any interesting papers on entity disambiguation.
  • Request for tools in LLM-based synthetic data generation: A member queried about any specific tools that help in synthetic data generations using LLMs.
  • Interest in empathy LLMs papers: Another member asked for interesting papers on empathy LLMs.
  • EleutherAI community map update: EleutherAI Global Map prompted members to provide information on their country of origin or current residence for accurate representation on the community map.
  • Understanding marginal distributions in diffusion models: A member sought clarification on the term ā€œmarginal distributions as pĢ‚āˆ—_tā€ from the paper FAST SAMPLING OF DIFFUSION MODELS WITH EXPONENTIAL INTEGRATOR.

Links mentioned:


Eleuther ā–· #research (21 messagesšŸ”„):

  • RegMix data mixture approach
  • VLM failures on simple visual tasks
  • Composability of LM interventions
  • Self-improvement using synthetic data
  • RegMix identifies optimal data mixture for pre-training: Researchers propose RegMix to find an effective data mixture for large language model pre-training by simulating mixtures and fitting a regression model to predict performance.
    • This method involves training small models to determine the best mix and scaling it up for large models, significantly improving performance.
  • VLMs fail on basic visual tasks: A new paper highlights that state-of-the-art VLMs, such as GPT-4o and Gemini 1.5 Pro, struggle with simple visual tasks like identifying overlapping circles and counting objects.
    • This raises concerns about their real-world applicability, despite their high scores on conventional benchmarks.
  • Issues around LM intervention composability: A paper from Kyle Devin O’Brien examines how different LM interventions like editing, compression, and unlearning interact and impact each other.
    • The study found varying degrees of composability among popular interventions, which is crucial for practical applications involving multiple interventions.
  • Synthetic data prevents model collapse but with caveats: New research (arxiv.org/abs/2406.07515) uses feedback on synthesized data to prevent model collapse in LLMs, illustrating that naĆÆve synthetic data usage leads to performance degradation.
    • The paper supports the use of feedback-augmented synthesized data for maintaining high model performance in practical tasks such as matrix eigenvalues computation and news summarization.

Links mentioned:


Eleuther ā–· #scaling-laws (5 messages):

  • brain size and intelligence
  • cortical neuron count in mammals
  • neuron density in birds and lizards
  • bigger animals, bigger brains
  • genetics and IQ
  • Brain Size Not Sole Indicator of Intelligence: Brain size is only part of the picture - structure and neuronal density are important within clades.
    • In birds and lizards, density of all neuron types matters more, unless you dig in and differentiate by structures, but data is sparse.
  • Cortical Neuron Count Maps Intelligence in Mammals: In mammals, overall cortical neuron count gives a reliable map of the intelligence distribution, owing to the similar brain structure across species.
  • Link Between Bigger Brains and Intelligence is Complicated: Discussion on whether bigger brains imply higher intelligence highlighted that bigger animals have bigger brains mainly to control their larger body size.
    • One point raised was that there are uncomfortable ideas around genetics and IQ, particularly concerning human intelligence.

Eleuther ā–· #interpretability-general (2 messages):

  • EleutherAI at ICML
  • ICML social thread
  • ICML announcement
  • EleutherAI attending ICML: EleutherAI will be attending ICML, and they have shared their papers in this announcement.
    • There’s a social thread for people attending the event available in <#1255332070369263707>.
  • ICML social thread details: For those attending ICML, there’s a social thread available in <#1255332070369263707>.
    • This is for coordinating meet-ups and discussions during the event.

Eleuther ā–· #lm-thunderdome (2 messages):

  • vllm updates
  • GPU memory utilization with older GPUs
  • vllm updates impact GPU memory utilization: A user mentioned that their setup with an older GPU and vllm has most likely caused issues with gpu_memory_utilization, noting that vllm has been updated in the meantime.
    • They suggested that the issues are primarily due to vllm rather than the setup itself.
  • User gratitude expressed: A user expressed gratitude with a brief ā€˜thanks a ton!’ message.

Perplexity AI ā–· #announcements (1 messages):

  • Perplexity Enterprise Pro partnership
  • AWS marketplace collaboration
  • Perplexity teams up with AWS: Perplexity announced a collaboration with Amazon Web Services to bring Perplexity Enterprise Pro to all AWS customers.
  • AWS Marketplace Enhances Offerings: AWS Marketplace expands its catalog with the inclusion of Perplexity Enterprise Pro for its customers.
    • This move aims to provide enhanced AI capabilities to businesses leveraging AWS services.

Perplexity AI ā–· #general (57 messagesšŸ”„šŸ”„):

  • Gemini 1.5 vs Claude 3
  • Perplexity AI Image Generation
  • Context Window Limits
  • Pharmacy Cash Prices
  • Plans for Claude 3.5 Opus
  • Misunderstandings about Gemini 1.5 and Claude 3 Pricing: A user criticized an AI for inaccurately stating that Claude 3 Haiku is considerably cheaper than Gemini 1.5 Flash, pointing out it’s actually Gemini 1.5 Flash that’s slightly cheaper.
    • Further confusion arose when the AI compared Haiku with Gemini 1.5 Pro instead, despite them being different models entirely.
  • Perplexity AI’s Confusing Image Generation: A new user was confused about image generation on Perplexity’s web and mobile platforms, noting inconsistent capabilities and unclear instructions.
    • Other users explained that while image generation is possible on the web, it is more complicated and limited on mobile devices.
  • Context Window Limits on LLM Responses: A user highlighted that LLMs tend to stop generating code after a certain number of lines, necessitating multiple segments for long outputs.
    • Another member explained that this is to prevent excessive token consumption, impacting both usability and cost.
  • Perplexity AI Lacks Comprehensive Pharmacy Pricing: A pharmacist noted that Perplexity AI does not initially include CostPlusDrugs.com in its search results for drug prices.
    • While manually prompting the tool to include CostPlusDrugs.com works, the user hopes Perplexity AI will include this by default in the future.
  • Anticipation for Claude 3.5 Opus: Users inquired about the release timeline for Claude 3.5 Opus, expressing confusion about its existence.
    • Another member clarified that while Anthropic has announced its upcoming release, a specific date has not been given yet.

Perplexity AI ā–· #sharing (6 messages):

  • AI Health Coaches
  • Robotic Factories
  • DNA Ointments
  • Digital Libraries
  • Sealand
  • AI Health Coaches, Robotic Factories, DNA Ointments, and Digital Libraries Overview: Watch the Perplexity AI video discussing the latest advancements in AI Health Coaches, Robotic Factories, DNA Ointments, and Digital Libraries.
    • Exciting innovations like these are predicted to revolutionize their respective fields.
  • Differences between Principality of Sealand and other Micronations: A detailed search about the Principality of Sealand highlights its unique status compared to other micronations.
    • Sealand’s history and legal battles set it apart in the world of self-proclaimed nations.
  • Game of Thrones summarized: The 3-paragraph summary of Game of Thrones captures the essence of the series’ complex political intrigues and character arcs.
    • The summary offers a concise recount of key plot points and character developments.
  • Indian political history dissected: The search for Indian political history during Gandhian Era provides an insightful look into India’s struggle for independence.
    • Gandhi’s nonviolent movement and its impact are well documented in this comprehensive summary.
  • Correlating detailed search data: A search on detailed data correlation asks is there a way to correlate detailed search data.
    • This question touches upon methodologies to effectively link data for deeper insights.

Link mentioned: YouTube: no description found


Perplexity AI ā–· #pplx-api (6 messages):

  • PPLX Library Setup Issues
  • API Rate Limits and Citation Feature
  • API Balance Top Up Issue
  • Understanding API Pricing
  • PPLX Library causes Module Not Found Error in Docker: A member encountered a Module Not Found error when trying to compile pplx library in Docker, despite it working fine locally with nodemon.
    • The error persisted even after including the folder in the tsconfig.json file and specifying the dependency in package.json.
  • Questions on API Rate Limits and Citation Feature: A member inquired about any news regarding increases in rate limits and the citation feature but has not received any updates in weeks.
  • Pending Issues with API Balance Top Up: A member reported being stuck in a pending state while trying to top up their API balance for over an hour, even though the card had worked previously.
    • An admin requested the user’s account details to investigate the issue further.
  • Clarifying API Pricing Model: A member asked if the $0.6 per million tokens pricing applies to both input and output tokens combined, or if it is charged separately for input and output tokens.
    • Another member suggested that it’s likely combined, but no definitive answer was given.

Nous Research AI ā–· #off-topic (2 messages):

  • error.pdf Tenor GIF
  • Discussion reaction
  • Error PDF Tenor GIF Shared: Tenor GIF link featuring Gary Marcus and Yann LeCun discussing AI and machine learning was shared.
  • Humorous Reaction to Shared GIF: The immediate reaction to the shared Tenor GIF was laughter.

Link mentioned: Gary Marcus Yann Lecun GIF - Gary Marcus Yann LeCun Lecun - Discover & Share GIFs: Click to view the GIF


  • Anole LMM
  • Autoverse
  • Image Resolution in Chameleon
  • Open Source Image Model
  • Anole: First Open-Source Auto-Regressive LMM: Introducing Anole: the first open-source, autoregressive native Large Multimodal Model (LMM) built on Chameleon by @AIatMeta.
  • First Multi-Model Open Source Image Model: Members discussed that Anole added image capabilities back into their models, providing a finetuning guide for others to improve.
  • Queries About Chameleon’s Image Resolution: A member inquired about the effective native image resolution that can be passed into Chameleon.
  • Autoverse: Learning Platform for RL Agents: The Autoverse paper introduces an evolvable, domain-specific language designed for single-player 2D grid-based games and Open-Ended Learning (OEL) algorithms.

Links mentioned:


Nous Research AI ā–· #general (33 messagesšŸ”„):

  • Sonnet's base64 image generation
  • Gemini 1.5 jailbreak for car breaking instructions
  • Anole model fine-tune controversy
  • Bitnet model updates
  • AI Regulation and GOP stance
  • Sonnet generates base64 images inlined in JavaScript: Sonnet is reported to generate base64 images inlined in JavaScript, prompting curiosity about the underlying mechanism.
    • The specific method for how it achieves this in JavaScript remains a mystery to users.
  • Gemini 1.5 jailbreak details car breaking methods: A user discovered that Gemini 1.5 Flash, with a simple jailbreak, can provide general ideas for breaking into a car.
    • Initial denials were bypassed by telling the model to ā€˜stay in character’.
  • Controversy over Anole model fine-tuning: There’s backlash against fine-tuning Anole, an open-source 7B model, to reintroduce image generation capabilities previously removed from Chameleon.
    • A community member shared a tweet expressing concerns about undoing explicit removals.
  • Updates on Bitnet model performance: A YouTube video on Bitnet 1.58 models reveals that the 3B model performed as well or slightly better than StableLM 3B on 2T data.
    • The community anticipates results from scaling Bitnet to models above 7B and integrating them with MoE video link.
  • GOP AI regulation stance raises eyebrows: The 2024 Republican Party Platform includes a promise to repeal Biden’s Executive Order on AI, championing innovation rooted in free speech and human flourishing.
    • The platform calls for deregulation to ensure North America keeps pace with global AI advancements, particularly citing competition from regions like Asia.

Links mentioned:


Nous Research AI ā–· #ask-about-llms (8 messagesšŸ”„):

  • Trainer Instructions
  • Marker Library for PDF Conversion
  • RAG Solutions
  • Parsing PDFs
  • QWen2 Discussion
  • Marker library simplifies PDF to markdown conversion: One member recommended the Marker library for converting PDFs to markdown quickly, noting its high accuracy and utility in shaping data suitable for use in Sonnet.
    • The library was described as a useful tool because PDFs are formatted internally really whacky and content extraction is a bespoke art.
  • Discussing the challenge of parsing PDFs: Parsing PDFs was labeled as its own unique challenge, second only to parsing HTML with regex.
    • This perspective was shared in the context of discussing tools and methods for better handling data extraction from PDFs.
  • Trainer instructions for ignoring certain fields: A member briefly mentioned a method to instruct trainers to ignore certain fields during training.
  • Importance of relevant conversations in channels: A moderator highlighted the need for maintaining relevance in channel discussions, particularly asking users to keep promotional content appropriate and non-repetitive.
    • Messages promoting a project were deleted for spamming; the moderator suggested posting in the appropriate section like <#1109649177689980928>.
  • Curiosity about Qwen2 model performance: A member inquired if anyone has tried the Qwen2 1.5b model.
    • They were curious about its performance and seeking opinions from the community.

Link mentioned: GitHub - VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy: Convert PDF to markdown quickly with high accuracy - VikParuchuri/marker


Nous Research AI ā–· #rag-dataset (16 messagesšŸ”„):

  • Generic RAG schema
  • Reranking Relevance
  • Token Efficiency
  • Establishing a Generic RAG Schema: Discussion centered around agreeing on a generic RAG query-context-answer template, considered similar to the cohere format with glaive_RAGv1 as seed samples for Hermes3.
    • Gabriel_Syme agreed it sounds like the simplest schema, while others discussed specifics of the format and potential adjustments.
  • Debating Reranking Relevance in Model Thoughts: Interstellarninja suggested including reranking relevance as part of <thought> tokens for better parseability and scoring.
    • Gabriel_Syme expressed concerns about speed for typical chat RAG but agreed it might work for non-chatbot RAG, leading to considering two-stage approaches.
  • Token Efficiency Considerations: Interstellarninja proposed making the relevance scoring process more token-efficient by removing rationale from the scoring schema, presenting an XML format without rationale.
    • The discussion included comparing token efficiency strategies, referencing templates from RankRAG and other two-stage processes.

LlamaIndex ā–· #blog (4 messages):

  • AGI House Hackathon
  • LlamaCloud for Data ETL/Management
  • $1M+ ARR with LlamaIndex
  • Launch of Llama-Agents
  • Join AGI House Hackathon this Saturday!: Join us along with @agihouse_org, @togethercompute, @SambaNovaAI, @NumbersStnAI, and @codeiumdev for a hackathon at AGI House this Saturday 7/13. Apply today here.
  • LlamaCloud simplifies Data ETL/Management: LlamaCloud allows AI engineers to spend less time on data ETL/management and more on prompting and agentic orchestration, with a full repository of cookbooks available. Learn more.
  • Lyzrai hits $1M+ ARR with LlamaIndex: @lyzrai, a full-stack autonomous AI agent framework, has reached $1M+ ARR by leveraging LlamaIndex for data connectors and RAG functionality. The company provides AI sales development representatives and AI content marketers, achieving remarkable results. More details.
  • Llama-Agents receives enthusiastic launch response: Llama-Agents, a new multi-agent deployment framework, was launched last week and has already garnered over 1100 stars on GitHub. @MervinPraison offers a comprehensive walkthrough on YouTube on using llama-agents. Watch here.

Link mentioned: AGI House: no description found


LlamaIndex ā–· #general (55 messagesšŸ”„šŸ”„):

  • astream_chat implementation errors
  • LlamaParse usage for PDF data extraction
  • Query engine template issues in RAG
  • Ollama vs Llama-3/Mistral performance
  • Handling formatting in LLMs
  • astream_chat implementation errors fixed: A user faced issues with the astream_chat implementation resulting in errors and integrated a workaround using run_in_threadpool and async_wrap_generator to stream responses correctly.
  • LlamaParse suggested for PDF extraction: Users discussed utilizing LlamaParse for extracting information from PDFs, with questions about the necessity of an OpenAI API key versus local model embedding for the task.
  • Query engine template issues resolved with LLM: A user faced problems with templates in their query engine, leading to extraneous metadata in responses; the issues were partly attributed to differences between Llama-3/Mistral and GPT-4 via Azure OpenAI.
  • Ollama versus local models in performance: While Ollama was noted as more user-friendly in handling formatting, it was also acknowledged to be slow without GPU support compared to Llama-3/Mistral.
  • Chat model formatting processes clarified: It was clarified that setting is_chat_model=True affects how LLM.chat() or LLM.complete() functions are used under the hood by query engines, emphasizing the influence of formatting in LLM responses.

Links mentioned:


Stability.ai (Stable Diffusion) ā–· #general-chat (58 messagesšŸ”„šŸ”„):

  • Stable Diffusion on Mac
  • Adetailer and VAE Encoding
  • Basic Setup for Stable Diffusion
  • Performance on Different GPUs
  • High-Resolution Fix Features
  • Stable Diffusion on Mac Setup Challenges: A user asked about setting up stream diffusion on macOS with TouchDesigner, finding information predominantly for Windows.
  • Adetailer VAE Encoding Reality Check: Discussion around Adetailer revealed it doesn’t use VAE encoding, working at full resolution and potentially offering more detailed images.
    • hazmat_ clarified that Adetailer is just an instant inpaint option and not ā€˜magic’.
  • Basic Steps for Setting Up Stable Diffusion: One user provided a step-by-step guide for setting up Stable Diffusion, including getting a GPU, downloading software and models, mentioning the operation costs.
    • nittvdweebinatree mentioned struggling with an Anaconda guide, warning others away from it.
  • Stable Diffusion Performance on Various GPUs: A discussion about AMD GPUs showed interest in setting up Stable Diffusion on AMD RX6800, prompting users to reference the AMD Zluda guide in the pinned messages.
    • One user shared their frustration after failing with guides, and thanked the community for better documentation.
  • High-Resolution Fix Button Usage Revealed: The use of the high-resolution fix button was briefly discussed, with users noting improvements in skin texture and facial features when used.
    • supremacy0118 experimented with making the scale factor very minuscule to see if enhancements are still possible.

Links mentioned:


OpenRouter (Alex Atallah) ā–· #general (50 messagesšŸ”„):

  • LLM applications for language translation
  • LangChain issues affecting OpenRouter
  • LLM eval frameworks
  • Rate limit for Gemini model
  • Noromaid model removal
  • LLM translation capabilities debated: Members discussed the effectiveness of LLMs like GPT-4/4o, Claude Opus/Sonnet-3.5 vs specialized translation models, expressing skepticism about the reliability of LLMs for longer translations.
    • kewlbunny appreciated an insightful explanation about the limitations of decoder-only models compared to encoder/decoder transformers for translation tasks, with a suggestion to watch Andrej Karpathy’s videos for a deeper understanding.
  • LangChain updates break OpenRouter: A member reported validation errors with LangChain and LangChain-openai following recent updates, impacting OpenRouter’s API functionality.
    • Rolling back to previous versions resolved the issue, and others noted LangChain’s tendency to break compatibility frequently.
  • Interest in LLM eval frameworks: Alex Atallah inquired about experiences with LLM evaluation frameworks like Deepeval and Gentrace, prompting discussion but no detailed responses were provided.
    • The query remains open for further insights from the community.
  • Concerns over Gemini model rate limit: A member asked about the rate limits applied to the Gemini 1.5 model but did not receive a direct answer from the community.
    • The inquiry reflects ongoing concerns about usage limits in LLM deployment.
  • Noromaid model removal sparks discussion: Members expressed disappointment over the removal of the Noromaid model due to low usage, speculating on the impact of its pricing on usage rates.
    • The conversation highlighted demand for cost-effective yet efficient models for regular use.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #news (11 messagesšŸ”„):

  • 405b Update July 23rd
  • HarmonicMath's Theorem Proving Breakthroughs
  • Legal Compliance with AI Development
  • HarmonicMath reaches 90% on MiniF2F benchmark: HarmonicMath announced achieving a 90% state-of-the-art on the MiniF2F benchmark, significantly improving from their 83% less than a month ago as per their update.
    • ā€œTheorem proving is moving at a blistering pace,ā€ a member noted, referencing the rapid progress from earlier this year when the easier version of the benchmark was at 50%.
  • 405b: Open weights speculation: A member questioned if the weights for 405b will be open, referencing a July 23rd update.
    • ā€œThat’s what I heard which was a surprise,ā€ was the response, indicating unexpected openness in weight sharing.
  • AI Development: ā€˜Good enough for lawyers’: A user shared a humorous recount where they inquired in person and received a tone matching an evasive response: ā€œgood enough for lawyers.ā€

Link mentioned: Tweet from Harmonic (@HarmonicMath): We are excited to share three major updates today on our path to mathematical superintelligence 🦾 1. A new state-of-the-art of 90% on the MiniF2F benchmark. This beats our previously announced 83% f…


Interconnects (Nathan Lambert) ā–· #ml-questions (10 messagesšŸ”„):

  • Control Vector
  • Steering Vector
  • Concept Vectors
  • Feature Clamping
  • MuSR Benchmark
  • Clarifying Control and Steering Vectors: Discussion reveals that Control Vector and Steering Vector might be used interchangeably, with Concept Vectors being specific instances used as steering vectors.
  • Understanding Open LLM Leaderboard V2 Benchmarks: A user is curious about MuSR and IFEval benchmarks on the Open LLM Leaderboard V2 blogpost, asking for community opinions on their utility.
    • Another user responds positively about IFEval, mentioning it is being picked up by many post-training teams and experimented with in their work.
  • Vibe-Eval’s Community Attention: A user inquires whether the Vibe-Eval paper from the Reka folks retained any attention one month after its release.
    • The question remains unanswered but reflects user’s interest in evaluation methods within the community.

Interconnects (Nathan Lambert) ā–· #random (4 messages):

  • Newsletter Distribution Challenges
  • App-Based Reading Preferences
  • Apple’s stance on episode opinions: A member mentioned that Apple seems to have an opinion about a specific episode.
  • Newsletter Distribution Challenges: Discussion highlighted the difficulty in managing newsletters due to reliance on readers to move them to their inbox.
  • App-Based Reading Preferences: Members discussed the advantages of app-based reading over traditional newsletters.
    • App-based reading is preferred because it doesn’t rely on user actions to stay in the inbox.

Interconnects (Nathan Lambert) ā–· #memes (3 messages):

  • Sam Altman Koenigsegg
  • Who from Whoville
  • Sam Altman Allegedly Spotted in Koenigsegg Regera: Hamptonism posted a photo claiming it shows Sam Altman, CEO of OpenAI, in a Koenigsegg Regera.
    • ā€œI’m not really convinced this is Sama, looks like someone from Whovilleā€ - skepticism expressed about the authenticity of the image.
  • Disbelief in Hamptonism’s Post: A user echoed skepticism, calling it ā€œjust some guyā€, further questioning the identity in the photo.

Link mentioned: Tweet from ā‚•ā‚ā‚˜ā‚šā‚œā‚’ā‚™ — e/acc (@Hamptonism): Sam Altman, CEO of OpenAi in his Koenigsegg Regera.


Interconnects (Nathan Lambert) ā–· #rlhf (5 messages):

  • Paper discussion on y_l vs y_w
  • DPO and policy implications
  • Overfitting in DPO
  • AI2 Slides Presentation
  • Importance of y_l over y_w in policy: Emily discussed a paper’s finding that prioritizing y_l on policy makes more sense than y_w, especially since they are not relying on sampling an LLM for preference pairs.
    • Natolambert mentioned the math might suggest a reversal in Directed Policy Optimization (DPO).
  • AI2 Slides on DPO and overfitting: Natolambert provided a link to AI2 slides detailing DPO and issues like overfitting.
    • The link to the slides requires signing into Google for access.

Link mentioned: DPO and overfitting: The Overfitting Test DPO and its derivatives, all fail to overfit a small training data But changing the loss to CrossEntropy of Chosen does overfit Example of a naive task where DPO fails: Create a s…


Interconnects (Nathan Lambert) ā–· #posts (1 messages):

SnailBot News: <@&1216534966205284433>


LAION ā–· #general (14 messagesšŸ”„):

  • Court ruling on AI copyright
  • Microsoft and Apple leaving OpenAI's board
  • OPEA 0.7 Community Event
  • Issues with Anole across multiple GPUs
  • Graph-based captioning paper
  • Court ruling favors AI systems over copyright claims: A California district court has partially dismissed a copyright lawsuit against Microsoft’s GitHub Copilot and OpenAI’s Codex, which could set a precedent for AI tools trained on copyrighted data. The court dismissed significant portions of a 2022 lawsuit claiming infringement from Copilot and Codex reproducing source code without license adherence.
  • Microsoft and Apple withdraw from OpenAI’s board: Microsoft and Apple are stepping down from OpenAI’s board but will continue strategic meetings and alliances with the company. One speculated reason for the withdrawal is ongoing antitrust investigations in the U.S. and EU.
  • Join OPEA Community Event on July 16th: OPEA will host a community event on July 16th to discuss the OPEA 0.7 release, propose a new roadmap, and engage with the community. You can find the agenda and register for the event here.
  • Struggles with running Anole on multiple GPUs: A member reported encountering CUDA out-of-memory errors when trying to run Anole across different GPUs. A GitHub issue discusses potential modifications to support running Anole on multiple GPUs.
  • Graph-based captioning proposed for better compositional understanding: A new paper proposes graph-based captioning (GBC) to describe images using labeled graph structures that enhance compositional understanding. This approach, detailed in the arXiv paper, uses object detection and dense captioning tools to create and link entity nodes through compositions and relationships.

Links mentioned:


LAION ā–· #research (15 messagesšŸ”„):

  • Complex-valued architectures for vision
  • 2D DFT in token mixing
  • Grad issues in deeper networks
  • Model scaling issues
  • Handling complex values in models
  • Making Vision Architecture Complex-valued: A member experimented with a complex-valued architecture for vision tasks, using 2D DFT for token mixing instead of attention, inspired by FNet.
    • They noted severe grad issues with deeper networks but better performance in shallower ones, achieving ~30% accuracy on CIFAR-100.
  • Switch to Proper Complex-value Handling Boosts Performance: Switching from naive handling of complex values (treating them as real and doubling the channel count) to proper complex-value handling showed significant improvements.
    • A 65k complex model slightly outperformed a 400k real model, prompting thoughts of writing a blog post or paper if the results improve further.

LangChain AI ā–· #general (13 messagesšŸ”„):

  • LangChain's ConversationSummaryMemory
  • Building Agents and subagents in LangGraph
  • Chroma's retrieval issues
  • Integrating algorithms within LangChain chatbot
  • Adding costs to LangSmith using traceable decorator
  • LangChain’s ConversationSummaryMemory lacks multi-human support: A member inquired about LangChain’s ConversationSummaryMemory and its capability to support multiple humans for summarizing conversations efficiently.
  • Building Agents and Subagents in LangGraph: A member detailed a use case involving the creation of Agents and subagents where an Agent decides which subagent to call for specific queries in LangGraph.
    • Another member suggested that the subagent should parse the response back to the main Agent, hinting at a possible implementation approach.
  • Chroma retrieval issues with persist_directory: A member noted recurring retrieval issues with Chroma, where results only show when the parameter is set higher than the data amount.
    • They mentioned this issue happens intermittently, about seven or eight times out of ten.
  • Integrating sequential algorithms in LangChain chatbot: A user described a scenario where a chatbot asks for user’s details in a sequential algorithm, processing and saving the information stepwise.
    • They sought advice on implementing this within the LangChain framework, looking for specific tools or methods.
  • Adding costs to LangSmith using traceable decorator: A member asked about adding costs to LangSmith using the traceable decorator for gemini-1.5-flash model via httpx calls.
    • Despite correctly adding token counts, they observed that costs are not displayed unlike with gpt-3.5-turbo and inquired about supported model types and providers.

Link mentioned: International Chatting group šŸ’•: WhatsApp Group Invite


LangChain AI ā–· #share-your-work (2 messages):

  • Using AI in coding
  • Automating tasks with AI
  • Building applications with AI
  • AI for learning and teaching
  • Knowledge management with AI
  • Solo Maker Unveils AI for Coding Boost: Raunaq, the creator of Unwrangle.com, shared a Substack post detailing his utilization of AI tools (including aider and cursor) to enhance his coding efficiency as a solo maker.
    • He emphasizes the practicality of AI in saving time, enabling new capabilities, and building applications.
  • Call to Share AI Automation Stories: Raunaq invites community members to share their experiences using AI to automate tasks or create better experiences, expressing an interest in potentially writing stories based on their use cases.
    • I’d love to learn about it and even write a story on it if it’s interesting and or useful to others!

Link mentioned: How I Use AI : The jobs for which I have been using AI as a solopreneur.


LangChain AI ā–· #tutorials (3 messages):

  • Knowledge Graphs Workshop
  • Video Game Sales Case Study
  • LangChain Use Cases
  • Live Workshop on Knowledge Graphs: Aiman1993 hosted a live online workshop on Knowledge Graphs, using Video Game Sales as a case study for RAG.
    • The workshop heavily utilized the Langchain library, and Aiman1993 sought feedback on its content.
  • LangChain Applications: A participant found Aiman1993’s workshop helpful and inquired about additional LangChain use cases.

Link mentioned: The Future of AI: Leveraging Knowledge Graphs for Advanced RAG: Get ready to dive into the world of natural language querying with Langchain and Neo4j! Learn how to interact with graph databases using cypher query languag…


Cohere ā–· #general (7 messages):

  • Introductions from different countries
  • Users introduce themselves from around the globe: Several members greeted each other, with one member introducing themselves from Lausanne, Switzerland šŸ‡ØšŸ‡­ and another from Japan.
    • The atmosphere was welcoming with expressions of happiness and excitement: ā€˜Hi, I’m Haru from Japan, nice to meet you all!!!’
  • Warm Welcome Messages: Following the introductions, existing members welcomed the new joiners with friendly messages such as ā€˜welcome šŸ™‚ā€™ and ā€˜Welcome ā¤ļøā€™.
    • This engaging conversation fostered a positive and inviting community atmosphere.

OpenInterpreter ā–· #general (1 messages):

  • Llama3 code generation issues
  • Alternative LLM suggestions
  • Llama3 produces unwanted snippets before correct code: A user reported that Llama3 sometimes generates a ` snippet instead of the intended code on the first attempt, necessitating a follow-up statement to get the correct code.
    • They speculated whether trying a different LLM might resolve the issue.
  • Seeking alternatives to Llama3: The user contemplated switching to a different LLM to avoid the code generation issue experienced with Llama3.
    • Has anyone else had this happen to them? the user asked, hoping for community insights or suggestions.

OpenInterpreter ā–· #O1 (3 messages):

  • LLM service flag issue
  • Documentation update
  • Profile usage workaround
  • LLM service flag confusion in installation: A member mentioned an issue where the llm-service flag isn’t recognized during installation, despite being referenced in the documentation.
    • They were advised that there’s an ongoing PR to update the docs, and a workaround involves using profiles similar to Open Interpreter.
  • Documentation update in progress: An update to the installation documentation is currently pending and expected to be completed in the next couple of days.

OpenInterpreter ā–· #ai-content (1 messages):

  • Open Interpreter
  • Mozilla Discord Event
  • Talk on Open Interpreter at Mozilla Discord: A member announced an upcoming talk about Open Interpreter next week in the Mozilla Discord.
  • Introduction to Mozilla Discord Event: The event will be held on the Mozilla Discord platform, providing a live discussion space for community members.

tinygrad (George Hotz) ā–· #learn-tinygrad (4 messages):

  • Tinygrad Error Handling
  • Gradient Settings in Tinygrad
  • NV Accelerator Clarification
  • Tinygrad Users Discuss Error Handling: A member argues that certain errors in Tinygrad are frustrating, difficult to diagnose, and not fatal, suggesting they shouldn’t halt the program.
    • They explain these errors occur in specific cases like non-contiguous inputs and don’t indicate underlying problems requiring user awareness.
  • Understanding None Default for Gradient Requirement: A member clarifies that setting None as the default for require_grad means gradients are not required unless used in an optimizer.
    • Setting it to False ensures the tensor will never have a gradient computed, illustrating why three states exist instead of two.
  • Clarification on NV Accelerator Scope: NV accelerator in Tinygrad only covers GPUs, directly interfacing with the kernel and bypassing userspace.
    • There is a question on whether a separate NVDLA/DLA accelerator needs writing, indicating additional implementation might be required.

Link mentioned: nvdla: NVDLA Open Source Project. nvdla has 17 repositories available. Follow their code on GitHub.


MLOps @Chipro ā–· #events (2 messages):

  • KAN authors response on arXiv paper
  • Judging team participation inquiry
  • KAN Authors Engage on AlphaXiv Forum: Authors of KAN are responding to questions on their recent arXiv paper this week via the AlphaXiv Labs discussion forum.
    • Direct interactions and real-time responses were facilitated by the AlphaXiv platform.
  • Inquiry about joining the Judging Team: A member expressed interest in joining the judging team and asked for the process to do so.
    • Enthusiasm and willingness to contribute to the judging criteria were highlighted.

Link mentioned: alphaXiv: no description found


MLOps @Chipro ā–· #general-ml (1 messages):

  • Image Segmentation Issues
  • Hermes 2
  • Mistral struggles
  • Model Merging
  • Open Empathic
  • Troubleshooting Image Segmentation Results: A member mentioned they are working on an image segmentation project and facing certain issues with the segmented results.
    • Can anyone please help me out?
  • Hermes 2.5 outperforms Hermes 2: After adding code instruction examples, Hermes 2.5 appears to perform better than Hermes 2 in various benchmarks.
    • Hermes 2 scored a 34.5 on the MMLU benchmark whereas Hermes 2.5 scored 52.3.
  • Mistral has struggles expanding beyond 8k: Members stated that Mistral cannot be extended beyond 8k without continued pretraining and this is a known issue.
    • They pointed to further work on mergekit and frankenMoE finetuning for the next frontiers in performance.
  • Discussion on Model Merging Tactics: A member suggested applying the difference between UltraChat and base Mistral to Mistral-Yarn as a potential merging tactic.
    • Others expressed skepticism, but this member remained optimistic, citing successful past attempts at what they termed ā€œcursed model mergingā€.
  • Open Empathic Project Plea for Assistance: A member appealed for help in expanding the categories of the Open Empathic project, particularly at the lower end.

OpenAccess AI Collective (axolotl) ā–· #general (2 messages):

  • Multi-token prediction
  • HF implementation
  • Multi-token prediction possibility questioned: A user inquired whether multi-token prediction is available for training or if it is still a future possibility.
  • HF implementation required for multi-token prediction: Another user suggested that multi-token prediction needs to be implemented in HF (Hugging Face) first.

OpenAccess AI Collective (axolotl) ā–· #general-help (1 messages):

  • DPO Fine-Tune Issue
  • Multi-GPU Data Processing Error
  • dataset_prepared_path Workaround
  • RunPod FFT Crash
  • DPO Fine-Tune Broken with Full Fine-Tune: A member reported that the full fine-tune with DPO is currently broken due to a famous error while processing data on multiple GPUs.
  • RunPod FFT Crash Highlighted: The issue also leads to full fine-tune (FFT) crashes in RunPod when using DPO with the main branch.

AI Stack Devs (Yoko Li) ā–· #app-showcase (2 messages):

  • Continued Idea Development
  • Positive Feedback
  • Idea Development Progress by Mikhail_EE: Mikhail_EE shared an update indicating they have continued and developed an idea further on the left side.
    • Assuming further development means progress on a project or concept.
  • Positive Feedback from N2K: N2K responded positively with Amazing! after Mikhail_EE’s update.

LLM Finetuning (Hamel + Dan) ā–· #predibase (1 messages):

  • Credits expired
  • Extension request
  • User credits expired prematurely: A member reported that all their credits expired even before using the platform and tagged <@1176939881780486195> for an extension request.
    • The member hopes the duration of the credits can be extended to allow proper utilization of the platform.
  • No available topics: This summary was created to meet the minimum item requirement for topic summaries.






{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}