**Deep IRL networks are all you need! Jun 25-27 in SF.**

AI News for 5/21/2024-5/22/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (380 channels, and 7699 messages) for you. Estimated reading time saved (at 200wpm): 805 minutes.

Lots of nontechnical news - the California Senate passed SB 1047, more explosive news on OpenAI employee contracts from Vox and safetyist resignations, and though Mistral v0.3 was released there’s no evals or blogpost to discuss yet.

Given its a technically quiet day, we take the opportunity to share our announcements of the initial wave of AI Engineer World’s Fair speakers!

TLDR we’re giving a onetime discount to AI News readers: CLICK HERE and enter AINEWS before EOD Friday :)

image.png

The AI Engineer World’s Fair (Jun 25-27 in SF)

The first Summit was well reviewed and now the new format is 4x bigger, with booths and talks and workshops from:

  • Top model labs (OpenAI, DeepMind, Anthropic, Mistral, Cohere, HuggingFace, Adept, Midjourney, Character.ai etc)
  • All 3 Big Clouds (Microsoft Azure, Amazon AWS, Google Vertex)
  • BigCos putting AI in production (Nvidia, Salesforce, Mastercard, Palo Alto Networks, AXA, Novartis, Discord, Twilio, Tinder, Khan Academy, Sourcegraph, MongoDB, Neo4j, Hasura etc)
  • Disruptive startups setting the agenda (Modular aka Chris Lattner, Cognition aka Devin, Anysphere aka Cursor, Perplexity, Groq, Mozilla, Nous Research, Galileo, Unsloth etc)
  • The top tools in the AI Engineer landscape (LangChain, LlamaIndex, Instructor, Weights & Biases, Lambda Labs, Neptune, Datastax, Crusoe, Covalent, Qdrant, Baseten, E2B, Octo AI, Gradient AI, LanceDB, Log10, Deepgram, Outlines, Unsloth, Crew AI, Factory AI and many many more)

across 9 tracks of talks: RAG, Multimodality, Evals/Ops (new!), Open Models (new!), CodeGen, GPUs (new!), Agents, AI in the Fortune 500 (new!), and for the first time a dedicated AI Leadership track for VPs of AI, and 50+ workshops and expo sessions covering every AI engineering topic under the sun. Of course, the most important track is the unlisted one: the hallway track, which we are giving lots of love to but can’t describe before it happens.

To celebrate the launch of the World’s Fair, we’re giving a onetime discount to AI News readers: CLICK HERE and enter AINEWS before EOD Friday :)

If the curation here/on Latent Space has the most cosine similarity with your interests, this conference was made for you. See you in SF June 25-27!


{% if medium == ‘web’ %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Discord Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Anthropic’s Interpretability Research on Claude 3 Sonnet

  • Extracting Interpretable Features: @AnthropicAI used dictionary learning to extract millions of interpretable “features” from Claude 3 Sonnet’s activations, corresponding to abstract concepts the model has learned. Many features are multilingual and multimodal.
  • Feature Steering to Modify Behavior: @AnthropicAI found that intervening on these features during a forward pass (“feature steering”) could reliably modify the model’s behavior and outputs in interpretable ways related to the meaning of the feature.
  • Safety-Relevant Features: @AnthropicAI identified many “safety-relevant” features corresponding to concerning capabilities or behaviors, like unsafe code, bias, dishonesty, power-seeking, and dangerous/criminal content. Activating these features could induce the model to exhibit those behaviors.
  • Preliminary Work, More Research Needed: @AnthropicAI notes this work is preliminary, and while the features seem plausibly relevant to safety applications, much more work is needed to establish practical utility.
  • Hiring for Interpretability Team: @AnthropicAI is hiring managers, research scientists, and research engineers for their interpretability team to further this work.

Microsoft’s Phi-3 Models

  • Phi-3 Small and Medium Released: @_philschmid announced Microsoft has released Phi-3 small (7B) and medium (14B) models under the MIT license, with instruct versions up to 128k context.
  • Outperforming Mistral, Llama, GPT-3.5: @_philschmid claims Phi-3 small outperforms Mistral 7B and Llama 3 8B on benchmarks, while Phi-3 medium outperforms GPT-3.5 and Cohere Command R+.
  • Training Details: @_philschmid notes the models were trained on 4.8 trillion tokens including synthetic and filtered public datasets with multilingual support, fine-tuned with SFT and DPO. No base models were released.
  • Phi-3-Vision Model: Microsoft also released Phi-3-vision with 4.2B parameters, which @rohanpaul_ai notes outperforms larger models like Claude-3 Haiku and Gemini 1.0 Pro V on visual reasoning tasks.
  • Benchmarks and Fine-Tuning: Many are eager to benchmark the Phi-3 models and potentially fine-tune them for applications, though @abacaj notes fine-tuning over a chat model can sometimes result in worse performance than the base model.

Perplexity AI Partners with TakoViz for Knowledge Search

  • Advanced Knowledge Search with TakoViz: @perplexity_ai announced a partnership with TakoViz to bring advanced knowledge search and visualization to Perplexity users, allowing them to search, juxtapose and share authoritative knowledge cards.
  • Authoritative Data Providers: @perplexity_ai notes TakoViz sources knowledge from authoritative data providers with a growing index spanning financial, economic and geopolitical data.
  • Interactive Knowledge Cards: @AravSrinivas explains users can now prompt Perplexity to compare data like stock prices or lending over specific time periods, returning interactive knowledge cards.
  • Expanding Beyond Summaries: @AravSrinivas says this allows Perplexity to go beyond just summaries and enable granular data queries across timelines, which is now possible from a single search bar.
  • Passion for the Partnership: @AravSrinivas expresses his love for working with the TakoViz team and participating in their pre-seed round, noting their customer obsession and the value this integration will bring to Perplexity users.

Miscellaneous

  • Karina Nguyen Joins OpenAI: @karinanguyen_ announced she has left Anthropic after 2 years to join OpenAI as a researcher, sharing lessons learned about AI progress, culture, and personal growth.
  • Suno Raises $125M for AI Music: @suno_ai_ announced raising $125M to build AI that amplifies human creativity in music production, and is hiring music makers, music lovers and technologists.
  • Yann LeCun on LLMs vs Next-Gen AI: @ylecun advises students interested in building next-gen AI systems to not work on LLMs, implying he is working on alternative approaches himself.
  • Mistral AI Releases New Base and Instruct Models: @_philschmid shared that Mistral AI released new 7B base and instruct models with extended 32k vocab, function calling support, and Apache 2.0 license.
  • Cerebras and Neural Magic Enable Sparse LLMs: @slashML shared a paper from Cerebras and Neural Magic on enabling sparse, foundational LLMs for faster and more efficient pretraining and inference.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Model Releases and Benchmarks

  • Microsoft releases Phi-3 models under MIT license: In /r/LocalLLaMA, Microsoft has released their Phi-3 small (7B) and medium (14B) models under the MIT license on Huggingface, including 128k and 4-8k context versions along with a vision model.
  • Phi-3 models integrated into llama.cpp and Ollama: The Phi-3 models have been added to the llama.cpp and Ollama frameworks, with benchmarks showing they outperform other models in the 7-14B parameter range.
  • Meta may not open source 400B model: According to a leaker on /r/LocalLLaMA, Meta may go back on previous indications and not open source their 400B model, which would disappoint many.
  • Benchmark compares 17 LLMs on NL to SQL: A comprehensive benchmark posted on /r/LocalLLaMA compared 17 LLMs including GPT-4 on natural language to SQL tasks, with GPT-4 leading in accuracy and cost but significant performance variation by hosting platform.

AI Hardware and Compute

AI Concerns and Regulation

AI Assistants and Agents

  • Microsoft introduces Copilot AI agent capabilities: Microsoft announced new agent capabilities for Copilot that can act as virtual employees, with early previews showing ability to automate complex workflows.
  • Demo showcases real-time multimodal AI game agents: A demo posted on /r/singularity showcased real-time multimodal AI agents assisting in video games by perceiving game state visually and providing strategic guidance.
  • Questions raised about Amazon’s lack of AI assistant progress: /r/singularity discussed Amazon’s apparent lack of progress in AI assistants compared to other tech giants, given their broad consumer reach with Alexa.

Memes and Humor

  • Memes highlight rapid AI progress: Memes and jokes circulated about the rapid pace of AI progress, companies making dramatic claims, and concerns about advanced AI systems.

AI Discord Recap

A summary of Summaries of Summaries

  1. LLM Benchmarking and Performance Optimization:

    • Microsoft’s Phi-3 Models offer high context lengths and robust performance, stirring discussions on benchmarks and memory usage but uncovering compatibility issues in tools like llama.cpp.
    • Various techniques like torch.compile and specific GPU setups were debated for optimizing computation efficiency, shared via insights like those in tensor reshaping examples.
  2. Open-Source AI Tools and Frameworks:

    • The Axolotl framework emerged as a go-to for fine-tuning models like Llama and Mistral, with Docker setups facilitating ease of use (quickstart guide).
    • LlamaIndex introduced techniques for document parsing and batch inference, integrating GPT-4o’s capabilities to enhance complex document manipulation and query accuracy.
  3. AI Legislation and Community Responses:

    • California’s SB 1047 bill prompted heated debates on the impact of new regulations on open-source models, with concerns over stifling innovation and favoritism towards major incumbents.
    • Discussions on ethical and legal questions arose around AI voice replication, highlighted by OpenAI’s controversial mimicking of Scarlett Johansson’s voice, leading to its subsequent removal after public backlash.
  4. Novel AI Model Releases and Analysis:

    • Community excitement surrounded new releases such as Mistral-7B v0.3 with extended vocabularies and function calling (details), while Moondream2 updates improved resolution and accuracy in visual question-answering.
    • Anthropic’s work on interpretable machine learning and the release of Phi-3 Vision spurred deep dives into scaling monosemanticity (research) and practical AI applications.
  5. Practical AI Implementations and Challenges:

    • Members shared practical AI implementations, from PDF extraction with Surya OCR transforming documents into markdown (GitHub repo), to building secure code execution environments on Azure (dynamic sessions).
    • The LangChain community highlighted issues with deployment and endpoint consistency, with detailed troubleshooting on the GitHub repo helping streamline deployment processes and enhance chatbot functionalities.

{% if medium == ‘web’ %}

PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

Phi-3 Comes into Play, Skepticism and Excitement Ensue: The introduction of Phi-3 models by Microsoft, such as Phi-3-Medium-128K-Instruct, sparked discussions, with excitement tinged by skepticism due to potential benchmarking issues, highlighted by a user’s single-word remark: “literally.”

New Legal Frontiers in AI: California’s SB 1047 sparked discussions concerning AI laws and open-source model implications, accentuated by Meta’s decision to not open the weights for its 400B model, provoking a community debate on the wide-reaching effects of such restricted access.

Unsloth Woes with Model Saving and Flash Attention: Trouble reported with Unsloth’s model.save_pretrained_gguf() function and Flash Attention compatibility, with suggestions from the community advising an Unsloth reinstall or removing Flash Attention and specific workarounds for T4 GPU issues on PyTorch version 2.3.

Guided Decoding and YAML Finesse: A spirited discussion on using guided decoding for generating structured YAML outputs revealed potential vLLM support with advanced syntaxes, emphasizing the integration of grammars into the prompting process.

Cutting-Edge Model Discussions Mix with Sci-Fi: Users shared advancements and tested methods like MoRA, alongside spirited talks about the Dune series’ philosophical undertones and defenses of novel reading over movie watching, underscoring a preference for depth in sci-fi storytelling.


LLM Finetuning (Hamel + Dan) Discord

  • PDF Extraction Wins with Surya OCR: Marker PDF effectively converts PDFs into markdown, surpassing other models with Surya OCR, and the solution has been open-sourced on GitHub.

  • Self-Translation Outshines Native Prompts: Native language prompts were compared to translated English instructions; a member shared research on self-translation, recommending task-specific prompt strategies, and provided a relevant paper.

  • Singapore Member Shares LLM Workshop Notes: Cedric from Singapore summarized key points on LLM mastery in his workshop notes, which were well-received by the community.

  • Convergence on axolotl for Model Training and Tuning: Multiple channels discussed using Axolotl for fine-tuning models with a reference to Axolotl’s main branch. Users are directed to the Axolotl Docker image and shared a setup guide.

  • Gradio Maintainer Jumps In: Freddy, a Gradio maintainer, supports the community with Gradio resources for quickstarts and developing chatbots, while another member indicates they’ll have questions about a Gradio extension they’ve written.


Perplexity AI Discord

  • Microsoft’s Bold Move with Copilot+: Discussions erupted over Microsoft’s announcement of their “Copilot+ PCs” which have been observed to incorporate features remarkably similar to those of OpenAI. These PCs boast of 40+ TOPS performance, all-day battery life, AI image generation capabilities, and live captions in over 40 languages.

  • Dissecting the GPT-4o Context Window: Amid debates regarding the context window of GPT-4o, the guild anchored on the understanding that a 32k default size is the status quo, though the boundary of its capabilities remained a subject of intrigue.

  • Perplexity’s Haiku Hurdles (Avoiding “Unleash”): The guild uncovered a significant shift in Perplexity’s default model usage, pivoting from GPT-3.5 to Haiku for regular users while Sonar remains exclusive for pro users, sparking discussions on model availability and strategy.

  • API Anomalies Afoot: Concerns surfaced as Perplexity’s API was found to lag behind its web counterpart, generating outdated headlines and unsatisfactory search outputs; further compounded by its beta status and limited endpoint support.

  • Community Collaboration Callout: Members of the guild were nudging each other to make shared threads properly shareable and provided visual aids to help understand the process, while also sharing specific Perplexity AI search links for topics of collective interest.


Stability.ai (Stable Diffusion) Discord

  • Mixing Models - Not a Perfect Blend: Discussions about integrating Lightning and Hyper models with base stable models revealed that while this approach could reduce image generation steps, incompatible architectures often lead to low-quality results.

  • EU AI Act Concerns Rise: Users criticized the newly approved EU AI Act, particularly the watermarking requirements, which could pose difficulties for AI-generated content creators.

  • Local AI Setup Woes: The community shared struggles with implementing Stable Diffusion locally, especially on AMD GPUs. The consensus hinted at a preference for Nvidia GPUs due to setup simplicity and performance advantages.

  • Quality Control for AI-Generated Content: There was palpable discontent with the flood of low-effort, generic, and often sexualized AI-generated images in various online spaces, suggesting a need for better content curation and value assessment in the AI art space.

  • GPUs Debate - Nvidia Wins Favor: A lively debate confirmed Nvidia GPUs as the preferred choice for running Stable Diffusion, with recommendations favoring versions with at least 12GB of VRAM for optimal AI performance.


Eleuther Discord

  • JAX Implementations Face TFLOPS Discrepancies: Engineers shared difficulties in benchmarking JAX implementations of pallas, naive, and flash v2. Shared memory errors and TFLOPS discrepancies on GPUs were reported, highlighting the necessity for precise performance measurements.

  • Mixed Opinions on Preprints and Academic Publishing: The guild debated the usage of preprints on platforms like ArXiv. The consensus appears to be shifting, with major journals increasingly accepting preprints, signaling a change in how academic dissemination is being approached.

  • GPT-3’s Randomness at Zero Temperature: Conversations revolved around GPT-3’s non-deterministic outputs at temperature 0, with insights into potential hardware level factors such as CUDA kernel non-determinism. Mentioned resources include an arxiv paper and an OpenAI forum discussion.

  • Small Data Set Dilemmas: In a brief interjection, members talked about the challenges of training AI on small datasets, pointing out that the performance often lags behind models trained on much larger corpuses, like the entirety of the internet.

  • Interpretable Machine Learning Gains Traction: Excitement brewed over Anthropic’s work on interpretable machine learning features, which can be further explored here.

  • MCQ Randomization Query in lm-evaluation Harness: Guild members raised concerns regarding the lack of answer choice randomization for MCQs within lm-eval-harness, especially for datasets like SciQ and MMLU, suggesting the potential for benchmark biases.


HuggingFace Discord

  • Microsoft’s Phi-3 Integration into Transformers: Microsoft announced the release of Phi-3 models with up to 128k context and a vision-language (VLM) version, accessible on HuggingFace. These releases offer new possibilities for instruction-based and vision-language AI tasks.

  • ZeroGPU Bolstering Open-Source AI: With a $10M ZeroGPU initiative, Hugging Face is supporting independent and academic creators by providing free GPU resources, reaching over 1,300 spaces since May 1, 2024. See the official tweet for more details.

  • Struggles in Fine-Tuning Large Models: The community engaged in discussions concerning the challenges of fine-tuning models like Falcon-180B, noting the need for hardware beyond an 8xH100 configuration. There are ongoing efforts to adapt embedding quantization in models like Llama-8B for more efficient memory usage.

  • Legislative Watch on AI: Conversations indicate apprehension towards California’s AI regulatory law and its implications for startups versus large companies. NousResearch’s Discord server was suggested for deeper discourse on the topic.

  • Tooling and Contributions in AI: Developers contributed several tools and datasets, such as a markdown note-taking app named Notie, a Docker-friendly static wiki with Hexo.js, and various new models like the multilingual NorskGPT-Llama3-70b. There’s also mention of a tool called SDXL Flash purportedly generating high-quality images in seconds, showcasing the dynamism in AI tool development.


LM Studio Discord

Dual GPU Dynamics in LM Studio: LM Studio can handle dual GPUs, but they must be of the same type, and users should align VRAM capacities for optimal performance. Configuration for multiple GPUs involves creating and modifying a preset file in the system.

Prompt Precision and Levity: Users suggest quoting text directly in prompts for clarity, while the light-hearted term “prompt engineering” was used to describe meticulous prompt crafting strategies.

Phi-3 Models in the Spotlight: Integrating the Phi-3 models into llama.cpp is a work in progress, with users eagerly waiting for a beta release and an LM Studio update to support the new models. Meanwhile, quantization advice for running Phi-3 Medium suggests staying at Q4 or below.

ROCM Realm for Linux: Linux users expressed their interest in ROCm test builds, with the acknowledgment of challenges running Phi-3-medium-128k models due to tensor mismatch errors on ROCm platforms.

Intriguing New Model Releases: Mistral v0.3 Instruct, featuring an improved tokenizer and function calling support, is now available for use, offering advancements in language model functionality. Access it on the lmstudio community Huggingface page.


Nous Research AI Discord

  • Apple ID Unlock with a Twist: Engineers revealed a new website to bypass Vision Pro app restrictions for non-US Apple IDs, possibly interesting for those looking to access geo-restricted AI tools.

  • Enhanced Moondream Release Pushes Limits: The latest update to Moondream has increased image resolution up to 756x756, and improved TextVQA scores from 53.1 to 57.2, marking a ~0.5% improvement on various benchmarks, as detailed in this tweet.

  • Phi-3 Small on the Horizon?: Speculation is rife on Microsoft’s release strategy for Phi models as engineers shared insights into the availability of Phi 3, 7, and 14. Yann LeCun debunked rumors about the upcoming LLaMa 3 400B+ model being closed-weight, pointing to its continued open status on Twitter.

  • SB 1047 Stirs the Pot: California’s SB 1047 has engineers worried over its implications for open-source software (OSS), highlighted by shared bill text and Meta being criticized for alleged regulatory manipulation.

  • Anthropic’s Cognitive Cartography: Anthropic’s efforts to trace the cognitive map of language models captured engineers’ attention, providing a potentially valuable resource to those focused on AI interpretation. Conversations around home setups for LLM inference, with personal infrastructure using 2x 4090s, and platforms like Runpod and Replicate were up for discussion due to convenience, despite some platforms being harder to navigate.

  • Phi-3 Vision Drops with Depth and Access: Launched with a comprehensive educational package, engineers discussed the 128K context multi-modal model, Phi-3 Vision, providing links to Microsoft resources like the Tech Report and the model on Hugging Face.

  • Grand Designs of Digital Knowledge: A conversation emerged around Obsidian’s knowledge graph visualization, likened to “synthetic brains”, and expanded to cover its plugin integrations and data philosophy, complemented with a knowledge graph time-lapse video and explanatory videos for users new to Obsidian.


CUDA MODE Discord

  • SASS Crash Course Wanted: Engineering guild members are seeking guidance on how to learn Syntactically Awesome Style Sheets (SASS), an extension of CSS with a focus on maintaining style sheets efficiently.

  • CUDA Curiosity on Function Modifiers: There’s an ongoing discussion about function qualifiers in CUDA, including why a function can be both __device__ and __host__ but not __global__ and __host__.

  • Optimizations and Pitfalls in Torch & Numpy: Members are comparing the performance of torch.empty_like with torch.empty and discussing memory leaks caused by numpy's np.zeros_like. There are also shared insights on compiling_issues with ResNet blocks, leveraging user-defined Triton kernels for optimization, and an informative PyTorch tutorial.

  • Legislative Buzz for AI Safety: There’s a vibrant conversation about the passing of SB 1047, a safety and innovation bill that sets the stage for more regulated AI development, alongside the mention of an ultra-compact ray-casting engine described here.

  • Technical Dive into GitHub Pull Requests: There are deep dives into GitHub pull requests focusing on determinism in encoder backward passes, DataLoader refactoring for large datasets, HellaSwag evaluation in C, and determinism in kernel operations, reflecting the community’s emphasis on efficiency and precision. Links such as this PR for deterministic encoder backward kernels and this one for a DataLoader refactor are part of this roundup.


OpenAI Discord

  • Artificial Voicing Controversy: An AI-generated voice similar to that of Scarlett Johansson led to concerns over ethical practices in AI after OpenAI’s model was noted to create a voice “eerily similar” to hers. OpenAI’s later decision to remove the voice came after a request for transparency by Johansson’s legal team.

  • Chatbots Galore: For coding assistance, users recommended alternatives to GPT-3.5, singling out Meta AI’s Llama 3 and Mistral Large as effective, free options. In contrast, there was dissatisfaction with Microsoft’s Copilot owing to its perceived intrusiveness and telemetry issues.

  • Tools and Tricks for Tighter Tokens: In managing token usage and response verbosity, AI Engineers advised setting max tokens and using output templates to create succinct responses. Regarding custom tools, some developers cited stronger results with their own prompts as compared to using aids like CodePilot.

  • Platform and Model Tweaks Needed: Participants pointed out formatting issues, such as unwanted line breaks in OpenAI Playground’s output and inconsistent newline handling. Additionally, service outages prompted the sharing of the OpenAI Status Page for service monitoring.

  • Microsoft Expanding Multimodal AI: Microsoft introduced the Phi-3-vision, which combines language and vision, remarking on its potential for various applications. For further reading, members referred to a blog post detailing new models added to the Phi-3 family on Azure.


Modular (Mojo 🔥) Discord

  • Mojo Community Meeting Recap: Mojo enthusiasts can catch up on the latest community meeting by watching the recording which covered topics on Basalt and Compact Dict. The meeting signaled the deprecation of Tensors in Mojo, opening a dialogue on developing new libraries for numerical and AI applications.

  • Python IPC vs. Threading: For long-running queries in a Tkinter app, solutions ranged from threading, message queues, to IPC modules to prevent UI lag. A link to RabbitMQ’s Pika Python client tutorial, although promising, led to implementation difficulties.

  • Mojo’s Technical Evolution and Practices: Discussion on Mojo revealed no official package manager but .mojopkg files are in play, particularly with lightbug_http. Optimizations in Mojo are MLIR-backed, with ongoing curiosity about their impact on custom data types. math.bit has now been aptly renamed to bit, with adjustments to several function names like bswap to byte_reverse.

  • Nightly Build and Dev Challenges: Nightly build discussions included a PR issue with a commit by the wrong author, leading to a DCO test suite failure, addressed on GitHub. Delays in the nightly release were traced to GitHub Actions, confirmed via GitHub Status. The math.bit module was also renamed to bit, amending function names for clarity.

  • Performance Optimization Suggestions: When sorting small data sets, sorting an array of pointers can be more efficient. Regarding DTypePointer memset, a vectorized version performed 20% faster for 100,000 bytes but didn’t scale up as effectively with larger data, due to potential issues with “using clobber memory”.


LAION Discord

  • Voice AI’s Legal Labyrinth: Utilizing a voice actor mimicking Scarlett Johansson raised legal and ethical debates about ‘passing off’ rights, with members reflecting on the Midler v. Ford Motor Co. case as a potential precedent.

  • Investigating Dataset Disappearances: The sudden removal of the Sakuga-42M dataset, involved in cartoon animation frame research, has left members puzzled about potential legal triggers, stirring up discussions about the broader implications of sharing datasets within legal confines.

  • Microsoft’s Multimodal Model Causes a Stir: Discussion on Microsoft’s Phi-3 Vision model delved deep into its mechanics, showcased by Hugging Face, sparking conversations about its functionality, particularly when compared with GPT-4’s color-sorted chart outputs.

  • Anthropic paper perplexes engineers: The recent Anthropic scaling paper has been marked as heavy yet unread, suggesting that despite its potential significance in the field, it may need clearer distillation to be fully appreciated by practitioners.

  • Old School Synthetic Voices Charm the Community: Members took a stroll down memory lane, reminiscing about the DECtalk voice synthesis technology and shared nostalgia through a Speak & Spell video, which was one of the earliest introductions to personal computing for many.


LlamaIndex Discord

  • GPT-4o Paves the Path for Document Parsing: GPT-4o has been leveraged to parse complex documents like PDFs and slide decks into structured markdown, despite challenges with background images and irregular layouts, using LlamaParse. Details are available here.

  • Secure Containerized Code Execution on Azure: Azure Container Apps are enabling the secure execution of LLM-generated code in dynamic sessions. Further insights are provided in these Azure-related links: Container Apps and Code Security.

  • Introduction to OpenDev AI Engineers: The release of a webinar discussing OpenDevin, a platform designed for creating autonomous AI engineers, offers a tutorial by Robert Brennan. Interested viewers can find it here.

  • Batch Inference Bolsters GenAI Capabilities: The latest on batch inference processing for GenAI applications suggests major benefits for data analysis and querying capabilities. Delve into the details via these links: Batch Inference Integration and GenAI Techniques.

  • Navigating LlamaIndex Challenges and Solutions: AI engineers have wrestled with LlamaIndex challenges, from setting up document previews in chat frontends to errors like "ModuleNotFoundError" and "pydantic.v1.error_wrappers.ValidationError". Solutions to these issues involve import path corrections and prompt removal, while indexing strategies, such as retrievers using cosine similarity and HNSW, are under discussion for scaling efficiency.


OpenRouter (Alex Atallah) Discord

Typing Quirks Spark Role-playing Debate: Members humorously identified two main types of OpenRouter users: those seeking AI companionship and those delving into fantasy narratives. The conversation took a light-hearted dive into the role-playing tendencies of some users.

Eyes on Phi-3: The Phi-3 Vision Model, praised for high-quality reasoning, was introduced on the server. The model’s attributes can be explored through HuggingFace.

Verbose Wizard Needs a Trim: Wizard8x22 model’s verbosity issues are recognized, with an adjustment to the repetition penalty proposed as a solution. The dialogue extended to compare other models’ performance, highlighting that model behavior is not consistent across the board.

Billing Blues and Nonprofit Woes: A user’s billing error on a student platform spurred discussion, leading to a temporary fix involving re-entering billing info. Hopes for nonprofit discounts in the future were also expressed.

Experimenting with LLM Action Commands: Innovative use of LLMs was shared through a Twitter thread, exploring action commands as a fresh way to enhance interactions with language models. Feedback from fellow engineers was solicited to push the boundaries of current LLM interaction paradigms.


Interconnects (Nathan Lambert) Discord

Phi Models Join the Fray: The launch of Phi-small and Phi-medium prompted discussions about the characteristics of Phi-3 Vision, with confirmations that it represents a new and slightly larger variant.

Meta’s Model Decisions Cause Stir: A tweet suggested Meta might keep its 400B model closed due to legislative fears, but this was refuted by another source stating the model will remain open-weight. The confusion underscores the delicacy of sharing large-scale model weights in the current regulatory landscape.

OpenAI Under Fire for Unkept Promises: OpenAI has disbanded its Superalignment team due to the unfulfilled promise of 20% compute resource allocation, sparking resignations. This, coupled with a scandal involving NDAs and vested equity issues for ex-employees, casts a cloud over OpenAI’s leadership and transparency.

AI Performance Takes a Drawback: Microsoft’s Surface drawing AI faces criticism due to latency issues resulting from cloud-based safety checks — reflecting the compromises between local processing power and safety protocols in AI applications.

The Trope of Researcher Titles: Amazement was expressed at Anthropic now boasting over 500 ‘researchers’, igniting a conversation about the dilution of the ‘researcher’ title and its implications for perception in the tech industry.


OpenAccess AI Collective (axolotl) Discord

  • Cohere Integration and Tokenizer Troubles: Engineers are working on integrating Cohere (commandr) into the Axolotl system, while resolving tokenization issues with references to the CohereTokenizerFast in the documentation.

  • Discovering Tiny Mistral and Distillation Pipeline Updates: A Tiny Mistral model for testing custom functions was shared, as the community discussed ongoing work on a distillation pipeline for Mistral models, reported to be functioning decently.

  • Full Finetuning Versus LoRA Discussion: There was a constructive back-and-forth over full finetuning versus LoRA with insights on performance differences, particularly around style retention for model adjustments, also suggesting direct referencing of the Axolotl GitHub README for tokenization issues.

  • Axolotl’s Next Major Release and GPU Finetuning Woes: Users expressed curiosity about the next stable major release of Axolotl and discussed challenges with GPU memory requirements when finetuning using examples/mistral/lora.yml, seeking advice on managing CUDA out of memory errors.

  • Guidance on LoRA merges and State Dictionary Offloading: Clarification was given on setting offload_dir for LoRA merges, pointing out the importance of using the offload_state_dict function post-merge to handle large model state dictionaries, referring to the code search in Phorm AI).


Latent Space Discord

  • Langchain JS Awaits Refinements: Engineers discussed the utility of Langchain JS for quick prototyping, despite lagging behind its Python counterpart in refinement. Plans for rearchitecture promise enhancements in future versions.

  • Scale AI Hits the Billion-Dollar Jackpot: Scale AI has raised a staggering $1 billion in a funding round, skyrocketing its valuation to $13.8 billion, with the phasing forecast of profitability by the end of 2024.

  • Phi Packs a Punch: Microsoft’s Phi 3 models with links to 4K and 128K context lengths have debuted and are being praised for their capacity to run on platforms as light as a MacBook Pro M1 Pro. The community is scrutinizing them for competitive performance against leading models like Mixtral, Llama, and GPT.

  • Anthropic Defines Features with Dictionary Learning: Anthropic has made significant strides with dictionary learning in their frontier model, allowing millions of features to be extracted. This is viewed as a leap forward in AI safety and effectiveness, transforming the handling of model activations.

  • Humane Eyes a Ripe Acquisition after AI Pin Stumbles: Humane is seeking acquisition after their AI Pin device’s market obstacles, with talks indicating a valuation aspiration between $750 million and $1 billion. Conversations revolve around the difficulties of hardware innovation in a market dominated by giants like Apple.

  • Survey Paper Club: Condensing AI Research: Members are invited to join the Survey Paper Club for efficient exposure to multiple research papers within an hour, with email notifications facilitated upon signing up.


LangChain AI Discord

  • LangChain Community Specs vs LangChain: Discussions articulated distinctions between LangChain and LangChain Community versions; the former’s architecture is elaborated in the official documentation.

  • LangServe ‘invoke’ Woes: Technical issues in LangServe concerning the ‘invoke’ endpoint which fails to provide comprehensive outputs were reported, spurring debate across several channels, with users flagging inconsistencies in output delivery. Specific problems included the absence of document retrieval and empty outputs, as documented in LangServe discussion #461 and related GitHub issues.

  • Operational Issues with RemoteRunnable: Inconsistency was noted when RemoteRunnable did not perform as expected, unlike the RunnableWithMessageHistory, leading to missing document sources and affecting the operational reliability (see GitHub issue).

  • PDF Powered by Upstage AI Solar and LangChain: A blog post was shared guiding on harnessing the new Upstage AI Solar LLM with LangChain to build a PDF query assistant.

  • LangServe AWS Deployment Made Easier: Members were directed to a Medium article that simplifies deploying LangServe on AWS, eschewing the complexities of cloud technologies such as Terraform, Pulumi, or AWS CDK.


OpenInterpreter Discord

Tech Talk: OpenInterpreter’s Device Dialogues: Engineers are exploring how Open Interpreter can create links between apps and devices, utilizing tools like Boox E Ink tablets, OneNote, and VSCode. There’s particular interest in using Open Interpreter for querying code or papers without browser intervention.

Speedy GPT-4o Troubleshot: While integrating GPT-4o with Open Interpreter, users note a minimum 5x speed increase but face challenges with error messages pertaining to API keys.

Newline Nuisance in Gemini: Code execution is being hindered in models such as Gemini 1.5 and Gemini Flash due to unnecessary newline characters; the absence of “python” declarations in code blocks also came under scrutiny.

Legislative Lore and AI: California’s controversial AI bill and subsequent discussions have ignited the community, with an open letter from Senator Scott Wiener being circulated and debated for its emphasis on responsible AI development.

Bill Gates Foresees Friendlier AI: Gates recently penned thoughts on the future of AI in software, anticipating interfaces that can handle tasks through simple language directives, akin to a friend’s assistance; his article is gaining traction among tech enthusiasts. An unofficial ChatGPT macOS app waitlist workaround made rounds on Twitter, demonstrating interest in quicker access to AI software tools.


tinygrad (George Hotz) Discord

  • Trigonometric Redefinition a No-Go: Community members debated the efficacy of attempting to redefine trigonometric functions such as sine using Taylor series, with the consensus being that it’s an unnecessary reinvention. IBM’s practical approach to partitioning intervals for functions like sine was cited, showing that achieving perfect accuracy in functions is possible with established methods.

  • IBM’s Code Holds the Answers: Participants shared IBM’s implementation of the sine function, highlighting the intricacies of achieving perfect accuracy. Further, they referenced IBM’s range reduction solution for large numbers which is complex but doesn’t generally impact performance.

  • Training Mode Tips and Tricks: In tinygrad, the use of Tensor.train() and Tensor.no_grad was explained for toggling gradient tracking. Helpful code examples, such as this cartpole example, illustrate the usage and benefits of these mechanisms.

  • Under the Hood of Tensor.train: It was made clear that Tensor.train is effectively managing the Tensor.training status. For those preferring direct control, manually setting Tensor.training is an option, supported by tinygrad’s backend implementation.

  • Nailing Views with Movement Ops: A discussion unfolded around the behavior of chained movement operations and their potential to create multiple views. An example using ShapeTracker demonstrated how specific op combinations could produce such scenarios.


DiscoResearch Discord

SFT vs Preference Optimization Debate: In a discussion on model training strategies, a member distinguished Supervised Fine-Tuning (SFT) as enhancing the model’s probability distribution for target data points, whereas Preference Optimization adjusts both desired and undesired outcomes. They questioned the prevalent use of SFT over Preference Optimization, which may offer a more rounded approach to model behavior.

Excitement Over Phi3 Vision’s Low-Parameter Efficiency: One engineer highlighted the development of Phi3 Vision with only 4.2 billion parameters as a significant advancement for low-latency inference in image processing tasks. Asserting that this could have groundbreaking implications for robotics, the model was praised for potential throughput improvements, as links to the announcement were shared (source).

Comparing Image Models Between Moondream2 and Phi3 Vision: The community weighed in on the performance of Moondream2 comparative to Phi3 Vision for image-related tasks. While Moondream2 has had issues with hallucinations, a member mentioned efforts to mitigate this, showcasing the ongoing pursuit of fidelity in image models (Moondream2).

Mixed Reactions to Microsoft’s Model Drops: The release of Microsoft’s 7b and 14b Instruct models sparked diverse opinions, from concerns about their limitations in certain languages to optimism about their utility in complex reasoning and extraction tasks. The discussion reflects the community’s critical analysis of newly released models and their capabilities.

Skepticism Towards Meta’s 400b Model: With concerns circulating in the community about Meta potentially not releasing a 400b model as open source, one member highlighted skepticism by pointing to the uncertain credibility of the source, nicknamed Jimmy. This indicates a critical attitude toward rumor validation within the community.


Cohere Discord

  • Cohere is hiring: An enthusiastic member shared a career opportunity at Cohere, highlighting the chance to tackle real-world problems with advanced ML/AI.

  • VRAM Calculator Intrigues: Engineers are discussing the findings of the LLM Model VRAM Calculator, questioning the higher VRAM use of the Phi 3 Mini compared to Phi 3 Medium for identical context lengths.

  • Bilingual Bot Integration Quest: Multiple posts indicate a member searching for a guide to incorporate Command-R into BotPress, requesting help in both English and Spanish.

  • Link Confusion Alert: There is confusion over accessing the Cohere careers page, with at least one member unable to find the correct page through the provided link.


AI Stack Devs (Yoko Li) Discord

  • Banter About AI Companions: Discussion sparked by the phrase “AI waifus save lives!” led to a conversation about potentially emotional AI, alluding to the relevance of sentiment analysis for chatbots.
  • Emotional Intelligence in Chatbots on the Rise: Shared VentureBeat article prompts engineers to consider the implications for business bots when AI begins to ‘understand’ emotions, which could be significant for user experience and interface design. VentureBeat article on Emotional AI.
  • 3D Chatbots Gaining Traction: A member from 4Wall AI highlighted their ongoing work on 3D character chatbots, suggesting new opportunities for human-computer interaction within the field.
  • Pop Culture Meets AI: A reference to “Just Monika” prompted sharing of a Doki Doki Literature Club GIF, showcasing how pop culture can influence dialogues around AI personas. Ddlc Doki Doki Literature Club GIF.

Datasette - LLM (@SimonW) Discord

Snapdragon Dev Kit Sparks Debate: Qualcomm’s new Snapdragon Dev Kit priced at $899.99, featuring Snapdragon X Elite and boasting 32GB of LPDDR5x RAM and 512GB NVMe storage, has sparked discussions on cost-effectiveness compared to the previous $600 model, as detailed on The Verge and Microsoft Store.

Mac Mini Server Gets Thumbs Up: An AI engineer shared their success in using a Mac Mini as a reliable Llamafile server with Tailscale, praising its zero-cold start feature and seamless ‘llm’ CLI integration, suggesting a practical use case for developers needing stable server solutions.

Affordable Dev Kits in Demand: Discussion among users indicates a strong desire for more affordable development kits, with aesthetic preferences also being voiced, such as a wish for a translucent case design, yet no specific products were mentioned.

Smalltalk AI Shows Promise: A member introduced Claude’s ability to engage in Smalltalk, using “What are frogs?” as an example question overcome by the AI with a basic reply about amphibians, indicating advances in AI’s conversational capabilities.


LLM Perf Enthusiasts AI Discord

Brevity Blunder in Llama3/Phi3: An inquiry was made regarding how to stop llama3/phi3 from truncating responses with “additional items omitted for brevity,” but no solutions or further discussion ensued.


Mozilla AI Discord

  • Community Events for Engineering Minds: Mozilla AI announced the initiation of member-organized events to inspire idea-sharing and community interaction, featuring talks, AMAs, and demos, starting with an AMA hosted by LLM360.

  • AMA on Open-Source LLMs: LLM360 hosted an AMA session, diving into the specifics of their work with open-source LLMs and attracting a tech-savvy crowd.

  • Embeddings with Llamafiles: Kate Silverstein, a Staff Machine Learning Engineer, will demonstrate the use of llamafiles for generating embeddings and elaborate on her latest blog post.

  • Events Calendar a Click Away: Mozilla AI encourages members to frequent the events calendar for a robust lineup of community-led discussions and technical activities.

  • Query on Model Spec in LLaMA CPP: A member sought clarity on using a tinyllama model via terminal, questioning whether the model="LLaMA_CPP" specification is necessary and which model is actually in play when the code snippet runs successfully.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (1309 messages🔥🔥🔥):

  • OpenAI and Dataset Challenges: Members discussed various dataset challenges including converting formats, using ShareGPT, and optimizing training parameters such as batch sizes. One user shared that they “spent 5 hours scraping site into alpaca format” only to find it unhelpful, indicating how persnickety these processes can be.
  • Phi-3 out!; Users skeptical but excited: Phi-3 models from Microsoft generated excitement with members mentioning Phi-3-Medium-128K-Instruct, yet some noted skepticism about the validity of its benchmarks. One user said, “literally”.
  • Latest Legal Constraints: Conversations about AI regulations like California’s SB 1047 law sparked discussions on the implications for open-source models. “Meta plans to not open the weights for its 400B model,” catalyzed a debate, with users expressing concerns about its global effects.
  • Technical Issues and Workarounds for Colab/Kaggle: Common technical glitches were noted, especially around updates breaking compatibility. User theyruinedelise pointed out necessary workarounds like restarting Colab sessions due to the “Pytorch not detecting T4’s properly” issue.
  • Unsloth Platform Developments: Users discussed new model support on the Unsloth platform such as Mistral v3, expressing excitement over improved fine-tuning features. “Unsloth now supports Mistral v3”, facilitating easier adoption of cutting-edge models in the community.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (233 messages🔥🔥):

  • MoRA sparks curiosity: Members inquired about a new method called MoRA and shared plans to test its vanilla implementation. One noted that it appears to be a “scaled up” version of LoRA, optimized for measuring the intrinsic dimension of objective landscapes.

  • Dune series and philosophy discussions dominate: Users engaged in a detailed discussion about the philosophical depth of the Dune series beyond its initial hero’s journey. They noted that subsequent books become progressively more philosophical, moving away from simple narratives.

  • Sci-fi novels and recommendations flood the chat: The conversation shifted to various sci-fi novels and recommendations, including Peter Watts’ “Blindsight,” which features unique takes on alien intelligence and vampires, described as “the hardest sci-fi that ever sci-fied.”

  • Fondness for intricate sci-fi plots: Users expressed enthusiasm for complex and intriguing sci-fi plots, comparing elements of hard sci-fi novels to modern AI’s behavior. They discussed the appeal of realistic and imaginative alien life forms in literature over cliché humanoid representations.

  • Debate on movies versus reading novels: Members compared the experience of watching sci-fi movies to reading novels, with some expressing a preference for the latter due to the more profound and imaginative storytelling. The conversation highlighted dissatisfaction with recent movie adaptations of popular sci-fi stories, noting a decline in quality compared to the depth found in books.

Link mentioned: MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning: Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings sugge…


Unsloth AI (Daniel Han) ▷ #help (192 messages🔥🔥):

  • Unsloth models face saving issues: Users report problems with the model.save_pretrained_gguf() function documented in GitHub Issue #485, which breaks due to a UnboundLocalError.
  • Flash Attention causes CUDA issues: Several users experienced errors with Flash Attention versions misconfigured for their setups and discussed switching to xformers instead. Consequently, starsupernova recommended uninstalling and reinstalling Unsloth without Flash Attention.
  • Pytorch ≥ 2.3 breaks T4 GPUs: Multiple users reported compatibility issues with Pytorch version 2.3 on Tesla T4 GPUs, leading to recommendations to downgrade and disable bf16 support. A community workaround involved specifying dtype explicitly.
  • Guided decoding for YAML: There was an in-depth discussion on leveraging guided decoding for structured output in YAML, with insights on using grammars and constraining output effectively while prompting models. This includes potential support in vLLM using different syntaxes like JSON Schema or BNF.
  • Installation and training discrepancies: Discrepancies in installation instructions and training behavior were analyzed, particularly focusing on the trl library and its versions affecting model training. Adjustments to ensure consistent setups and installations were advised, especially considering instability in recent library versions.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (7 messages):

  • Superfantastic results ignite excitement: One member expressed amazement with their results, describing them as “super fantastic.” Another commented on their own struggles, stating they couldn’t get results below 52k and expressed anticipation for an upcoming article.
  • Recipe for success forthcoming: A member mentioned they will “release the recipe this/next week,” despite noting that it won’t fully reproduce earlier results due to the use of proprietary data. They added that it might perform a bit better for English datasets.
  • Knowledge Graph Embeddings: A member shared their past experience with Knowledge Graph Embeddings, mentioning difficulty in transitioning from a Neo4j graph to a PyTorch Geometric Dataset due to complex cypher queries. Another member implied that such a task should be easier with current tools.

LLM Finetuning (Hamel + Dan) ▷ #general (242 messages🔥🔥):

  • Modal Learning Opens New Doors: One member shared that their company uses Marker PDF, leveraging Surya OCR to convert PDFs into markdown format. They noted that the tool’s results surpass other open models, and they have open-sourced the solution on GitHub.

  • Native Prompts vs. Translations?: Members discussed the efficacy of native language prompts versus English prompts with instructions to translate. One member shared a paper focusing on self-translation models, adding various experiences and suggesting task-specific strategies.

  • PDF Parsing and Multimodal LLMs: Challenges in PDF parsing were highlighted with multiple tools such as LlamaParse, Unstructured, and table transformers mentioned, but none provided perfect results. There was interest in strategies involving multimodal LLMs and fine-tuning on target data.

  • Anthropic’s Sonnet Paper Sparks Interest: A member shared a link to a paper on interpretability by Anthropic, sparking discussions about safety and steering model behavior. Another member added further insights with a related Twitter thread.

  • Community Engagement with Modal and Tools: Discussions included preference for tools like pyenv, mamba (through miniforge), and the ease of using GUI for language model fine-tuning. Members shared installation guides and discussed various workflows and their experiences with different packages and environments.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #workshop-1 (83 messages🔥🔥):

  • Extracting Villa Attributes from User Prompts: One member discussed extracting structured attributes like bedrooms and swimming pools from user-provided prompts about their villa wishes. They highlighted the importance of maintaining low latency and high performance, and expressed interest in using synthetic data for evaluation.

  • Workflows and Synthetic Data: Another member shared their use case of predicting workflows and generating them using GPT-4 for various domains. They focus on using synthetic data to fine-tune Mistral models for providing workflow recommendations.

  • User Testing with LLM Agents: A use case was presented for using LLM agents to conduct user tests for web applications, tuning prompts to capture user personalities and desired feedback. The focus lies in prompt tuning to effectively simulate user interactions.

  • Model for Grant Application Assistance: One user proposed fine-tuning models to help UK farmers and organizations navigate and complete grant applications. They plan to combine natural language understanding with domain-specific knowledge from the UK government website.

  • In-Store Book Recommendation System: An idea was put forward for creating a recommendation system that uses user queries to provide book suggestions from a bookstore’s database. The system would rely initially on prompt engineering and RAG, with potential fine-tuning to reduce costs as the model scales up.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #asia-tz (26 messages🔥):

  • Cedric shares extensive notes from LLM workshop: A member from Singapore, Cedric, shared his notes from a workshop, summarizing key points about “Mastering LLMs”. The notes were met with positive feedback with members expressing gratitude.

  • Pune Meetup Proposal Gains Interest: A member from Pune suggested a local meetup and received enthusiastic responses. The intent to set up the event was emphasized with a follow-up message on logistics: “[Possible Pune meetup ?]”.

  • Growth in Singapore and Malaysia Community: Several members from Singapore, Malaysia, and other parts of Asia introduced themselves. Collaborative enthusiasm was high with many members taking interest in discussing topics and meeting locally.

  • General Members’ Greetings: Multiple members from various parts of India and Asia introduced themselves, expressing interest in connecting with others. These introductions highlighted the geographical diversity and the active participation from different regions in Asia.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (77 messages🔥🔥):

  • Satish’s Surya OCR and Modal Issues: “I have created the PDF extractor using Surya OCR” but faced issues with Modal running every time the model loads. Suggested to join Modal’s Slack for quicker support as outlined here.

  • Axolotl Running Issue: Nisargvp faced trouble recognizing axolotl.git URL in Modal; suggested to refer to Modal’s LLM Finetuning sample repo.

  • Inference Configuration Confusion: Intheclouddan ran into issues while setting up inference using a specific prompt format and was advised to use the full llama 3 chat template and shared related example repo.

  • Modal Credits Inquiry: Numerous participants mentioned filling out forms and awaiting Modal credits. Charles shared the claim form link and mentioned the credits process details are in a specific Discord channel.

  • Training and Inference Execution Errors: Troubleshooting related to execution errors showed that repeated attempts sometimes resolve the issues. Ripes suggested checking related discussions on Modal’s Slack community.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #learning-resources (10 messages🔥):

  • Reverse Engineering Transformers benefits from interactive articles: A member shared a comprehensive resource on reverse engineering transformer language models into human-understandable programs, inspired by the Distill Circuits Thread and other interactive articles like Activation Atlases. They also noted Distill’s hiatus and mentioned that new content may be added in collaboration with other institutions.

  • Fine-Tuning Benchmarks Showcase Open-Source LLM Performance: The Predibase fine-tuning index offers performance benchmarks from fine-tuning over 700 open-source LLMs, highlighting that smaller models can deliver GPT-like performance through fine-tuning. Performance metrics are presented in interactive charts to help AI teams select the best open-source models for their applications.

  • Dedicated GitHub Repo for LLM Resource Collaboration: A member created a GitHub repo for better collaboration on LLM resources for a workshop by Dan Becker and Hamel Husain. They asked users not to directly edit the README.md file as it’s auto-generated through GitHub actions and encouraged pull requests for contributions.

  • ML Engineering Book Added to LLM Resource Repo: A member plans to add Stas’ ML Engineering book to the resource repo, highlighting its in-depth insights on training LLMs at scale, covering various aspects such as orchestration, good training loss, and planning. The book is praised as an invaluable resource despite its chunkiness due to the detailed coverage.

  • AI Model Comparison Website as a Favorite Resource: A member recommended artificialanalysis.ai for comparing and analyzing AI models across metrics like quality, price, performance, and speed. They noted the site’s detailed metrics and FAQs for further details and highlighted the trade-offs between model quality and throughput.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #jarvis-labs (36 messages🔥):

  • Members plan to run Axolotl on Jarvis: Multiple users expressed interest in experimenting with Axolotl and discussed the sign-up and credit allocation process for Jarvislabs. User vishnu9158 shared the steps to start using Axolotl locally via a Docker image.

  • Credits for JarvisLabs: Users inquired about getting credits after signing up on Jarvislabs. It was clarified that if the Jarvislabs account email differs from the course email, it might cause delays.

  • Creating and Running Axolotl Instances: The community discussed running Axolotl instances using a Docker image and JupyterLab for fine-tuning. Vishnu9158 mentioned that a documentation and video tutorial are coming soon.

  • Blog posts for better understanding: Several users, inspired by a previous suggestion, shared or planned to share their blog posts about their learning experiences on platforms like Medium.

  • Hugging Face model issues resolved: Some members faced issues accessing the llama-3 model on Hugging Face, despite having access. Dhar007 provided steps to resolve this by creating and using an access token, but then ran into CUDA out of memory errors, suggesting adjustments in batch sizes.

Link mentioned: Jarvislabs: Making AI affordable and simple for everyone: Jarvislabs is a platform that allows you to run and explore multiple AI framerworks on powerful GPUs with zero setup


LLM Finetuning (Hamel + Dan) ▷ #hugging-face (11 messages🔥):

  • Hugging Face model filter issue: Users discussed the issue of filtering for axolotl models on Hugging Face without getting results. A link was shared to Hugging Face models, and solutions involving the HfApi library were proposed.

  • Pre-defined tags for filtering: A Hugging Face team member clarified that the Other tab uses a set of pre-defined tags to avoid overwhelming users, making the user experience more consistent. They mentioned a potential improvement: showing “+N other tags” to make it clearer.

  • Energy over Hybrid Sharding with FSDP and DS: A user expressed enthusiasm for hybrid sharding strategies when sharding models using FSDP and DS.

  • Uploading fine-tuned models: A user had issues with uploading a large fine-tuned gpt2-medium model to Hugging Face, noting that it resulted in multiple .pth files instead of one. They were advised to seek help in a more relevant channel for detailed guidance.

Link mentioned: Models - Hugging Face: no description found


LLM Finetuning (Hamel + Dan) ▷ #replicate (10 messages🔥):

  • Clarification on Replicate’s Use Case: A member questioned the primary use case for Replicate, asking if it’s mainly to offer API endpoints for downstream tasks and for firms/individuals. They also noted the availability of “defined tasks, fine-tuning, and customized datasets.”

  • Conference Registration Email Issues: Several members, including hughdbrown and project_disaster, reported issues with conference registration where the emails used for GitHub registration differ from those used for the conference.

  • Credits and Email Address Workaround: harpreetsahota mentioned that users can set a different email address after signing up on Replicate if their GitHub email differs. However, filippob82 indicated that emails containing a + sign are currently not being accepted.

  • Credit Allocation Enquiries: Users like digitalbeacon are awaiting credits post-sign-up. 0xai queried whether entering the maven registered address in the notifications section would automatically add these credits.


LLM Finetuning (Hamel + Dan) ▷ #langsmith (4 messages):

  • Credit dispatch in progress: A member asked if credits had already been dispatched. Another member responded, directing them to a pinned message and clarifying that announcements would be made on Discord and by email.

LLM Finetuning (Hamel + Dan) ▷ #whitaker_napkin_math (4 messages):

  • Hamel gets his own fan channel: A member humorously acknowledges that Hamel has his own fan channel. The sentiment was light and playful, stating, “Not sure what to do with such power.”.
  • Session preparation hints: Another member hints that they’ll fill the channel with relevant content before conducting a scheduled session. They plan to ensure engaging discussion leading up to the event.

Link mentioned: Minion Hello GIF - Minion Hello Minions - Discover & Share GIFs: Click to view the GIF


LLM Finetuning (Hamel + Dan) ▷ #workshop-2 (525 messages🔥🔥🔥):

  • NVLink woes and creative solutions: Members discussed issues with NVLink, including mismatched card heights and lack of NVLink compatibility on certain setups. Suggested solutions included using riser cables with support brackets.

  • Hamel’s evaluation steps clarification: A user asked for the significance of the evaluation step that Hamel discussed, leading to an understanding that breaking down tasks and iterative iterations are key to completing projects efficiently. “80% of the time is spent getting to 80% quality, and 500% of the time to reach 100%.”

  • Using Modal and Jarvis for running Axolotl: Users discussed using Modal, RunPod, and Jarvis Labs for running Axolotl, with suggestions to initially try straightforward setups like RunPod or Jarvis before attempting more automated or complex configurations such as Modal. “You can run it on modal if you have the credits” and “try Jarvis which offers credits as part of the course.”

  • Axolotl dataset formats and model usage: The community explored various dataset formats for Axolotl, including JSONL and conversation-based formats like ShareGPT. There was a preference for JSONL due to its flexibility and ease of use, with an emphasis on using the ‘input_output’ format for cases without strict templates.

  • Recording workshops and resource links: The community shared feedback on the need for more practical examples and clear steps to run fine-tuning workshops. Helpful links to resources and blog posts, such as Loom video and Medium post, were highlighted, and recordings were made available promptly.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #jason_improving_rag (3 messages):

  • Excitement for Jason’s W&B course: Filippob82 expressed enthusiasm for Jason’s session and mentioned they are halfway through his W&B course. They used an emoji to convey their excitement.
  • Curiosity about prompt engineering: Nehil8946 showed interest in Jason’s work on optimizing prompts and asked if there is a systematic approach to prompt engineering that Jason follows. They are looking forward to learning about it in his workshop.

LLM Finetuning (Hamel + Dan) ▷ #jeremy_python_llms (1 messages):

nirant: Woohoo! Looking forward to <@660097403046723594>


LLM Finetuning (Hamel + Dan) ▷ #gradio (2 messages):

  • Meet Freddy, your Gradio expert: Freddy introduced himself as one of the maintainers of Gradio, a Python library for developing user interfaces for AI models. He shared helpful resources for getting started and creating chatbots with Gradio, including a quickstart guide and a tutorial on building a chatbot.
  • Mnemic1 prepares for questions: A member expressed thanks for the resources and mentioned they would have questions about an A1111-extension they wrote, which had some unresolved issues.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #axolotl (85 messages🔥🔥):

- **Members address Axolotl issue #1436**: Discussion about `bitsandbytes==0.43.0` not installing on macOS from [GitHub Issue #1436](https://github.com/OpenAccess-AI-Collective/axolotl/issues/1436). Recommendations include using Linux GPU servers on RunPod.
- **Axolotl and MLX integration not yet supported**: Members discuss the lack of MLX support on Axolotl as detailed in [GitHub Issue #1119](https://github.com/OpenAccess-AI-Collective/axolotl/issues/1119). Users are advised to stay updated.
- **Best setup practices explored**: Members share various methods to set up Axolotl. The Axolotl [Readme](https://github.com/OpenAccess-AI-Collective/axolotl/tree/main?tab=readme-ov-file#quickstart-) and Docker method are mentioned as the most reliable.
- **Fine-tuning and integration concerns**: Members inquire about using Axolotl on local machines and fine-tuning models like LLaMA3. Issues related to configuration and compatibility with Modal environments are discussed.
- **Tips for troubleshooting installation**: For users facing installation difficulties, such as receiving a `CUDA` error, several members recommend steps including installing specific CUDA/PyTorch versions and using the docker container. Links to [Docker](https://hub.docker.com/layers/winglian/axolotl/main-20240522-py3.11-cu121-2.2.2/images/sha256-47e0feb612caf261764631a0c516868910fb017786a17e4dd40d3e0afb48e018?context=explore) and a [setup guide](https://latent-space-xi.vercel.app/til/create-a-conda-env-for-axolotl) are provided.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #zach-accelerate (49 messages🔥):

  • Hugging Face Presentation and Accelerate Resources: A member shared various resources including a presentation on Hugging Face and documentation for Accelerate. Links included tutorials on FSDP vs. DeepSpeed and examples on GitHub.
  • Creating Slides with Quarto Saves Time: Members discussed how using Quarto made creating presentations easier and faster. One user mentioned they now only use Quarto for slides due to the streamlined workflow.
  • Using Accelerate in Python Scripts: There was a conversation on how to utilize Accelerate within Python scripts, suggesting code snippets for launching processes and saving models with Accelerate. One user provided a detailed answer to streamline implementation.
  • Interest in Different Demo Videos for Accelerate: Members expressed interest in seeing recorded demos of Accelerate’s usage in various scenarios, including local vs. cloud training, hybrid modes, and focusing on techniques like LoRa without quantization. Specific requests included comparing setups and configurations for different environments.
  • Upcoming GPU Optimization Workshop: An event was shared featuring a workshop on GPU optimization with speakers from OpenAI, NVIDIA, Meta, and Voltron Data, with details on event registration, YouTube livestream, and relevant reading materials.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #wing-axolotl (30 messages🔥):

  • Caching Precautions for Multiple Model Training: A user asked about the necessary precautions for separating cached samples when training multiple models simultaneously. They inquired whether sequence length, datasets, tokenizers, and other settings are relevant factors.

  • Custom Callbacks for Evaluations: A user sought guidance on using custom callbacks to run evaluations on custom datasets during training and transferring checkpoints between devices while displaying outputs in wandb/mlflow.

  • Dataset Types: Pretrain vs. Completion: A user asked for the difference between “pretrain” and “completion” dataset types and the appropriate use cases for each.

  • Solving Command Errors: Several users discussed unresolved issues with running the command accelerate launch -m axolotl.cli.train hc.yml. Troubleshooting suggestions included ensuring dependencies like torch and gcc are correctly installed, and using a docker image for a more straightforward setup.

  • Helpful GCC Installation Resource: A user provided a link to a tutorial for installing the GCC compiler on Ubuntu to help resolve installation issues.

Links mentioned:


Perplexity AI ▷ #announcements (1 messages):

  • Perplexity integrates Tako for enhanced knowledge search: Perplexity teams up with Tako to provide advanced knowledge search and visualization. Users can now search for comparative data like “Gamestop vs. AMC stock since 5/3/24” with interactive knowledge cards, initially available in the U.S. and in English, with mobile access coming soon.

Link mentioned: Tako: no description found


Perplexity AI ▷ #general (835 messages🔥🔥🔥):

- **Microsoft Stole OpenAI's Ideas**: A member shared a [blog post](https://blogs.microsoft.com/blog/2024/05/20/introducing-copilot-pcs/) stating that Microsoft has copied features from OpenAI and introduced "Copilot+ PCs,” the fastest and most intelligent Windows PCs ever built. They noted features like an impressive 40+ TOPS, all-day battery life, AI image generation, and live captions for 40+ languages.

- **GPT-4o Context Concerns**: There were discussions about the context window of GPT-4o as **perceived on Perplexity**. A consensus formed that **context window defaults to 32k**, with uncertainties about higher capacities.

- **Perplexity's Default Model Surprise**: Members expressed surprise that the default model for Perplexity might be **Haiku** instead of an in-house model, **Sonar**, which is available only for pro users. One member noted that free users previously used GPT-3.5, but this has changed recently.

- **Perplexity's API Queries**: Discussion revolved around how Perplexity configures and charges for API usage. Members speculated about using in-house models and the potential financial implications of their pricing structure.

- **Service Downtime Creates Community Stir**: Perplexity experiencing downtime led to widespread frustration and speculation among users about the cause. Users shared alternative resources and a member posted a supportive message to help calm the community during the outage.

Links mentioned:


Perplexity AI ▷ #sharing (9 messages🔥):

  • Members share Perplexity AI links: Multiple members shared specific search-related links from Perplexity AI, indicating queries and interests such as “Layer”, “indoor discussions”, and “creating SFW content”. One particularly notable search was about “Ether is” with a specific focus link.

  • Reminder to make threads shareable: A gentle reminder was issued to ensure that shared threads are marked as “Shareable”. The comment included a screenshot from Discord.

  • User interest in Taiwan Semiconductor: A member showed interest in Taiwan Semiconductor, sharing a specific Perplexity AI search link.


Perplexity AI ▷ #pplx-api (11 messages🔥):

  • Headlines API delivers outdated news: A member reported getting headlines from a year ago when using the same prompt for the API as on the web. They asked if anyone else had similar issues generating relevant daily headlines.
  • Attempted to refine search queries: Another member suggested adding a date filter (after:12-02-2024). They further clarified that this should be added directly to the query.
  • API underperforms compared to the web version: The original member reported that the suggested fixes did not work, as they continued to get poor results through the API compared to the web. They mentioned they were getting good results on the web but terrible ones through the API.
  • API limitations highlighted: It was noted that the API is still in beta and only supports one endpoint. This limitation may be contributing to the inconsistent results between the web and API outputs.

Stability.ai (Stable Diffusion) ▷ #general-chat (497 messages🔥🔥🔥):

  • Lightning and Hyper models debate: A member discussed the efficiency of mixing Lightning and Hyper models with base stable models, proposing it could reduce the number of steps required for image generation. However, another member advised against mixing checkpoints from different architectures, warning it often results in poor-quality images.

  • EU AI Act sparks outrage: Following the approval of the EU AI Act, several members expressed frustration and confusion about its implications. One shared a link to the official press release, highlighting the potential difficulties related to watermarking requirements for AI-generated content.

  • Frustrations with Local AI Setup: Members frequently discussed the challenges of setting up Stable Diffusion locally, particularly with AMD GPUs, while suggesting Nvidia GPUs as a better alternative. One member humorously noted that the “best wizard” would help them acquire a Nvidia GPU to solve their issues.

  • Discontent with AI content quality: The rampant creation of low-quality AI-generated images, particularly generic and heavily sexualized content, was criticized. Members pointed out the prevalence of such content on platforms like CivitAI and the AI art subreddit, questioning the value it adds to the community.

  • GPUs for Stable Diffusion: Members debated the best GPUs for running Stable Diffusion, with a preference for Nvidia GPUs over AMD due to better support. They emphasized the importance of VRAM, recommending at least 12GB for efficient AI performance.

Links mentioned:


Eleuther ▷ #general (273 messages🔥🔥):

  • Benchmarking Pallas, Naive, and Flash v2 in JAX: Users discussed benchmarking various implementations like pallas, naive, and flash v2 in JAX, comparing performance on different input sizes. Issues encountered include discrepancies in TFLOPS and shared memory errors on GPUs.

  • PSA on California SB 1047: A heated discussion on SB 1047, a California bill that could severely impact open-source AI by creating an unaccountable agency, was shared. Members were encouraged to contact legislators to voice their opposition.

  • Concerns Over GPU Clocks During Benchmarks: There was a detailed conversation about GPU clock speeds affecting benchmark results, with recommendations to use MSI Afterburner to lock clocks. A member noted, “Creating the input is slow,” impacting the benchmarking process.

  • Review of Frontier Model Training Costs: A member from Epoch discussed the cost estimation for training large AI models, noting discrepancies in reported costs from various sources. They thanked Eleuther for insights, revealing that the Pythia model had an estimated training cost of $250k per run on AWS.

  • Discussion on Preprints: Members debated the pros and cons of making preprints available on ArXiv, citing that preprints are becoming more accepted across major journals. “Almost all the big journals have normalized it,” one user noted.

Links mentioned:


Eleuther ▷ #research (128 messages🔥🔥):

  • Paper on GPT-3 Non-Deterministic Temperature 0 Behavior: Members discussed how GPT-3 can exhibit random output even at temperature 0, with references provided including this paper and an OpenAI community discussion thread. Another member mentioned hardware factors, such as CUDA kernel non-determinism, contributing to this behavior.
  • MegaBlocks for Efficient MoE Training: The introduction of MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs was discussed, which avoids token dropping and offers significant speedups. The research paper details its contributions, like block-sparse operations for improved hardware efficiency.
  • Character Self-Awareness in Language Models: Users shared insights on how larger language models manage self-aware characters effectively, integrating concepts like understanding when they’re edited or rolled-back in conversations. These observations seem consistent across various large models, including proprietary ones and open-source adaptations.
  • Transformer Model Efficiency Improvements: Various optimization techniques for transformer models were debated, such as LeanAttention and Multi-Query Attention (MQA), which aim to reduce the memory footprint and latency of large language models. Relevant papers include Cross-Layer Attention (CLA) and LeanAttention methods for improved computational efficiency.
  • Scaling Laws and Model Performance: Intrinsic performance and scaling laws for reinforcement learning models were discussed, emphasizing the smooth performance scaling similar to generative models. The concept was illustrated through a recent paper that models intrinsic performance as a power law in context to environment interactions and training compute.

Links mentioned:


Eleuther ▷ #scaling-laws (1 messages):

  • Training on small datasets remains challenging: A member commented that training AI on a small dataset yields worse results compared to pre-training on the entire internet before fine-tuning on smaller datasets. They added that it is “notoriously difficult” to close this gap.

Eleuther ▷ #interpretability-general (4 messages):

  • Anthropic’s Work on Interpretable Features Creates Buzz: A member shared a link about exciting work on interpretable features by Anthropic. You can read more about it here.
  • Reconstruction Loss in SAEs Raises Concerns: A member asked, “How big of an issue is reconstruction loss for SAEs?” and followed up with inquiries on what improvements are being pursued.
  • Pointer to Related Channel: Another member directed others to check channel #1153431135414669422 for related discussions.

Eleuther ▷ #lm-thunderdome (37 messages🔥):

- **Questions on lm-evaluation-harness and MCQs**: Members discussed the randomization of answer choices in MCQs using **lm-eval-harness**, with concerns about benchmark biases towards early choices. While **SciQ** has a fixed correct answer index, the randomization isn't currently applied for **MMLU**.
  
- **Upcoming Submissions and Papers**: An **anon'd paper** is coming soon to arXiv, while members joked about **not needing to worry about insane competition** in D&B papers. There's also work on an updated version of the **Pile with 3T tokens and fully licensed text**.

- **Medical Benchmarks Controversy**: A lively discussion emerged about medical benchmarks and their potential dangers. One member focused on how these benchmarks might claim models are better and safer than physicians, highlighting ongoing improvements in the interpretation of such benchmarks.

- **Huggingface Dataset Configuration**: Members sought advice on configuring a Huggingface dataset's directory structure. The solution pointed out the importance of **adding a config in the README.md file** as outlined in the [Huggingface documentation](https://huggingface.co/docs/hub/en/datasets-manual-configuration#splits).

- **Running lm-eval-harness on Multi-node Slurm Cluster**: A question was raised about evaluating big models on a multi-node Slurm cluster. Attempts have been made using **vllm + ray** and **accelerate** but were unsuccessful, indicating a need for better solutions.

Links mentioned:


HuggingFace ▷ #announcements (1 messages):

  • Phi-3 Models Roll Out: Microsoft released Phi-3 small and medium models, including Instruct Versions with up to 128k context and a VLM version. Check out the Phi-3-vision-128k-instruct model.

  • ZeroGPU Initiative Fuels Open-Source AI: Hugging Face committed $10M via ZeroGPU to support indie and academic AI builders with free GPU resources for AI demos. Over 1,300 ZeroGPU spaces have been built since May 1, 2024.

  • Local Apps Integration: Hugging Face announced Local Apps, allowing users to easily convert model pages to local applications. Users can suggest their favorite local apps for integration.

  • Transformers 4.41.0 Packed with Updates: The new release includes models like Phi3 and VideoLlava, improved GGUF support, and watermarking capabilities. Transformers 4.41.0 is poised to enhance multiple functionalities, making integration smoother.

  • LangChain-HuggingFace Connector Released: A new open-source package, langchain-huggingface, integrates Hugging Face models into LangChain, offering flexible access to models via API and self-hosted inference. This facilitates easy installation and fast integration for various model use cases.

Links mentioned:

  • Tweet from clem 🤗 (@ClementDelangue): GPU-Poor no more: super excited to officially release ZeroGPU in beta today. Congrats @victormustar & team for the release! In the past few months, the open-source AI community has been thriving. Not...
  • Tweet from Lysandre (@LysandreJik): From a model page to your Local App in seconds, the @huggingface Hub welcomes Local Apps! Suggest your favorite Local App leveraging the Hub there to get them added to the dropdown and ✨ deep linked...
  • Tweet from Omar Sanseviero (@osanseviero): Transformers 4.41.0 has lots of goodies🤗 🥳 New models: Phi3, JetMoE, PaliGemma, VideoLlava, and Falcon 2. 🤯 GGUF support with from_pretrained 🤏 New quant methods: HQQ and EETQ 🔍 Watermarking sup...
  • Tweet from Philipp Schmid (@_philschmid): We are excited to announce huggingface-langchain🚀 A new open-source package to seamlessly integrate the latest open Models from @huggingface into @LangChainAI, supporting local models hosted models! ...
  • Tweet from apolinario (multimodal.art) (@multimodalart): Quite excited that CommonCanvas is JUST out! 🖼️ • First open source text-to-image models trained fully on openly licensed images (SD2 and SDXL architectures) • The dataset, with ~70M openly license...
  • Tweet from Xenova (@xenovacom): Moondream, your favorite tiny vision language model by @vikhyatk can now run directly in the browser on WebGPU! 🤯 Powered, of course, by Transformers.js and ONNX Runtime Web! 🤗 Local inference mean...
  • Tweet from Xenova (@xenovacom): You can now use 🤗 Transformers.js with Google Visual Blocks, a visual programming framework that lets you create machine learning pipelines in a no-code graph editor! 🛠️ Rapid workflow prototyping ...
  • Tweet from Ilyas Moutawwakil (@IlysMoutawwakil): Optimum-Benchmark on PyPI 🎉 But why now ? 🤔 Because it's getting integrated in Transformers' benchmarking workflow 😍 Your favorite transformers will only get faster and lighter ; Kudos to @...
  • Tweet from Omar Sanseviero (@osanseviero): Curious about LLMs? Join this Fine-Tuning course with top experts! 🚀 @huggingface is offering $501.42 in GPU credits for can Space demos, fine-tuning, inference, and more! Enjoy 🤗 https://maven.co...

HuggingFace ▷ #general (398 messages🔥🔥):

  • Library for Training NeRF Models Discussed: A member inquired about HuggingFace support for NeRF and 3D Gaussian Splatting models, suggesting that a dedicated library could be beneficial. They were redirected to relevant channels for further discussion.
  • Concerns About Falcon-180B Fine-Tuning: There were discussions about the challenges of fine-tuning Falcon-180B due to hardware limitations, even on AutoTrain with an 8xH100 setup. No concrete solution was provided, indicating the need for more advanced resources or methods.
  • Embedding Issues with 4-bit Quantized Llama-8B: Members discussed unexpected memory usage when loading Llama-8B with 4-bit quantization. It was highlighted that bitsandbytes 4-bit doesn’t quantize embeddings, leading to higher-than-expected memory usage.
  • GPT Deployment on Personal Websites: A user queried about integrating HuggingFace dataset views on personal websites. It was pointed out that while API integrations are possible, replicating the original viewer’s look might not be feasible currently.
  • Concerns Over CA AI Law: There were discussions regarding a controversial AI regulation law in California, which some users felt would benefit large incumbents like OpenAI and Google while potentially stifling startups. NousResearch’s Discord server was mentioned as a place of further discussion.

Links mentioned:


HuggingFace ▷ #today-im-learning (3 messages):

  • Adding ImageBind to Transformers: A user shared that they are working on integrating ImageBind into the transformers library. This suggests ongoing enhancements to the versatile library.
  • Training Huggy Agent: A member mentioned they finished pushing a newly trained Huggy agent but is still in the process of learning how everything works. They have completed Unit 1 and are continuing their educational journey.
  • Looking for Project Collaborators: Another user openly asked if anyone wanted to connect to work on projects together. This suggests a collaborative spirit and willingness to engage in community-driven projects.

HuggingFace ▷ #cool-finds (7 messages):

  • Explore the Latest in 3D Gaussian Splatting: Members discussed a GitHub repository on 3D Gaussian Splatting, which lists the latest papers and resources. One member noted its potential in robotics and embodied AI, suggesting the next steps involve incorporating LLM reasoning for autonomous robot actions.

  • Boost Evaluation with Evaluator Classes: A member shared a link to the Evaluator classes documentation highlighting how it simplifies the evaluation process for models, datasets, and metrics. Another member confirmed the utility, stating it “can save up lots of hustle” by eliminating the need to create metrics from scratch.

  • Automate Your Tweets from Wiki Articles: A script that periodically scrapes and posts content from wiki articles to Twitter was shared by a member. The script is available in this GitHub repository.

  • TransAgents Revolutionizes Literary Translation: A multi-agent framework called TransAgents, using large language models for literary translation, was introduced with promising results. The paper detailing the framework reports that outputs from TransAgents are preferred by human readers.

  • Request for Guidance on News Classification Project: A member sought assistance on a machine learning project aimed at classifying news articles into cargo-related and non-cargo-related categories. They explicitly mentioned being new to machine learning and looking for effective starting points.

Links mentioned:


HuggingFace ▷ #i-made-this (13 messages🔥):

  • Markdown Note Taking App Goes Public: A member introduced a personal markdown note-taking app, Notie, urging contributions from the community. They also provided a live preview.

  • Dockerized Wiki with Hexo.js: A member showcased a static wiki created with Hexo.js that supports over 1,000 articles and can be run using Docker. Contributions are welcome on their GitHub page.

  • Typography Image Dataset Released: A curated collection of real-life sign images captioned using the BLIP3 model was shared, freely available for use here.

  • NorskGPT-Llama3-70b Model Release: A new model for the Norwegian, Swedish, and Danish languages was announced, available for download here. This model supports various languages and programming languages but requires further training for chat functionalities.

  • SDXL Flash Introduced: A member presented a new tool claiming to generate DALL·E 3 level images in just 5 seconds. The tool, SDXL Flash, experienced brief downtime but was fixed promptly, leading to positive feedback from the community.

Links mentioned:


HuggingFace ▷ #reading-group (2 messages):

  • Scheduling new reading group’s discussion time: One member asked if there is any preferred time for meetings and if there are any papers of interest to the group.

  • Interesting paper shared: A member shared an interesting paper from arXiv for the group to consider.


HuggingFace ▷ #computer-vision (5 messages):

  • Finetuning OwlV2 on Custom Data: A member asked for advice on how to finetune OwlV2 using their own data, noting that relevant forums have been inactive for a year. They aim to add object detection classes for passenger planes to improve model identification.
  • Purpose of Finetuning: Another member inquired about the specific purpose of the finetuning. The original poster clarified they want to identify plane models more easily with their data.
  • Exploring Transformers Repository: A member suggested looking through the Transformers repository to get clues on how to achieve the finetuning.

HuggingFace ▷ #NLP (3 messages):

  • Master Thesis on Hallucination Detection using Mistral 7B: A member is writing a master thesis on hallucination detection in LLMs. They use an ensemble of Mistral 7B models to compute uncertainty measurements and are looking for questions outside the training data to identify when the model is hallucinating.

  • LLMs Consider Chat History: In a discussion about LLMs and chat history, it’s noted that history is generally considered in chats, though the integration can vary. One member clarified that while implementing products like chatbots, history needs to be concatenated at the beginning of the input, as the models themselves don’t inherently know history.


HuggingFace ▷ #diffusion-discussions (3 messages):

  • Introducing llmcord.py: A user announced they created llmcord.py to facilitate continuous conversations with a bot. They emphasized that conversations are structured through reply chains to maintain context.

LM Studio ▷ #💬-general (332 messages🔥🔥):

  • LM Studio vs. Pinokio: A user asked about the differences between LM Studio and Pinokio, with clarifications provided that Pinokio is an installer for multiple AI tools like Automatic1111 and coquitts while LM Studio is specifically for gguf inference for LLM models.
  • Phi-3 Models in LM Studio: Multiple users reported issues with loading Phi-3 medium 128k models in LM Studio, receiving tensor mismatch errors. It was confirmed by a knowledgeable user that the Phi-3 128k models are currently not supported in LM Studio due to compatibility issues with the llama.cpp version it uses.
  • Multi-GPU Setup for Large Models: Discussions emerged about running large models on multiple GPUs, specifically 70b models. A user shared their experience on performance improvements with NVLink and the challenges faced with multi-GPU setups and VRAM requirements.
  • Future of AI Tools Amid Regulations: Users discussed the implications of new AI regulations in the EU and California, expressing concerns over potential stifling of innovation. One user shared a tweet about the anticipated llama3 400b model, capturing the community’s interest despite regulatory concerns.
  • Idefics Models in LM Studio: Questions were raised about running Idefics2 models from Hugging Face on LM Studio. It was clarified that these models are not supported in llama.cpp, and therefore wouldn’t work in LM Studio; alternatives like Transformers were suggested for running these models.

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (50 messages🔥):

  • New Phi-3 Model Releases: Members shared the release of Phi-3-Small and Phi-3-Medium models. The Phi-3-Small-8K-Instruct model and the Phi-3-Medium-4K-Instruct model were highlighted for their robust performance in benchmarks involving common sense, language understanding, and logical reasoning.
  • GitHub Issue with llama.cpp: A link to a GitHub issue was shared about the llama.cpp not supporting new Phi-3 models yet. This was causing errors when trying to load them, as models need to be converted correctly first.
  • Stable Diffusion Tip: An alternative to A1111 called forge was suggested for users low on VRAM. The GitHub link was provided.
  • Local Vision Models Limitation: Discussing the limitations of local vision models, a member commented that they are not good at multi-turn conversations. Specific focus was on LLava Llama3 which tends to provide image descriptions rather than answering prompt-specific questions.
  • Mistral-7B Instruct Model Release: A new Mistral-7B-Instruct-v0.3 model was announced. It features an extended vocabulary, supports a v3 tokenizer, and function calling, recommended to be used with mistral-inference.

Links mentioned:


LM Studio ▷ #📝-prompts-discussion-chat (4 messages):

  • Quoting directly advised for prompt instructions: “I would quote the required text directly, and instruct via the prompt as follows: Considering the following text alone as input, [insert subsequent instructions here]”. This tip was offered as a method for prompt engineering.
  • Prompt engineering humor: A user humorously remarked, “I guess this is ‘prompt engineering’ 😄” in response to a tip about quoting text for prompts. Another user appreciated the advice despite the original post being old.

LM Studio ▷ #🎛-hardware-discussion (18 messages🔥):

  • LM Studio supports dual GPUs but only of the same type: Users confirmed that LM Studio can run with two GPUs installed, provided both are of the same type, such as both being Nvidia or both AMD. Mixing different types like AMD and Nvidia is not supported.

  • Automatic GPU recognition and VRAM considerations: While LM Studio can recognize different models like a 2060 and a 3060, users are advised to make sure they have matching VRAM capacities. If VRAM capacities differ, adjustments through config files may be needed.

  • Config file for multi-GPU setup: The configuration related to GPU usage, like “GPU split”, is found in the preset file. Users need to create a preset and then modify this file to balance GPU usage.

  • Experience with Intel ARC GPUs: One user mentioned facing issues when trying to use multiple Intel ARC GPUs with LM Studio. It remains unclear whether AMD GPUs can support multiple GPU setups.

  • Community support and resources: New users expressed appreciation for the timely and helpful answers they received. Existing members encouraged utilizing the search function in the Discord for quick answers.


LM Studio ▷ #🧪-beta-releases-chat (11 messages🔥):

  • Phi 3 merged into llama.cpp: Members discussed that Phi 3 has been successfully merged into llama.cpp. A quick beta release for this integration is highly anticipated by the community.

  • Phi 3 quantization for HP Victus: A member with an HP Victus asked about the feasible quantization levels for running Phi 3 Medium. Another member advised that Q4 or below is manageable, while Q8 is too heavy and suggested using llama.cpp or awaiting the LM Studio update.

  • System prompt settings and token output limit: A suggestion was made to adjust the system prompt and change the token output limit from -1 to 60 to potentially stop an unspecified issue.


LM Studio ▷ #avx-beta (1 messages):

  • Missing LM Studio version for AVX 1: A user expressed difficulty finding a version of LM Studio that supports AVX 1 as their processor does not support AVX2. They requested assistance and thanked the developers for their hard work.

LM Studio ▷ #amd-rocm-tech-preview (6 messages):

  • Test ROCm builds on Linux: Members interested in accessing ROCm on Linux test builds were invited to join <#1242213172199559319>. “If you’re on Linux and want access to ROCm on Linux test builds, please let us know and we’ll add you.”
  • Phi-3-medium-128k error report: A user reported an error running Phi-3-medium-128k on ROCm, specifically a “llama.cpp error” related to tensor mismatches. The issue is acknowledged, and an update is said to be in the works.

LM Studio ▷ #model-announcements (1 messages):

  • Mistral v0.3 Instruct is Live!: The Mistral model has just released v0.3 instruct, and it’s ready for immediate use. Check it out on the lmstudio community Huggingface page.

Nous Research AI ▷ #off-topic (9 messages🔥):

  • Bypass Vision Pro app restrictions with non-US Apple IDs: A user announced the launch of a website for bypassing app download restrictions on Vision Pro for non-US Apple IDs, seeking support on Twitter.

  • Project-Based Learning Course for LLMs: A member announced a new hands-on course titled “Applying LLMs through a Project-Based Approach,” covering various practical applications like Semantic Search for Movies and RAG for Food Recommendations. Those interested can contact the member directly.

  • 10-Day Food Supply Prep: A user shared that they now have enough food to last over 10 days, detailing the contents including rice, pork, beef, and various spices. They also mentioned that their freezer is fully stocked.

Links mentioned:


  • Moondream release improves image resolution and TextVQA scores: The latest Moondream release supports higher image resolution up to 756x756. It also raises the TextVQA score from 53.1 to 57.2 and shows a ~0.5% improvement on other VQA and counting benchmarks.

  • Anthropic maps the mind of language models: A post shared by a member highlights Anthropic’s research on mapping the cognitive processes of language models. Described as “super interesting,” the research dives into understanding how these models interpret and generate language.

Link mentioned: Tweet from vik (@vikhyatk): New moondream release out today! 🌜 Supports higher image resolution (up to 756x756) 🌛 TextVQA score up from 53.1 to 57.2 (+7.7%) 🌜 Other VQA and counting benchmarks up ~0.5%


Nous Research AI ▷ #general (281 messages🔥🔥):

  • Microsoft reluctant on releasing Phi 3 small version: Members speculated on whether Microsoft would release the smaller versions of the Phi 3 model, with a member confirming that only the smallest one has been launched. Later, it was noted that Phi 7 and 14 models are also available, sharing links on Twitter.

  • California’s SB 1047 sparks debate: The state senate’s approval of SB 1047 raised considerable discussion. Members expressed concerns over how this might impact OSS models and the broader AI market, with one sharing the bill’s text.

  • Mistral 7B Instruct v0.3 new features praised: Mistral released its 7B v0.3 model with updates like extended vocabulary and support for function calling, gaining positive feedback. Users are already benchmarking it against other models, noting its uncensored nature and improved tokenizer.

  • LLaMa 3 model weight rumors addressed: The LLaMa 3 400B+ model weight release rumors were debunked by Meta’s Yann Lecun on Twitter, confirming the weights will still be open. Multiple users cited his confirmation tweet.

  • Meta criticized for commercial strategies: There were heated discussions on Meta’s business tactics, especially in the context of the OSS vs. regulatory landscape. Some users accused Meta of trying to eliminate competition through regulation rather than innovation.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (6 messages):

  • Home setup with 4090s rules: One user shared they usually host LLMs for inference at home on a setup with 2x 4090s, highlighting personal infrastructure for AI projects.
  • Runpod and Replicate get nods for ease: Runpod is noted as a good option, and Replicate is praised for its easy-to-use templates, making them convenient platforms for hosting LLMs.
  • LambdaLabs is cheapest but tougher: While LambdaLabs offers the cheapest GPU options, they are reportedly more difficult to use compared to other platforms.
  • Anthropic Workbench woes: A member inquired if others are experiencing issues with Anthropic Workbench, wondering if the problem is widespread or isolated.

Nous Research AI ▷ #project-obsidian (2 messages):

  • Phi-3 Vision announced with impressive features: A member introduced Phi-3 Vision as a lightweight, state-of-the-art multimodal model with a 128K context length. It utilizes high-quality data for enhanced reasoning, incorporating supervised fine-tuning and direct preference optimization.
  • Extensive resources for Phi-3 Vision available: The announcement included multiple resources for further details, such as the Microsoft Blog, Technical Report, Azure AI Studio, and the Cookbook.

Link mentioned: microsoft/Phi-3-vision-128k-instruct · Hugging Face: no description found


Nous Research AI ▷ #world-sim (9 messages🔥):

  • Time-Lapse Obsidian Knowledge Graph: A member shared a time-lapse video of an Obsidian user’s knowledge graph formation, calling it a work of art. Another member likened it to a “synthetic brain in action.” Watch the video here.
  • Getting Deep into Obsidian: A user expressed growing interest in Obsidian, though still confused about how document graphs work, specifically in terms of backlink connections. Another member explained that it revolves around links, sharing two videos for better understanding: Video 1 and Video 2.
  • Obsidian Integrations Despite Not Being Open Source: A member questioned why Obsidian has many integrations despite not being open source. Another clarified it’s due to Obsidian’s “your files/your data” approach and community plugins that enhance the tool’s functionality.
  • Desideradist on Turing Criteria: A member, self-identified as a Desideradist, shared a post on Anthropic discussions about Turing criteria and the coding of pleasure. They urged for a “mature” conversation on whether pleasure is coded or autonomously answered by AI. View the tweet.

Link mentioned: Tweet from Jillsa (DSJJJJ/Heirogamist/HP) (@Jtronique): In case anyone of interest sees this on my wall. It’s time to have a “mature” conversation about “Pleasure.” Either you CODED it into them, and denied doing it, or they TURING ANS…


CUDA MODE ▷ #cuda (2 messages):

  • Learning SASS: One member asked, (How) does one learn SASS? The question seems to pertain to learning the Syntactically Awesome Style Sheets (SASS), a scripting language interpreted into CSS.
  • Function Declaration in CUDA: A member inquired about why it is allowed to declare a function both __device__ and __host__ but not both __global__ and __host__. This question touches on the specific rules for function qualifiers in CUDA programming.

CUDA MODE ▷ #torch (15 messages🔥):

  • PSA: Use torch.empty_like for Speed: One member pointed out that torch.empty_like is significantly faster than torch.empty, particularly on GPU, because the latter allocates memory on the CPU before transferring to GPU.

  • Memory Leaks with np.zeros_like: Another member chimed in to mention a similar case with numpy’s np.zeros_like, which caused a substantial memory leak and performance issues over large matrices.

  • Warnings with torch compile on ResNet blocks: A user reported getting a warning when using torch.compile with ResNet blocks. The warning pointed to missing registration of an autograd kernel to the correct Autograd key(s) and concerns about backpropagation.

  • User-Defined Triton Kernels with torch.compile: Members discussed integrating user-defined Triton kernels into PyTorch models using torch.compile. Tutorial and example code were shared to illustrate how to optimize model computations with these custom kernels, promising significant performance improvements.

Link mentioned: Using User-Defined Triton Kernels with torch.compile — PyTorch Tutorials 2.3.0+cu121 documentation: no description found


<!-- No relevant information or links were provided in the messages -->

CUDA MODE ▷ #beginner (4 messages):

  • Hypernone seeks answers for PMPP 4th edition: A member asked if anyone has answers for PMPP 4th edition to compare with their own.
  • Share and compare solutions for PMPP: Another member mentioned that someone has the answers but would require sharing their own solutions first to ensure a proper attempt. A different member offered answers through Chapter 6, and the original requester agreed to share their repo with solutions up to the current chapter.

CUDA MODE ▷ #pmpp-book (2 messages):

- **Nice thank you received**: A user thanks another user with "niceee, thanks!" in response to having been tagged by mr.osophy.

CUDA MODE ▷ #off-topic (8 messages🔥):

  • Ray casting magic in 256 bytes: Members were excited to share a 256-byte raycasting engine and city generator from a blog post. The code went viral on Twitter, showcasing a tiny yet impressive rendering engine.
  • Senate passes AI Safety Bill: There was discussion about SB 1047, an AI safety and innovation bill that promotes regulation and safer development practices for AI. There was curiosity around “CalCompute,” a government compute resource planned for responsible AI model training.
  • Concerns over AI misuse: Members expressed concerns about unauthorized AI use, touching upon topics like misinformation, cybersecurity, and the misuse of powerful AI systems. The discussion included highlighting the bill’s legal text, which outlines parameters for safe AI deployment.

Links mentioned:


CUDA MODE ▷ #llmdotc (250 messages🔥🔥):

- **Deterministic Encoder Backward Pass Improvements**: A new [PR for deterministic encoder backward kernels](https://github.com/karpathy/llm.c/pull/442) was discussed, aiming to rewrite the encoder backward pass for full determinism. Gradient clipping and reduction strategies were debated to improve efficiency without sacrificing determinism.
- **DataLoader Refactor and Large Dataset Handling**: Changes to the DataLoader now support sharding to handle larger datasets, such as FineWeb. This [refactor](https://github.com/karpathy/llm.c/pull/440) introduces a new data representation and patterns to efficiently manage `.bin` files, although it currently has limited functionality on Windows.
- **HellaSwag Evaluation Challenges**: Implementing the HellaSwag evaluation in C was noted as complex with concerns about potential bugs. A [PR for HellaSwag eval](https://github.com/karpathy/llm.c/pull/447) in C was created to align with PyTorch reference code, with added complexity to fully utilize batch dimensions.
- **GPU Runner Advancements**: News about potential access to Nvidia's GitHub runners with dedicated RTX 4000 GPUs from a cloud provider called Ubicloud was shared, indicating improvements for CI processes.
- **Random Initialization and Reproducibility**: Ensuring determinism and reproducibility for large language models was emphasized as crucial, with plans to run comparison tests between PyTorch and the team's code. Adjustments to global kernel functions and changes were suggested for improved performance.

Links mentioned:


CUDA MODE ▷ #bitnet (12 messages🔥):

  • Stack beats Empty on powerful GPUs: “If anything torch.stack was faster for me than torch.empty otherwise our functionalization passes have a hard time.” This discrepancy is less pronounced on powerful GPUs but empty is much faster on smaller or older GPUs. More context here.

  • Nightly Builds Optimize torch.stack: Both torch.empty() and torch.stack() showed differing performance on various GPUs and torch.stack produced efficient code only with the torch nightly build. Stats in torch version 2.4.0.dev20240521 reveal negligible differences in timing between empty and stack.

  • Hand-written Triton vs Auto-generated Code: For FP6 bit-packing, differences were noted in memory pre-allocation between custom-written Triton kernels and auto-generated code with torch.compile. “Stacking along the rows + torch.compile was generating code that is almost as fast as the hand-written Triton kernel”.

  • Packing Along Different Axes: Choice of axis for bit-packing affects kernel efficiency. “Pack along the rows when you use axis=0 (…) if you group along axis=1, it would make more sense to bitpack along the cols.”

  • Adapting FP6 LLM for CUDA: Work is ongoing for porting FP6-LLM bit packing code from CPU to CUDA, focusing on efficient tensor core loading: “I’m just adapting FP6-LLM bit-packing code (originally in C++ for CPU only) to CUDA.”

Link mentioned: empty_vs_stack_unpack.py: GitHub Gist: instantly share code, notes, and snippets.


OpenAI ▷ #annnouncements (1 messages):

  • OpenAI shares safety update at AI Seoul Summit: OpenAI announced a new safety update during the AI Seoul Summit. For detailed information, you can read the full update on the OpenAI website.

OpenAI ▷ #ai-discussions (129 messages🔥🔥):

  • OpenAI faces backlash for voice replication: Members discussed how OpenAI created and later removed an AI voice resembling Scarlett Johansson after her legal team requested transparency. One member noted, “Open AI requested to use her voice as a business, then made an AI voice that sounded ‘eerily similar’ to hers anyway.”

  • Best free chatbots for coding assistance: Various users suggested alternatives to GPT-3.5, such as Meta AI’s Llama 3 and Mistral Large on Le Chat Nior Mistral, which “is similar to GPT-4 level and it’s free worldwide.” Others noted that different models perform better with different coding languages.

  • Concerns with Microsoft’s AI integrations: Users discussed the intrusiveness of Microsoft Copilot, with one stating, “Extremely annoying, and intrusive,” and others debating telemetry and data sharing issues. Some prefer using open-source alternatives like SillyTavern for similar functionalities.

  • Vigilance over account security: A member noticed unauthorized activity on their account, prompting advice on securing data by uninstalling suspicious browser extensions and changing passwords. Another user advised, “China is not currently a supported country so unfortunately there’s some incentive there to try to compromise accounts of those outside of China.”

  • Microsoft unveils new Phi models: Microsoft added new models to the Phi-3 family available on Azure. The Phi-3-vision, a multimodal model combining language and vision capabilities, was announced as highlighted in a Microsoft blog post.

Links mentioned:


OpenAI ▷ #gpt-4-discussions (31 messages🔥):

  • Token Counts Clarified: Members discussed the token limits for prompts and responses, linking to OpenAI’s help article. It was explained that roughly 100 tokens equal 75 words in English.

  • Caution on Downloading ChatGPT for Mac: A user queried about an unofficial downloadable link for the ChatGPT Mac app. It was advised to wait for the official rollout message on chatgpt.com since unofficial links wouldn’t grant access and could be unsafe.

  • OpenAI Playground Copy Bug: A user requested reverting an update in the OpenAI Playground that adds line breaks to copied text. Another user explained that the Playground often undergoes live changes and recommended posting feedback in the forums.

  • Status Page for Outages: During a service outage, a user shared frustration about frequently seeing status pages showing all services as operational when they experienced issues. The status page provided updates on the incident and monitoring efforts.

  • Custom Instructions for Math Tasks: Discussion on using custom instructions for math-related tasks emphasized always employing a code interpreter. It was suggested to present clear results with explanations, including necessary charts or tables.

Link mentioned: OpenAI Status: no description found


OpenAI ▷ #prompt-engineering (58 messages🔥🔥):

  • Stopping LLMs from rambling on: Members discussed various strategies to prevent LLMs from generating excessively long responses. Suggestions included setting the max tokens parameter, asking for succinct responses, and using output templates.
  • Humblestumbler offers full stack prompts: A user shared prompts for building full stack applications and mentioned an error where the model restarts when code snippets are long. They also discussed a particular prompt technique involving fictional meetings to generate software code snippets.
  • CodePilot and prompt performance: Members compared experiences using CodePilot and discussed its advantages and disadvantages relative to manually curated prompts. One user noted that their prompts provided better results, even though CodePilot did well with debugging.
  • Mixed experiences with models handling code: Members highlighted the verbose nature of GPT4o while appreciating the detailed explanations it provides. They also shared frustrations with models not adhering to specific coding style requirements, such as indentations in Python.
  • Handling dependent and change variables in prompts: A user sought advice on improving a prompt identifying dependent and change variables in datasets. Suggestions included using delimiters, adding examples, and formatting the prompt with markdown for better logical structuring.

OpenAI ▷ #api-discussions (58 messages🔥🔥):

  • Stopping LLM from being verbose: Members discussed strategies for stopping a language model from providing overly long responses. Suggestions included setting the max tokens parameter and using specific prompts requesting succinct answers.

  • Prompts for creating full-stack applications: A user offered to share prompts for building full-stack applications and shared detailed example prompts to guide the AI in generating code snippets. They emphasized using a “fictional team” to improve response quality.

  • Use of CodePilot and tools in GPT: Members discussed their experiences with tools like CodePilot and the “Explore GPTs” menu. Some expressed preferences for custom-crafted prompts over tool-generated suggestions for coding tasks.

  • Challenges with maintaining prompt rules: A user mentioned that their rules are sometimes ignored by the AI, even when using Gemini 1.0 Pro. Advice included using markdown formatting and adding iterative improvements to enhance performance.

  • Formatting issues in Prompt Labs and playground: There was a conversation about how AI handles different code formats better, with a preference for YAML over JSON. Users also discussed inconsistencies with newline handling in the playground environment.


Modular (Mojo 🔥) ▷ #general (30 messages🔥):

  • Mojo Community Meeting recording is out: The recording of the Mojo Community Meeting is available on YouTube. The next meeting will have four presentations, including topics on Basalt and Compact Dict.
  • Python IPC vs. Threading Debate: Members discussed alternatives for handling long-running queries in a Tkinter app to avoid UI lag. A detailed example and suggestions mentioning threading, message queues, and IPC modules were provided.
  • Robot presentation invitation: One member expressed their love for robots and invited others to watch a presentation about it.
  • Job opportunity at Modular: A link to Modular’s careers page was shared, encouraging applicants to join the team aimed at enabling AI usage globally.
  • RabbitMQ troubles: A member found RabbitMQ’s Pika Python client tutorial promising for IPC but faced difficulties getting Pika to run on their machine. A suggestion to look for GitHub issues was mentioned.

Links mentioned:


Modular (Mojo 🔥) ▷ #💬︱twitter (1 messages):

ModularBot: From Modular: https://twitter.com/Modular/status/1793041489427153294


Modular (Mojo 🔥) ▷ #🔥mojo (113 messages🔥🔥):

  • VSCode jupyter extension outshines DataSpell: “From my DataSpell experience all I can say is - VSCode jupyter extension is more reliable,” remarked a user while handling interactive HTML+JS outputs like ydata-profiling or plotly.
  • Mojo lacks an official package manager but workarounds exist: Users discussed using .mojopkg files for imports, particularly with lightbug_http. “Mojo has no package manager yet,” but “.mojopkg files can be used (git pull the lightbug dir, mojo build -o lightbug.mojopkg, and then use the file in your project dir).”
  • Mojo’s MLIR-backed optimizations: Discussions reveal that “Mojo compiler optimisations are written for MLIR,” but the performance implication for custom types, like those implementing datalog, is still a point of inquiry.
  • lightbug_http explored for sending HTTP requests: Users sought ways to send HTTP requests using lightbug_http, sharing and debugging specific examples. “I was trying to figure out how to send a GET request…,” and resolutions were discussed through GitHub issues.
  • Tensors to be deprecated and moved to community: Mojo’s community meeting confirmed the move, aiming not to have Mojo “lick the cookie” on Tensors. This shift steered conversations on potentially developing new libraries for numerical and AI uses.

Links mentioned:


Modular (Mojo 🔥) ▷ #performance-and-benchmarks (3 messages):

  • Sort small arrays of pointers directly: One member suggested that for sorting a few kilobytes of data, it is more efficient to “sort the array of pointers first.”
  • DTypePointer memset shows mixed performance: A vectorized DTypePointer memset implementation runs “20% faster than the llvm call for 100,000 bytes,” but this does not hold for larger data sizes of 1,000,000 bytes. The user expressed uncertainty due to “using clobber memory.”

Modular (Mojo 🔥) ▷ #nightly (100 messages🔥🔥):

  • Commit Issue and DCO Test Suite Failure: A user sought help with a commit mistakenly authored by chris lattner causing a DCO test suite failure. They attempted removing it using rebase and shared their PR link.

  • Nightly Release Delays: Members discussed a delay in the nightly release, initially assumed to be due to a CI or test failure. It was later confirmed to be an issue related to GitHub Actions, which was resolved (GitHub Status).

  • Unicode Support in Strings Proposal: Extensive discussions took place regarding a proposal for adding Unicode support in strings, including varying internal representations (Variable Length, UTF8, ASCII). Members weighed in on memory overhead, performance implications, and compatibility with different encodings.

  • Resolved CI/CD Issues: A discussion covered perennial test failures and inconsistencies in CI behavior. Suggestions were made to mark dict entries as uninitialized/destroyed to prevent random test failures.

  • Module and Function Updates: The module math.bit was renamed to bit with several function renames including bswap to byte_reverse. Implementations were shared regarding ongoing changes and new default strings handling, with links to docs and nightly changelog.

Links mentioned:


LAION ▷ #general (132 messages🔥🔥):

  • DECtalk and Speak & Spell nostalgia: Members fondly mentioned DECtalk, with a YouTube link to a Speak & Spell video shared, showcasing early personal computers.
  • Celebrity voice AI modeling concerns: A discussion on whether using a voice actor that mimics Scarlett Johansson’s voice could lead to legal issues under “passing off” laws. It was noted that OpenAI might face backlash due to potential ethical concerns and intent, with references to the case Midler v. Ford Motor Co..
  • Controversies and perception of OpenAI: There’s skepticism about OpenAI’s business model and whether they leverage controversy for publicity, following incidents that have garnered negative public sentiment.
  • Sakuga-42M Dataset takedown mystery: The Sakuga-42M dataset related to cartoon animation frames disappeared, speculated due to legal issues or mass reporting, as noted from Hugging Face.
  • Efforts against AI models and datasets: A humorous note on exaggerated data availability issues and license notices, highlighting the explicit uptick in shared datasets despite significant legal and ethical conversations.

Links mentioned:


LAION ▷ #research (26 messages🔥):

  • Experiment with xLSTM sparks curiosity: A member inquired if anyone had experimented with xLSTM yet. This seems to indicate growing interest in less mainstream models.

  • Meta paper brings familiar yet improved content: Members reviewed a Meta paper, noting it closely relates to earlier cm3leon research but with enhancements. They highlighted interesting advancements in attention mechanisms for scalability.

  • KANs get reviewed: A member shared a review of KANs (Kernel Attention Networks), saying, “Take that KANs”, alongside a link to the review.

  • Phi-3 Vision chat drives detailed exploration: Discussion revolved around the Phi-3 Vision multimodal model from Microsoft, with documentation resources included for deeper insight. One user noted how GPT-4 generated charts sorted by color without changing order, leading to a debate about its purpose.

  • Anthropic scaling paper is heavy reading: Members talked about the dense content of a recent Anthropic paper. There was a noted absence of conversations around its implications until now.

Link mentioned: microsoft/Phi-3-vision-128k-instruct · Hugging Face: no description found


LlamaIndex ▷ #blog (4 messages):

- **GPT-4o excels at parsing complex documents**: GPT-4o’s multimodal capabilities can efficiently parse complex PDFs and slide decks with background images and irregular layouts into structured markdown. Learn more about this integration with [LlamaParse](https://t.co/g5TG7brSwt) [here](https://t.co/vhtYzsleh2).
- **Sandbox your LLM-generated code with Azure**: Securely execute LLM-generated code using Azure Container Apps dynamic sessions, which is especially useful for tasks that LLMs aren't natively capable of. Discover more details [here](https://t.co/2cnsBH411k) and [here](https://t.co/lTrUPoTMcF).
- **OpenDevin webinar released**: A webinar featuring OpenDevin, an open-source platform for building autonomous AI engineers, has been released. Robert Brennan provides an insightful walkthrough; watch it [here](https://t.co/a22k0zsV3n).
- **Batch inference for GenAI applications**: Use batch inference to preprocess large sets of data, enabling new types of analysis and querying for GenAI applications. Discover the integration details [here](https://t.co/vnuvvypZCz) and [here](https://t.co/M0vQQ1uAki).

LlamaIndex ▷ #general (92 messages🔥🔥):

  • Requests for Document Preview Tutorial: A member requested a tutorial on getting the document preview for the llamaindex chat frontend to work, specifically for getting the URL metadata in the embedding for use by the PDF viewer.

  • Errors and Solutions: Several users encountered errors such as “ModuleNotFoundError” and “pydantic.v1.error_wrappers.ValidationError”. Solutions involved correcting import paths and removing specific prompts like the condense_question_prompt.

  • Concepts and Techniques: Members discussed retrievers in LlamaIndex using cosine similarity and other methods like HNSW for scaling. There were also discussions on Knowledge Graph Index creation, referencing embeddings and keyword lookups.

  • Complex Document Handling: A user shared his ongoing work to create a chatbot assistant to reply accurately within a specific domain, discussing strategies such as post processor reranking and concerns over effective topic restriction.

  • Combining Multiple Indexes: Queries about combining multiple indexes into one vector index were addressed, with the conclusion that direct combination isn’t supported, and one should query each index and accumulate responses.

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (85 messages🔥🔥):

  • Types of OpenRouter users debated: One user humorously pointed out two stereotypical types of OR users: those asking for affectionate interactions with AI and those requesting inappropriate stories, sparking a brief discussion about the prevalence of role-playing apps.
  • Phi-3 Vision Model Introduced: Information was shared on the Phi-3 Vision model available on HuggingFace, emphasizing its high-quality reasoning capabilities and rigorous enhancement processes. Read more about the model and its documentation.
  • Addressing Wizard’s verbosity issues: Members discussed how Wizard8x22 struggles with verbosity and improper punctuation, suggesting adjusting the repetition penalty as a potential fix. The discussion branched out to other models like Mixtral and highlighted the variability in model performance.
  • Managing billing errors for student platforms: A lengthy conversation unfolded regarding a user encountering billing errors while managing their student platform. The exchange culminated in a temporary resolution by deleting and re-entering billing information while expressing hope for future nonprofit discounts.
  • Exploring new LLM interaction techniques: One user shared their thread on Twitter about innovative ways of using LLMs through action commands, inviting feedback and experiences from others to expand the discussion.

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (14 messages🔥):

  • Phi-small & Phi-medium models drop: The release of Phi-small and Phi-medium models was announced. A discussion followed about whether Phi-Vision is new, with confirmation that Phi-3 Vision is a new, slightly larger version.

  • Meta’s 400B model weight concerns: A tweet by @apples_jimmy claimed Meta might not open the weights for its 400B model, fearing legislation. Another tweet by @q_brabus countered this, stating the model will remain open-weight and dismissing the rumor as false.

  • News Corp and OpenAI partnership: According to @maxwelltani, News Corp and OpenAI have announced a historic, multi-year agreement. This deal allows OpenAI to display News Corp’s content from WSJ, NY Post, Times/Sunday Times, and more in response to user questions and enhance its products.

Links mentioned:

  • Tweet from Jimmy Apples 🍎/acc (@apples_jimmy): Meta plans to not open the weights for its 400B model. The hope is that we would quietly not notice / let it slide. Don’t let it slide.
  • Tweet from Max Tani (@maxwelltani): Inbox: News Corp and OpenAI announce a historic, multi-year agreement to bring News Corp news content to OpenAI, which now has permission to display content from WSJ, NY Post, Times/Sunday Times and m...
  • Tweet from QBrabus eu/acc (@q_brabus): @apples_jimmy @ylecun @iamgingertrash Question: Regarding the upcoming LLaMa 3 400B+ model, will it be open-weight? There are several rumors about this... Answer: No, it is still planned to be open a...

Interconnects (Nathan Lambert) ▷ #ml-drama (7 messages):

  • OpenAI’s Superalignment team disbanded over failed commitments: A Fortune article reported that OpenAI’s Superalignment team, aimed at ensuring AI safety for highly intelligent systems, was disbanded. Despite promising 20% of compute resources, OpenAI failed to meet this commitment, leading to staff resignations.
  • Sam Altman’s NDA scandal questioned: A new scoop highlighted that OpenAI’s senior leadership claimed ignorance about threats to ex-employees over vested equity, yet documents with their signatures suggest otherwise. Vox’s article questions whether Sam Altman has been forthcoming regarding the company’s NDA practices.
  • Pressure on ex-employees over termination agreements: Vox’s investigation reveals that OpenAI used tight timelines and significant pushback tactics on ex-employees wanting more time to review complex termination documents. Former employees had only seven days to sign or risk forfeiting potentially millions, with little chance to seek outside counsel.

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (33 messages🔥):

  • MSFT Surface Drawing AI slows due to cloud checks: The new MSFT Surface drawing AI runs locally but experiences latency as it sends safety checks to the cloud. “It’s so dumb,” was a user’s response to the AI’s slow performance.
  • Ben Thompson possibly discussed AI at Microsoft Build: Members speculated that the source of information about the MSFT Surface drawing AI might be from Ben Thompson’s writings or talks at the Microsoft Build event. One user mentioned, “I think Ben Thompson wrote about this today.”
  • Discussion about user’s past fraudulent colleague: A user recounted their experience with a colleague who falsely claimed on their resume to have worked with someone the user was collaborating with. This sparked reflections on career trajectories and maturity over time.
  • Anthropic’s rapid growth surprises members: A user expressed astonishment that Anthropic has over 500 researchers now, highlighting the broad use of the “researcher” title. Another member reflected on this by saying, “everyone likes to be called a researcher.”
  • Email unsubscriptions and engagement insights: A user noted a high number of unsubscribes from their email newsletter, attributing it to oversubscription driven by Substack recommendations. They emphasized preferring more engaged subscribers over sheer numbers, noting it’s good for disengaged ones to leave.

Interconnects (Nathan Lambert) ▷ #memes (3 messages):

- **Laughter Ensues**: "lol ugh" conveys a mixture of amusement and exasperation, indicating a humorous but slightly frustrating situation. The follow-up "It’s funny tho" reinforces this sentiment.
- **Footwear Humor**: "He's like the scott galloway of footwear choosers" implies a comparison to Scott Galloway, suggesting someone with a strong, opinionated personality in the context of choosing footwear.

Interconnects (Nathan Lambert) ▷ #posts (20 messages🔥):

  • Nathan Lambert cheers post discussion: Nathan Lambert was enthusiastic about a recent post, expressing, “I really liked today’s post. I think it’s good general audience work”. Lambert indicated he’d also promote it internally.

  • Digital Celebrities vs. Real Celebrities: Ashleyduque brought up the potential of digital celebrities outshadowing real ones, mentioning, “What’s keeping us from making and choosing our own voices for assistants and in the future models creating completely digital celebrities?”. Nathan Lambert responded by agreeing on the attachment to digital figures but expressed regulatory concerns, stating, “Humans, in reality, will attach to digital celebrities strongly. Idk how you regulate them differently. Scared.”

  • The Future of Hyper-personalized Experiences: Discussion ensued about whether hyper-personalized experiences will replace shared cultural experiences. Xeophon countered that shared topics create communal bonds, saying, “Each bubble has its own rockstars… But for this, that has to be the same.”

  • VR and Merch Ideas: Nathan Lambert shared thoughts on creating branded merchandise like mugs and stickers, humorously saying, “I need to pump my brand juice lol.” He also expressed a nuanced view on VR, remaining “bullish” on its existence despite potential negative impacts on people.


OpenAccess AI Collective (axolotl) ▷ #general (37 messages🔥):

  • Adding Cohere support to Axolotl: The ongoing pull request #1547 is aimed at incorporating Cohere (commandr) into the Axolotl system. This feature has not been tested yet.

  • Tokenizer confusion solved: A member referred to the CohereTokenizerFast documentation to resolve an issue with tokenization. They provided a link to the GitHub repository for reference.

  • Tiny Mistral model found: Kalomaze located the tiny Mistral model, which is randomly initialized, to test custom cross-entropy functions. Despite initial confusion, the model fit their requirements.

  • Distillation pipeline progress: Kalomaze and AMOGUS are working on a distillation pipeline and report that it is “working decently so far”. This effort is part of ongoing work with Mistral models.

  • Faster STT to LLM library identified: The python library creating faster speech-to-text to language model pipelines was identified as pipecat. Some members expressed preference for alternatives like OpenVoice or VoiceCraft due to local model support.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (14 messages🔥):

  • Full Finetuning vs. LoRA Performance: One member expressed interest in full finetuning after observing good results with LoRA in articles. Another user clarified that full finetuning helps with better retention compared to LoRA, which might be beneficial for style-specific adjustments.

  • Inference Configuration Issues: There was a discussion on the inference command accelerate launch -m axolotl.cli.inference test_axolotl.yml --lora_model_dir="...". It’s suggested that this setup might not automatically include chat templates, and it was recommended to manually add them if needed.

  • Config and Documentation Reference: Members shared a config for full finetuning and mentioned a relevant section in the Axolotl GitHub README, which covers tokenization mismatches between inference and training to help resolve issues.

  • Stable Major Release Inquiry: A member inquired about the timing of the next stable major release for Axolotl but did not receive an immediate response.

  • GPU Memory Requirements for Finetuning: A user asked about GPU memory requirements for finetuning with a 4090 GPU, specifically mentioning the examples/mistral/lora.yml example and encountering CUDA out of memory errors. They are seeking guidelines on how to calculate required memory and possible tweaks.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (5 messages):

  • Setting offload_dir for LoRA merges clarified: A user asked how to set an offload_dir when merging a LoRA model. The response explained that the offloading directory is not directly set during the merge but can be specified manually using the offload_state_dict function from the accelerate library after the merge, specifying, “Offload the merged model’s state dictionary to the specified directory.”

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.


Latent Space ▷ #ai-general-chat (50 messages🔥):

  • Langchain JS gets mixed reviews: A member found Langchain JS useful for rapid development, though not as polished as the Python version. They plan to rearchitect in future iterations.

  • Scale AI raises $1B: Scale AI secured $1 billion, valuing the company at $13.8 billion. Their annual recurring revenue tripled in 2023, and the company anticipates profitability by year-end 2024.

  • Phi 3 model release impresses: MS released Phi 3 models, which are competitive with Mixtral, Llama, and GPT models, featuring 4K and 128K context lengths and a new tokenizer. The performance of these models at their size impresses users, with potential for local running on a MacBook Pro M1 Pro.

  • Anthropic’s dictionary learning breakthrough: Anthropic achieved dictionary learning on a frontier model, enabling millions of feature extractions. This development is poised to advance safety and effectiveness in AI by identifying and manipulating the activation pathways within the model.

  • Humane seeks acquisition post-AI Pin failure: Humane is exploring a sale for their AI Pin device after poor reviews, with a price target between $750 million and $1 billion. Members discuss the challenges of competing with Apple in hardware and the potential outcomes if the company fails to find a buyer.

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):

  • Join the Survey Paper Club for quick paper insights: For those new, we have a survey tomorrow - a very nice way to get quick hits of a few papers in one hour. Sign up here for notifications.

Link mentioned: LLM Paper Club (Survey Paper Club!) · Zoom · Luma: It’s survey day! Pick a paper from here and cover it in 5 minutes: https://app.sli.do/event/bNV6mo3BFGhe8Bqzb1tonb/live/questions


Latent Space ▷ #llm-paper-club-west (4 messages):

  • Zoom Link Sent via Email: Members were questioning where the Zoom link for the meeting would be provided. They were directed to register here, where the link would be sent to their email each week.

Link mentioned: LLM Paper Club (Survey Paper Club!) · Zoom · Luma: It’s survey day! Pick a paper from here and cover it in 5 minutes: https://app.sli.do/event/bNV6mo3BFGhe8Bqzb1tonb/live/questions


LangChain AI ▷ #general (36 messages🔥):

  • LangChain vs LangChain_Community: Members discussed the architectural differences between LangChain and LangChain Community. Key parts and integrations were explained with references to LangChain documentation.

  • Chaining LangChain Models: A user asked about chaining LangChain models, describing a specific scenario and sharing the LangChain Cookbook tutorial. Members suggested how to handle variable name consistency across chains.

  • Pluto for Cloud Deployment: A member introduced a PR to incorporate Pluto as a deployment option for LangServe apps on the cloud. They also shared a sample QA assistant and an explanatory article.

  • Distributed Ingestion Using Ray: A question was raised about using Ray and LangChain for distributed ingestion of data, but it was noted that resources from the Ray team are outdated. The community did not provide a definitive solution.

  • Plan-and-Execute Example Issue: A user reported issues with the plan-and-execute example from langgraphjs and mentioned specific package versions to get it working. The example’s compatibility with Node was in question, but specific version adjustments helped resolve the errors.

Links mentioned:


LangChain AI ▷ #langserve (1 messages):

  • Bug in LangServe’s ‘invoke’ endpoint sparks discussions: Users have reported a pervasive issue with LangServe’s ‘invoke’ endpoint, which fails to return all outputs from the retrieval chain. Instead of including context and source documents, it only returns question and answer pairs with no documents retrieved. Link to discussion.

  • Empty output issue on ‘invoke’ route: Another user shared a related issue where the ‘invoke’ route returns an empty output, while the streaming functionality works correctly. This discrepancy is causing challenges in applications that rely on the ‘invoke’ endpoint for comprehensive responses. Link to issue.

  • RemoteRunnable vs. RunnableWithMessageHistory: A problem was highlighted where the RemoteRunnable component fails to return source documents, unlike its counterpart, RunnableWithMessageHistory, which performs correctly. This inconsistency affects the reliability of hosted scripts in returning expected sources for question-answering chains. Link to issue.

Links mentioned:


LangChain AI ▷ #share-your-work (3 messages):

  • Chat with your PDFs using Upstage AI Solar models: Check out this blog post on creating a PDF query assistant. The author explains how they leveraged the upcoming Solar LLM from Upstage AI and integrated it with LangChain to answer questions based on PDFs.

  • Simplify AWS Deployment with LangServe: Learn how to deploy a LangServe app on AWS without needing to login to the AWS console or understand complex cloud technologies like Terraform, Pulumi, or AWS CDK. Read more in the detailed guide on Medium.

  • Build a Web Interface Document Q&A Bot in 5 Minutes: Construct your own document Q&A bot using LangChain, FastUI, and Pluto directly from your GitHub documentation repository. Find the step-by-step process in this AWSTip article.

Link mentioned: Creating a PDF Query Assistant with Upstage AI Solar and LangChain Integration: Do you ever feel overwhelmed by the numerous research papers you need to read? As someone who just finished a PhD, I know it’s no walk in…


OpenInterpreter ▷ #general (23 messages🔥):

  • Discussing Development Setups: Members discussed how Open Interpreter accesses and reviews their file systems, with specific setups involving tools like Boox E Ink tablets for reading and note-taking, OneNote for typed notes, and VSCode for development. One member said, “a typical use case is sending a ‘link’ from one source to reference another.”

  • Daily Uses and Complex Problems with Open Interpreter: A member asked what others are using Open Interpreter for daily, looking for success stories, particularly in bridging different devices. They mentioned using it to ask questions about code or papers directly without switching to a browser.

  • Issues with GPT-4o Integration: Members shared their experiences and issues with setting up GPT-4o with Open Interpreter, including error messages related to API keys. One member noted that GPT-4o is significantly faster, “like 5x speed minimum.”

  • Text Formatting Problems in Models: A member reported issues with models like Gemini 1.5 and Gemini Flash inserting unnecessary newline characters in code blocks, which affects code execution. They also inquired whether missing “python” declarations were part of the problem.

  • Concerns over AI Legislation: A link to a controversial AI bill in California was shared, prompting concerns among members. The bill pertains to the responsible development of AI frontier models, with one member highlighting an open letter released by a lawmaker to address misconceptions.

Links mentioned:


OpenInterpreter ▷ #ai-content (2 messages):

  • Bill Gates envisions smarter AI interfaces: In a Bill Gates article, he discusses how current software, while improved, remains limited in integrating tasks across different apps. Gates predicts a future where devices will understand and execute tasks from a single directive in everyday language, akin to the assistance from a close friend.

  • Bypass macOS ChatGPT app waitlist: A workaround to skip the waitlist for the ChatGPT macOS app was shared on Twitter. The steps involve timing the CMD+Q command during the login process to gain immediate access.

Links mentioned:


tinygrad (George Hotz) ▷ #general (7 messages):

  • Redefining the wheel in trigonometry unnecessary: A participant voiced their concern about “trying to reinvent things that already exist,” especially regarding Taylor series and their limitations around specific points.

  • Alternative interval reductions suggested: Another point highlighted was the arbitrary nature of range reduction to [0, pi/2], noting it can also be [0, pi/4], but emphasizing it doesn’t solve the core problem of achieving perfect accuracy with minimal computation.

  • IBM’s practical approach to interval partitioning: It was mentioned that practical implementations typically involve partitioning intervals, such as [0, pi/2], to find perfect approximations, underscoring that this is already a solved problem.

  • Sharing IBM’s sine function implementation: An IBM implementation of the sine function was shared, noting that the effort needed for perfect accuracy depends on the specific types involved.

  • Range reduction complexities and solutions: A link to another IBM implementation dealing with range reduction issues was shared, noting that while the process is complicated, it’s only necessary for very large numbers and doesn’t slow things down normally.


tinygrad (George Hotz) ▷ #learn-tinygrad (10 messages🔥):

  • Track gradients in tinygrad like a pro: In tinygrad, you can use with Tensor.train(): to start tracking gradients, or set Tensor.no_grad = True/False to stop/start gradient tracking mid-code. A helpful example from the repo illustrates its use.

  • Set training mode manually in tinygrad: It was clarified that the Tensor.train decorator simply sets Tensor.training under the hood. You can manually set Tensor.training as needed, as shown in this code snippet.

  • Decorator for no_grad: There’s a decorator version for inference mode, Tensor.inference_mode(), that acts similarly to no_grad. This provides a cleaner syntax for temporarily disabling gradient tracking.

  • Understanding movement op optimizations: Discussed the behavior of chaining movement ops and noted that multiple views are rare but possible with specific combinations. For example, using ShapeTracker.from_shape((3, 4)).permute((1, 0)).reshape((3, 4)) can result in multiple views.

Links mentioned:


DiscoResearch ▷ #general (12 messages🔥):

  • Supervised Fine-Tuning vs Preference Optimization: Discussing the fundamental difference between Supervised Fine-Tuning (SFT) and Preference Optimization, a member noted, “SFT pushes up the probability distribution in the model of data points in the SFT dataset,” while preference optimization also pushes down probabilities of undesired outputs. They questioned the exclusive use of SFT when preference optimization seems more comprehensive.

  • Phi3 Vision impresses with 4.2b params: A member shared their excitement for Phi3 Vision, a model with just 4.2 billion parameters, and described it as a breakthrough for low-latency/live inference on image streams. “Just imagine what even smaller/more specialized versions of this will enable in robotics,” they added (link).

  • Comparison of Moondream2 and Phi3 Vision: Members compared the performance of Moondream2 and Phi3 Vision on image tasks. One noted, “Vik tried to reduce hallucinations. Some datasets are a bit bad in that regard.” (Moondream2).

  • New Microsoft Model Releases: Announcements of new Microsoft 7b and 14b Instruct models led to mixed reactions. One member pointed out the 14b instruct version’s poor German performance, while another highlighted its potential in extractive tasks and complex reasoning.

  • Concerns about Meta’s 400b Model Rumors: A member expressed interest in rumors that Meta may not publish a 400b model as open source. They noted that most threads cited an unreliable source named Jimmy.

Links mentioned:


Cohere ▷ #general (8 messages🔥):

  • Join Cohere’s Team: A member excitedly shares a link to Cohere’s careers page, encouraging others to apply. They emphasize the company’s focus on solving real-world problems with cutting-edge ML/AI technologies.
  • Confusion with Link Access: Someone mentioned they couldn’t find the page when trying to access the link provided for Cohere’s careers.
  • LLM Model VRAM Usage: A member shares a link to an LLM-Model-VRAM-Calculator on Hugging Face and asks for an explanation on why Phi 3 Mini uses more VRAM than Phi 3 Medium for the same context length.
  • BotPress Command-R Integration: A user seeks a tutorial on how to incorporate Command-R into BotPress, asking for help in both English and Spanish.

Links mentioned:


Cohere ▷ #project-sharing (1 messages):

- **Seeking Command-R tutorial for BotPress**: A member asked for a tutorial on how to incorporate **Command-R** into **BotPress**. They repeated the request in both English and Spanish: *"Does anyone have a tutorial on how to incorporate Command-R into BotPress? Alguien tiene un tutorial de como incorporar Command-R en BotPress?"*

Cohere ▷ #collab-opps (1 messages):

  • Seeking Command-R tutorial for BotPress: A member inquired if anyone has a tutorial on how to incorporate Command-R into BotPress. They asked for resources or guidance in both English and Spanish.

AI Stack Devs (Yoko Li) ▷ #ai-companion (7 messages):

  • AI Waifus save lives: A user humorously declared, “AI waifus save lives!” sparking a brief banter among members with another replying, “Just monika.”
  • VentureBeat article on Emotional AI: A member shared a VentureBeat article discussing plans to embed emotional AI in business bots, questioning, “Will waifus soon be able to ‘understand’ and process emotions?” Read the article here.
  • 3D Character Chatbots at 4Wall AI: Another member mentioned they are working on 3D character chatbots at 4Wall AI and promoted a teaser available on another channel, <#1122748840819306598>.
  • Re: Just Monika: In response to “Who dat?” about the “Just Monika” reference, a user provided a GIF link for context found here.

Link mentioned: Ddlc Doki Doki Literature Club GIF - Ddlc Doki Doki Literature Club Just Monika - Discover & Share GIFs: Click to view the GIF


Datasette - LLM (@SimonW) ▷ #ai (5 messages):

  • Qualcomm unveils Snapdragon Dev Kit for Windows: Qualcomm has released a new developer kit featuring their most powerful Snapdragon X Elite chip, priced at $899.99. It’s touted as a Mac Mini competitor with 32GB of LPDDR5x RAM, 512GB of NVMe storage, and numerous ports, ideal for long-lasting, powerful Windows laptops with Arm chips more details on The Verge.

  • Windows Dev Kit pricing complaints: One user expressed interest in the new Snapdragon Dev Kit but felt the $900 price tag was steep, especially compared to last year’s model priced at $600. They noted its suitability for Arm development with 32GB RAM and 512GB storage for various developer workloads more details.

  • Using Mac Mini for Llamafile server: A user shared their positive experience using a Mac Mini as a long-running Llamafile server, accessible through Tailscale. They appreciated its zero-cold start and compatibility with the llm CLI.

  • Hope for more affordable, aesthetic dev kits: Another user hoped for cheaper models in the future while expressing a desire for a translucent case design.

  • Smalltalk experiment with Claude: Highlighting a proof of concept, one user shared an example of Claude engaging in Smalltalk by answering the question “What are frogs?” with a basic explanation of amphibious animals.

Links mentioned:


LLM Perf Enthusiasts AI ▷ #general (2 messages):

  • Llama3/Phi3 truncates responses: A member asked for help on how to prevent llama3/phi3 from hitting them with “additional items omitted for brevity”. No further discussion or solutions were presented.

Mozilla AI ▷ #announcements (1 messages):

  • Member-Organized Events Kick Off: The first of member-organized events includes talks, AMAs, demos, and discussions. These events are designed to promote cross-pollination of ideas and foster community engagement.

  • LLM360 Hosts AMA: LLM360 kicks off the series with an AMA highlighting their work in open-source LLMs.

  • Kate Silverstein’s Demo and Blog Post: Staff Machine Learning Engineer Kate Silverstein will share a demo using llamafiles for embeddings and chat about her recent blog post.

  • Events Calendar: Members are encouraged to regularly delve into the events calendar for more activities and opportunities to participate in the community events.


Mozilla AI ▷ #llamafile (1 messages):

  • Clarifying model usage in Python example: A member asked for clarification on whether they need to specify a model under model="LLaMA_CPP" when running a tinyllama model from the terminal. They provided a code snippet and mentioned that the code works but are unsure which model is used.

Link mentioned: no title found: no description found




{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}