**Gemini-in-Google-Slides is all we needed.**

AI News for 5/23/2024-5/24/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (380 channels, and 4467 messages) for you. Estimated reading time saved (at 200wpm): 495 minutes.

Followups: Jason Wei published a nice ā€œ201ā€ supplement to yesterday’s topic on Evals, somewhat on the metagame of making a successful eval, but with some side digressions and anecdotes about specific notable evals like MATH and LMSYS. It’s also the last day to use the AINEWS code for the AI Engineer World’s Fair.

It’s a quiet news day so we went diving for interesting content from the community. Today’s winner is Kyle Corbitt’s talk on Deploying Finetuned Models in Prod:

image.png

In brief the commandments are:

  1. Thou Shalt Not Fine-Tune:Ā Just use prompting! And optionally few-shot examples/RAG. Fine-tuning is expensive, slow, and complex. Only do it if your use case really requires it.
  2. Thou Shalt Write a Freaking Prompt:Ā Create a baseline and prove the task is possible with prompting.
  3. Thou Shalt Review Thy Freaking Data:Ā If you must fine-tune, make sure you understand your data thoroughly.
  4. Thou Shalt Use Thy Actual Freaking Data:Ā Your model will only be as good as the data it's trained on. Make sure your training data is as close as possible to the data your model will see in production.
  5. Thou Shalt Reserve a Test Set:Ā Always reserve a portion of your data for testing to evaluate your model's performance.
  6. Thou Shalt Choose an Appropriate Model:Ā The more parameters a model has, the more expensive and slower it is to train. Choose a model that is appropriate for your task and your budget.
  7. Thou Shalt Write Fast Evals:Ā Write evaluation metrics that are fast to compute so you can quickly iterate on your model.
  8. Also, Thou Shalt Write Slow Evals:Ā Write evaluation metrics that are more comprehensive and take longer to compute, to give you a deeper understanding of your model's performance.
  9. Thou Shalt Not Fire and Forget:Ā Don't just deploy your model and forget about it. Monitor its performance and be prepared to retrain or update it as needed.
  10. Thou Shalt Not Take the Commandments Too Seriously:Ā These commandments are meant to be helpful guidelines, not hard and fast rules. Use your best judgment and adapt them to your specific needs.

Fun fact, we used Gemini to do this summary of the deck. Give it a try.

image.png


{% if medium == ā€˜web’ %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Anthropic’s Claude AI and Interpretability Research

  • Feature alteration in Claude AI: @AnthropicAI demonstrated how altering internal ā€œfeaturesā€ in their AI, Claude, could change its behavior, such as making it intensely focus on the Golden Gate Bridge. They released a limited-time ā€œGolden Gate Claudeā€ to showcase this capability.
  • Understanding how large language models work: @AnthropicAI expressed increased confidence in beginning to understand how large language models really work, based on their ability to find and alter features within Claude.
  • Honesty about Claude’s knowledge and limitations: @alexalbert__ stated that Anthropic is honest with Claude about what they know and don’t know, rather than purposefully making decisions about its ability to speculate on tricky philosophical questions.

Open-Source AI Models and Advancements

  • Open-source models catching up to closed-source: @bindureddy highlighted that on the MMLU benchmark, open-source models like GPT-4o are nearing the performance of closed-source models like GPT-4 for simple consumer use-cases. However, more advanced models are still needed for complex AI agent and automation tasks.
  • New open-source model releases: @osanseviero shared several new open-source model releases this week, including multilingual models (Aya 23), long context models (Yi 1.5, M2-BERT-V2), vision models (Phi 3 small/medium, Falcon VLM), and others (Mistral 7B 0.3).
  • Phi-3 small outperforms GPT-3.5T with fewer parameters: @rohanpaul_ai pointed out that Microsoft’s Phi-3-small model, with only 7B parameters, outperforms GPT-3.5T across language, reasoning, coding, and math benchmarks, demonstrating rapid progress in compressing model capabilities.

AI Agents, Retrieval-Augmented Generation (RAG), and Structured Outputs

  • Shift from RAG for QA to report generation: @jxnlco forecasted that in the next 6-8 months, RAG systems will transition from question-answering to report generation, leveraging well-designed templates and SOPs to unlock business value by targeting people with money.
  • ServiceNow uses RAG to reduce hallucination: @rohanpaul_ai shared a ServiceNow paper showing how RAG can ensure generated JSON objects are plausible and executable for workflow automation by retrieving relevant steps and table names to include in the LLM prompt.
  • RAG adds business value by connecting LLMs with real-world data: @cohere outlined how RAG systems address challenges like hallucinations and rising costs by connecting LLMs with real-world data, highlighting the top 5 reasons enterprises are adopting RAG for their LLM solutions.

AI Benchmarks, Evaluation, and Cultural Inclusivity

  • Standard AI benchmarks may not guide true global cultural understanding: @giffmana suggested that typical ā€œwesternā€ AI benchmarks like ImageNet and COCO may not be indicative of genuine ā€œmulticultural understandingā€. Training models on global data instead of just English can greatly improve performance in non-western cultures.
  • Difficulties in evaluating large language models: @clefourrier and @omarsar0 shared a report discussing the challenges in robustly evaluating LLMs, such as differences between initial benchmark design and actual use, and the need for more discriminative benchmarks as models become more capable.
  • Aya 23 multilingual models expand who technology serves: @sarahookr introduced Cohere’s Aya 23 models, a powerful multilingual family aiming to serve nearly half the world’s population, as part of their mission to change who is seen by technology.

Memes and Humor

  • Nvidia stock and the ā€œpermanent underclassā€: @nearcyan joked about a spouse regretting not buying Nvidia stock and being part of the ā€œpermanent underclass foreverā€.
  • Satire of Anthropic’s Golden Gate Bridge AI: @jeremyphoward satirized Anthropic’s interpretability demo, humorously claiming that ā€œOpenAI has already caught up with the latest feature in Claude, and also has an advanced Golden Gate Bridge mode based on sophisticated mechanistic interpretability research.ā€
  • Poking fun at Google’s AI mistakes: @mark_riedl shared a humorous anecdote about jokingly claiming Google’s AI incorrectly thought he won a DARPA award, leading people to actually believe he didn’t receive the honor.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Progress and Capabilities

  • Impressive transcription and location identification by GPT-4: In /r/OpenAI, GPT4-o demonstrates remarkable abilities to transcribe text from images and identify locations, even without EXIF data, as shown in this video and discussed further.
  • Yi-Large catching up to state-of-the-art models: A comparison posted in /r/singularity shows Yi-Large approaching GPT-4 performance and surpassing Claude 3 Opus and Gemini 1.5 pro on several benchmarks.

AI Ethics and Safety Concerns

  • OpenAI employees leaving over ethical concerns: In /r/singularity, it’s reported that OpenAI employees are departing not just due to ā€œdecelā€ fears but over issues like partnering with News Corp, lobbying against open source, and aggressive tactics against ex-employees.
  • Concerns over OpenAI’s News Corp partnership: An /r/OpenAI post criticizes OpenAI’s partnership with News Corp, a right-wing propaganda company, worried it could lead to ChatGPT legitimizing extreme viewpoints.
  • California AI bill requires safeguards but criticized: A new California AI bill, discussed in /r/singularity, mandates models over 10^26 flops have weapons creation prevention, shutdown buttons, and government reporting. However, the requirements are criticized as not making technical sense.
  • Yann LeCun pushes back on AI doomerism: In a video shared on /r/singularity, AI pioneer Yann LeCun argues the biggest AI dangers are censorship, monitoring, and centralized power, not the doomer scenarios often portrayed.

AI Interpretability and Control

  • Anthropic’s ā€œGolden Gate Claudeā€ maps AI features: Anthropic’s research, detailed in /r/singularity, shows their ā€œGolden Gate Claudeā€ can map and manipulate an AI’s internal features, a potentially major advance in understanding and controlling AI behavior.
  • Anthropic demonstrates feature alteration to shape AI behavior: Another Anthropic paper, shared on /r/singularity, shows interpretable features learned by a sparse autoencoder can represent complex concepts and be altered to control an AI, such as inducing an obsession.

AI Commercialization and Access

  • Meta considers paid version of AI assistant: The Information reports, in a post on /r/singularity, that Meta is working on a premium paid version of its AI assistant.
  • Macron positions Mistral as EU’s top AI company: A CNBC article, shared on /r/singularity, describes French President Macron promoting Mistral as the leading EU AI company, drawing criticism of favoring a French firm over other European contenders.
  • Google Colab offers free GPUs for AI development: An /r/singularity post highlights that Google Colab is providing free GPU access, including A100s, to enable AI development.

Memes and Humor

  • Meme on boomers not letting go: A meme on /r/singularity jokes about boomers refusing to let younger generations take over.
  • Satirical video on Microsoft training GPT5: An /r/singularity video satirizes Microsoft training GPT5 by feeding it data like a whale consuming krill.
  • Meme about Windows Recall AI and privacy: A meme on /r/singularity pokes fun at a hypothetical Windows Recall AI feature and the privacy concerns it would raise.

AI Discord Recap

A summary of Summaries of Summaries

  1. LLM Fine-Tuning Techniques and Best Practices:

    • Ten Commandments for Fine-Tuning: In Kyle Corbitt’s talk, members emphasized meticulous prompt design and template configurations, using ### delimiters and ā€œend of textā€ tokens for efficient model fine-tuning.

    • Hamel’s Latency Optimization Blog: Discussions on reducing overfitting and the effective use of retrieval-augmented generation (RAG) strategies highlighted practical guidance from ongoing fine-tuning experiments on platforms like Axolotl.

  2. Innovations in Quantization and Performance Optimization:

    • Tim Dettmers’ Research on LLM.int8(): His work, highlighted by this blog, demonstrates how advanced quantization methods maintain transformer performance without degradation, revealing insights into emergent features and their implications.

    • CUDA’s Gradient Norm Bug Fixing: Solved issues like exploding gradients and batch size problems significantly improved training stability, as detailed in this PR.

    • Optimized Memory Architecture in Axolotl: Sample packing efficiency improvements showed a 3-4% resource management gain during distributed training.

  3. Open-Source Frameworks and Community Efforts:

    • Axolotl’s Latest Updates: The community discussed integrating observability into LLM applications and resolving cache and configuration issues to streamline workflows in fine-tuning models.

    • PostgresML Integration with LlamaIndex: Andy Singal highlighted the synergy between PostgresML and LlamaIndex in efficiently leveraging AI for database management tasks.

  4. Multimodal AI and New Model Developments:

    • Phi-3 Model Excitement: Unsloth’s Phi-3 models, touted for their longer context lengths and medium support, captured community interest with announcements of rapid optimization and integration.

    • Mobius Model Anticipations: DataPlusEngine’s upcoming release promises efficient base model creation, sparking debates on the implications for foundational diffusion models and their training methodologies.

  5. Challenges in AI Ethics, Governance, and User Experience:

    • SB-1047 Regulatory Concerns: Community outrage over the centralization of AI governance and comparisons to regulatory captures in other industries prompted heated discussions on the bill’s impact on small developers.

    • Ethical Use of AI in Communication Tools: Deployments of GPT-4 and Claude for workplace communication monitoring raised philosophical questions about embedding ethics into AI and their potential for reducing legal vulnerabilities, as highlighted in discussions regarding API integration and usage limits.


{% if medium == ā€˜web’ %}

PART 1: High level Discord summaries

LLM Finetuning (Hamel + Dan) Discord

Fine-Tuning Facts: Discussion on fine-tuning in the general channel revealed a concern about semantic similarity overfitting due to biased data categories. A user struggled with understanding fine-tuning vis-Ć -vis user inputs and initial model training. Changes in the OpenAI platform’s sidebars were also noted with the disappearance of two icons (threads and messages).

Templates Take the Spotlight: In workshop-1, the importance of configuring templates correctly during fine-tuning was highlighted. In particular, the delimiter ### aids in parsing different input sections, and ā€œend of textā€ tokens indicate when to stop token generation.

Maven Mingles with Moderation: In asia-tz, a light-hearted exchange between members referenced a reunion. A request for a conference talk recording was met, with the video being available on Maven.

Modal Mobilization: Modal users in 🟩-modal shared excitement over received credits, training experiences, and provided specific links to Modal documentation and examples for new users. A plan to use Modal for a Kaggle competition was also shared, including setup and execution details.

Jarvis Jots Down Jupyter Jumble: In the jarvis-labs channel, members discussed storing a VSCode repo on Jarvis with a suggestion to use GitHub for saving work. There was a notice of spot instance removal due to instability. The cost and duration of fine-tuning the open-lama-3b model were shared, and a user resolved an Ampere series error by adjusting model parameters.

Hugging Face Huddles on Credits & Spanish Models: The hugging-face channel saw discussions about pending HF credits and models suitable for Spanish text generation—with Mistral 7B and Llama 3 models being recommended.

Credit Countdown Carries On in replicate, where an upcoming announcement related to credit management and distribution was teased.

Corbitt’s Commandments Claim Clout: Enthusiastic attendees in the kylecorbitt_prompt_to_model channel discussed fine-tuning methods and techniques presented in Kyle Corbitt’s talk, including Ten Commandments for Deploying Fine-Tuned Models.

Axolotl Answers the Call in workshop-2, where users discussed datasets, model training, and troubleshooting in Axolotl. A blog post on TinyLLama Fine-Tuning was shared, and there was a push for integrating observability into LLM applications.

Zoom Out, Discord In: Users from workshop-3 migrated their discussions to Discord after the Zoom chat was disabled.

Axolotl’s Cache Conundrum Causes Confusion: Issues with cache in Axolotl frustrating users and confusion with missing files were resolved in axolotl. Discussions on sample packing and a guide on tokenizer gotchas addressed concerns around efficiency and tokenization.

Accelerate to Victory: zach-accelerate saw users work through confusion over float comparisons, resolve Jarvislab training command errors, and exchange resources for learning model acceleration with a focus on fine-tuning best practices.

Winging It with Axolotl: The wing-axolotl channel collaborated on dataset templates, pre-processing issues, Axolotl configurations, and provided a PR merge for the latest Axolotl updates. They delved into debugging tools and the significance of precise templates for training success.


HuggingFace Discord

Protein Data Visuals Reach New Heights: A new protein visualization project now sports 3D rendering and includes examples for human hemoglobin and ribosomal proteins, with the project details found on GitHub.

Enter the TranscriptZone with OpenAI’s Whisper: A new transcription app that leverages OpenAI’s Whisper to transcribe YouTube videos and more is available at Hugging Face Spaces.

Decentralizing the Web - More than a Dream?: A project building infrastructure for a decentralized internet sought community feedback through a survey, raising discussions about the ethics of data collection.

A Vision Transformers Query in Depth: A member sought resources on applying Vision Transformers (ViT) for monocular depth estimation, indicating an intent to develop a model using ViT, but no specific resources were provided in the discussion.

Quantisation Quandary for Mistral Model: The use of bitsandbytes for 8-bit quantisation on Mistral v0.3 Instruct led to slower performance compared to 4-bit and fp16, a baffling outcome that contradicts expected efficiency gains from reduced-bit computation.


Perplexity AI Discord

  • Perplexity Climbs Over ChatGPT in CSV Showdown: Engineers discussed that Perplexity AI outshines ChatGPT in CSV file processing by allowing direct CSV uploads. Also, Julius AI was recommended for data analysis, leveraging Python and integration with LLMs like Claude 3 or GPT-4.

  • Users Snub Claude 3 Opus: Claude 3 Opus is getting the cold shoulder due to increased content restrictions and perceived diminished utility, with GPT-4 posed as a preferable option despite limitations.

  • Querying Pro Search’s True Upgrade: Upgrades to Pro Search raised eyebrows as users discussed whether new multi-step reasoning features and API specs were genuine backend improvements or merely surface-level UI enhancements.

  • API Integration Articulated: Dialogue around API integration for external tools with Claude generated interest along with sharing of custom function calls, serverless backends, and documentation like Tool Use with Claude.

  • Ethics in AI: More Than a Thought Experiment: Discourse on infusing GPTs with ethical monitoring capabilities sparked, casting light on potential applications in workplace communication and legal defensibility, albeit with philosophical wrinkles yet to be ironed out.


Stability.ai (Stable Diffusion) Discord

  • Speculation Peaks on RTX 5090’s VRAM: There’s buzzing debate over whether the rumored RTX 5090 with 32GB VRAM makes practical sense. References were made to potential specs and images on PC Games Hardware, but some members remained skeptical about its authenticity.

  • Stable Diffusion and the AMD Challenge: Users offered guidance on installing Stable Diffusion on an AMD 5700XT GPU, suggesting that starting with web services like Craiyon may circumvent potential compatibility issues.

  • Stable Diffusion 3: Trial Before Commitment: The community contrasted Stable Diffusion 3 with competitor Midjourney, highlighting that while a free trial is available for SD3, ongoing access would require a Stability membership.

  • Anticipation Builds Around Mobius Model: An announcement concerning DataPlusEngine’s novel Mobius model has garnered significant interest for its claim to create efficient base models. The model, teased on Twitter, is neither a straightforward base model nor a tuned version of something pre-existing.

  • 32GB VRAM: Game Changer or Overkill?: The mention of a 32GB VRAM GPU led to conversations about the potential shift in Nvidia’s approach to data center GPU sales, considering how products with substantial memory could impact the market demand for the H100/A100 series.


Unsloth AI (Daniel Han) Discord

  • PEFT Config Snag Solved: An issue where config.json was missing during PEFT training was resolved by copying it from the base model’s configuration, with the user confirming success.

  • Llama Levitates Above Bugs: The Llama 3 model’s base weights were described as ā€œbuggy,ā€ but Unsloth has implemented fixes. To improve training, the use of reserved tokens and updates to the tokenizer and lm_head are recommended.

  • System Prompt Boosts Llama 3: Incorporating a system prompt, even a blank one, was observed to enhance Llama3 finetuning outcomes.

  • Phi 3 Proliferation: Excitement bubbled as Phi 3 models debuted, sporting medium support. Community chatter pointed engineers toward extensive details in blog posts and release notes.

  • Stable Diffusion’s Sinister Side Show: Creepy artifacts and uncanny voice cloning outputs from Stable Diffusion startled users, with discussions and experiences shared via YouTube videos and a Reddit thread.

  • VSCode Copilot Climbing Onboard: Recommendations for a local VSCode ā€œcopilotā€ were sought and met with suggestions and positive responses in the random channel.

  • Inference Inertia with Phi-3: Slower inference times using Unsloth Phi-3 puzzled one user, who provided a Colab notebook to investigate the lag, with community efforts yet to find a fix.

  • Quantization Quandary Unraveled: A member faced challenges quantizing a custom model, hitting walls with llama.cpp and Docker compatibility, sparking a discussion on solutions.

  • VRAM Verdict for Model Might: VRAM requirements were laid out: 12GB for Phi 3 mini is okay, but 16GB is a must for Phi 3 medium. For hefty tasks, considering outside computing resources was proposed.

  • Data Diligence for Training Consistency: The importance of using consistent datasets for training and evaluation was echoed, highlighting Unslothai’s public datasets like the Blackhole Collection.

  • Platform Possibilities and Cautions: Queries regarding Unsloth support for older Macs were addressed, confirming a focus on CUDA and GPU usage, with suggestions for those on CPU-only rigs.

  • Enterprise Expertise Extension: A community member stepped forward to offer enterprise expertise to Unsloth, hailing the joining of accelerators at Build Club and Github, hinting at synergistic potential for Unsloth’s endeavors.


Nous Research AI Discord

Intellectual Debate Ignites Over AI Understanding: In-depth discussions were had about the true understanding of concepts by LLMs, with interpretability research considered important empirical evidence. Skeptics argued that current efforts are lacking, with references to work by Anthropic on mapping large language model minds.

The Creature from the Llama Lagoon: A technical foray into enhancing Llama models centered around crafting a script that could manage function calls, with Hermes Pro 2’s approach serving as inspiration. Another inquiry circled the implementation of Llama3 LoRA techniques on a 3080 GPU.

Reality Quest in Digital Dimensions: Spearheading a conversation on Nous and WorldSim, members explored the possible applications of NightCafe and multi-dimensional AR spaces in mapping complex AI worlds. Dream-like explorations in audio-visualizers and whimsical ASCII art representations highlighted creative uses for AI-driven simulations.

Sifting Through RAG Data: Advocation for models to integrate internal knowledge with Retrieval-Augmented Generation (RAG) was a hot topic, with questions raised about how to handle contradictions and resolve conflicts. Emphasizing user evaluations was seen as essential, particularly for complex query cases.

Precision over Pixie Dust in Fine-Tuning AI: The community’s discourse featured a celebration of the Mobius model for its prowess in image generation, with anticipation for an open-sourced version and elucidating publications. Additionally, Hugging Face was mentioned for their PyTorchModelHubMixin enabling easier model sharing, though limited by a 50GB size constraint without sharding.


Eleuther Discord

  • JAX vs. PyTorch/XLA: The TPU Showdown: The performance comparison of JAX and PyTorch/XLA on TPUs spurred debate over benchmarking nuances such as warmup times and blocking factors. The dramatic decline in GPT-3 training costs from $4.5M to an estimated $125K-$1M by 2024 was highlighted, considering TFLOP rates and GPU-hour pricing from various contributors, linking to a Databricks Blog Post.

  • Scaling and Teaching LLMs: In the research forum, the Chameleon model was noted for its strong performance in multimodal tasks, while Bitune promised improvements in zero-shot performance for LLMs (Bitune Paper). Discussions questioned the scalability of the JEPA model for AGI and critiqued RoPE’s context length limitations, referencing a relevant paper.

  • Emergent Features Puzzle LLM Enthusiasts: Tim Dettmers’ research on advanced quantization methods maintaining performance in transformer inference was linked, including his concept of emergent outliers, and its integration with Hugging Face via the bitsandbytes library. Discourse on emergent features coalescing around ideas of them being the ā€œDNAā€ of a model, driving discussions on its implications for phase transitions.

  • A Brief on Technical Tweaks & LM Evaluation: Within the lm-thunderdome, engineers covered practical tips for setting seeds in vllm models, retrieving the list of tasks with lm_eval --tasks list, and handling changes in BigBench task names that affect harnesses like Accelerate with memory issues. It was suggested to locate tasks by perusing the lm-eval/tasks folder for better organization.

  • A Call for Collaboration: An appeal was made for expanding the Open Empathic project, with a YouTube guide for contributing movie scenes and a link to the project shared. Further collaboration was encouraged, underlining the need for community efforts in enhancement.


LM Studio Discord

GPU Adventures: Engineers discussed challenges when loading small models onto GPUs, with some favoring models like llama3, mistral instruct, and cmdrib. Meanwhile, using lower quantizations, such as llamas q4, reportedly yielded better results than higher ones like q8 for certain applications, refuting the notion that ā€œbigger is always better.ā€

Next-Gen Models Incoming: An update in the model realm informed about the release of a 35B model, with testing to ensure LM Studio compatibility. Optimizations for different scales of models were a topic too, with a focus on Phi-3 small GGUFs and their efficiency.

Servers and Setups: Hardware discussions included leveraging distributed inference with llama.cpp and its recent RPC update, although quantized models aren’t supported yet. Experimental builds using clustered cheap PCs with RTX 4060 Ti 16GB for distributed model setups and possible network constraints were also explored.

Multilingual Cohesion Achieved: Cohere models now extend their prowess to 23 languages, as advertised with aya-23 quants available for download, but ROCm users must await an update to dive in.

Stable Diffusion Left Out: LM Studio clarified that it exclusively handles language models, excluding image generators like Stable Diffusion, alongside dealing with CUDA issues on older GPUs and promoting services like Julius AI to ease user experience woes.


CUDA MODE Discord

  • Gradient Norm Nuisance: Altering the batch size from 32 leads to a sudden spike in gradient norm, disrupting training. A pull request resolved this issue by preventing indexing overflow in the fused classifier.

  • Int4 and Uint4 Types Need Some TLC: A member flagged that many functions lack implementations for int4 and uint4 data types in PyTorch, with a discussion thread indicating limitations on type promotion and tensor operations.

  • Live Code Alert – Scan Algorithm in Spotlight: Izzat El Hajj will lead a live coding session on the Scan algorithm, vital for ML algorithms like Mamba, scheduled for <t:1716663600:F>, promising to be a technical deep dive for enthusiasts.

  • CUB Library Queries and CUDA Nuances: Members tapped into discussions ranging from the functioning of CUDA CUB library code to triggering tensor cores without cuBLAS or cuDNN, highlighting resources like NVIDIA’s CUTLASS GitHub repository and the NVIDIA PTX manual.

  • FineWeb Dataset Conundrum: Processing the FineWeb dataset can be a storage hog, hitting 70 GB on disk and gobbling up to 64 GB of RAM, hinting at a need for better optimization or more robust hardware configurations for data processing tasks.


Modular (Mojo šŸ”„) Discord

Python Libraries Cling to C Over Mojo: There’s a lively conversation about the feasibility and preparedness of porting Python libraries to Mojo, with concerns about pushing maintainers too hard given Mojo’s evolving API. Members discussed whether targeting C libraries might be a more immediate and practical endeavor.

Rust’s Security Appeal Doesn’t Rust Mojo’s Potential: Mojo is not slated to replace C, but the security benefits of Rust are influencing how engineers think about Mojo’s application in different scenarios. Ongoing discussions address concepts from Rust that could benefit Mojo developments.

Blazing Ahead With Nightly Mojo: BlazeSeq performance on MacOS using Night versions of Mojo shows promising similarity to Rust’s Needletail, fueling cross-platform efficiency discussions. Rapid nightly updates, noted in changelog, keep the community engaged with the evolving language.

Curiosity Sparks Over Modular Bot’s Machinery: Queries were raised about the underlying tech of ā€œModularBotā€, and although no specific model was referenced, the bot shared a colorful reply. Separately, the potential for ML model training and inference within Mojo was discussed, with mention of Max Engine as a numpy alternative, though no full-fledged training framework is on the horizon.

Compile-Time Confusion and Alignment Woes: Problems from aligning boolean values in memory to compile-time function issues are causing a stir among users, with workarounds and official bug reports highlighting the importance of community-driven troubleshooting.


OpenAI Discord

  • LaTeX Loyalist LLM: In the realm of formatting, users noted frustration with GPT’s strong inclination to default to LaTeX despite requests for Typst code, revealing preferences in coding syntax that the LLM seems to adhere to.

  • Microsoft Copilot+ vs. Leonardo Rivalry: Conversations in the community centered on the value of Microsoft Copilot+ PCs for creative tasks like ā€œsketch to image,ā€ while some members encouraged checking out Leonardo.ai for analogous capabilities.

  • A Thirst for Efficiency in AI: Concern was voiced over the environmental toll of AI, citing a Gizmodo article on the substantial water usage during the training of AI models, prompting discussions on the need for more eco-friendly AI practices.

  • Iteration Over Innovation: There was active dialogue on enhancing the performance of LLMs through iterative refinement, with references to projects like AutoGPT addressing iterations, despite the associated higher costs.

  • Intelligence Infusion Offer Overstated?: The guild pondered the plausibility and potential of embedding legal knowledge within ChatGPT, enough to consider a valuation at $650 million, though detailed perspectives on this bold assertion were limited.


LangChain AI Discord

LangChain CSV Agent Deep Dive: Engineers explored LangChain’s CSV agent within a SequentialChain and discussed how to customize output keys like csv_response. Challenges with SQL agents handling multi-table queries were mentioned, pointing towards token limits and LLM compatibility issues, with direction to GitHub for issues.

AI Showcases Gather Buzz: OranAITech tweeted their latest AI tech, while everything-ai v2.0.0 announced features including audio and video processing capabilities with a repository and documentation available.

Demystifying VisualAgents: Demonstrations of Visual Agents platform were shared via YouTube, revealing its potential to streamline SQL agent creation and building simple retrieval systems without coding, utilizing LangChain’s capabilities. Two specific videos showcased their workflows: SQL Agent and Simple Retrieval.

EDA GPT Impressions On Display: A demonstration of EDA GPT, including a five-minute overview video showcasing its various functions, was linked to via LOVO AI. The demo highlights the AI tool’s versatility.

Tutorial Teaser: A message in the tutorials channel provided a YouTube link to business24.ai’s content, although the context of its relevance was not disclosed.


LAION Discord

  • Piracy’s Not the Panacea: Despite a humorous suggestion that The Pirate Bay could become a haven for sharing AI model weights, skepticism among members arises, highlighting the potential for friendlier AI policy landscapes in other nations to prevail instead.

  • Japan Takes the AI High Road: Participants noted Japan’s encouraging position on AI development, referencing a paper shared via a tweet about creating new base diffusion models without the need for extensive pretraining, showcasing a strategy involving temporary disruption of model associations.

  • Poisoned Recovery Protocols Probed: A collaborative study, involving a poisoned model recovery method conducted by fal.ai, was mentioned, with findings expected to empirically substantiate the recovery approach. Reservations were expressed regarding the aesthetics of AI-generated imagery, specifically the ā€œhigh contrast lookā€ and artifacts presented by models like Mobius versus predecessors such as MJv6.

  • Claude Mappings Crack the Code: Anthropic’s research paper details the dissection of Claude 3 Sonnet’s neural landscape, which illustrates the manipulation of conceptual activations and can be read at their research page. Debates sparked over the potential commercialization of such activations, with a juxtaposed fear of the commercial implications driving AI practitioners to frustration.

  • A Nostalgic Look at AI’s Visual Visions: A member reminisced about the evolution from early AI visual models like Inception v1 to today’s sophisticated systems, recognizing DeepDream’s role in understanding neural functionality. Furthermore, the benefits of sparsity in neural networks were discussed, describing the use of L1 norm for sparsity and a typical 300 non-zero dimensions in high-dimensional layers.


LlamaIndex Discord

  • Meetup Alert: Limited Seats Available: Few spots remain for the upcoming LlamaIndex meetup scheduled for Tuesday, with enthusiasts encouraged to claim their spots quickly due to limited availability.

  • MultiOn Meets LlamaIndex for Task Automation: LlamaIndex has been coupled with MultiOn, an AI agents platform, facilitating task automation through a Chrome web browser acting on behalf of users; view the demo here.

  • RAGApp Launches for Code-Free RAG Chatbot Setup: The newly introduced RAGApp simplifies the deployment of RAG chatbots via a docker container, making it easily deployable on any cloud infrastructure, and it’s open-source; configure your model provider here.

  • Solving PDF Parsing Puzzles: The community endorses LlamaParse as a viable API for extracting data from PDFs, especially from tables and fields, leveraging the GPT-4o model for enhanced performance; challenges with Knowledge Graph Indexing were also a topic, highlighting the need for both manual and automated (through VectorStoreIndex) strategies.

  • PostgresML Joins Forces with LlamaIndex: Andy Singal shared insights on integrating PostgresML with LlamaIndex, detailing the collaboration in a Medium article, ā€œUnleashing the Power of PostgresML with LlamaIndex Integrationā€, receiving positive remarks from the community.


OpenRouter (Alex Atallah) Discord

  • Phi-3 Medium 128k Instruct Drops: OpenRouter unveiled Phi-3 Medium 128k Instruct, a powerful 14-billion parameter model, and invited users to review both the standard and free variants, and to participate in discussions on its effectiveness.

  • Wizard Model Gets a Magic Boost: The Wizard model has shown improvements, exhibiting more prompt and imaginative responses, yet attention is required to avoid repeated paragraphs.

  • Eyes on Phi-3 Vision and CogVLM2: Enthusiasm surges around Phi-3 Vision, with sharing of testing links like Phi-3 Vision, and suggestions to use CogVLM2 for vision-centric tasks found at CogVLM-CogAgent.

  • Automatic Llama 3 Prompt Transformation: It was clarified that prompts to Llama 3 models are automatically transformed through OpenRouter’s API, streamlining the process, but manual prompting remains as an alternative approach.

  • Gemini API Annoyances: Users reported issues with Gemini FLASH API, such as empty outputs and token drain, recognized as a model-centric problem. The emergence of Google’s daily API usage limits has piqued interest in how this might affect OpenRouter’s Gemini integration.


Latent Space Discord

  • Indexify Ignites Interest: The launch of Indexify, an open-source real-time data framework by Tensorlake, sparked discussions focusing on its ā€œstreaming ETLā€ capabilities and the challenges in creating sustainable open-source models. Concerns were raised about the adequacy of the extractors provided and their potential paths to monetization.

  • LLM Evaluation under the Microscope: A Hugging Face blog post about Large Language Model (LLM) evaluation practices, the importance of leaderboards, and meticulous non-regression testing caught the attention of members, emphasizing the critical role of such evaluations in AI developments.

  • AI’s Answer to Search Engine Manipulations: An incident involving website poisoning affecting Google’s AI-gathered overviews triggered discussions around security and data integrity, including workarounds through custom search engine browser bypasses as reported in a tweet by Mark Riedl.

  • AI Democratizing Development or Raising Reliability Questions?: GitHub CEO Thomas Dohmke’s TED Talk on AI’s role in simplifying coding provoked debates over its reliability despite AI-driven UX improvements that expedite problem-solving in the coding process.

  • Diversity Scholarships to Bridge Gaps: Engineers from diverse backgrounds who face financial barriers to attending the upcoming AI Engineer World’s Fair received a boost with the announcement of diversity scholarships. Interested applicants should furnish concise responses to the essay questions provided in the application form.


Interconnects (Nathan Lambert) Discord

  • Tax Tales Without Plastic: Nathan Lambert deciphered an invoice kerfuffle, realizing the rational behind tax billing sans credit card due to resale certificates.

  • Golden Gate AI Gets Attention: Experimentation by Anthropic AI led to ā€œGolden Gate Claude,ā€ an AI single-mindedly trained on the Golden Gate Bridge, creating buzz for its public interactivity at claude.ai.

  • Google’s AI Missteps: Google’s failure to harness feedback and premature deployment of AI models spurred discussion about the tech giant’s public relations challenges and product development woes.

  • Battling Dataset Misconceptions: Google’s AI team countered claims about using the LAION-5B dataset by putting forth that they utilize superior in-house datasets, as referenced in a recent tweet.

  • Nathan Shares Knowledge Nuggets: For AI aficionados, Nathan Lambert uploaded advanced CS224N lecture slides. Additionally, attendees were tipped off about an upcoming session recording, sans release date details.


OpenAccess AI Collective (axolotl) Discord

  • GQA Gains Traction in CMDR Models: Discussions revealed that Grouped Query Attention (GQA) is present in the ā€œcmdr+ā€ models but not in the basic ā€œcmdrā€ models, indicating an important distinction in their specifications.
  • VRAM Efficiency with Smart Attention: Engineers noted that while GQA doesn’t offer linear scaling, it represents an improved scaling method compared to exponential, affecting VRAM usage favorably.
  • Sample Packing Gets a Boost: A new GitHub pull request showcases a 3-4% efficiency improvement in sample packing, promising better resource management for distributed contexts, linked here.
  • Academic Achievement Acknowledged: A member’s co-authored journal article has been published in the Journal of the American Medical Informatics Association, highlighting the impact of high-quality, mixed-domain data on medical language models, with the article available here.
  • Community Cheers Scholarly Success: The community showed support for the peer’s published work through personal congratulatory messages, fostering a culture of recognition for academic contributions within the AI field.

OpenInterpreter Discord

SB-1047 Sparks Technical Turmoil: Engineers express deep concerns about the implications of SB-1047, dubbing it as detrimental to smaller AI players and likening the situation to regulatory capture observed in other industries.

Perplexity and Arc, Tools of the Trade Showcased: The community spotlighted tools aiding their workflows, sharing a Perplexity AI search on SB-1047 and the new ā€œCall Arcā€ feature of Arc Browser, which simplifies finding relevant answers online, with an informational link.

Install Issues Incite Inquiry: Users face issues with Typer library installation via pip, raising questions about whether steps in the setup process, such as poetry install before poetry run, were followed or if a virtual environment is being used.


Mozilla AI Discord

Twinny Takes Off as Virtual Co-Pilot: Developers are integrating Twinny with LM Studio to serve as a robust local AI code completion tool, with support for multiple llamafiles running on different ports.

Embedding Endpoint Enlightenment: The /v1/embeddings endpoint was clarified not to support image_data; instead, the /embedding endpoint should be used for images, as per pull request #4681.

Mac M2 Meets Its Match in continue.dev: A performance observation noted that continue.dev runs slower on a Mac M2 compared to an older Nvidia GPU when executed with llamafile.

Hugging Your Own LLMs: For those looking to build and train custom LLMs, the community recommended the use of HuggingFace Transformers for training, with the reminder that llamafile is designed for inference, not training.


Cohere Discord

  • Gratitude Echoes in the Server: A user expressed heartfelt thanks to the team, showcasing user appreciation for support or development work done by the team.
  • Curiosity About Upscaled Models: There’s buzz around whether a 104B version of a model will join the family tree, but no clear answers have been outlined yet.
  • Langchain Links Missing: Questions arose regarding the integration of Langchain with Cohere, with users seeking guidance on its current usability and implementation status.
  • Model Size Mysteries: Users are probing for clarity on whether the Aya model in the playground pertains to the 8B or 35B version, indicating importance in understanding model scales for application.
  • Error Troubleshooting Corner: Issues like a ValidationError with ContextualCompressionRetriever and a 403 Forbidden error signal active debugging and technical problem-solving among the engineers, serving as reminders of common challenges in AI development.

AI Stack Devs (Yoko Li) Discord

AI Comedy Night Hits the Right Notes: An AI-generated standup comedy piece shared by a user was met with positive surprise, indicating advancements in AI’s capability to mimic humor and perform entertainment.

Exploratory Queries on AI Applications: Curiosity about the extent of Ud.io’s functions was evident from a user’s query whether its capabilities go beyond generating comedy.

Sound Transformations Showcased: A user displayed the flexible audio alteration features of Suno by sharing an altered, demonic version of an original sound piece.

Eagerness for Audio Engineering Know-How: Interest was expressed in acquiring the skills to craft audio modifications like the ones demonstrated, a skill set valuable for an AI engineer with an interest in sound manipulation.

Concise Communication Preferred: A one-word reply ā€œNoā€ to a question highlighted a preference for succinct responses, perhaps reflecting an engineer’s desire for direct, no-nonsense communication.


MLOps @Chipro Discord

  • In Search of a Unified Event Tracker: A member has highlighted a pressing need for an event calendar compatible with Google Calendar to ensure no community events are overlooked. The absence of such a system is a noted concern within the community.

DiscoResearch Discord

  • New Dataset Announcement: A new dataset has been referenced by user datarevised, with a link to further details: DataPlusEngine Tweet.

The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Datasette - LLM (@SimonW) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

LLM Finetuning (Hamel + Dan) ā–· #general (74 messagesšŸ”„šŸ”„):

  • Semantic similarity overfitting concern: A member pondered if over-represented response categories in data, despite no particular response being over-represented, could lead to bias. They referenced their prior experience in Research Psychology checking for such issues.
  • Fine-tuning model confusion: A user struggled with understanding how much fine-tuning incorporates specific user inputs into a model compared to pre-training. They seek clarity on differences between pre-training, curriculum training, and fine-tuning.
  • OpenAI platform sidebars change: Some participants discussed changes in the OpenAI platform’s sidebars, mentioning that two icons disappeared (one for threads and another for messages).
  • Rasa and conversational complexity: A participant shared insights into Rasa’s approach to conversational AI, emphasizing the difficulty of creating intent classifiers due to complex conversations. They mentioned that treating intents as entities may reduce complexity.
  • Kyle Corbitt’s conference talk recording available: The recording of Kyle Corbitt’s conference talk is now available on the Maven portal, with specific links shared within the discussion.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #workshop-1 (23 messagesšŸ”„):

  • LLM Finetuning and ### usage clarifications: Discussed the use of ### in fine-tuning LLMs for sequence generation, noting that it helps the model understand different parts of the input during inference. Appropriately configuring templates during fine-tuning is necessary, including other structures like ChatML.

  • Template requirements explained: Emphasized that inputs during inference need to match the template used during fine-tuning, not necessarily ### but whatever was set (e.g., Llama 2 chat template). Model hosting services typically manage this templating and structure.

  • Model behavior with and without delimiters: Delimiters can help a model understand distinct sections of input like changing POVs in Reddit; otherwise unnecessary for general stylistic adaptations. Terminating delimiters or tokens ensure models correctly parse and end responses.

  • End of text token usage: The concept of an ā€œend of textā€ token was briefly mentioned as a mechanism for instructing the model to stop generating tokens, indicating efficient input and output management for LLMs.

  • Homework assignments on use cases for LLMs: Members shared and discussed homework projects applying LLMs to tasks like generating recipes and learning apps. Projects emphasized prompt engineering and retrieval-augmented generation (RAG) techniques among others. Links to resources and shared homework details here.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #asia-tz (8 messagesšŸ”„):

  • Reka.ai Jokes About Reunion: A member humorously commented on seeing another member after a long time, joking, ā€œYou’re being kind! I was starting to think I’d never see the light of day again after fast.ai.ā€ They inquired about how they have been and what they’re currently building.
  • Conference Recording Request Fulfilled: A member asked for a recording of the ā€œConference Talk: From prompt to model,ā€ which occurred at 4:30 AM IST. The request was answered affirmatively as the recording is now available on Maven.

LLM Finetuning (Hamel + Dan) ā–· #🟩-modal (18 messagesšŸ”„):

  • Modal Credits Received with Enthusiasm: Multiple users confirmed receiving credits from Modal and expressed eagerness to start fine-tuning models. One user said, ā€œTime to hack something.ā€.
  • Curiosity about Using Modal for Pure PyTorch Code: A user asked about utilizing Modal for fine-tuning LLMs with pure PyTorch code, comparing it to using Jarvis Labs. Another user confirmed it’s possible, sharing their experience training SentenceTransformer models with Modal.
  • Dataset Management in Modal: Discussion included how to upload datasets and use them within Modal, with detailed code examples and steps provided. Steven Merrill walked through setting up a Parquet file, building volumes, and annotating functions with GPU metadata.
  • Modal Documentation and Examples: Users shared useful links to Modal documentation and examples, including volumes documentation and a TensorFlow tutorial, which could be adapted for PyTorch.
  • Using Modal for Kaggle Competitions: One user planned to leverage Modal for a Kaggle competition, involving downloading data, library installations, fine-tuning, and saving models/logs. Another mentioned running Jupyter servers on Modal for up to 24 hours, sharing a link to the Jupyter inside Modal example.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #jarvis-labs (16 messagesšŸ”„):

  • Saving VSCode repo on Jarvis: A member inquired about saving their repo on the VSCode instance on Jarvis without pausing it to save credits. Another suggested publishing the code to GitHub and cloning it back as needed, while paused instances only charge for storage, which is minimal.
  • Removal of spot instances: The platform temporarily removed spot instances due to instability and low utilization issues.
  • Fine-tuning open-lama-3b cost and duration: Fine-tuning the open-lama-3b on gpt4-LLM-cleaned data took 3 hours 44 minutes on an RTX6000Ada, costing roughly $4. A discussion followed about the small size of LORA weights likely explaining the apparent instant upload to Huggingface.
  • Ampere series error with Axolotl: A user encountered an error with preprocessing on an A6000, which was resolved by changing bf16 to false and fp16 to true.
  • Course signup credits issue: A user reported not receiving credits after signing up for a course and joining Jarvis; the admin responded that new lists are processed, and credits will be added once the user’s information is received.

LLM Finetuning (Hamel + Dan) ā–· #hugging-face (9 messagesšŸ”„):

  • HF credits to be distributed soon: Members inquired about the process for obtaining HF credits. Details will be announced soon by email, and credits will be granted to attendees who fill out a form being sent over the weekend.
  • Best model for Spanish text generation: A member asked for recommendations on models for fine-tuning specifically for Spanish text generation tasks. Mistral 7B was suggested as a fluent option, and Llama 3 was mentioned as another model yielding solid results despite not being officially multilingual.

LLM Finetuning (Hamel + Dan) ā–· #replicate (1 messages):

  • Upcoming Announcement on Credits: An announcement regarding the management and distribution of credits will be made soon. ā€<@739531318571958272> is going to be running these credits but we are making an announcement soon about themā€.

LLM Finetuning (Hamel + Dan) ā–· #kylecorbitt_prompt_to_model (164 messagesšŸ”„šŸ”„):

  • High Expectations for the Talk: Members expressed excitement about the talk despite time zone challenges, with a call for recording it. *"I really want to see this but can't make it 😦 will it be recorded?"*
  • Link Overflow: Multiple links were shared including Hamel's [LLM inference notes](https://hamel.dev/notes/llm/inference/03_inference.html), [Argilla](https://argilla.io/), and the [MTEB Benchmark](https://huggingface.co/spaces/mteb/leaderboard). A significant number of resources were gathered from the talk.
  • Interactive and Humorous Session: Members appreciated the interactive vibe with humorous exchanges about fine-tuning and sleep schedules. *"Fine-tuning is not only expensive in GPU compute terms, but also affecting our sleep schedules!"*
  • Discussing Efficient Fine-Tuning Techniques: Various fine-tuning methods such as DoRA, MoRA, and LoRA were discussed, with linked articles like [Answer.AI's efficient fine-tuning](https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html). Exploration of context extension techniques like RoPE for models was also mentioned.
  • Commandments for Fine-Tuning: The "Ten Commandments" for deploying fine-tuned models were discussed with a link to the [slides](https://docs.google.com/presentation/d/1IIRrTED0w716OsU_-PL5bONL0Pq_7E8alewvcJO1BCE/edit#slide=id.g2721fb6713e_0_67). Members found the content very practical and beneficial for their work.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #workshop-2 (117 messagesšŸ”„šŸ”„):

  • Sharing the Jarvis Repo Link: A link to nisargvp’s Jarvis repository on Hugging Face was shared along with a config file for setting up the model in Axolotl.
  • Guide for Running Models on Modal: Users discussed running model training smoothly on Modal, pointing out a quickstart guide from Modal Labs and mentioned seamless operations after initial fixes.
  • TinyLLama Fine-Tuning Blog Post: The blog post documenting the fine-tuning process of TinyLLama on the alpaca_2k_test dataset using Axolotl and Jarvis, which can be found here, was shared and appreciated by the community.
  • Observability in LLM Applications: Discussions revolved around incorporating observability into LLM applications to collect user feedback and LLM input/output pairs, highlighting the need for better tracking methods.
  • Modal Training Error Support: Users encountered and resolved issues during Mistral model training using the Modal Labs repo, with community members offering troubleshooting advice and sharing specific error details to diagnose configuration problems.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #workshop-3 (3 messages):

  • Zoom chat confusion leads to Discord: Members were unsure where to continue their conversation after the Zoom chat was disabled. One member suggested moving their discussion to a specific Discord channel, which made sense to others.

LLM Finetuning (Hamel + Dan) ā–· #axolotl (32 messagesšŸ”„):

  • Cache Issue in Axolotl Frustrates User: A member noted that when re-running experiments in Axolotl, an unexpected cache used old data samples, which is documented here. Renaming the dataset file resolved this, prompting another user to suggest running the pre-process step explicitly.

  • Confusion with Missing Files: Users encountered issues like missing simple.yml or qlora.yml files while running training commands on Jarvislabs and Google Colab, leading to unsuccessful executions. A member shared that their qlora run took around 6 hours on 2x4090s GPUs, confirming the significance of using the correct files and configurations.

  • Inquiries About Sample Packing: One member asked if sample packing in Axolotl concatenates multiple dataset rows to fill the max sequence length. Another member confirmed this, explaining that although they are concatenated, the attention is set so that rows don’t attend to one another.

  • RuntimeError with BFloat16 in Google Colab: A RuntimeError related to BFloat16 not implemented for BFloat16 on T4 GPU led a user to switch from Google Colab to Jarvis-labs. They were advised to check PyTorch and CUDA versions, with a switch to the example configuration solving the issue.

  • Guide on Tokenizer Gotchas Shared: A user shared a link to Hamel’s notes on tokenizer gotchas, addressing intricacies in prompt construction and behavioral differences between training and inference due to tokenization handling.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #zach-accelerate (118 messagesšŸ”„šŸ”„):

  • User confusion over float16 and float32: There was a question about why float16 numbers appear higher than float32 in a displayed table. A link to a past discussion on the topic was provided to clarify the confusion.

  • Configuration issues with Jarvislab resolved: User encountered an error with the Jarvislab training command regarding a missing configuration file. Another user advised changing the command to use accelerate launch -m axolotl.cli.train hc.yml, which resolved the issue.

  • Optimizing Axolotl runs on different GPUs: A member requested advice on adjusting accelerate configs for optimized axolotl runs on varied GPUs. It was suggested to map configs back to the axolotl yaml, avoiding direct acceleration config settings.

  • Resources for learning model Accelerate: Users discussed how to get started with Accelerate for finetuning tasks, with advice to stick with higher-level abstractions like axolotl for simplicity and learning depth.

  • Hyperparameters and Inference precision: Inquiry on optimal learning rates for extended vs. undertrained models and issues with BF16 precision in T4 GPUs. Suggestions included asking in Zoom QA for hardware-compatible solutions or transforming weights for supported datatypes.

Links mentioned:


LLM Finetuning (Hamel + Dan) ā–· #wing-axolotl (192 messagesšŸ”„šŸ”„):

  • PR for latest axolotl and llama 3 demo merged: The Modal LLM fine-tuning repository now includes the latest axolotl updates and a llama 3 fine-tuning demo.
  • Seeking dataset templates and pre-processing issues: Members inquire about chatml.intel dataset templates and encounter issues during pre-processing, particularly with decoding due to dataset structure lacking numeric IDs. Reference: Axolotl Docs.
  • Clarifications on Axolotl configurations: Discussions reveal that default config values like load_in_8bit and load_in_4bit are set to False if not specified, with recommendations to inspect code directly for clarification.
  • Template-free prompt construction confusion: A member found the documentation on template-free prompt construction confusing, while others clarify the importance of template correctness.
  • Office Hours Q&A highlights debugging and stack insights: Members express the importance of debugging tools for understanding inputs and samples during training, advocate for rigorous template validation, and suggest callback functions for logging model predictions, referencing Axolotl Callbacks.

Links mentioned:


HuggingFace ā–· #announcements (1 messages):

  • Visualize Proteins with Proteinviz: Check out Proteinviz for creating custom visuals of proteins. This tool is made by a dedicated community member.

  • Speedy SDXL Results: The SDXL flash space delivers impressive results fast. Credit goes to the creator for this efficient build.

  • Custom Tokenizers Inspired by Karpathy: A community member shared their custom tokenizer, which is inspired by Karpathy’s work. This highlights ongoing innovations within the community.

  • Mistral-7B v0.3 Demo: Experience rapid performance with the Mistral-7B v0.3 chat demo. It’s another example of cutting-edge developments by active contributors.

  • Create Transparent Images with Diffusers: Generate transparent images using Diffusers, a project facilitated by another community member. This feature allows for creative visual outputs using advanced diffusing techniques.

Links mentioned:


HuggingFace ā–· #general (490 messagesšŸ”„šŸ”„šŸ”„):

  • AutoTrain Data Formatting Questions: Members discussed how to format data for finetuning in AutoTrain, with suggestions to reference the AutoTrain documentation. Example CSV formats and nuances of input data types were shared, enhancing clarity on setup.
  • Advanced LLM Fine-Tuning: The difference between DPO and RHLF methods for fine-tuning LLMs was highlighted, suggesting SFT followed by RHLF for teaching text-completion models conversational norms. Links to specific datasets and finer model adjustments were also shared.
  • Pandora Model Excitement: Details about the Pandora model, a new open-source text-to-video model, were shared along with a preview link. Discussions on its smartness and potential applications created significant excitement among members.
  • Mobius Model Controversy: The upcoming Mobius diffusion model faced scrutiny with comments about controlled quality and composition training. Resulting discussions emphasized its potential to significantly reduce the cost and complexity of developing new diffusion models.
  • Learning and Development Resources: Several members including @temeretam discussed educational and professional paths for advancing in AI, while others sought advice on specific coding and data handling problems, referencing both GitHub and Hugging Face documentation links for technical support.

Links mentioned:


HuggingFace ā–· #today-im-learning (8 messagesšŸ”„):

  • Deep RL for Embodied AI sparks interest: A member shared their enthusiasm about learning Deep Reinforcement Learning specifically for Embodied AI applications and invited detailed updates on progress.

  • Fast.ai courses recommended for AI beginners: Suggested Fast.ai’s part 1 & 2 courses which cover practical deep learning tasks using HuggingFace libraries and offer a strong foundation for beginners in deep learning. Course details can be found here.

  • Coursera course on Generative AI with LLMs: Recommended Generative AI with Large Language Models course on Coursera for those interested in gaining foundational knowledge in AI. The course is designed to be completed in 3 weeks, details available here.

  • PixART Diffusion Model Call Event: Announced a call event for an in-depth review of the PixART diffusion model for text-to-image synthesis, scheduled for Friday at 10:00 AM Pacific time. Additional information and community interaction can be found here.

Links mentioned:


HuggingFace ā–· #cool-finds (3 messages):

  • Exciting ChatGPT Applications in Drug Discovery: A link to a study was shared discussing the potential use of ChatGPT and other LLMs in next-generation drug discovery. The article, published in the International Journal of Surgery, highlights contributions from various institutions across India and Bangladesh Read more.

  • PostgresML and LlamaIndex Make Waves: An integration of PostgresML with LlamaIndex was highlighted in a recent Medium post. This integration promises to unlock new potentials in AI advancements, with detailed insights available in the article.

Link mentioned: ChatGPT or LLM in next-generation drug discovery and… : International Journal of Surgery: An abstract is unavailable.


HuggingFace ā–· #i-made-this (22 messagesšŸ”„):

  • Protein Dataset Gets Major Updates: A member shared updates on their protein visualization project, adding examples for human hemoglobin, mouse GTPase, and human ribosomal protein. They also implemented support for 3D rendering and created an in-depth example table on GitHub.

  • Transcription App with OpenAI’s Whisper Rocks!: A member introduced their transcription app for YouTube videos, audio files, and video files, utilizing OpenAI’s Whisper. Check it out on Hugging Face Spaces.

  • Call for Feedback on Decentralized Internet Infra: One member requested feedback and participation in a survey for their project building infrastructure for a decentralized and agent-centric internet: survey link. This sparked a debate about spamming channels and the ethics of data collection through surveys.

  • 3D Model Visualization in Browser Challenges: Despite challenges with 3D model rendering of protein structures in the Gradio browser, there is ongoing effort to find a solution. Helpful resources include a blog post on Hugging Face.

  • SimpleTuner Bug Fixes Improve Training: A member highlighted that fixing some minor bugs in SimpleTuner significantly enhanced its training performance. Now it trains better than ever.

Links mentioned:


HuggingFace ā–· #computer-vision (4 messages):

  • Monthly Computer Vision Hangout Announced: An upcoming monthly Computer Vision Hangout was introduced, aimed at discussing projects, ideas, and problems in CV-related fields. More details and event participation can be found here.

  • Seeking Invoice Processing Solution: A member inquired about an open-source neural network or paid API for extracting structured line-by-line information from scanned invoices. They requested the output to be formatted as JSON, specifying fields like product_id, description, quantity, unit_price, and total_price.

  • Looking for Deep Learning Study Partner: A user expressed interest in finding a deep learning study partner who shares a passion for AI and data science. They emphasized a mutual drive to explore neural networks, complex algorithms, and innovative projects.

  • Request for ViT Resources in Depth Estimation: Another member asked for resources on utilizing Vision Transformers (ViT) for monocular depth estimation. They indicated an interest in building their own model using ViT and are seeking guidance.

Link mentioned: Join the Hugging Face Discord Server!: We’re working to democratize good machine learning šŸ¤—Verify to link your Hub and Discord accounts! | 79727 members


HuggingFace ā–· #NLP (8 messagesšŸ”„):

  • Quantisation Anomalies in Mistral v0.3 Instruct: A member reported unexpected performance issues when comparing Mistral v0.3 Instruct using bitsandbytes 8-bit, 4-bit, and fp16 quantisation levels. They found that while fp16 and 4-bit took around 100 seconds, 8-bit took 500 seconds, despite expectations of 8-bit being faster than 4-bit.
  • Switching from Pipelines to Generate Without Impact: The same user noted that switching from pipelines to the generate() method, per the documentation for text generation with 8-bit models, did not improve the performance as expected.
  • Bitsandbytes Version and Optimization Tips: In response to the performance issue, another member inquired about the version of bitsandbytes being used and suggested trying int8_threshold=0 for potential performance gains. The original user mentioned they are using a batch size of 1 and contexts ranging from 500 to 2000 tokens.

HuggingFace ā–· #diffusion-discussions (6 messages):

  • Seeking NLG Learning Resources: A member asked for recommendations for learning Natural Language Generation (NLG). Responses to this query were not provided in the message history.

  • Query about Training Stable Diffusion on Custom Dataset: Another member asked for official documentation on training Stable Diffusion (SD) to generate images from a custom dataset such as MNIST. They mentioned finding documentation on the site, but it seemed to focus on unconditional generation.

  • Looking for Deep Learning Study Partner: A different member expressed interest in finding a partner to learn deep learning with. They emphasized a desire for someone equally passionate about AI and data science, keen to explore neural networks, complex algorithms, and innovative projects.

  • Help Needed for Converting pth+index File to Hugging Face Link: A member requested assistance in converting a pth+index file into a Hugging Face link RVC model. This technical query did not receive an immediately visible response.


Perplexity AI ā–· #general (493 messagesšŸ”„šŸ”„šŸ”„):

  • Perplexity vs. ChatGPT for Data Processing: Discussion emerged on the capabilities of Perplexity and ChatGPT in processing CSV files, with mentions that Perplexity already supports CSV uploads. Julius AI, an alternative for data analysis, was highlighted for running on Python and leveraging LLMs like Claude 3 or GPT-4.

  • Disappointment with Claude 3 Opus: Users expressed dissatisfaction with Claude 3 Opus due to increased restrictions and lower utility, particularly in handling copyrighted material. Some suggested alternatives like GPT-4o but acknowledged that Claude 3’s usefulness has diminished.

  • Pro Search Features and Enhancements: Users noted new features in Pro Search, with enhancements including multi-step reasoning and updated API specs fetching. However, some users observed that such updates might be part of A/B testing and only involve UI changes rather than backend improvements.

  • Tool Integrations and Custom Function Calls: There were discussions on Claude’s capacity for external tool integration via APIs, and attempts to replicate ChatGPT’s data analysis tool through custom function calls and serverless backend solutions. Links to relevant documentation like Tool Use with Claude were shared.

  • Ethical AI and Communication Analysis Projects: Talks included the creation of GPTs for communication analysis and ethical behavior monitoring, with suggestions that such tools could help improve workplace communication and reduce wrongful termination suits. Users debated the feasibility and philosophical implications of encoding ethics into algorithms.

Links mentioned:


Perplexity AI ā–· #sharing (7 messages):

  • Peran Kepala Sekolah shared: A brief link is shared to Peran Kepala Sekolah without additional context or discussion.
  • What is PB55 explained: A link provided to what is the PB55 for further reading.
  • Origin of ā€˜makura’ explored: A user shares a link to explore the etymology of the Japanese word ā€œęž•ļ¼ˆć¾ćć‚‰ / makuraļ¼‰ā€ here, which means pillow.
  • Ensure thread shareability: A reminder is given with an attachment to ensure threads are shareable with a link to Discord thread.
  • Stuart Hall’s theory discussed: Stuart Hall’s encoding/decoding model is shared.
  • Opus 50 limit queried: A user inquires about the Opus 50 limit.

Perplexity AI ā–· #pplx-api (1 messages):

  • References feature still in beta limbo: A user questioned the status of references being in beta and expressed frustration over not receiving a response after applying three times. They asked if anyone knew when this feature would be released in the API.

Stability.ai (Stable Diffusion) ā–· #general-chat (427 messagesšŸ”„šŸ”„šŸ”„):

  • Rumors of RTX 5090 Specifications Stir Debate: Discussions center around new rumors that the RTX 5090 may feature 32GB VRAM, igniting skepticism about the feasibility and utility. One member shared a link to purported images, but others criticized these as misleading.

  • Stable Diffusion Installation Guidance: A member seeks advice on installing Stable Diffusion with an AMD 5700XT GPU. Recommendations included trying web services like Craiyon initially, due to potential complications with AMD hardware.

  • Pricing and Access of Stable Diffusion 3: Users debated the merits of Stable Diffusion 3 vs. Midjourney, with some noting that SD3 is available for a free trial. However, it appears that a Stability membership is required for continued access.

  • Introduction of Mobius Model Generates Interest: DataPlusEngine announced the upcoming Mobius model on Twitter, claiming it to be the best stable diffusion-based image model. The model is described as ā€œneither a base model nor a fine tuneā€ and touted for its ability to create new base models efficiently.

  • Curiosity Over GPU Performance and Costs: New GPU models, particularly the 5090, sparked discussions about memory and training speeds. Members noted that higher VRAM like 32GB could detract from sales of high-end data center GPUs like the H100/A100, hinting this could influence Nvidia’s strategy.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #general (275 messagesšŸ”„šŸ”„):

  • PEFT Training Question Resolved: A user faced an issue with the config.json not being created during PEFT training and was advised to copy from the base model’s configuration. The user confirmed it worked and thanked the community for the help.

  • Llama 3’s Bugs Noted: Some users discussed that ā€œSome of Llama 3’s base (not instruct) weights are ā€˜buggyā€™ā€ but Unsloth auto-fixes these. It was advised to use reserved tokens during training and ensure the tokenizer and lm_head are trained.

  • System Prompt Improves Llama3: Users mentioned that adding a system prompt improves Llama3 finetuning performance. One user confirmed that even a blank system prompt can positively impact results.

  • Phi 3 Model Support Announced: It was announced that Phi 3 models, including medium support, are now available. The community showed excitement and shared links to relevant blog posts for more details.

  • Creepy Imprint with Stable Diffusion: Users shared eerie experiences with voice cloning and creepy artifacts generated by Stable Diffusion. They posted links to related YouTube video and a Reddit discussion.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #announcements (1 messages):

  • Phi-3 and Mistral v3 now live: Unsloth now supports Phi-3, Mistral v3, and many other new models. Check out the release details.

  • Llama 3 issues resolved: We’ve fixed all Llama 3 issues so finetuning is much better now. For a deeper dive, refer to this Reddit thread.

  • Explore free Colab notebooks: Access our Phi-3 medium notebook, Mistral v3 notebook, and more.

  • New model support and GitHub Accelerator: See our latest model additions on Hugging Face and learn about our participation in the GitHub 2024 Accelerator.

  • Celebration of AI innovation: We’re excited to join 10 other projects in GitHub’s 2024 Accelerator, highlighting the global impact and rapid advancement of AI innovation.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #random (4 messages):

  • Seek Local VSCode Copilot Recommendations: One user asked, ā€œDoes anyone use local vscode ā€˜copilot’? I would like to try some. Looking for recommendation :)ā€. Another responded with, ā€œtry continueā€, followed by the initial user expressing thanks, ā€œThanks, will try:)ā€.

Unsloth AI (Daniel Han) ā–· #help (103 messagesšŸ”„šŸ”„):

  • Sloth Phi-3 Inference Poses Performance Issue: A user reported slower inference times when using the Unsloth Phi-3 model compared to the original. They shared a Colab notebook to diagnose the issue, but even after suggested modifications, the problem persisted.

  • Custom Model Quantization Issue: One member experienced issues quantizing a custom model derived from an Unsloth notebook. They received errors related to unsupported architecture with llama.cpp and Docker.

  • Resource Requirements for Different Models: Queries about VRAM requirements indicated that 12GB is sufficient for Phi 3 mini, while 16GB is needed for Phi 3 medium. It was also noted that for larger tasks like summarization with a bigger context window, renting computing resources might be necessary.

  • Evaluation DataSet Criteria: A discussion highlighted the importance of using consistent datasets for training and evaluation. Specifically, unslothai’s public datasets on Hugging Face, such as those listed in the Blackhole Collection, were recommended for high quality.

  • Compatibility and Custom Model Support: Several users inquired about the compatibility of Unsloth with older Macs and using GPU-less systems, confirmed that Unsloth is optimized for CUDA and GPU usage. Several workarounds and tips were suggested for CPU-only systems and custom model support.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #community-collaboration (2 messages):

  • Engineer offers enterprise experience to Unsloth: A member, higginsconsultingptyltd_39617, congratulated others on joining the accelerators at Build Club and Github and proposed leveraging their enterprise experience to assist Unsloth. Another member responded positively, expressing eagerness to discuss further, ā€œAbsolutely we’d love to!ā€

Nous Research AI ā–· #off-topic (12 messagesšŸ”„):

  • Master of Plain-Speak Talks PixART Diffusion Model: Interested members can ā€œhear a Master of Plain-Speak describe how he fine-tuned the PixART diffusion modelā€ during a call today at 10:00 AM Pacific Time. Join the event and link to Discord for further discussion or view past topics on their blog and YouTube videos.

  • Excitement Over Intel Libraries: A member expressed excitement to ā€œtinker with the Intel librariesā€ while discussing IPEX and BigDL separation. Potential collaboration and exploration of Intel’s improvements were mentioned.

  • Stable Functionality of IPEX-LLM: Although one member hasn’t used IPEX-LLM, they’ve found that it has ā€œrock-solid stableā€ support where it exists. Discussions included improvements in IPEX-LLM’s setup.

  • Tinygrad OpenCL Setup Insights: If performance is not the main concern, ā€œtinygrad OpenCL is trivial to set up and get runningā€, suggested one member. Another member humorously criticized geohot’s lack of interest due to memory bandwidth limitations.

  • Experimental Stint with drm/xe Driver: Currently, a member is running the experimental drm/xe driver without major issues, apart from the known constraints. They expressed hope that Battlemage will perform better.

Link mentioned: Arxiv Dives with Oxen.AI - Fine Tuning Diffusion Transformers (DiT) Ā· Zoom Ā· Luma: Hey Nerd, join the Herd!… for a little book/paper review. WHAT TO EXPECT Each week we pick a topic to cover in depth and have open Q/A and discussion.…


  • TAS Mario Sunshine sparks AI speedrun debate: A member shared a YouTube video showcasing a tool-assisted speedrun of ā€œSuper Mario Sunshineā€ and discussed the potential of AI mastering such techniques. They pondered the intriguing developments AI might bring to speedrunning and game engine manipulation by imposing specific limitations.

  • Pannenkoek2012’s Mario 64 praised: Another YouTube video was shared featuring a zero A-press speedrun of ā€œSuper Mario 64ā€ by Pannenkoek2012. The member appreciated the content, noting its insights into evolving AI and consciousness through rapid thought processes.

  • Prophetic AI’s Halo and Morpheus-1 impress: A link to Prophetic AI was shared, highlighting the Halo, a non-invasive neural device for lucid dreaming, and Morpheus-1, an ultrasonic transformer generating holograms for neurostimulation. The member emphasized the extreme potential of these technologies for exploring the subconscious mind and consciousness enhancement.

Links mentioned:


Nous Research AI ā–· #general (280 messagesšŸ”„šŸ”„):

  • New Paper on Transformer Circuits: A user shared a link to the new paper, Scaling Monosemanticity, suggesting the community check it out.
  • PyTorchModelHubMixin Class by HF: A member highlighted a class called PyTorchModelHubMixin created by Hugging Face, which allows seamless integration of AI models with the HUB using save_pretrained, push_to_hub, and from_pretrained methods. However, AI models need to stay under 50GB as sharding is not supported yet.
  • Mobius Model Impresses Community: Discussion on the Mobius model showcased its high performance in image generation, particularly in Pixar-style renderings and multi-word text generation. It also generated excitement for potential open-sourcing and further papers explaining its training method.
  • Lively Debate on LLM Understanding: A heated discussion unfolded around whether LLMs truly understand concepts, with one user pointing to interpretability research as a major source of empirical evidence, while another argued that current interpretability efforts are insufficient. They referenced recent research including a paper from Anthropic and debates around the significance of interpretability in AI.
  • Technical Repo for RLHF Models Shared: A GitHub repository, Online RLHF, was shared, detailing a workflow for training reward models for Reinforcement Learning from Human Feedback (RLHF), which aims to surpass results from offline learning methods.

Links mentioned:


Nous Research AI ā–· #ask-about-llms (8 messagesšŸ”„):

  • Llama.cpp script handles function calls: A member shared an update about creating a script using llama.cpp that manages function calls and returns answers from the model based on tool responses. They mentioned being inspired by the Hermes Pro 2 GitHub repo and offered to create a pull request to add a notebook.
  • Hermes model praised: The same member described the Hermes model as ā€œa beast.ā€
  • Looking for LoRA resources on a 3080: A member asked for resources to perform Llama3 LoRA on a 3080 GPU with 10GB. The response recommended checking out unsloth or axolotl.
  • New developer introduction: A new member, a developer from torchtune, introduced themselves and mentioned their interest in tool-calling with Mistral v0.3. They sought advice on fine-tuning models for tool-calling and queried experiences with zero-shot new tools.

Nous Research AI ā–· #project-obsidian (6 messages):

  • Kquant criticizes kquant’s reputation: Members expressed skepticism about kquant, with one stating, ā€œI’ve heard it’s not very great.ā€ Another concurred, sharing similar opinions from colleagues.

  • Concerns on LLM Capabilities: There was agreement that kquant’s capabilities, especially on the LLM side, are dubious, though its vision capabilities were not discussed.

  • Disappointment over product removal: A member mentioned the removal of ā€œSkyā€ in a playful manner, which caused amusement and mirrored shared sentiments of disappointment. Another member humorously expressed that they ā€œstol’t our waifus.ā€


Nous Research AI ā–· #rag-dataset (36 messagesšŸ”„):

  • Models should contextually integrate internal and RAG knowledge: Members discussed the idea of training models to ā€œadd context from its own knowledgeā€ or to override RAG data if it contradicts internal knowledge, emphasizing the shortcomings of depending solely on RAG.

  • Concerns about internal vs. RAG knowledge: A debate emerged over whether internal model knowledge, which could avoid obvious errors, should outweigh RAG, which can sometimes include bad data, highlighting a ā€œdamned if you do damned if you don’t situation.ā€

  • Finetuning can resolve conflicts: A member noted that finetuning with models like GPT-4 or Gemini might prevent illogical outcomes from incorrect RAG data.(ā€œI think any LLM of gemini or gpt4 size can reason that its not safe to put glue stick into your pizza.ā€).

  • Function calling as a form of RAG: A query was posed about whether function calling is a type of RAG, indicating not all nuances of RAG integration are universally understood yet.

  • Benchmarking RAG performance: Discussing RAG performance benchmarks, members agreed user evaluation is crucial, especially for complex, multi-hop questions, despite being easier for single-hop queries.

Links mentioned:


Nous Research AI ā–· #world-sim (21 messagesšŸ”„):

  • Jam Session Video Hits A Snag: Teknium reported that the jam session video has been recorded but there are issues with getting it onto YouTube. They promised to inform the group as soon as it’s uploaded.

  • NightCafe Connection to Nous/WorldSim: Rezonaut introduced NightCafe noting its potential key role for solutions in the Nous and worldsim contexts. They suggested it could enhance the interface by integrating multi-dimensional and multi-sensory communications.

  • Creative Brainstorming for AI Worlds: Rezonaut shared intricate ideas for using AR spaces and visual elements to map out and explore interconnected worlds and dimensions in a manner inspired by biological brain functions and mindmaps. This includes the visualization of knowledge and designed immersive spaces connected like neural networks.

  • Vorpal_strikes’ New Visualizer Fascination: Vorpal_strikes shared a link to an immersive audio-visualizer that caught their interest. The visualizer offers a highly dynamic and immersive environment, potentially useful for creative and AI-based applications.

  • Golden Gate Claude Streams Consciousness in ASCII: Teknium shared a whimsical representation of an AI called ā€œGolden Gate Claudeā€ monologuing in ASCII art about consciousness, simulation theory, and classic AI banter, accompanied by an ASCII depiction. This showcases both playful creativity and deep thematic explorations in AI projects.

Links mentioned:

  • worldsim: no description found
  • Tweet from Kiri (@Kyrannio): Is this terrifying, or amazing? You decide. Golden Gate Claude inner monologuing to itself as a merged Omega Claude, complete with ASCII representations. "Haha, an ASCII art representation of my...

Eleuther ā–· #general (53 messagesšŸ”„):

  • JAX vs PyTorch/XLA on TPU Performance: A member raised a query on the performance comparison of PyTorch/XLA and JAX on TPUs, but the discussion quickly shifted to benchmarking concerns such as warmup and blocking factors.

  • Improving LLM Reasoning Through Fine-Tuning: An inquiry made about fine-tuning strategies that improve LLM reasoning pointed toward a search for scholarly papers detailing specific parts of model training that enhance reasoning capabilities. There were no specific papers referenced in this discussion.

  • Compute Cost of Training GPT-3 Over Time: The conversation covered the substantial drop in compute costs for training GPT-3 from around $4.5M in 2020 to an estimate of $125k-$1M in 2024. These costs varied based on assumptions such as TFLOP rates and GPU-hour pricing, with various users contributing different figures and sources, including a Databricks Blog Post.

  • Validating GPU Costs for Training Models: A critical examination revealed that more realistic estimates for well-connected H100 GPUs are between $2.5-$3/hr, suggesting a $1.25-$1.5M range for substantial models like GPT-3 trained on 1.4T tokens. This underscores the variability and complexity in exact cost approximations for large-scale model training.

  • RAG versus Finetuning for Custom Library Extraction: A user asked whether RAG (Retrieval-Augmented Generation) was the best method for enabling LLMs to extract information from a custom library for specific questions, hinting they were considering both finetuning and RAG for their experimentation needs.

Link mentioned: Turbocharged Training: Optimizing the Databricks Mosaic AI Stack With FP8: At Databricks, we be


Eleuther ā–· #research (249 messagesšŸ”„šŸ”„):

  • JEPA vs LLMs Spark Debate: A lengthy discussion unfolded about JEPA and its potential to lead to AGI as proposed in ā€œA Path Towards Autonomous Machine Intelligenceā€. Members criticized the model for being similar to existing models like GPT and DINO but in different domains, with skepticism about its scalability and context handling: ā€œI don’t see how the JEPA/Lecun path scales even 1/1000 in amount of economically important tasks solved compared to LLM.ā€
  • ROPE’s Influence on Long-Term Context: Members discussed a new approach to RoPE, suggesting it has limitations regarding context length capabilities in LLMs. A recently published paper revisits existing theories and proposes a novel understanding of RoPE’s long-term decay properties: View PDF.
  • Modula: A New Training Strategy: An interesting project called Modula was shared, which introduces scalable neural network training through automatic normalization using the modular norm. Skeptical members found the abstract intriguing but uncertain about its practicality: ā€œIt is very, very, very strangely worded if it is legitimate.ā€
  • Chameleon Model Insights: The Chameleon model, capable of multimodal tasks such as text and image generation, was highlighted. This model is noted for its state-of-the-art performance in multiple domains, suggesting potential competition for established models: View PDF.
  • Bitune Enhances LLM Instruction-Tuning: Bitune, a novel approach for improving instruction-tuning in LLMs through both causal and bidirectional attention, was discussed. This method claims significant improvements in zero-shot performance across several types of reasoning tasks: View PDF.

Links mentioned:


Eleuther ā–· #interpretability-general (3 messages):

  • Tim Dettmers’ quantization research: a mixed reaction: A post highlights Tim Dettmers’ quantization methods described in his paper and blog, explaining no performance degradation transformer inference with advanced quantization methods. It also mentions the intriguing concept of emergent outliers in transformers as ā€œsinks of entropy/informationā€, integrated with Hugging Face via bitsandbytes library.
  • Emergent features as ā€œDNAā€ of the model: The concept of emergent features being invariant across layers and behaving like ā€œsinks of entropyā€ was discussed, with a comparison to ā€œDNAā€ from which the rest of the model’s functionality could be reconstructed. The conversation probes into phase transitions around 7B parameter models and possible parallels to phase transitions in 3SAT or spin glass models.
  • Exploring transfer learning and fine-tuning applications: A member speculated about the potential for using ablation of vectors separating in-distribution and out-of-distribution samples to improve out-of-distribution generalization by minimizing shortcut features. However, this approach is acknowledged as being closer to transfer learning than true out-of-distribution generalization.

Link mentioned: LLM.int8() and Emergent Features — Tim Dettmers: When I attended NAACL, I wanted to do a little test. I had two pitches for my LLM.int8() paper. One pitch is about how I use advanced quantization methods to achieve no performance degradation transfo…


Eleuther ā–· #lm-thunderdome (10 messagesšŸ”„):

  • Set a seed in vllm models: Members discuss setting a seed in model_args for vllm models, noting that while it defaults to seed=1234, it might not be the issue. vllm also allows a per-sample seed in gen_kwargs, typically set to 0 during greedy decoding.

  • List all possible tasks using lm_eval: One member asked how to see the list of all possible tasks to test. Another specified that using lm_eval --tasks list gives a list of all task names, highlighting the need for better documentation.

  • BigBench task names have changed: A member is looking for updated BigBench task names as their 8-month-old eval harness no longer aligns. They are frustrated because the old harness isn’t properly utilizing Accelerate, causing memory issues by overloading GPUs.

  • Organize tasks in lm-eval folder: To find tasks, it’s suggested to look in the lm-eval/tasks folder. It’s mentioned that tasks are ā€œpretty nicely organizedā€ there.


LM Studio ā–· #šŸ’¬-general (142 messagesšŸ”„šŸ”„):

  • Challenges with Small Model Loading on GPU: Members discussed issues related to loading small models on GPUs. One noted, ā€œonly load the biggest small models,ā€ while others suggested trying models like llama3, mistral instruct, cmdr.

  • Better Results with Lower Quantizations: A member shared, ā€œI got better results with llamas q4 than I did q8 for my application,ā€ noting ā€œBigger not always better.ā€

  • Finding Uncensored and Specialized Models: The discussion highlighted the challenge of finding appropriate models, with suggestions to try ā€œdeepseek coder, wizardlm, llama3,ā€ and a link to Hermes 2 Pro for JSON and function calling.

  • Vector Search and Context Management in Queries: Topics included using embeddings and vector search to handle full-article context for better responses. Specific prompts were shared, with one noting it ā€œworks much better with full articles,ā€ providing more detailed answers.

  • Disk Utilization and Performance: Conversations touched on how disk utilization might affect performance, with one noting, ā€œrunning models partially offloaded to swap has worked for me,ā€ though ā€œtok/sec becomes sec/tok.ā€

Links mentioned:


LM Studio ā–· #šŸ¤–-models-discussion-chat (70 messagesšŸ”„šŸ”„):

  • Model Updates Announced: A member announced that the 35B model is incoming, followed by a release announcement. They are actively testing to ensure compatibility with the latest LM Studio version.

  • Compatibility Issues and Fixes: Discussion around compatibility issues with ROCm build and new model versions were highlighted. Confirmed issues were related to outdated versions which will be resolved as ROCm version gets updated in the coming days.

  • Recommendations for Conversational Models: Members discussed decent conversational models, with one recommending Wavecoder Ultra as an excellent choice for coding and learning. Another suggestion was to try Mistral-Evolved-11b-v0.1 for uncensored use.

  • Loading Issues with Specific Hardware: A user reported indefinite loading times using a model on their system with a 5800x3d, 32GB DDR4, 4080 16GB VRAM. They later clarified it worked properly without using web search agents.

  • Potential Issues and Future Releases: Some members expressed anticipation for Phi-3 small GGUFs and discussed optimization differences between medium and small models, noting that phi small models provide better optimization.

Links mentioned:


LM Studio ā–· #šŸ“-prompts-discussion-chat (23 messagesšŸ”„):

  • LLMs struggle with precise character prompts: A user noted that Local Language Models (LLMs) often fail to adhere to precise character limits in prompts. They emphasized the difficulty of avoiding unnecessary additions like opinions or comments.

  • Capitalization and model behavior vary: Discussions highlighted that different models respond variably to capitalized instructions. One user pointed out, ā€œGenerally, LLM’s don’t follow capitalized words on order of importance.ā€

  • Specialized model recommended for multilingual tasks: A recommendation was made for using a specialized multilingual model for tasks like grammar and punctuation correction. The suggested model was Aya 23 8B by Cohere For AI.

  • Temperature adjustment considered for output quality: A user contemplated tweaking the temperature setting in Llama 3 to potentially improve its performance, as they observed, ā€œLlama 3 has a much more… Creative way of doing it.ā€

  • GPU vs. CPU processing time discrepancy: One user mistakenly ran a grammar check task on their CPU, which extended the duration from 35 minutes to an estimated 15 hours. They later corrected this by running the task on GPU, significantly reducing the time required.

Link mentioned: lmstudio-community/aya-23-8B-GGUF Ā· Hugging Face: no description found


LM Studio ā–· #āš™-configs-discussion (6 messages):

  • Tried disabling VPN routing for specific traffic types: A suggestion was made to disable VPN routing for specific traffic types and directly download models from Huggingface, possibly injecting them into the Models directory manually. The strategy is commonly recommended, especially when facing regular concerns about VPN-related issues.

  • CUDA versions on older GPUs may be problematic: It was pointed out that CUDA versions on the GTX 950m might be too outdated to function correctly. This could be a limiting factor in running certain models.

  • Recommendation for using Julius AI: Julius.ai was recommended, offering 10 free chats as a promotional feature. This is presented as a useful resource or tool for users encountering issues.

  • Persistent NVIDIA CUDA issues despite driver updates: Attempts to update NVIDIA drivers and configure different CUDA and CuDNN versions (12.4, 12.1, 11.8) on a system with a GTX 950m GPU have not resolved issues. The user continues to run on AMDOpenCL, leaving the potential CUDA capability of their NVIDIA card unused without clear reasons or solutions.

Links mentioned:

  • Julius AI | Your AI Data Analyst: Julius is a powerful AI data analyst that helps you analyze and visualize your data. Chat with your data, create graphs, build forecasting models, and more.
  • Julius AI | Your AI Data Analyst: Julius is a powerful AI data analyst that helps you analyze and visualize your data. Chat with your data, create graphs, build forecasting models, and more.

LM Studio ā–· #šŸŽ›-hardware-discussion (5 messages):

  • Llama.cpp supports distributed inference: Reddit discussion link revealed that llama.cpp now supports distributed inference with recent RPC code updates. Although it doesn’t support quantized models yet, it can still run models across multiple machines by adjusting certain lines in the code.

  • Exploring PC builds for distributed models: Discussion considered the feasibility of clustering cheap used PCs with RTX 4060 Ti 16GB cards for optimal builds. There was curiosity about the network bandwidth requirements and possible constraints when linking these machines.

  • Using rented online PCs for inference: One suggestion was to use services like Maximum Settings or ShadowPC for renting multiple PCs to run larger models. However, concerns about high costs and specific limitations such as ShadowPC’s inactivity timer and limited 6GB system RAM were raised.

  • Considerations for power consumption and networking: It was noted that RTX 4060 Ti cards draw 160W peak power, implying significant power considerations for host machines. Networking expenses and performance benchmarks are also crucial factors in a distributed architecture setup.

Link mentioned: Reddit - Dive into anything: no description found


LM Studio ā–· #amd-rocm-tech-preview (4 messages):

  • 7900 XTX available?: One member inquired, ā€œ7900 xtx here, where can I get it?ā€ indicating interest in acquiring a specific GPU model.
  • 7900m works on Windows, not sure about Stable Diffusion: Another member shared that the 7900m works on Windows but they haven’t figured out Stable Diffusion on LM Studio. They also mentioned not yet trying it on NixOS with a 6800xt.
  • LM Studio doesn’t support Stable Diffusion: A member clarified that Stable Diffusion is not supported in LM Studio, which is dedicated solely to language models, not image generation models.
  • ROCm praised as a game changer: One participant expressed enthusiasm about ROCm, noting, ā€œdamn ROCm really is a game changer huh.ā€

LM Studio ā–· #model-announcements (1 messages):

  • Cohere models go multilingual: Cohere models are now available in 23 different languages including Arabic, Chinese, French, and more. Check out the download links for aya-23 quants on the lmstudio-community page.
  • Update on deployment requirements: To use the aya-23 models, you’ll need version 0.2.23 or newer. ROCm users will have to wait for an upcoming update.

CUDA MODE ā–· #general (23 messagesšŸ”„):

  • Clarification on Sparsity and Pruning: A member asked if sparsity is just pruning, but the discussion did not elaborate further.
  • Quantization of Neural Networks Questioned: There was a query about whether neural net quantization is only scaling down the precision or if it involves non-uniform quantization like remapping weights to quantiles.
  • Workshop Excitement: One member mentioned that the workshop was rad and expressed excitement to be there.
  • Question Posting Guidance: A user asked where to post questions and was directed to a specific Discord channel by another user here.
  • Announcement Channel Adjustment: A member requested an announcement channel for webhooks, and it was promptly adjusted into an announcement channel by another user, who also commented, ā€œLOL doneā€.

CUDA MODE ā–· #triton (4 messages):

  • Minimum Dimension Requirement for Dot Product: A member questioned why the dot product computation in CUDA requires matrices to have at least a dimension of 16. Another user suggested it might be due to tensor cores requirements.

  • Optimizing Matrix-Vector Multiplication: To optimize matrix-vector multiplication K v, a member asked if padding the vector to a shape of n by 16 would be advisable. They also pondered whether running sum(K * v.T, axis=-1) would be cheaper performance-wise.

  • Symmetric Matrix Computation: Discussion on whether performance can be improved by not recomputing already computed parts of a symmetric matrix. The member inquired if there is a special order of computation that could be considered to boost performance.


CUDA MODE ā–· #torch (1 messages):

davidgonmar_: Might be inplace operators?


CUDA MODE ā–· #announcements (1 messages):

  • Exciting live coding session with Izzat El Hajj: A speaker event featuring Izzat El Hajj, co-author of the PMPP book, is scheduled for tomorrow at <t:1716663600:F>. The highlight of the event will be actual live coding of the Scan algorithm, which is crucial for modern ML algorithms like Mamba, promising an engaging session for attendees.

CUDA MODE ā–· #pmpp-book (4 messages):

  • Excitement builds over book purchase: A member announced, ā€œI bought the book,ā€ sparking curiosity from another member who asked how they liked it. The buyer responded that they had just bought it and would see how it is.

  • Upcoming PMPP author events: A member informed the channel about opportunities to meet and discuss with PMPP authors in the upcoming weeks. They mentioned that Prof Izzat El Hajj will present SCAN topics tomorrow and next week, and Prof Wen-mei Hwu will present later this summer. Check out the events calendar for more details.


CUDA MODE ā–· #torchao (5 messages):

  • int4 dtype functions lack implementations: A member noticed a lot of functions aren’t implemented for the int4 dtype, even mentioning that the test script contains a few TODOs. They questioned if this gap is worth addressing (ā€œIs this worth working on?ā€).

  • uint4 extensions and limitations discussed: References were made to uint4 extensions, highlighting specific limitations such as type promotion constrained to uint8 and tensor shape operations like unbind and slice having restrictions. Another member stated that sub-byte dtypes are typically utilized in custom kernels rather than standard eager/compile functions.

  • uint4 needs improvement: A member straightforwardly pointed out that ā€œuint4 indeed does need some loveā€, indicating a recognized need for enhancement in this area.

  • Questioning the value of the task: Another member posed the question of what defines whether the task is ā€œworth working on,ā€ hinting at a need for clarity on the potential benefits versus the required effort.

Link mentioned: Supporting new dtypes in PyTorch: tldr; This post explains what adding a new dtype to PyTorch core means, the criteria of adding a new dtype to PyTorch core and the official recommendation of how to support new ā€œsecondary dtypesā€ use …


CUDA MODE ā–· #llmdotc (115 messagesšŸ”„šŸ”„):

  • Gradient Norm Issues with Batch Size: A bug was identified where changing the batch size from 32 caused the gradient norm to spike significantly, causing failures in the training process. As one member phrased it, ā€œthe gradient norm is suddenly really really large and training failsā€.
  • Exponential Notation Parsing Issue: Members discussed a problem with passing floats in exponential notation to C, noting that -l 3e-4 doesn’t get parsed by atof. It was noted that using 3.0e-4 might work, but this will need to be tested later.
  • Deterministic Kernels for Multi-GPU Runs: Members discussed the importance of getting deterministic kernels before any larger run, pointing out that a 124M model is still relatively small but more extensive runs would need determinism.
  • FineWeb Dataset Storage and RAM Usage: The FineWeb dataset is large, with intermediate disk usage reaching 70 GB and RAM usage up to 64 GB during processing. This has led to performance issues across systems with different configurations.
  • Exploding Gradients Fix: A fix for the exploding gradients issue, especially with large batch sizes, was implemented and tested successfully. This fix prevents indexing overflow in the fused classifier as mentioned in this PR.

Links mentioned:


CUDA MODE ā–· #rocm (2 messages):

  • Dreams of MI300 Gaming Card: One member speculated, ā€œmaybe after the mi300 does well they will ship a gaming card that works XD.ā€ Another humorously replied, ā€œA person can dream at least.ā€

CUDA MODE ā–· #bitnet (1 messages):

mobicham: https://arxiv.org/pdf/2405.14854


Modular (Mojo šŸ”„) ā–· #general (90 messagesšŸ”„šŸ”„):

  • Funding Python Libraries’ Port to Mojo: A user questioned the availability of a budget to incentivize developers of major Python libraries like psycopg3 to port their work to Mojo. It was discussed that the fast-evolving API and lack of stable FFI story could potentially burn out maintainers if pursued prematurely.
  • Debate on Porting Libraries: Some members argued against the practicality of asking existing Python libraries to port to Mojo, pointing out the challenges and potential unwelcome response. Others highlighted that C libraries, specifically those with no dependencies, might be more suited for early porting efforts.
  • Comparison with Rust and Future Prospects: Security benefits of moving to Rust were mentioned favorably, although it was noted that Mojo aims to suit different use cases without fully replacing C. Discussions touched on Rust’s commitment to portability and the potential of Mojo leveraging similar concepts.
  • BlazeSeq on MacOS: A user faced issues running BlazeSeq on MacOS, which was resolved by using the nightly version of Mojo. Feedback on performance was shared, showing similar efficiency between BlazeSeq and Rust’s Needletail, indicating promising results on Mac’s Ventura pro-max M2 arm64.
  • Prospects of HVM for Various Languages: There was a discussion about the HVM being used for running various programming languages like Python and Haskell, similar to JVM. Attention was drawn to an explanation by Victor Taelin about HVM’s potential despite its current performance limitations.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #šŸ’¬ļø±twitter (1 messages):

ModularBot: From Modular: https://twitter.com/Modular/status/1793797622572220431


Modular (Mojo šŸ”„) ā–· #ai (12 messagesšŸ”„):

  • Training ML models and inference in Mojo?: One member inquired about the future of training ML models and running inference natively in Mojo, and if Modular has plans to introduce a PyTorch-alternative written in Mojo. ā€œThey have Max Engine, which can be used in place of numpy for inferenceā€ but no plans for a training framework.
  • Level-Up Celebration with ModularBot: ModularBot congratulated a member for reaching level 16 with a whimsical comparison to a knight’s journey. The bot continued with playful banter about taco preferences but clarified it cannot send funds.
  • Curious about ModularBot’s model: A member asked about the model ModularBot is based on, and the bot responded with a fanciful narrative, stating it is ā€œforged from the fires of ancient forgesā€ and adept at dispensing knowledge, not funds.

Modular (Mojo šŸ”„) ā–· #šŸ”„mojo (31 messagesšŸ”„):

  • Low-bit-depth networks spark debate: Discussions on the utility of low-bit-depth networks for embedded AI systems emphasized the importance of potentially incorporating dedicated support in programming languages. ā€œHaving an easy, language-supported means to specify that you wanted limited bit depth would be a big step to making small embedded AI systems.ā€

  • FFT in Mojo: Scipy vs FFTW: One member sought advice on performing FFTs in Mojo, weighing the use of Scipy’s FFT functions against wrapping FFTW. Another member suggested referring to a discussion on Tensor to NumPy array conversion for more insights.

  • Function-only structs without initialization: A proposal for a decorator to create function-only structs without initialization sparked a discussion on using @staticmethod to achieve similar functionality. ā€œI guess what I want is to be able to call a variation of that once for an entire struct.ā€

  • Mojo function argument handling update: A user highlighted a recent update on how Mojo processes function arguments, shifting from making copies by default to using borrowed conventions unless mutations occur. The update aims to ā€œimprove consistency, performance, and ease of use,ā€ as outlined on GitHub changelog.

  • Compile-time metaprogramming confusion: A user encountered issues with a function designed to build tables at compile time, facing a ā€œrange check issueā€ with list indexing. Another member proposed setting the list size explicitly using table.size, table.resize(256*n, 0), or table.append to resolve the issue.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #performance-and-benchmarks (2 messages):

  • Benchmarking in Jupyter vs Compiling questioned: A member asked about the reliability of benchmarking in a Jupyter notebook versus compiling. Another responded that one should benchmark in an environment similar to production and provided detailed tips to enhance precision, emphasizing compiled benchmarks and CPU isolation techniques.

Link mentioned: CPU Isolation – Introduction – by SUSE Labs (part 1…: This blog post is the first in a technical series by SUSE Labs…


Modular (Mojo šŸ”„) ā–· #šŸ“°ļø±newsletter (1 messages):

Zapier: Modverse Weekly - Issue 35 https://www.modular.com/newsletters/modverse-weekly-35


Modular (Mojo šŸ”„) ā–· #nightly (34 messagesšŸ”„):

  • Mojo 24+ introduces breaking changes: A user experienced a runtime error with mojo parser.mojo Diffusion.bwpreset after updating to Mojo 24+. The culprit was identified as a type mismatch in a method, solved by ensuring read_bytes returns List[SIMD[uint8, 1]] (repo link).

  • Traits to support f-strings proposed: There was a discussion about contributing to f-string support with a Formatable trait in Mojo. One member suggested starting with something akin to Python’s __format__ method handling format_spec.

  • Documenting bug in DTypePointer[bool]: A member discovered inconsistent behavior in DTypePointer[bool] when storing/loading with different widths and filed a bug report. The issue possibly involves bitpacking and alignment, providing code examples to reproduce the behavior.

  • Mojo nightlies released frequently: Users discuss the rapid deployment of nightly builds, now updated to 2024.5.2414. Links were shared to changelogs and community meetings for updates (roadmap, community meeting).

  • Alignment issues with bitpacking: Another alignment-related bug affected storing bool values in memory. Workarounds and multiple implications were discussed, leading to further exploration and bug documentation for community visibility.

Links mentioned:


OpenAI ā–· #ai-discussions (116 messagesšŸ”„šŸ”„):

  • Run an LLM with Nvidia A40: Participants discussed whether it is possible to run Large Language Models (LLMs) using an Nvidia A40 GPU, indicating interest in hardware requirements for AI tasks.
  • Microsoft Copilot+ PC features: There was a detailed discussion on Microsoft Copilot+ PCs, which include features like ā€œsketch to imageā€ in Microsoft Paint. Users debated the capabilities and recommended checking out alternatives like Leonardo.ai for similar functionalities.
  • Water consumption by AI models: Concerns were raised about the water usage of training AI models, with gizmodo article shared to highlight the environmental impact of AI technologies. Participants expressed the need for making AI more energy-efficient.
  • AI empowerment and iterative work: There was a conversation about empowering AI with iterative work to refine outputs. Some users pointed to projects like AutoGPT that attempt to address iterative improvements but acknowledged the cost issues associated with such tasks.
  • GPT-4’s capabilities vs. GPT-3.5: The participants compared GPT-4’s improved ability to handle specific tasks like word counting when compared to GPT-3.5. An example was shared showing GPT-4 completing a word count task correctly by following a detailed process.

Link mentioned: Training ChatGPT Required Enough Water to Fill a Nuclear Cooling Tower: An average user’s conversational exchange with ChatGPT amounts to dumping a large bottle of fresh water out on the ground, new research says.


OpenAI ā–· #gpt-4-discussions (11 messagesšŸ”„):

  • GPT refuses to output Typst code: A user complains that GPT defaults to writing LaTeX instead of Typst code, despite explicit requests. They are frustrated with GPT’s persistent behavior.

  • Inquiry about GPTs running on 4o: A user asked if GPTs are running on GPT-4o. It’s confirmed indirectly that GPT-4 capabilities might include building further advanced models.

  • Clarification on Vision capabilities: Mixed responses on whether Vision is out. One user confirms GPT-4 and GPT-4o can analyze images, while another negates it.

  • Addressing Invalid Request errors: A user reaches out to see if a peer resolved their Invalid Request error from a year ago. They mention currently experiencing the same issue and seek assistance.

  • Discussion on monetizing legal knowledge ChatGPT: A user asks for opinions on selling a company embedding ChatGPT with legal knowledge for $650 million dollars. This remains a provocative inquiry but receives no elaborate response.


OpenAI ā–· #prompt-engineering (8 messagesšŸ”„):

  • Improving Prompt Engineering for Name Selection: A member asked for advice on structuring a prompt to either provide a name if a code is given or vice versa. Another member suggested a solid prompt but did not offer further details.
  • AI Should Verbalize Problem-Solving Steps: One member observed that clarifying the need for the AI to ā€œverbally work out a problem step-by-stepā€ often resolves issues. There was no further elaboration on specific steps or examples.
  • Fun Custom Instruction for Assistant Persona: A member shared a custom instruction called ā€œPONDER,ā€ which directs the AI to engage in a soliloquy-like, self-reflective exploration on a topic, preferably seeking creative insights. This setup involves an autoprompting loop initiated by a user input of ā€.ā€ and showcases innovative patterns through a dynamic ideational network.

OpenAI ā–· #api-discussions (8 messagesšŸ”„):

  • Improving prompt engineering for name selection: A member seeks advice on how to configure a prompt to return a code when a name is expected and vice versa. They received a positive response indicating the prompt was solid.

  • Citation needed: A member asks for a ā€œcitation?ā€ in the middle of a discussion, but no specific context is provided.

  • Clarify AI problem-solving with verbal steps: Noted that prompting the AI to verbally work through a problem step-by-step can enhance its problem-solving capabilities.

  • Fun and useful custom ā€œponderā€ instructions: Shared a detailed custom instruction for making the AI ā€œponderā€ and enter an autoprompting loop using the cue of ’.’ from the user. This method is described as both fun and a tool for exploring connections and generating insights creatively.


LangChain AI ā–· #general (83 messagesšŸ”„šŸ”„):

  • Using CSV Agent in LangChain: Members discussed how to use a CSV agent as part of an LLM chain in LangChain. Documentation links were shared for further details.

  • Sequential Chains with CSV Agent: Instructions were provided on integrating a CSV agent into a SequentialChain along with other chains like wiki_chain and verifier_chain. Specific parameters like output_variables were highlighted for configuring the chain’s behavior.

  • CSV Agent Custom Output Key: Guidance was given on customizing the create_csv_agent to set the output key as csv_response. This involves modifying the output_key parameter in the LLMChain of the agent.

  • Memory in Sequential Chain: There was a request for adding memory to a Sequential Chain, with examples provided on using ConversationBufferMemory and implementing the memory within an agent setup.

  • SQL Agent Issues: Concerns were raised about SQL agents struggling with multi-table queries despite using few-shot prompts, suggesting potential issues with token usage, LLM compatibility, or prompt templates. Specific GitHub issues were mentioned for further context.

Links mentioned:


LangChain AI ā–· #share-your-work (4 messages):

  • OranAITech Showcases on Twitter: A member shared a Twitter link showcasing their latest advancements in AI technology. No additional context was provided.

  • Everything-AI v2.0.0 Launches with New Features: A member announced the release of everything-ai v2.0.0, highlighting its ability to handle tasks such as audio processing, video generation, and 3D protein structure prediction. The project can be accessed on GitHub and comes with detailed documentation.

  • VisualAgents Flow Engineering Demos: Two YouTube videos were shared, showcasing the Visual Agents flow engineering platform built on LangChain: Building a SQL Agent and Building a Simple Retrieval. The platform enables flow creation in a fully browser-based PWA without coding.

  • EDA GPT DEMO by Sounak Roy: A demo for EDA GPT was shared via this link, offering a 5-minute overview of its capabilities.

Links mentioned:


LangChain AI ā–· #tutorials (1 messages):

business24.ai: https://youtu.be/gflsu_6R_8g


LAION ā–· #general (65 messagesšŸ”„šŸ”„):

  • Pirate Bay won’t save AI: A member speculated that ā€œthe pirate bay might eventually end up with a weights category and be the saviour of AI,ā€ but another disagreed, stating it won’t happen due to more AI-friendly policies in other countries.

  • Japan supports AI training: A discussion highlighted Japan’s protective stance on AI training and inference, linking to a tweet discussing a paper on making new base diffusion models without extensive pretraining.

  • Controversy over model technique descriptions: Disputes arose regarding the communication and understanding of methods for creating new base diffusion models. The technique involves ā€œnighshading and other techā€ to disrupt model associations before restoring them, which one user defended against accusations and misunderstandings.

  • Human preference study with Ella-SDXL: A project involving a poisoned model recovery method is under a human preference study in collaboration with fal.ai. The results are forthcoming, and the approach seeks to demonstrate the validity of the method through empirical results.

  • Artifacts in AI-generated images: Critique of the ā€œhigh contrast lookā€ and artifacts in Mobius and other models were discussed, with comparisons to previous AI models like MJv6 and earlier iterations. Members noted issues with latent noise and the visual characteristics of different models.

Links mentioned:


LAION ā–· #research (11 messagesšŸ”„):

  • Anthropic releases research paper on Claude: A member shared a major new research paper from Anthropic about interpreting large language models, where they mapped out the inner workings of Claude 3 Sonnet. The paper highlights the ability to identify and tune specific concept activations, such as the Golden Gate Bridge.
  • Debate on AI as an ad product: A member questioned the potential for companies to leverage AI concept activations as an ad product, sparking a humorous response and a linked example on X. Another member lamented the inevitability of such developments driving them mad.
  • Reflections on AI model progress: A member reminisced about early AI vision work on the Inception v1 model and its evolution to today’s sophisticated models. They commented on the historical importance of hallucinogenic DeepDream for learning about neurons and circuit manipulation.
  • Discussion on sparsity in neural networks: A member explained the architecture and training methodology of a sparse autoencoder, emphasizing the use of L1 norm enforcement to maintain sparsity. They noted that a high-dimensional middle layer typically has only around 300 non-zero dimensions on average.

Links mentioned:

  • Thermodynamic Natural Gradient Descent: Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed ...
  • Tweet from Philip Kung (@PhilipKung5): thank you golden gate claude šŸ˜‚šŸ˜‚šŸ˜‚
  • Golden Gate Claude: When we turn up the strength of the ā€œGolden Gate Bridgeā€ feature, Claude’s responses begin to focus on the Golden Gate Bridge. For a short time, we’re making this model available for everyone to inter...

LlamaIndex ā–· #blog (3 messages):

  • Few spots left for LlamaIndex meetup: ā€œThere’s only a few spots left for Tuesday’s meetup, so grab them while you can!ā€ Stay updated here.
  • Automate tasks using LlamaIndex and MultiOn: ā€œMultiOn is an AI agents platform that works with the web to get real things done by connecting to the Internet through your Chrome web browser and acting on your behalf.ā€ Check out the demo here.
  • Introducing RAGApp - A no-code interface for RAG chatbot: ā€œA docker container that’s easily deployable in any cloud infrastructure and is fully open-source.ā€ Configure your LLM model provider easily here.

LlamaIndex ā–· #general (60 messagesšŸ”„šŸ”„):

  • LlamaParse Emerges as PDF Extraction Solution: Users recommended LlamaParse for extracting data from PDFs with tables and fields, suggesting it’s a suitable out-of-the-box API for the task. LlamaParse supports extraction via GPT-4o.

  • Knowledge Graph Indexing Advice: Discussions addressed challenges with indexing knowledge bases containing links to other pages, suggesting manual triplet creation for KnowledgeGraphIndex while considering VectorStoreIndex for efficiency.

  • LlamaIndex Integration Clarifications: Participants shared confusion over installing LlamaIndex locally with all necessary packages, specifically the LLM OpenAI component, advising to clear cache and ensure proper directory structure.

  • Pydantic Parsing Issues in LLM: User struggled with pydantic model errors during response parsing, with suggestions to add better descriptions to fields and improved input parsing for GPT-4o. The issue pointed to the LLM’s inability to correctly interpret the output class.

  • Better Models for Invoice Processing: Recommendations were made to check HuggingFace MTEB leaderboard for superior embedding models, with specific mentions of BGE, Nomic, and GTE models for tasks like chatting with invoices and PDFs.

Links mentioned:


LlamaIndex ā–· #ai-discussion (4 messages):


OpenRouter (Alex Atallah) ā–· #announcements (1 messages):

  • New AI Model Alert: Phi-3 Medium 128k Instruct: OpenRouter announced the release of Phi-3 Medium 128k Instruct model. Users can check out the standard variant and the free variant, and join the discussion here to share their feedback on its performance and applicability.

Links mentioned:

  • Phi-3 Medium Instruct by microsoft | OpenRouter: Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjust...
  • Phi-3 Medium Instruct by microsoft | OpenRouter: Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjust...

OpenRouter (Alex Atallah) ā–· #general (41 messagesšŸ”„):

  • Wizard Model Shows Improved Performance: Members noticed that wizard model responses have become significantly better, with reduced wait times and more creative answers. ā€œYou still need to babysit it to avoid paragraph repetition, but otherwise, it was quite good,ā€ highlighted one user.
  • Phi-3 Vision Gains Interest: Discussions led to the hype around Phi-3 Vision’s capabilities, with users sharing test links like Phi-3 Vision and mentioning its potential when combined with other models. Another model, CogVLM2, was recommended for vision tasks at CogVLM-CogAgent on Hugging Face.
  • Llama 3 Model Prompt Formatting Clarified: Members clarified that prompts for Llama 3 models get automatically transformed by OpenRouter’s API, eliminating the need for manual formatting. Manual prompt submission is an option, using the prompt parameters and the completions endpoint instead of chat/completions.
  • Llama 3 Parameter Update: Optimal parameters for Llama 3 models are being updated soon due to a recently fixed bug. This update will be pushed within approximately 48 hours, according to a team response.
  • Google’s Gemini API Issues and Limits: Users expressed frustration over Gemini FLASH returning blank outputs despite high token usage. It’s confirmed as a model-side issue, and the discussion highlighted Google’s new daily API usage limits, sparking curiosity about increased OpenRouter Gemini usage.

Links mentioned:


Latent Space ā–· #ai-general-chat (36 messagesšŸ”„):

  • Tensorlake launches Indexify: Members discussed the new open-source product by Tensorlake, called Indexify, which provides a real-time data framework for LLMs. ā€œIt’s like a ā€˜streaming ETL’ layer,ā€ said one member, while another pondered the challenge of sustainability with open source products.

  • Indexify dissected: The design choices behind Indexify sparked interest, partly attributed to its creator’s background with Nomad. There were questions about the sufficiency and monetization of the extractors provided.

  • Hugging Face Leaderboard blogpost shared: A post by Clementine, running the HF OSS Leaderboard, was shared. It delves into LLM evaluation practices and the significance of leaderboards and non-regression testing (Hugging Face blog).

  • Website poisoning works on Google’s AI overviews: A link to a revelation by Mark Riedl about a website poisoning attack that affects Google’s AI overviews (X post). This led to further discussion on using custom search engine browser bypasses to avoid such issues.

  • Thomas Dohmke’s TED Talk on AI in coding: Members discussed Thomas Dohmke’s TED Talk on how AI is lowering the barriers to coding. There were mixed feelings about its current reliability, but acknowledgment that UX improvements allow quicker workarounds for issues.

Links mentioned:


Latent Space ā–· #ai-announcements (1 messages):

  • World’s Fair Diversity Scholarships Available: Those struggling to afford tickets to the AI Engineer World’s Fair can apply for diversity scholarships, which offer either free or discounted tickets for the event from June 25-27 in San Francisco. Applications should include ā€œconcise but specific responses to essay questionsā€ and can be applied for here.

Link mentioned: Diversity Program - AI Engineer World’s Fair June 2024: AI Engineer World’s Fair is committed to assisting underrepresented minorities who want to attend our event. We steadfastly believe in the value of having a wide variety of people attend. We know …


Interconnects (Nathan Lambert) ā–· #random (27 messagesšŸ”„):

  • Tax Invoicing without a Credit Card: Nathan Lambert mentioned an odd situation where a platform sent him an invoice for taxes despite not having a credit card on file. He found the process logical after learning the details about resale certificates.

  • Golden Gate Bridge-Focused AI: The group was intrigued by Anthropic AI’s experiment, which demonstrated altering an AI’s internal features to make it focus on the Golden Gate Bridge. This led to the creation of ā€œGolden Gate Claude,ā€ available for public interaction at claude.ai.

  • Google’s PR Fiasco: Members discussed how Google’s product pipeline issues seem to lead to repeated public failures, such as poorly received AI releases. The conversation highlighted concerns about internal feedback not being heeded and oversights in rolling out substandard models.

  • Response to AI Dataset Claims: A link shared by Philpax refuted claims about Google’s AI datasets, specifically denying reliance on LAION-5B. Google’s AI team emphasized they have superior internal datasets for their research.

Links mentioned:

  • Tweet from Anthropic (@AnthropicAI): This week, we showed how altering internal "features" in our AI, Claude, could change its behavior. We found a feature that can make Claude focus intensely on the Golden Gate Bridge. Now, fo...
  • Tweet from Lucas Beyer (bl16) (@giffmana): Just in case it’s not obvious: the answer is a ridiculous hallucination. Maybe because ā€œGoogle’s ai datasetā€ isn’t even a thing. We’re not touching laion5b, not even for research. We don’t need to, w...

Interconnects (Nathan Lambert) ā–· #lectures-and-projects (2 messages):

  • Advanced CS Lecture Slides Available: Nathan Lambert shared a link to a more advanced version of his CS25N lecture, based on material from CS224N. The slides can be accessed here.

  • Future Recording Announcement: Nathan Lambert mentioned that a recording of the session would be available eventually. No specific dates were provided for the release.

Link mentioned: [21 May 2024] Life after DPO (for alignment): Life after DPO Nathan Lambert || Allen Institute for AI || @natolambert Stanford CS224N: Natural Language Processing with Deep Learning 21 May 2024


OpenAccess AI Collective (axolotl) ā–· #general (17 messagesšŸ”„):

  • GQA confusion with cmdr models: Members were clarifying whether ā€œcmdrā€ and ā€œcmdr+ā€ models have Grouped Query Attention (GQA). One member confirmed, ā€œcmdr+ has gqa. not + doesnt,ā€ showing different specs for each version.
  • VRAM scaling discussion: There was a discussion on how the presence or absence of GQA affects VRAM usage. One user mentioned, ā€œgqa is better than exponential but not linear yeah… it just scales better.ā€
  • Sample packing efficiency improvement: Members highlighted a new PR on GitHub, noting a ā€œ3-4% efficiency improvement with sample packingā€. This was linked to a PR by Dave Sescleifer.

Link mentioned: Switch to parallel FFD bin packing algorithm. by winglian Ā· Pull Request #1619 Ā· OpenAccess-AI-Collective/axolotl: Add support for packing in a distributed context. Add packing efficiency estimate back. See #1516 by @dsesclei. Attempting to rebase the original PR onto the latest main wasn’t terribly clean. I a…


OpenAccess AI Collective (axolotl) ā–· #community-showcase (3 messages):

  • Journal Article Published: A member shared a journal article they co-authored, now published in the Journal of the American Medical Informatics Association. They mentioned their affiliation with UniversitĆ© catholique de Louvain and other contributors to the paper.

  • Congratulations Pour In: Another member congratulated the author on the publication, adding a friendly ā€œcongrats šŸ™‚ā€ note. This shows community support and celebration for the author’s achievement.

Link mentioned: Impact of high-quality, mixed-domain data on the performance of medical language models: AbstractObjective. To optimize the training strategy of large language models for medical applications, focusing on creating clinically relevant systems th


OpenInterpreter ā–· #general (8 messagesšŸ”„):

  • SB-1047 sparks outrage: Members discussed concerns about SB-1047, which they see as an attempt to centralize AI governance among big players like OpenAI. One member called it a ā€œwhimsical, flaming pile of garbageā€ and drew parallels with regulatory capture in Big Pharma and the Energy Sector, arguing it disadvantages smaller developers on tight budgets.
  • Perplexity AI search link shared: A member shared a link to Perplexity AI search regarding SB-1047. No further details or context was provided in the chat about the specifics of the search.
  • Arc Browser’s Call Arc praised: The new ā€œCall Arcā€ feature of Arc Browser was highlighted for its simplicity and usefulness. The member praised it for allowing users to ā€œask your browser to find and collect relevant answers for youā€ effortlessly, sharing a link for more details.

Links mentioned:


OpenInterpreter ā–· #O1 (5 messages):

  • User faces issue with Typer installation: A user stated ā€œqueuelabs: pip install typer does not resolveā€ indicating they are having trouble installing the Typer library using pip.
  • Poetry setup problem troubles users: Another user asked ā€œDid you run poetry install before poetry run 01? Are you running in a virtual environment,ā€ pointing out potential steps missed in the setup process.

Mozilla AI ā–· #llamafile (9 messagesšŸ”„):

  • Twinny + LM Studio blow minds as local co-pilot: A user shared their positive experience using Twinny with LM Studio as a local co-pilot replacement. They asked about running this setup via llamafiles and received confirmation that running two llamafiles at the same time is possible by assigning different ports.

  • Embedding images with llama.cpp endpoint confusion solved: A member asked if the llamafile/llama.cpp server supports images in llava embeddings and shared a command that did not work as expected. They later clarified that the /v1/embeddings endpoint does not accept image_data but using the /embedding endpoint works as expected.

  • Running continue.dev with llamafile performance issues: Another user reported running continue.dev with llamafile, noting it was slow on a Mac M2 but somewhat faster on an older Nvidia GPU.

  • Inquiries on building and training custom LLMs: A member sought advice on building and training a custom LLM using company documentation for internal use. They received a recommendation to use HuggingFace Transformers for training, noting that llamafile only supports inference.

Links mentioned:


Cohere ā–· #general (8 messagesšŸ”„):

  • User Thanks Team: ā€œTHANK YOU!ā€ expressed in response to a previous interaction.

  • Inquiry About 104B Model: A user asked if the team is planning to publish a 104B version of their model family.

  • Langchain Integration Question: A member inquired about the current status and recommendation for using Langchain integration with Cohere.

  • Aya Model Size Clarification: A user asked whether the Aya model on the playground is for the 8B or 35B version.

  • Validation Error with Compressor: An issue was shared regarding a ValidationError with ContextualCompressionRetriever due to an abstract method.

  • ā€œ56 Bananas Equal to 1 Appleā€ Calculation: A calculation problem was explored with CMR+: ā€œ1 apple = 2 pears, 3 pears = 4 oranges, 6 oranges = 7 bananasā€, concluding ā€œ56 bananas are equal to 1 apple.ā€

  • 403 Forbidden Error Troubleshoot: A user reported a 403 Forbidden error despite using the correct production key.


AI Stack Devs (Yoko Li) ā–· #late-night-lounge (6 messages):

  • AI Generated Standup comedy is surprisingly good: A user shared a link expressing surprise at the quality of AI-generated standup comedy. They seemed impressed with its performance.

  • Exploring the Ud.io App: Another user asked if the app mentioned, Ud.io, only does comedy. This inquiry suggests curiosity about the app’s full capabilities.

  • Transforming audio on Suno: A member shared a more ā€œdemonicā€ version of the original audio using Suno. This highlights the versatility of the platform in modifying sound.

  • Interest in Learning Audio Manipulation: One user expressed interest in learning how to create audio modifications similar to the ones shared. This indicates a desire to acquire skills in audio engineering or AI-driven sound manipulation.

  • Dismissive Response: Briefly, a user responded with a curt ā€œNoā€ to a query, indicating either disinterest or negation of a previous statement.

Links mentioned:


MLOps @Chipro ā–· #events (1 messages):

  • Member seeks Google Calendar integration for event tracking: A member inquired about the availability of an event calendar that could be imported into Google Calendar to avoid missing events. They expressed their concern with a sad emoji, indicating a need for a streamlined way to keep track of scheduled activities.

MLOps @Chipro ā–· #general-ml (1 messages):

evelynciara: yess I’m glad this channel exists šŸ˜…


DiscoResearch ā–· #general (1 messages):

datarevised: https://x.com/DataPlusEngine/status/1793803117642854732


{% else %}

Part 2

LLM Finetuning (Hamel + Dan) Discord

Fine-Tuning Facts: Discussion on fine-tuning in the general channel revealed a concern about semantic similarity overfitting due to biased data categories. A user struggled with understanding fine-tuning vis-Ć -vis user inputs and initial model training. Changes in the OpenAI platform’s sidebars were also noted with the disappearance of two icons (threads and messages).

Templates Take the Spotlight: In workshop-1, the importance of configuring templates correctly during fine-tuning was highlighted. In particular, the delimiter ### aids in parsing different input sections, and ā€œend of textā€ tokens indicate when to stop token generation.

Maven Mingles with Moderation: In asia-tz, a light-hearted exchange between members referenced a reunion. A request for a conference talk recording was met, with the video being available on Maven.

Modal Mobilization: Modal users in 🟩-modal shared excitement over received credits, training experiences, and provided specific links to Modal documentation and examples for new users. A plan to use Modal for a Kaggle competition was also shared, including setup and execution details.

Jarvis Jots Down Jupyter Jumble: In the jarvis-labs channel, members discussed storing a VSCode repo on Jarvis with a suggestion to use GitHub for saving work. There was a notice of spot instance removal due to instability. The cost and duration of fine-tuning the open-lama-3b model were shared, and a user resolved an Ampere series error by adjusting model parameters.

Hugging Face Huddles on Credits & Spanish Models: The hugging-face channel saw discussions about pending HF credits and models suitable for Spanish text generation—with Mistral 7B and Llama 3 models being recommended.

Credit Countdown Carries On in replicate, where an upcoming announcement related to credit management and distribution was teased.

Corbitt’s Commandments Claim Clout: Enthusiastic attendees in the kylecorbitt_prompt_to_model channel discussed fine-tuning methods and techniques presented in Kyle Corbitt’s talk, including Ten Commandments for Deploying Fine-Tuned Models.

Axolotl Answers the Call in workshop-2, where users discussed datasets, model training, and troubleshooting in Axolotl. A blog post on TinyLLama Fine-Tuning was shared, and there was a push for integrating observability into LLM applications.

Zoom Out, Discord In: Users from workshop-3 migrated their discussions to Discord after the Zoom chat was disabled.

Axolotl’s Cache Conundrum Causes Confusion: Issues with cache in Axolotl frustrating users and confusion with missing files were resolved in axolotl. Discussions on sample packing and a guide on tokenizer gotchas addressed concerns around efficiency and tokenization.

Accelerate to Victory: zach-accelerate saw users work through confusion over float comparisons, resolve Jarvislab training command errors, and exchange resources for learning model acceleration with a focus on fine-tuning best practices.

Winging It with Axolotl: The wing-axolotl channel collaborated on dataset templates, pre-processing issues, Axolotl configurations, and provided a PR merge for the latest Axolotl updates. They delved into debugging tools and the significance of precise templates for training success.


HuggingFace Discord

Protein Data Visuals Reach New Heights: A new protein visualization project now sports 3D rendering and includes examples for human hemoglobin and ribosomal proteins, with the project details found on GitHub.

Enter the TranscriptZone with OpenAI’s Whisper: A new transcription app that leverages OpenAI’s Whisper to transcribe YouTube videos and more is available at Hugging Face Spaces.

Decentralizing the Web - More than a Dream?: A project building infrastructure for a decentralized internet sought community feedback through a survey, raising discussions about the ethics of data collection.

A Vision Transformers Query in Depth: A member sought resources on applying Vision Transformers (ViT) for monocular depth estimation, indicating an intent to develop a model using ViT, but no specific resources were provided in the discussion.

Quantisation Quandary for Mistral Model: The use of bitsandbytes for 8-bit quantisation on Mistral v0.3 Instruct led to slower performance compared to 4-bit and fp16, a baffling outcome that contradicts expected efficiency gains from reduced-bit computation.


Perplexity AI Discord

  • Perplexity Climbs Over ChatGPT in CSV Showdown: Engineers discussed that Perplexity AI outshines ChatGPT in CSV file processing by allowing direct CSV uploads. Also, Julius AI was recommended for data analysis, leveraging Python and integration with LLMs like Claude 3 or GPT-4.

  • Users Snub Claude 3 Opus: Claude 3 Opus is getting the cold shoulder due to increased content restrictions and perceived diminished utility, with GPT-4 posed as a preferable option despite limitations.

  • Querying Pro Search’s True Upgrade: Upgrades to Pro Search raised eyebrows as users discussed whether new multi-step reasoning features and API specs were genuine backend improvements or merely surface-level UI enhancements.

  • API Integration Articulated: Dialogue around API integration for external tools with Claude generated interest along with sharing of custom function calls, serverless backends, and documentation like Tool Use with Claude.

  • Ethics in AI: More Than a Thought Experiment: Discourse on infusing GPTs with ethical monitoring capabilities sparked, casting light on potential applications in workplace communication and legal defensibility, albeit with philosophical wrinkles yet to be ironed out.


Stability.ai (Stable Diffusion) Discord

  • Speculation Peaks on RTX 5090’s VRAM: There’s buzzing debate over whether the rumored RTX 5090 with 32GB VRAM makes practical sense. References were made to potential specs and images on PC Games Hardware, but some members remained skeptical about its authenticity.

  • Stable Diffusion and the AMD Challenge: Users offered guidance on installing Stable Diffusion on an AMD 5700XT GPU, suggesting that starting with web services like Craiyon may circumvent potential compatibility issues.

  • Stable Diffusion 3: Trial Before Commitment: The community contrasted Stable Diffusion 3 with competitor Midjourney, highlighting that while a free trial is available for SD3, ongoing access would require a Stability membership.

  • Anticipation Builds Around Mobius Model: An announcement concerning DataPlusEngine’s novel Mobius model has garnered significant interest for its claim to create efficient base models. The model, teased on Twitter, is neither a straightforward base model nor a tuned version of something pre-existing.

  • 32GB VRAM: Game Changer or Overkill?: The mention of a 32GB VRAM GPU led to conversations about the potential shift in Nvidia’s approach to data center GPU sales, considering how products with substantial memory could impact the market demand for the H100/A100 series.


Unsloth AI (Daniel Han) Discord

  • PEFT Config Snag Solved: An issue where config.json was missing during PEFT training was resolved by copying it from the base model’s configuration, with the user confirming success.

  • Llama Levitates Above Bugs: The Llama 3 model’s base weights were described as ā€œbuggy,ā€ but Unsloth has implemented fixes. To improve training, the use of reserved tokens and updates to the tokenizer and lm_head are recommended.

  • System Prompt Boosts Llama 3: Incorporating a system prompt, even a blank one, was observed to enhance Llama3 finetuning outcomes.

  • Phi 3 Proliferation: Excitement bubbled as Phi 3 models debuted, sporting medium support. Community chatter pointed engineers toward extensive details in blog posts and release notes.

  • Stable Diffusion’s Sinister Side Show: Creepy artifacts and uncanny voice cloning outputs from Stable Diffusion startled users, with discussions and experiences shared via YouTube videos and a Reddit thread.

  • VSCode Copilot Climbing Onboard: Recommendations for a local VSCode ā€œcopilotā€ were sought and met with suggestions and positive responses in the random channel.

  • Inference Inertia with Phi-3: Slower inference times using Unsloth Phi-3 puzzled one user, who provided a Colab notebook to investigate the lag, with community efforts yet to find a fix.

  • Quantization Quandary Unraveled: A member faced challenges quantizing a custom model, hitting walls with llama.cpp and Docker compatibility, sparking a discussion on solutions.

  • VRAM Verdict for Model Might: VRAM requirements were laid out: 12GB for Phi 3 mini is okay, but 16GB is a must for Phi 3 medium. For hefty tasks, considering outside computing resources was proposed.

  • Data Diligence for Training Consistency: The importance of using consistent datasets for training and evaluation was echoed, highlighting Unslothai’s public datasets like the Blackhole Collection.

  • Platform Possibilities and Cautions: Queries regarding Unsloth support for older Macs were addressed, confirming a focus on CUDA and GPU usage, with suggestions for those on CPU-only rigs.

  • Enterprise Expertise Extension: A community member stepped forward to offer enterprise expertise to Unsloth, hailing the joining of accelerators at Build Club and Github, hinting at synergistic potential for Unsloth’s endeavors.


Nous Research AI Discord

Intellectual Debate Ignites Over AI Understanding: In-depth discussions were had about the true understanding of concepts by LLMs, with interpretability research considered important empirical evidence. Skeptics argued that current efforts are lacking, with references to work by Anthropic on mapping large language model minds.

The Creature from the Llama Lagoon: A technical foray into enhancing Llama models centered around crafting a script that could manage function calls, with Hermes Pro 2’s approach serving as inspiration. Another inquiry circled the implementation of Llama3 LoRA techniques on a 3080 GPU.

Reality Quest in Digital Dimensions: Spearheading a conversation on Nous and WorldSim, members explored the possible applications of NightCafe and multi-dimensional AR spaces in mapping complex AI worlds. Dream-like explorations in audio-visualizers and whimsical ASCII art representations highlighted creative uses for AI-driven simulations.

Sifting Through RAG Data: Advocation for models to integrate internal knowledge with Retrieval-Augmented Generation (RAG) was a hot topic, with questions raised about how to handle contradictions and resolve conflicts. Emphasizing user evaluations was seen as essential, particularly for complex query cases.

Precision over Pixie Dust in Fine-Tuning AI: The community’s discourse featured a celebration of the Mobius model for its prowess in image generation, with anticipation for an open-sourced version and elucidating publications. Additionally, Hugging Face was mentioned for their PyTorchModelHubMixin enabling easier model sharing, though limited by a 50GB size constraint without sharding.


Eleuther Discord

  • JAX vs. PyTorch/XLA: The TPU Showdown: The performance comparison of JAX and PyTorch/XLA on TPUs spurred debate over benchmarking nuances such as warmup times and blocking factors. The dramatic decline in GPT-3 training costs from $4.5M to an estimated $125K-$1M by 2024 was highlighted, considering TFLOP rates and GPU-hour pricing from various contributors, linking to a Databricks Blog Post.

  • Scaling and Teaching LLMs: In the research forum, the Chameleon model was noted for its strong performance in multimodal tasks, while Bitune promised improvements in zero-shot performance for LLMs (Bitune Paper). Discussions questioned the scalability of the JEPA model for AGI and critiqued RoPE’s context length limitations, referencing a relevant paper.

  • Emergent Features Puzzle LLM Enthusiasts: Tim Dettmers’ research on advanced quantization methods maintaining performance in transformer inference was linked, including his concept of emergent outliers, and its integration with Hugging Face via the bitsandbytes library. Discourse on emergent features coalescing around ideas of them being the ā€œDNAā€ of a model, driving discussions on its implications for phase transitions.

  • A Brief on Technical Tweaks & LM Evaluation: Within the lm-thunderdome, engineers covered practical tips for setting seeds in vllm models, retrieving the list of tasks with lm_eval --tasks list, and handling changes in BigBench task names that affect harnesses like Accelerate with memory issues. It was suggested to locate tasks by perusing the lm-eval/tasks folder for better organization.

  • A Call for Collaboration: An appeal was made for expanding the Open Empathic project, with a YouTube guide for contributing movie scenes and a link to the project shared. Further collaboration was encouraged, underlining the need for community efforts in enhancement.


LM Studio Discord

GPU Adventures: Engineers discussed challenges when loading small models onto GPUs, with some favoring models like llama3, mistral instruct, and cmdrib. Meanwhile, using lower quantizations, such as llamas q4, reportedly yielded better results than higher ones like q8 for certain applications, refuting the notion that ā€œbigger is always better.ā€

Next-Gen Models Incoming: An update in the model realm informed about the release of a 35B model, with testing to ensure LM Studio compatibility. Optimizations for different scales of models were a topic too, with a focus on Phi-3 small GGUFs and their efficiency.

Servers and Setups: Hardware discussions included leveraging distributed inference with llama.cpp and its recent RPC update, although quantized models aren’t supported yet. Experimental builds using clustered cheap PCs with RTX 4060 Ti 16GB for distributed model setups and possible network constraints were also explored.

Multilingual Cohesion Achieved: Cohere models now extend their prowess to 23 languages, as advertised with aya-23 quants available for download, but ROCm users must await an update to dive in.

Stable Diffusion Left Out: LM Studio clarified that it exclusively handles language models, excluding image generators like Stable Diffusion, alongside dealing with CUDA issues on older GPUs and promoting services like Julius AI to ease user experience woes.


CUDA MODE Discord

  • Gradient Norm Nuisance: Altering the batch size from 32 leads to a sudden spike in gradient norm, disrupting training. A pull request resolved this issue by preventing indexing overflow in the fused classifier.

  • Int4 and Uint4 Types Need Some TLC: A member flagged that many functions lack implementations for int4 and uint4 data types in PyTorch, with a discussion thread indicating limitations on type promotion and tensor operations.

  • Live Code Alert – Scan Algorithm in Spotlight: Izzat El Hajj will lead a live coding session on the Scan algorithm, vital for ML algorithms like Mamba, scheduled for <t:1716663600:F>, promising to be a technical deep dive for enthusiasts.

  • CUB Library Queries and CUDA Nuances: Members tapped into discussions ranging from the functioning of CUDA CUB library code to triggering tensor cores without cuBLAS or cuDNN, highlighting resources like NVIDIA’s CUTLASS GitHub repository and the NVIDIA PTX manual.

  • FineWeb Dataset Conundrum: Processing the FineWeb dataset can be a storage hog, hitting 70 GB on disk and gobbling up to 64 GB of RAM, hinting at a need for better optimization or more robust hardware configurations for data processing tasks.


Modular (Mojo šŸ”„) Discord

Python Libraries Cling to C Over Mojo: There’s a lively conversation about the feasibility and preparedness of porting Python libraries to Mojo, with concerns about pushing maintainers too hard given Mojo’s evolving API. Members discussed whether targeting C libraries might be a more immediate and practical endeavor.

Rust’s Security Appeal Doesn’t Rust Mojo’s Potential: Mojo is not slated to replace C, but the security benefits of Rust are influencing how engineers think about Mojo’s application in different scenarios. Ongoing discussions address concepts from Rust that could benefit Mojo developments.

Blazing Ahead With Nightly Mojo: BlazeSeq performance on MacOS using Night versions of Mojo shows promising similarity to Rust’s Needletail, fueling cross-platform efficiency discussions. Rapid nightly updates, noted in changelog, keep the community engaged with the evolving language.

Curiosity Sparks Over Modular Bot’s Machinery: Queries were raised about the underlying tech of ā€œModularBotā€, and although no specific model was referenced, the bot shared a colorful reply. Separately, the potential for ML model training and inference within Mojo was discussed, with mention of Max Engine as a numpy alternative, though no full-fledged training framework is on the horizon.

Compile-Time Confusion and Alignment Woes: Problems from aligning boolean values in memory to compile-time function issues are causing a stir among users, with workarounds and official bug reports highlighting the importance of community-driven troubleshooting.


OpenAI Discord

  • LaTeX Loyalist LLM: In the realm of formatting, users noted frustration with GPT’s strong inclination to default to LaTeX despite requests for Typst code, revealing preferences in coding syntax that the LLM seems to adhere to.

  • Microsoft Copilot+ vs. Leonardo Rivalry: Conversations in the community centered on the value of Microsoft Copilot+ PCs for creative tasks like ā€œsketch to image,ā€ while some members encouraged checking out Leonardo.ai for analogous capabilities.

  • A Thirst for Efficiency in AI: Concern was voiced over the environmental toll of AI, citing a Gizmodo article on the substantial water usage during the training of AI models, prompting discussions on the need for more eco-friendly AI practices.

  • Iteration Over Innovation: There was active dialogue on enhancing the performance of LLMs through iterative refinement, with references to projects like AutoGPT addressing iterations, despite the associated higher costs.

  • Intelligence Infusion Offer Overstated?: The guild pondered the plausibility and potential of embedding legal knowledge within ChatGPT, enough to consider a valuation at $650 million, though detailed perspectives on this bold assertion were limited.


LangChain AI Discord

LangChain CSV Agent Deep Dive: Engineers explored LangChain’s CSV agent within a SequentialChain and discussed how to customize output keys like csv_response. Challenges with SQL agents handling multi-table queries were mentioned, pointing towards token limits and LLM compatibility issues, with direction to GitHub for issues.

AI Showcases Gather Buzz: OranAITech tweeted their latest AI tech, while everything-ai v2.0.0 announced features including audio and video processing capabilities with a repository and documentation available.

Demystifying VisualAgents: Demonstrations of Visual Agents platform were shared via YouTube, revealing its potential to streamline SQL agent creation and building simple retrieval systems without coding, utilizing LangChain’s capabilities. Two specific videos showcased their workflows: SQL Agent and Simple Retrieval.

EDA GPT Impressions On Display: A demonstration of EDA GPT, including a five-minute overview video showcasing its various functions, was linked to via LOVO AI. The demo highlights the AI tool’s versatility.

Tutorial Teaser: A message in the tutorials channel provided a YouTube link to business24.ai’s content, although the context of its relevance was not disclosed.


LAION Discord

  • Piracy’s Not the Panacea: Despite a humorous suggestion that The Pirate Bay could become a haven for sharing AI model weights, skepticism among members arises, highlighting the potential for friendlier AI policy landscapes in other nations to prevail instead.

  • Japan Takes the AI High Road: Participants noted Japan’s encouraging position on AI development, referencing a paper shared via a tweet about creating new base diffusion models without the need for extensive pretraining, showcasing a strategy involving temporary disruption of model associations.

  • Poisoned Recovery Protocols Probed: A collaborative study, involving a poisoned model recovery method conducted by fal.ai, was mentioned, with findings expected to empirically substantiate the recovery approach. Reservations were expressed regarding the aesthetics of AI-generated imagery, specifically the ā€œhigh contrast lookā€ and artifacts presented by models like Mobius versus predecessors such as MJv6.

  • Claude Mappings Crack the Code: Anthropic’s research paper details the dissection of Claude 3 Sonnet’s neural landscape, which illustrates the manipulation of conceptual activations and can be read at their research page. Debates sparked over the potential commercialization of such activations, with a juxtaposed fear of the commercial implications driving AI practitioners to frustration.

  • A Nostalgic Look at AI’s Visual Visions: A member reminisced about the evolution from early AI visual models like Inception v1 to today’s sophisticated systems, recognizing DeepDream’s role in understanding neural functionality. Furthermore, the benefits of sparsity in neural networks were discussed, describing the use of L1 norm for sparsity and a typical 300 non-zero dimensions in high-dimensional layers.


LlamaIndex Discord

  • Meetup Alert: Limited Seats Available: Few spots remain for the upcoming LlamaIndex meetup scheduled for Tuesday, with enthusiasts encouraged to claim their spots quickly due to limited availability.

  • MultiOn Meets LlamaIndex for Task Automation: LlamaIndex has been coupled with MultiOn, an AI agents platform, facilitating task automation through a Chrome web browser acting on behalf of users; view the demo here.

  • RAGApp Launches for Code-Free RAG Chatbot Setup: The newly introduced RAGApp simplifies the deployment of RAG chatbots via a docker container, making it easily deployable on any cloud infrastructure, and it’s open-source; configure your model provider here.

  • Solving PDF Parsing Puzzles: The community endorses LlamaParse as a viable API for extracting data from PDFs, especially from tables and fields, leveraging the GPT-4o model for enhanced performance; challenges with Knowledge Graph Indexing were also a topic, highlighting the need for both manual and automated (through VectorStoreIndex) strategies.

  • PostgresML Joins Forces with LlamaIndex: Andy Singal shared insights on integrating PostgresML with LlamaIndex, detailing the collaboration in a Medium article, ā€œUnleashing the Power of PostgresML with LlamaIndex Integrationā€, receiving positive remarks from the community.


OpenRouter (Alex Atallah) Discord

  • Phi-3 Medium 128k Instruct Drops: OpenRouter unveiled Phi-3 Medium 128k Instruct, a powerful 14-billion parameter model, and invited users to review both the standard and free variants, and to participate in discussions on its effectiveness.

  • Wizard Model Gets a Magic Boost: The Wizard model has shown improvements, exhibiting more prompt and imaginative responses, yet attention is required to avoid repeated paragraphs.

  • Eyes on Phi-3 Vision and CogVLM2: Enthusiasm surges around Phi-3 Vision, with sharing of testing links like Phi-3 Vision, and suggestions to use CogVLM2 for vision-centric tasks found at CogVLM-CogAgent.

  • Automatic Llama 3 Prompt Transformation: It was clarified that prompts to Llama 3 models are automatically transformed through OpenRouter’s API, streamlining the process, but manual prompting remains as an alternative approach.

  • Gemini API Annoyances: Users reported issues with Gemini FLASH API, such as empty outputs and token drain, recognized as a model-centric problem. The emergence of Google’s daily API usage limits has piqued interest in how this might affect OpenRouter’s Gemini integration.


Latent Space Discord

  • Indexify Ignites Interest: The launch of Indexify, an open-source real-time data framework by Tensorlake, sparked discussions focusing on its ā€œstreaming ETLā€ capabilities and the challenges in creating sustainable open-source models. Concerns were raised about the adequacy of the extractors provided and their potential paths to monetization.

  • LLM Evaluation under the Microscope: A Hugging Face blog post about Large Language Model (LLM) evaluation practices, the importance of leaderboards, and meticulous non-regression testing caught the attention of members, emphasizing the critical role of such evaluations in AI developments.

  • AI’s Answer to Search Engine Manipulations: An incident involving website poisoning affecting Google’s AI-gathered overviews triggered discussions around security and data integrity, including workarounds through custom search engine browser bypasses as reported in a tweet by Mark Riedl.

  • AI Democratizing Development or Raising Reliability Questions?: GitHub CEO Thomas Dohmke’s TED Talk on AI’s role in simplifying coding provoked debates over its reliability despite AI-driven UX improvements that expedite problem-solving in the coding process.

  • Diversity Scholarships to Bridge Gaps: Engineers from diverse backgrounds who face financial barriers to attending the upcoming AI Engineer World’s Fair received a boost with the announcement of diversity scholarships. Interested applicants should furnish concise responses to the essay questions provided in the application form.


Interconnects (Nathan Lambert) Discord

  • Tax Tales Without Plastic: Nathan Lambert deciphered an invoice kerfuffle, realizing the rational behind tax billing sans credit card due to resale certificates.

  • Golden Gate AI Gets Attention: Experimentation by Anthropic AI led to ā€œGolden Gate Claude,ā€ an AI single-mindedly trained on the Golden Gate Bridge, creating buzz for its public interactivity at claude.ai.

  • Google’s AI Missteps: Google’s failure to harness feedback and premature deployment of AI models spurred discussion about the tech giant’s public relations challenges and product development woes.

  • Battling Dataset Misconceptions: Google’s AI team countered claims about using the LAION-5B dataset by putting forth that they utilize superior in-house datasets, as referenced in a recent tweet.

  • Nathan Shares Knowledge Nuggets: For AI aficionados, Nathan Lambert uploaded advanced CS224N lecture slides. Additionally, attendees were tipped off about an upcoming session recording, sans release date details.


OpenAccess AI Collective (axolotl) Discord

  • GQA Gains Traction in CMDR Models: Discussions revealed that Grouped Query Attention (GQA) is present in the ā€œcmdr+ā€ models but not in the basic ā€œcmdrā€ models, indicating an important distinction in their specifications.
  • VRAM Efficiency with Smart Attention: Engineers noted that while GQA doesn’t offer linear scaling, it represents an improved scaling method compared to exponential, affecting VRAM usage favorably.
  • Sample Packing Gets a Boost: A new GitHub pull request showcases a 3-4% efficiency improvement in sample packing, promising better resource management for distributed contexts, linked here.
  • Academic Achievement Acknowledged: A member’s co-authored journal article has been published in the Journal of the American Medical Informatics Association, highlighting the impact of high-quality, mixed-domain data on medical language models, with the article available here.
  • Community Cheers Scholarly Success: The community showed support for the peer’s published work through personal congratulatory messages, fostering a culture of recognition for academic contributions within the AI field.

OpenInterpreter Discord

SB-1047 Sparks Technical Turmoil: Engineers express deep concerns about the implications of SB-1047, dubbing it as detrimental to smaller AI players and likening the situation to regulatory capture observed in other industries.

Perplexity and Arc, Tools of the Trade Showcased: The community spotlighted tools aiding their workflows, sharing a Perplexity AI search on SB-1047 and the new ā€œCall Arcā€ feature of Arc Browser, which simplifies finding relevant answers online, with an informational link.

Install Issues Incite Inquiry: Users face issues with Typer library installation via pip, raising questions about whether steps in the setup process, such as poetry install before poetry run, were followed or if a virtual environment is being used.


Mozilla AI Discord

Twinny Takes Off as Virtual Co-Pilot: Developers are integrating Twinny with LM Studio to serve as a robust local AI code completion tool, with support for multiple llamafiles running on different ports.

Embedding Endpoint Enlightenment: The /v1/embeddings endpoint was clarified not to support image_data; instead, the /embedding endpoint should be used for images, as per pull request #4681.

Mac M2 Meets Its Match in continue.dev: A performance observation noted that continue.dev runs slower on a Mac M2 compared to an older Nvidia GPU when executed with llamafile.

Hugging Your Own LLMs: For those looking to build and train custom LLMs, the community recommended the use of HuggingFace Transformers for training, with the reminder that llamafile is designed for inference, not training.


Cohere Discord

  • Gratitude Echoes in the Server: A user expressed heartfelt thanks to the team, showcasing user appreciation for support or development work done by the team.
  • Curiosity About Upscaled Models: There’s buzz around whether a 104B version of a model will join the family tree, but no clear answers have been outlined yet.
  • Langchain Links Missing: Questions arose regarding the integration of Langchain with Cohere, with users seeking guidance on its current usability and implementation status.
  • Model Size Mysteries: Users are probing for clarity on whether the Aya model in the playground pertains to the 8B or 35B version, indicating importance in understanding model scales for application.
  • Error Troubleshooting Corner: Issues like a ValidationError with ContextualCompressionRetriever and a 403 Forbidden error signal active debugging and technical problem-solving among the engineers, serving as reminders of common challenges in AI development.

AI Stack Devs (Yoko Li) Discord

AI Comedy Night Hits the Right Notes: An AI-generated standup comedy piece shared by a user was met with positive surprise, indicating advancements in AI’s capability to mimic humor and perform entertainment.

Exploratory Queries on AI Applications: Curiosity about the extent of Ud.io’s functions was evident from a user’s query whether its capabilities go beyond generating comedy.

Sound Transformations Showcased: A user displayed the flexible audio alteration features of Suno by sharing an altered, demonic version of an original sound piece.

Eagerness for Audio Engineering Know-How: Interest was expressed in acquiring the skills to craft audio modifications like the ones demonstrated, a skill set valuable for an AI engineer with an interest in sound manipulation.

Concise Communication Preferred: A one-word reply ā€œNoā€ to a question highlighted a preference for succinct responses, perhaps reflecting an engineer’s desire for direct, no-nonsense communication.


MLOps @Chipro Discord

  • In Search of a Unified Event Tracker: A member has highlighted a pressing need for an event calendar compatible with Google Calendar to ensure no community events are overlooked. The absence of such a system is a noted concern within the community.

DiscoResearch Discord

  • New Dataset Announcement: A new dataset has been referenced by user datarevised, with a link to further details: DataPlusEngine Tweet.

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}