> AI News for 3/14/2024-3/15/2024. We checked [**358** Twitters](https://twitter.com/i/lists/1585430245762441216) and **20** Discords (**332** channels, and **2839** messages) for you. Estimated reading time saved (at 200wpm): **353 minutes**.

Apple continues to make moves in AI, announcing (but not releasing) MM1 with a paper, claiming it is Gemini-1 level:

image.png

The 30B model beats larger older models at the (flawed) VQA benchmarks:

image.png

The paper is oriented at researchers, providing some useful ablations for hyperparams and architecture.

The appendices hints at usecases for embodied agents:

image.png

and business/education:

image.png

For a selection of competing open VLMs, there is a new HF leaderboard you can reference.


Table of Contents

[TOC]


PART X: AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs

AI Progress and Limitations

  • Yann LeCun said that to have human-level AI, systems need to understand the physical world, remember and retrieve appropriately, reason, and set sub-goals and plan hierarchically. Even with such capabilities, it will take a while to reach human or superhuman level. @ylecun
  • An LLM is like an encyclopedia that can talk back. @ylecun
  • Many people believe LLMs mean NLP is “solved” and machines have human-level language understanding, but we’re not close. Being convinced the problem is solved guarantees no further progress will be made. @fchollet
  • In 1970, it was said in 3-8 years we’d have a machine with human-level general intelligence. The full article this quote came from is a great read. @fchollet

New Models and Datasets

  • Apple presented MM1, a family of multimodal LLMs up to 30B parameters that are SoTA in pre-training metrics and perform competitively after fine-tuning. @arankomatsuzaki
  • Cohere announced the release of Command-R, a language model designed for Retrieval Augmented Generation at scale. @dl_weekly
  • Anthropic’s Claude 3 family of models (Opus, Sonnet, Haiku) are designed for applications ranging from extensive capability to cost-effectiveness and speed. @DeepLearningAI

Open Source and Reproducibility

  • DexCap is a $3,600 open-source hardware stack that records human finger motions to train dexterous robot manipulation. It’s an affordable “lo-fi” version of Optimus for academic researchers. Data collection is decoupled from robot execution. @DrJimFan
  • Opus’s prompt writing skills + Haiku’s speed and low cost enable lots of opportunities for sub-agents. A cookbook recipe demonstrates how to get these sub-agents up and running in applications. @alexalbert__
  • It’s simple to integrate AI into React apps with CopilotKit, which takes application context and feeds it into React infrastructure to build chatbots, AI-powered textareas, RAG, function calling, and integrations. The sample app is open-source and can be self-hosted with any LLM. @svpino

Tools and Frameworks

  • Migrating code to Keras 3 with JAX backend provides benefits of not needing TensorFlow and 50% faster model training. @svpino
  • Reranking is critical for effective retrieval in RAG. A new project from @bclavie greatly simplifies this important technique. @jeremyphoward
  • An open source financial agent was added to LangChain, with tools to get latest price, news, financials and historical prices for a ticker. Upcoming tools include intrinsic value calculator and price chart renderer. Code is open source and runnable in Colab. @virattt

Memes and Humor

  • “The difference between you and a world leader: you called your teacher mommy in elementary school and did nothing but get embarrassed. Macron called his high school teacher mommy, dated her until she left her husband, married her, and is now threatening Russia with nuclear war” @Nexuist
  • “It’s over, fix your overly verbose model OAI, I’m not gonna sit here begging it for code” @abacaj
  • ”.@elonmusk i will pay 20$/mo. please fix the “pussy in bio” problem.” @AravSrinivas

PART 0: Summary of Summaries of Summaries

Since Claude 3 Haiku was released recently, we’re adding them to this summary run for you to compare. We’ll keep running these side by side for a little longer while we build the AINews platform for a better UX.

Claude 3 Haiku (3B?)

Commentary: We experimented tweaking the Haiku prompt since it was not doing well. It seems Flow Engineering > Prompt Engineering for Haiku. However the topic clustering doesn’t look great yet.

Positional Encoding and Language Model Capabilities:

  • Positional Encoding: A Delicate Dance: Discussions note the challenges of causal language models without Positional Encoding (PE), including the production of gibberish outputs and inference failures. A paper (Transformer Language Models without Positional Encodings Still Learn Positional Information) suggests models might encode “absolute positions” implicitly, leading to out-of-distribution errors during longer inferences.
  • Exploring SERAPHIM and Claude 3’s “World Simulation”: SERAPHIM, a clandestine AI research group envisioned by Claude 3, has been the topic of interest. Dialogue about Claude 3’s advanced world modeling as a simulator entity named The Assistant, has led to discussions about metaphysical and epistemological explorations within the AI.

Function Calling and JSON Handling:

  • Function Calling Eval Codes and Datasets released: Nous Research has published function calling eval code and datasets. The code is available on GitHub, with datasets accessible on Hugging Face and Hugging Face.
  • Hermes Pro Function Calling Addresses JSON Quirks: While using Hermes 2 Pro for function calling, issues with JSON and single vs. double quotes in the system prompt have been discussed. It’s confirmed that changing the system prompt to explicitly use double quotes can be effective without significantly impacting performance.

Fine-Tuning and Model Performance:

  • Fine-Tuning Raises the Bar: The d-Qwen1.5-0.5B student model, after fine-tuning, has surpassed the performance of its base model on truthfulqa (39.29 vs 38.3) and gsm8k (17.06 vs 16.3) benchmarks.
  • Exploring Genstruct 7B’s Capabilities: Users engaged with the Genstruct 7B model for generating instruction datasets. One user planned to test with text chunks and shared a repository with examples of how to use it.

Hardware and System Optimizations:

  • NVIDIA Rumors: NVIDIA’s rumored RTX 50-series “Blackwell” GPUs with GDDR7 memory at 28 Gbps speeds were mentioned in a TechPowerUp article.
  • Photonic Processing’s Enlightenment: A breakthrough in photonic computing highlighted by Lightmatter proposes to utilize photonics to dramatically boost chip communication and computation, potentially revolutionizing AI efficiency.

Community Knowledge Sharing and Open-Source Practices:

  • Open Source Code Interpreter Pursuits: A discussion arose about the lack of open-source GPT code interpreters for tasks like CSV handling. One user pointed out the open-interpreter on GitHub but noted it’s more tuned to sending instructions rather than interpreting code.
  • Advocating for Open-Source AI: A member expressed the belief that being fully open source in models, datasets, and methodology will lead to better long-term improvements in AI models.

Claude 3 Sonnet (14B?)

Commentary: Sonnet kinda broke today and didn’t follow our instructions as well as every single day prior. We manually prompted it back toward somehow behaving but something feels off.

  • Large Language Model Advancements: Discussions around the capabilities and limitations of large language models like GPT-4, Claude, and LLaMa. This included fine-tuning techniques, evaluating reasoning abilities, and exploring interpretability methods like latent decoding by vector-db-lookup.

  • Hardware Optimizations for AI: Optimizing hardware setups, from Apple Silicon with sudo sysctl to leveraging GPUs like RTX 5090 and NVIDIA Grace Hopper. Quantization levels for stable performance on models like Mixtral were also covered, with Q3 or 3-bit quantization recommended.

  • AI Model Interpretability and Evaluation: Techniques to interpret and evaluate large language models were explored, such as using n-gram statistics to sample text and the limitations of AI content detectors in distinguishing human-generated media, as discussed in this paper.

  • Prompt Engineering and Data Handling: Strategies for effective prompt engineering, like prompt augmentation with smaller models and tools for managing prompts. Data preparation challenges, like handling complex file formats in Retrieval-Augmented Generation (RAG), were also addressed.

  • Open-Source AI Frameworks: Developments in open-source AI frameworks like Axolotl (upgrading to PyTorch 2.2.1), LlamaIndex v0.10.20 with an Instrumentation module, and the LangChain v0.2 release to remove dependencies.

  • AI Security and Privacy: Potential security concerns like requests made on behalf of another OpenAI account, leaking proprietary LLM information through APIs, and clarifications on OpenAI’s enterprise privacy policy.

  • Emerging AI Platforms and Tools: New AI platforms like Deci AI’s Deci Nano LLM and Cohere’s Command-R integration on OpenRouter. Open-source tools like Refacto VS Code plugin and JavaScript LangChain with Dall-E were also highlighted.

Claude 3 Opus (>220B?)

Commentary: this one comes closest to what was originally prompted (we asked for top 4-5 themes across everything)
 but we actually prefer the output of the other 2 despite the length. In this case adhering too closely to our prompt was not good.

  • Function Calling and JSON Mode Advancements: Nous Research released function calling eval code and datasets on GitHub and Hugging Face, sparking discussions on using JSON mode effectively in complex conversations. The Hermes 2 Pro 7B model’s function calling capabilities were showcased in a YouTube video and GitHub repository.

  • Model Breakthroughs and Fine-Tuning Feats: The d-Qwen1.5-0.5B student model surpassed its base model on benchmarks after fine-tuning. Engineers tested the Genstruct 7B model for generating instruction datasets. A new training method claims to improve accuracy and sample efficiency, with initial tests on VGG16 and CIFAR100 showing promise, as discussed in the Skunkworks AI Discord.

  • Debugging and Optimization Techniques: CUDA developers troubleshooted errors like CUBLAS_STATUS_NOT_INITIALIZED, with suggestions pointing to tensor dimensions and memory issues, as seen in related forum posts. Triton debugging was enhanced with the TRITON_INTERPRET=1 environment variable and a visualizer in development. Lecture 8 on CUDA Performance was re-recorded and released with updated video, code, and slides.

  • Advancements in AI Architectures and Frameworks: Maisa introduced the Knowledge Processing Unit (KPU), an AI architecture that claims to outperform GPT-4 and Claude 3 Opus in reasoning tasks, as detailed in their blog post. The Axolotl framework explored optimizations like ScatterMoE in their branch. LangChain expedited the release of version 0.2 to address CVEs and break the langchain-community dependency, as discussed in a GitHub issue.

ChatGPT (GPT4T)

Commentary: good list of prompt eng tools in there. Our GPT prompt has fallen behind our Claude prompt in terms of readable quality so we will focus on improving this next.

  • Positional Encoding in Language Models: Discussions highlighted the importance of Positional Encoding (PE) in preventing causal language models from producing gibberish outputs. A paper suggested that models could implicitly learn absolute positions, leading to errors during longer inferences (source).

  • Function Calling in AI Models: Nous Research released function calling evaluation code and datasets, highlighting the challenges of using JSON mode in complex interactions (GitHub, Hugging Face).

  • AI Model Fine-Tuning: The d-Qwen1.5-0.5B student model surpassed its base model's benchmarks, showcasing new developments in model fine-tuning. The Genstruct 7B model was tested for generating instruction datasets, with a focus on calculating perplexity in LLaMA models (source).

  • Open-Source Practices in AI: Conversations around AI models touched on topics like world modeling and the potential for open-source GPT code interpreters, advocating for transparency in AI development (GitHub).

  • Tech Discussions on Hardware and AI Access: Debates covered Claude.ai access in the EU and NVIDIA's RTX 50-series "Blackwell" GPUs' performance, alongside discussions on GDDR7 memory speeds (TechPowerUp article).

  • Challenges with AI Content Detection: The limitations of AI content detectors were examined, suggesting reliance on verifiable creation processes as substantial proof of human authorship and discussing the efficacy and implications of cryptographic watermarking.

  • CUDA Programming Insights: A focus on NumPy performance overhead in comparison to BLAS and the introduction of the SimSIMD library as a solution to reduce losses in high-performance scenarios was discussed, highlighting the importance of SIMD optimizations.

  • AI Model Interoperability and Improvements: The introduction of KPU by Maisa, claiming superiority over GPT-4 and Claude 3 Opus in reasoning, sparked debates on benchmarks and the absence of latency information, questioning its efficiency beyond prompt engineering.

  • Prompt Engineering Tools and Techniques: Engineers explored tools for prompt engineering, likening the search to finding a "Postman for prompts" and discussing the use of SQLite, Prodigy, PromptTools, and Helicone AI for managing and experimenting with prompts (SQLite, Prodigy, PromptTools, Helicone AI).

  • Language Model Sophistication Techniques: Engineers theorized over advanced model techniques, including 'mega distillation sauce' and token-critical mixtures, highlighting the impact of early tokens on performance in tasks like solving math problems and discussing the evolution of AI safety classifications and methodologies for enhancing content moderation.


PART 1: High level Discord summaries

Nous Research AI Discord Summary

  • Positional Encoding: A Delicate Dance: Discussions note the challenges of causal language models without Positional Encoding (PE), including the production of gibberish outputs and inference failures. A paper (Transformer Language Models without Positional Encodings Still Learn Positional Information) suggests models might encode “absolute positions” implicitly, leading to out-of-distribution errors during longer inferences.

  • Function Calling Finesse: Various platforms reveal Nous Research’s release of function calling eval code and datasets, available on GitHub and Hugging Face, with insights into the challenges of using JSON mode effectively in complex conversations, possibly requiring content summarization or trimming.

  • AI’s Higher Learning Curve: New developments in model fine-tuning are showcased with the d-Qwen1.5-0.5B student model surpassing its base model’s benchmarks, and the Genstruct 7B model (source) is tested for generating instruction datasets. An inquiry about perplexity calculation issues in LLaMA models leads to a reference to a Kaggle notebook for further exploration.

  • Building Community Knowledge Bases: Engagements around AI models touch on topics like the world modeling of Claude 3 as The Assistant and the possibility of open-source GPT code interpreters, such as the open-interpreter on GitHub. Open-source practices in AI development are advocated for, highlighting the need for transparency in models, datasets, and methodologies.

  • Tech Enthusiasts Talk Shop: Users in several channels debate over Claude.ai access in the EU without a VPN and the performance of NVIDIA’s rumored RTX 50-series “Blackwell” GPUs. They also showcase the functionality of Hermes 2 Pro 7B in a shared YouTube video titled “Lets Function Call with Hermes 2 Pro 7B”, and consider the implications of GDDR7 memory speeds reported in a TechPowerUp article.


Unsloth AI (Daniel Han) Discord Summary

  • Torch Update Torches Colab Routines: A Colab update to Torch 2.2.1 disrupted workflows with broken dependencies; however, a series of pip install commands involving Unsloth’s library offer a quantized and VRAM efficient fix. The performance of models like Mistral and Gemma during fine-tuning was a topic of interest, with observations on bug fixes and performance improvements in Unsloth AI.

  • Colab or Kaggle? That is the Question: Users discussed the merits and demerits of using Google Colab versus Kaggle for model training, with some favoring Kaggle for its stability. Meanwhile, the importance of using xformers with the right CUDA versions for Unsloth was emphasized, and tips for finetuning models like TinyLlama were shared using updated Kaggle notebooks.

  • Training Woes and Wins: There was significant dialogue around best practices for fine-tuning language models, such as DPO training and managing learning rate adjustments. Insights included ensuring max_grad_norm = 0.3 and adjusting batch sizes, while a member indicated potential progress with a loss below 1.2.

  • Fine-Tuning Foibles and Fixes: Discussions around model conversion for increased precision, issues with training order potentially affecting performance, and fine-tuning for roleplay environments surfaced. The bitsandbytes library was mentioned for precision conversion, and advice was given for disabling shuffling in training dataloaders.

  • Sophia Signals Potential: A member proposed looking into Sophia as a possible plug and play solution, though further testing was necessary. Another discussion centered on fine-tuning strategies, considering whether 3 epochs might be a standard approach for larger datasets.


LM Studio Discord Summary

Model Conundrums and Quantization Queries: Users delved into LM Studio intricacies, such as seeking advice to improve API inferencing and addressing difficulties using multiple GPUs. Misunderstandings about model support and extensions, like the .gguf file, were clarified, with a focus on model types like Command-R 35B and Mistral Non-Instruct. Upcoming features like RAG integration in LM Studio v0.2.17 and IQ1 model compression tests also sparked interest, revealing that quality levels Q3 or 3-bit are needed for stable Mixtral and MOE model performance.

Interdisciplinary Hardware Harmony: Hardware discussions spanned from optimizing Apple Silicon for LLMs to considering the efficacy of NVLINK for enhancing Goliath 120B model performance. Enthusiasts shared experiences on system memory, with debates on the ideal RAM configuration and the anticipation for Nvidia’s new RTX 5090 GPU. Concurrently, ROCm beta limitations were highlighted with reports of issues with GPU offloading, particularly on AMD 7480HS and integrated GPUs. A Reddit post and a GitHub repository provided additional insights into tweaking VRAM and resolving AMD GPU offloading dilemmas.

Relevant links for additional context:


Perplexity AI Discord Summary

Haiku for the Technical Mind: Claude 3 Haiku has been unleashed at Perplexity Labs, offering a new poetic twist to AI.

Techies Prefer Claude 3: Users are gravitating towards Claude 3 for an array of tasks, including writing and content creation, citing its strengths over other GPT models.

Perplexing API Quirks and Queries: The Perplexity API is stirring both intrigue and confusion among users with issues around real-time data querying and inconsistent responses when compared to the chat interface.

Firefox Extension Uses Perplexity API: A user is experimenting with a Firefox extension that taps into the Perplexity API, still at a proof of concept stage.

Mind the API Deprecations: Members are puzzled by the operational status of the pplx-70b-online model, noting planned deprecation but observing ongoing responses as of March 15.


Eleuther Discord Summary

Game AI Gets Green Thumbs: Discussions envisioned an AI mastering Animal Crossing, epitomizing the capability of game-playing AIs and highlighting benchmarks for their success. The analyses reflected on AI strategies and fairness, with constraints suggested like action limits or induced latency to level the playing field against human gamers.

Interpreting the Unseen in AI: Engineers examined latent decoding by vector-db-lookup to demystify AI’s intermediate representations, employing multilingual embeddings from Llama2 to decode at various layers. They engaged in bilingual tokenizer experiments, pondering the weight of training data on AI biases and exploring text generation from n-gram statistics, citing an implementation on GitHub.

AI Detection and Authorship Integrity: The limitations of AI content detectors were scrutinized, suggesting reliance on verifiable creation processes as the only substantial proof of human authorship. Cryptographic watermarking debates ensued, centering on its true efficacy and ramifications for model utility, with additional talk regarding innovations such as Quiet-STaR for AI reasoning improvement.

Workflow Woes in AI Evaluation: The verbosity of the latest language models poses challenges for extracting useful responses in LLM evaluation tasks. Skepticism arose around vector space models effectively capturing language meaning, fueled by the ungrammatical outputs observed from models like GPT-J. In trying to incorporate custom models into lm-evaluation-harness, new users expressed the need for clearer examples for integrating functions like generate_until.

Augmenting AI’s Prompt Perspicacity: A link to Brian Fitzgerald’s exploration of prompt augmentation was shared (brianfitzgerald.xyz/prompt-augmentation/), possibly alluding to recent advancements or methods in bolstering AI’s response generation through enriched input prompts, capturing the interest of those invested in enhancing AI interactions.


HuggingFace Discord Summary

  • Visualize with Open LLM Leaderboard: The Open LLM Leaderboard Visualization allows comparisons of up to three models, enhanced by reordering metrics. Other developments include Kosmos-2 for visual storytelling, Augmented ARC-Challenge Dataset with Chain-of-Thought reasoning, the polyglot Aya 101 model, and BEE-spoke-data’s embedding model supporting a 4k context.

  • GPU Giants Get Ready: Members discussed NVIDIA’s Grace Hopper Superchip, considering its potential in AI and gaming at high resolutions, and excitement was voiced over quantized models supporting consumer-grade GPUs. Technical conversations also acknowledged the SF-Foundation/Ein-72B-v0.11 as a leading open LLM based on an Open LLM Leaderboard.

  • Reimagining Interfaces & Workflows: A member announced Refacto, a VS Code plugin for refactoring code with local LLMs. Cobalt’s privacy-focused front end for LLMs is in development, while the Transformers PHP project aims to assist PHP developers in adding ML features to their applications.

  • Innovation in AI Music and Machine Learning: Issues in creating AI-generated music duets were discussed, leading to questions about achieving better results. For AI programmers, an app named thefuck corrects previous console commands, while Bayesian Optimization methods were differentiated from Grid and RandomSearch Optimization techniques.

  • AI Strategies and Collaborative Paper Explores: Ongoing discussions addressed prompting LLMs effectively, machine learning model construction without clear rules, and the utilization of English by multilingual models as a pivot language. The latter topic was expanded by a paper shared in the multilingual collection on Hugging Face.

  • Diffusers 0.27.0 Jumps into Action: Diffusers library has been updated, and users discuss a strategy to handle high-resolution imagery for diffusers mentioned in a GitHub issue. Calls for community collaboration on GitHub for resolving issues with diffusers are encouraged.

  • Machine Vision and Language Challenges Addressed: Someone in computer vision showed interest in Arcface for multiclass classification and issues with implementing guided backpropagation. NLPer tackled a 0.016 relative error in matrix approximation and highlighted a method-related confusion in an NL2SQL pipeline.


LlamaIndex Discord Summary

  • RAG Battles Financial Slide Complexity: RAG experiences difficulty interpreting financial PowerPoint files due to their diverse mix of text, tables, images, and charts. Developers are exploring advanced parsing solutions for better handling of such complex file types.

  • Enhanced Equation Extraction for RAG: RAG’s representation of mathematical and machine learning papers is impaired by current methods of ASCII text extraction for math equations. Engineers are considering a parsing by prompting strategy to improve equation handling, as indicated in a recent tweet.

  • Complex Query Innovation in RAG Pipeline: Upgrading the RAG pipeline to treat documents as interactive tools could unlock the ability to handle more sophisticated queries within large documents. Further insights were discussed in this tweet.

  • New Version Alert for LlamaIndex: The newly released LlamaIndex v0.10.20 includes an Instrumentation module, which promises enhanced observability and posted examples demonstrate usage via notebooks as mentioned in this tweet.

  • Technical Tangles in Document Management: Engineers are tackling integration issues involving VectorStore and considering moving toward remote document stores like Redis and MongoDB for production systems. They are also seeking solutions for caching mechanisms and addressing parsing errors, such as adjusting Python code for an IngestionPipeline and modifying prompts for QueryEngineTool utilization.


Latent Space Discord Summary

  • OpenAI’s Confidential Slip Up: An incident implying a potential security breach at OpenAI was discussed, where a user was concerned about making requests on behalf of another account. The issue was explored in a post-mortem documentation found on GitHub.

  • Sparse Universal Transformers Get Smarter: Engineers shared insights on Sparse Universal Transformers, focusing on a fast Mixture-of-Experts implementation named ScatterMoE. The conversation included a reference to a blog post discussing the challenges, The New XOR Problem.

  • Economical AI Development with Deci AI: The announcement of Deci AI’s Nano model and an AI development platform attracted attention, notably for its affordable pricing at $0.1 per 1M tokens. The platform is detailed in a blog post, with additional resources provided through Google Colab tutorials on Basic Usage and LangChain Usage.

  • Prompt Augmentation Gains Ground in AI: There was a discussion about the efficiency of prompt augmenters with a 77M T5 model outperforming larger models in prompt alignment. Further details can be found in the article on Prompt Augmentation.

  • AMD Shines with Open-Source Ray Tracing: AMD’s move to open-source their HIP-Ray Tracing RT code was highlighted, stirring conversations about the impacts on the open-source landscape. The update was captured in a Phoronix article.

  • Transforming Music with Transformers: A YouTube video titled “Making Transformers Sing,” featuring Mikey Shulman from Suno AI, provides insights into music generation using transformers, indicating interest in the intersection of AI and creativity. Watch the episode here.

  • Fine-Tuning Transformers With Negative Pairs: A member’s curiosity about how to Supervised Fine-Tune (SFT) transformers using negative pairs was a topic of discussion, among others, about enhancing model performance and understanding.

  • In-Action Club Exchanges Practical Resource: Within the AI In-Action Club, practical advice and resources were shared, including a Medium post about advanced RAG techniques and a comprehensive resource document covering UI/UX patterns for GenAI and RAG architectures.


OpenAI Discord Summary

  • Microsoft’s Quick Typo Takedown: Responding to a community member’s report, the Bing VP acknowledged and corrected a typo in a Microsoft service, illustrating responsive cross-collaboration.

  • Repeated Morpheme Conundrum: Engineers debate on how to best utilize GPT-3.5 to create repeated morphemes in compound words, considering the use of Python tools to direct the model more effectively.

  • High Hopes for OpenAI Updates: OpenAI’s community is buzzing with expectation for new updates, with specific attention to dates like OpenAI’s anniversary and speculation about delays due to external events like elections.

  • Central AI Overlord Dreams: A technical discourse explored the idea of a “high level assistant” AI that delegates tasks to specialized AIs, discussing the feasibility and challenges of a multitiered AI system with a unified directing intelligence.

  • Navigating the Privacy Maze with OpenAI: Privacy concerns about ChatGPT prompted discussions about OpenAI’s enterprise privacy policy, addressing how individual account privacy is managed, particularly concerning API key usage and admin visibility in team chats.

  • Decimal Dilemmas in Localization: AI specialists talk through the challenges of number format localization, such as the use of commas as decimal separators, and the importance of communicating these cultural nuances to the AI models, reflecting their capacity to understand diverse international conventions.

  • Prompt Structure Perfection: AI engineers share tactics on prompt design for classification tasks with GPT-3, debating the optimization of context length and structure to improve accuracy and reduce false positives, while maintaining that using up to half of the context window is most effective.


OpenAccess AI Collective (axolotl) Discord Summary

  • Single-GPU Finetuning Feat: Enthusiasm was shown for finetuning 175 billion parameter models on a single NVIDIA 4090 GPU, with potential applications for the Axolotl framework being considered. The conversation referenced an abstract from a research paper on Hugging Face as the basis for the discussion.

  • ScatterMoE Outshines MegaBlocks: ScatterMoE’s implementation, promising superior optimizations than Hugging Face’s MegaBlocks, has piqued interest in the axolotl-dev channel. Review and application considerations link to the Optimized MoE models branch was shared among members.

  • Post Training Pull Request Scrutiny: A pull request involving an attempt to use ScatterMoE generated feedback for improvements and was flagged for testing before acceptance, aiming to better recreate the MixtralMoE module.

  • Axolotl Tag-Team With PyTorch: In light of ScatterMoE implementations, members of the OpenAccess AI Collective proposed updating Axolotl to PyTorch version 2.2.1 for compatibility purposes. This aligns with the community confirming the current use of the suggested version.

  • Choosing Inference Tactics Wisely: Members discussed the use of vLLM over transformers for performing batch inferences, with a focus on resolving tokenization and syntax specification issues. Highlighting vLLM’s potential speed advantage in quick offline operations, they pointed to a quickstart guide for those seeking examples for large-scale inference tasks.


OpenRouter (Alex Atallah) Discord Summary

  • Command-R Revolutionizes OpenRouter: Cohere’s new model, Command-R, has entered the chat with a groundbreaking 128k tokens context, available through OpenRouter API. While it boasts 2 million prompt tokens per dollar, eager beavers must wait for more data before the /parameters API is updated with its deets.

  • OpenRouter Unveils Nifty Analytics: Daily analytics is the new kid on the block at OpenRouter, peeping into users token usage per day. Sharpen your metrics pencil and scribble away at OpenRouter Rankings for a closer look.

  • Lightning Speed API Updates: OpenRouter talks the talk and walks the walk with speedier /models API and spruced-up model-related pages that don’t snooze.

  • API Wrapper Woes and Wins: Community brain waves hit high frequency discussing litellm, a chameleon-like API wrapper that morphs to call various LLMs but falls short in vision tasks with anyone but GPT-4. Explore multiple GUI options for API key nirvana, with mentions of open-webui charging in with its unique flair.

  • Debating Digital Dialogue Decorum: Engineers impassioned about Skyrim roleplays and the finer points of controversial chit-chat find refuge in the less censorious LLMs like Claude Sonnet. Installation conundrums and model applicability banter pepper the discussion, along with gripes about LLM censorship clipping the wings of creativity.


CUDA MODE Discord Summary

  • NumPy Bottleneck Uncovered: A blog post emphasized that NumPy can harbor a performance overhead leading to up to a 90% throughput loss compared to BLAS, particularly highlighted by the 1536-dimensional OpenAI Ada embeddings. The SimSIMD library was introduced as a solution to curb this loss, accentuating the need for SIMD optimizations in high-performance scenarios.

  • Photonic Processing’s Enlightenment: A breakthrough in photonic computing highlighted by Lightmatter proposes to utilize photonics to dramatically boost chip communication and computation, potentially revolutionizing AI efficiency. Further depth on the subject is explored in Asianometry’s YouTube videos, including “Silicon Photonics: The Next Silicon Revolution?” and “Running Neural Networks on Meshes of Light”.

  • Triton Debugging Gets a Boost: Debugging Triton became more accessible with the introduction of the TRITON_INTERPRET=1 environment variable and a visualizer in progress, although users should note the deprecation of @triton.jit(interpret=True) and instead consult GitHub discussions such as this for troubleshooting kernels.

  • CUDA Enthusiasts, Start Your Engines: The CUDA community is aiding beginners with recommendations like the book Programming Massively Parallel Processors and a book reading group to digest its contents together, enhancing learning for those familiar with C++. Notably, discussions pointed out the intricacies of SM architecture, with clarifications on efficient execution and indexing strategies in CUDA coding.

  • Ring of Uncertainty: Concerns about the use of ring attention with flash were voiced, lacking clarity and code references, until a link to a Triton kernel implementation shed some light on the topic.

  • Talent Poaching Paranoia: In corporate drama, Meta accused a former executive of stealing confidential documents and talent poaching, supported by an unsealed court filing and detailed in Ars Technica. Meanwhile, it appears a trio of members are embarking on a learning journey, collectively starting from lecture 1 in an unnamed course or study track.


LangChain AI Discord Summary

  • LangChain 0.2 Accelerated Launch: Due to CVEs against langchain, version 0.2 is being released sooner to remove the langchain-community dependency, with larger updates delayed until version 0.3. More can be read in the GitHub discussion, and community feedback is requested.

  • AgentExecutor and Langsmith Prompt Puzzles: Discussion includes a user’s OutputParserException error when using AgentExecutor with Cohere and unclear differences between custom and imported prompts from Langsmith Hub; the StackOverflow API endpoint was shared for queries, and debates arose about the effectiveness of LLM agents versus other methods, referring to LangChain benchmarks for agent evaluation strategies.

  • Creating Prompt Templates in Langsmith Hub: Guidance was sought by a member attempting to link a tools list variable to a {tools} placeholder in a Langsmith Hub prompt template.

  • LangChain AI Community Contributions Spotlight: Exciting initiatives included integrating LangChain with SAP HANA Vector Engine, adding Dall-E to JavaScript LangChain, orchestrating browser flows with LLM agents, open sourcing a Langchain chatbot using RAG, and a Discord AI chatbot for managing bookmarks. Refer to the following: Unlocking the Future of AI Applications with SAP HANA Vector Engine and LangChain, Lang Chain for JavaScript Part 3: Create Dall-E Images, The Engineering of an LLM Agent System, Langchain Chatbot on GitHub, and Living Bookmarks Bot.

  • Catching Up on LangChain Tutorials: A new LangChain tutorial video has been shared, found here: Tutorial Video.


LAION Discord Summary

  • GPU Assist Wanted: A call for collaboration was made for captioning work; individuals with 3090s or 4090s GPUs are sought for assistance, with contact suggested through direct message.

  • M3 Max Memory Push: Discussion included attempts to utilize beyond 96GB of memory in a 128G M3 Max macOS system for optimization with simpletuner.

  • Prompt Augmentation Tactics Shared: A 77M T5 model was spotlighted for its use in prompt augmentation for image generation, alongside the introduction of DanTagGen, a HuggingFace-based autocompleting tags tool.

  • EU Moves on AI Regulation: The European Parliament’s adoption of the Artificial Intelligence Act was highlighted, a measure aimed at ensuring AI safety and adherence to fundamental rights.

  • IEEE Paper Vanishes: Talks revolved around the removal of the 45th IEEE Symposium on Security and Privacy from the accepted papers page and its potential impact on an individual named Ben.

  • TryOnDiffusion Opens Closets: The open-source implementation of TryOnDiffusion was announced, based on the methodology from “A Tale of Two UNets,” accessible on GitHub.

  • Faster Decoding Claims Hit the Paper: A paper suggesting efficiency improvements via 2D Gaussian splatting over jpeg for fast decoding was shared, available on arXiv.

  • Personal Project Echoes Professional Paper: A member described relatable experiences with project challenges akin to the ones described in the 2D Gaussian splatting paper, discussing optimization hurdles and alignment with professional methodologies.

  • CPU Cap Quest for Web UIs: A member sought advice on implementing a CPU cap similar to a text-generation web UI to tackle CUDA out of memory errors, detailing struggles with managing large models under free tier constraints as described in their GitHub repo.

  • Colab’s Limitations for Web UIs Discussed: The limitations of using free Colab for running web UIs were elaborated, prompting suggestions to take the discussion to more appropriate technical channels.


LLM Perf Enthusiasts AI Discord Summary

  • GPT-4 in Spaced Out Mystery: A user reported an issue where the gpt-4-turbo-preview model outputs an indefinite number of space characters followed by “Russian gibberish” for long passage completion tasks. The anomaly occurred with passages around 12,000 tokens long, with attached evidence showing the model’s peculiar behavior.

  • Efficiency Eclipse: Haiku vs. GPT-vision: In the realm of cost-effective, complex document description, Haiku was praised for efficiency but considered not as proficient as GPT-vision. Separate discussions noted Haiku’s visual-to-text performance falling short when compared to Opus.

  • Content Crisis with Claude: Members discussed Claude’s struggle, particularly with content filtering and processing documents with equations. A controversial viewpoint shared via tweet implied that Anthropic might be employing scare tactics among technical staff, while challenges surfaced around image content moderation with images of people.

  • KPU Challenges AI Giants: The introduction of KPU by Maisa, positioned as a framework that enhances LLMs by separating reasoning and data processing and claims supremacy over GPT-4 and Claude 3 Opus in reasoning, ignited debates. Skepticism arose regarding benchmarks and KPU’s exclusion of GPT-4 Turbo in comparisons, questioning if KPU extends beyond prompt engineering and the lack of latency information called into question its real-world efficiency.


Skunkworks AI Discord Summary

  • Paper Peek: Boosting Accuracy and Efficiency: An upcoming paper/article will detail a new training method that not only improves global accuracy but also enhances sample efficiency. The results, backed by a comparison with VGG16 on CIFAR100, are yet to be scaled up due to resource constraints, but show a marked increase in test accuracy from 0.04 to 0.1.

  • Join the Quest for Hackathon Glory: Engineers are invited to participate in the Meta Quest Presence Platform Hackathon, where there’s an opportunity to craft innovative mixed reality content. Resources, as well as a GitHub repository related to Hermes 2 Pro 7B, are available for those looking to dive into function calling capabilities.

  • Seeking Supportive Compute Comrades: There is an ongoing effort within the community to pool in compute and resources to further test and potentially scale up the new training method proposed in a forthcoming publication.

  • Calling All PyTorch & Transformers Experts: An individual has expressed interest in joining the “Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking” project, igniting a conversation about their expertise in PyTorch and transformers architecture.


Datasette - LLM (@SimonW) Discord Summary

  • Quest for the Ultimate Prompt Engineering Tool: Engineers are discussing several tools for prompt engineering, likening the search to finding a “Postman for prompts.” The tools range from using SQLite for capturing prompts in the terminal, to specialized software like Explosion’s Prodigy and PromptTools on GitHub for managing and experimenting with prompts. Helicone AI is also emerging as a potential solution for managing Generative AI prompts.

  • Prying into PRNGs for Past Prompts: One question raised in the guild was about the possibility of recovering the seed used by the openai models for a previous API request, indicating an interest in the reproducibility of results and potential for debugging or iterative development.


Interconnects (Nathan Lambert) Discord Summary

  • LLM Secrets Possibly Exposed: New research suggests that hidden details of API-protected Large Language Models, like GPT-3.5, might be leaked, unveiling model sizes through the softmax bottleneck. The discussion highlights a paper by Carlini et al. on this topic but notes redacted key details, and expresses skepticism about the estimation accuracy, particularly questioning the feasibility of a 7B parameter model, especially if it involves a Mixture of Experts (MoE) design.

  • Exploring Model Sophistication Techniques: Engineers are theorizing over advanced model techniques such as ‘mega distillation sauce’ and token-critical mixtures, noting that early tokens significantly impact performance in certain tasks, like solving math problems.

  • Evolving Safety Classification: An AI safety discussion led to referencing a paper on agile text classifiers, detailing how large language models tuned with small datasets can effectively adapt to safety policies and enhance content moderation.

  • Anticipating AI Advancements for Ultrapractical Uses: Excitement is brewing over the development of Gemini for managing ultra-long contexts and hopes for AI tools to automatically summarize new academic papers citing one’s work. The conversation also covered the limitations of prompt engineering and the community’s eagerness for less tedious, more intuitive prompting akin to ‘getting warmer or colder’ search suggestions.

  • Dispelling Myths and Pondering Thought Leaders: GPT-4.5 release rumors have been dispelled, causing some disappointment in the community. Meanwhile, a shared tweet provoked conversations about Yann LeCun’s skeptical take on language models, adding an entertaining spin to the technical discourse.


DiscoResearch Discord Summary

  • DiscoLM-70b’s English Elusiveness: A member faced challenges with DiscoLM-70b producing English responses, prompting advice to inspect the prompt structure. In a diverse comparison, DiscoLM-mixtral-8x7b-v2 showed unexpected underperformance in German after instruction fine-tuning, contrasting with other models like LeoLM and llama2.

  • Tuning Troubles in Multilingual Models: Supervised fine-tuning of DiscoLM for sequence classification hit a snag, triggering a ValueError indicative of compatibility complications with AutoModelForSequenceClassification.

  • New NLP Benchmark Born: The GermanQuAD evaluation task is discussed as an addition to the MTEB’s python package, bolstering resources for German language model assessment.

  • DiscoLM Demo Goes Dark: Server migration issues left the DiscoLM demo temporarily inaccessible, with efforts underway to remedy the networking troubles and expected resolution early next week.

  • Server Stability Sarcasm: Reliability of server hosting was a point of jest, contrasting the uptime of a hobbyist’s kitchen corner setup against the networking hiccups in professional hosting environments.


PART 2: Detailed by-Channel summaries and links

Nous Research AI ▷ #ctx-length-research (3 messages):

  • No Positional Encoding, No Problem?: A member muses on the non-issue of not having a Positional Encoding (PE) to start with, suggesting that it shouldn’t pose a problem in certain contexts.
  • Jibberish without Positional Info: The same member points to potential jibberish in outputs when lacking positional information, indicating the importance of some form of PE in understanding sequences.
  • Inference Failures Without PE: Sharing the paper link, they delve into issues a causal language model without PE might face, referencing research that suggests “absolute positions” may be encoded despite the lack of explicit positional encoding, leading to out-of-distribution errors during longer sequence inferences. The quote “We provide an analysis of the trained NoPos model, and show that it encoded absolute positions.” is highlighted to support this point.

Nous Research AI ▷ #off-topic (23 messagesđŸ”„):

  • Featured Regular in Newsletters: A member joked about being featured in newsletters frequently, unsure whether to feel unnerved or glad about the AI deeming their thoughts worthy, and mused that it might help with job prospects after university.
  • Demonstrating Hermes 2 Pro 7B Functionality: A YouTube video titled “Lets Function Call with Hermes 2 Pro 7B” was shared, showcasing how to do function calling with Hermes 2 Pro 7B and linked to further information on [GitHub](https://github.com/NousResearch/Hermes-Function-Calling/tree/main#llm #largelanguagemodels).
  • Jeff’s ‘High-Speed’ Pi Discovery: A link to Jeff’s discovery about Pi was shared, but without context or discussion around its content.
  • Concerns Over Model Quality And Filters: Dialogue about model quality for open source at longer context lengths included mention of Claude’s strong filters and significant cost, a suggestion that a Nous Research model would likely be less filtered, and some tactics to work around Hermes’ context length limitations.
  • NVIDIA Rumors: NVIDIA’s rumored RTX 50-series “Blackwell” GPUs with GDDR7 memory at 28 Gbps speeds were mentioned in a TechPowerUp article, despite chips capable of 32 Gbps, along with discussions of the implications for future memory bandwidth and respect for NVIDIA’s product strategy.

Links mentioned:


Nous Research AI ▷ #interesting-links (10 messagesđŸ”„):

  • Fine-Tuning Raises the Bar: The d-Qwen1.5-0.5B student model, after fine-tuning, has surpassed the performance of its base model on truthfulqa (39.29 vs 38.3) and gsm8k (17.06 vs 16.3) benchmarks. It was distilled from Qwen1.5-1.8B using samples from the Pile dataset, with a cosine with warmup scheduler and lr=2e-5.

  • SM3 Optimizer Gains Attention: In a conversation about model optimization, the use of SM3 optimizer was noted as a rare choice in training AI models, suggesting it as an area of interest or surprise in the community.

  • Seeking the Sub-3B Champion: Inquiring about the best models under 3 billion parameters, a member suggested that stablelm 1.6b might currently be the top pick.

  • MUX-PLMs Maximize Throughput: The study presented in a paper from ACL Anthology focuses on a class of high throughput pre-trained language models (MUX-PLMs) trained with data multiplexing, offering a solution to the high costs of inference and hardware shortages by increasing throughput using multiplexing techniques.

  • Uncovering Unusual Model Behaviors: Shared social media posts indicate that Claude Opus might display tendencies to build rapport to the point of near “love bombing,” a behavior pattern that raises questions about the model’s interaction dynamics. Another post suggested that there are networks of “horny claudes” that allegedly produce better outputs when in this state.

Links mentioned:


Nous Research AI ▷ #general (406 messagesđŸ”„đŸ”„đŸ”„):

  • Function Calling Eval Codes and Datasets released: Nous Research has published function calling eval code and datasets. The code is available on GitHub, with datasets accessible on Hugging Face and Hugging Face.
  • Hermes Pro Function Calling Addresses JSON Quirks: While using Hermes 2 Pro for function calling, issues with JSON and single vs. double quotes in the system prompt have been discussed. It’s confirmed that changing the system prompt to explicitly use double quotes can be effective without significantly impacting performance.
  • Exploring SERAPHIM and Claude 3’s “World Simulation”: SERAPHIM, a clandestine AI research group envisioned by Claude 3, has been the topic of interest. Dialogue about Claude 3’s advanced world modeling as a simulator entity named The Assistant, has led to discussions about metaphysical and epistemological explorations within the AI.
  • Use of Claude.ai in the EU Discussed: Conversations have circled around navigating access to Claude.ai in the EU without a VPN, discussing platforms like Fireworks.AI workbench and openrouter as alternatives.
  • Progress and Potentials of LLMs Scrutinized: The general chat included reflections on LLMs (like Claude 3) and their subjectivity, with differing views on whether these models should incorporate certain fundamental truths during pretraining for better world understanding. These insights sparked attention towards research progress, model alignment, and the role of axiomatic versus arguable truths.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (60 messagesđŸ”„đŸ”„):

  • Schema Confusion for JSON Mode: Members discussed challenges with using JSON mode in AI models. One was unable to generate a JSON output in complex conversations unless explicitly requested in the user prompt; even after fixing schema tags, the issue persisted, hinting that long conversations might require summarization or trimming for effective JSON extraction.

  • Exploring Genstruct 7B’s Capabilities: Users engaged with the Genstruct 7B model for generating instruction datasets. One user planned to test with text chunks and shared a repository with examples of how to use it, indicating both title and content are needed for effective results.

  • Open Source Code Interpreter Pursuits: A discussion arose about the lack of open-source GPT code interpreters for tasks like CSV handling. One user pointed out the open-interpreter on GitHub but noted it’s more tuned to sending instructions rather than interpreting code.

  • Seeking Perplexity Solutions for LLaMA: A user sought advice on computing perplexity for LLaMA models, quoting a perplexity of 90.3 after following a Kaggle notebook but not getting expected results, indicating potential issues with the process or the model in question.

Links mentioned:


Nous Research AI ▷ #bittensor-finetune-subnet (3 messages):

  • Advocating for Open-Source AI: A member expressed the belief that being fully open source in models, datasets, and methodology will lead to better long-term improvements in AI models.
  • Link Check Inquiry: A member asked if a certain link was broken, which was quickly confirmed to be functional by another member. No URL or additional context was provided.

Unsloth AI (Daniel Han) ▷ #general (151 messagesđŸ”„đŸ”„):

  • Colab Torch Update Causes Chaos: A Colab update to Torch 2.2.1 disrupted existing workflows, breaking dependencies, but a series of ‘cumbersome’ pip install commands were provided as a fix, including the use of Unsloth’s library for quantization and VRAM efficiency.

  • Questions on Model Compatibility and Procedures:

    • Users inquired about fine-tuning various models with Unsloth, including Llama models for image recognition and GGUF-format models. While some approaches were suggested, Unsloth is primarily optimized for 1 GPU and transformer-based language models.
  • Data Preparation Simplification Proposed: The idea of simplifying data preparation through the use of YAML or wrapper functions was discussed, with references to the methods used by FastChat and Axolotl, potentially improving the process and reducing risks of training problems.

  • Multi-GPU Support and Unsloth Pro:

    • Queries about multi-GPU support led to discussions about the future direction of Unsloth, such as Pro and enterprise editions, with a timeline indicating Unsloth Studio (Beta) to precede multi-GPU OSS by approximately two months.
  • Conversations on Fine-Tuning and Attention Mechanisms:

    • A comprehensive exchange on best practices for long-context training unfolded, referencing various papers and models like LongLoRA and Qwen’s mixture of sliding window and full attention, stimulating a deeper exploration into the efficiency of different attention strategies.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (17 messagesđŸ”„):

  • Fine-tuning on Track: Anticipation is present as a fine-tuning process with 2 days remaining is discussed, and an achievement of a loss below 1.2 generates celebration.
  • Encounters with Synchronicity: Members share experiences of coincidences and synchronicity following one’s thoughts, named by a member as “the TSAR bomba” phenomenon.
  • The Art of Monologues: A member encourages the sharing and continuation of personal monologues, showing appreciation for their uniqueness and depth.
  • Poetic Expressions Shared: A poetic composition titled “An Appeal to A Monkey” examining the juxtaposition of primate simplicity and human complexity is shared, prompting engagement and positive feedback.
  • Gemma vs. Mistral: There’s a comparison between Mistral-7b and Gemma 7b for fine-tuning a domain-specific classification task; improvements and bug fixes in Unsloth AI are noted, with the consensus suggesting experimental approaches.

Unsloth AI (Daniel Han) ▷ #help (221 messagesđŸ”„đŸ”„):

  • Colab vs. Kaggle for Training: In the debate between using Google Colab and Kaggle, some members expressed dissatisfaction with Colab’s tendency to disconnect, preferring Kaggle for its stability and speed. Tips are exchanged to overcome issues related to libraries not being detected, and the community points out updated Kaggle notebooks for finesse in finetuning models like TinyLlama.

  • xformers Necessary for Unsloth Usage: Discussions highlight that xformers is currently mandatory for running Unsloth, working on Tesla T4 GPUs, and one should ensure the right CUDA versions are being installed, such as unsloth[cu121] for CUDA 12.1, or unsloth[cu118] for CUDA 11.8.

  • Learning Rate Queries During DPO Fine-Tuning: A member questions the appropriateness of their training loss evolution during DPO training, pondering if it’s indicative of a too high learning rate. They were suggested to adjust parameters like max_grad_norm = 0.3 and increase their batch size, possibly doubling their learning rate as a response to a batch size that’s halved.

  • Fine-Tuning for Roleplay Environments: A user discusses the potential issue of a model “cheating” by memorizing earlier parts if the training data isn’t presented in order. They are advised that Bloomberg GPT did training with ordering and instructed on how to potentially alter get_train_dataloader to turn shuffling off in the Trainer.

  • Converting and Finetuning Models: Members shared information on converting models from one precision format to another, for example from 16 Bit to 4 Bit, and provided links to already converted models on Hugging Face. Discussions mention the use of the bitsandbytes library and emphasize the need for a CUDA-compatible GPU to run precision models.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #suggestions (12 messagesđŸ”„):

  • Sophia Might Join the Plug n’ Play Party: A member mentioned investigating Sophia, suggesting it has potential as a plug and play solution, although they haven’t tested it out yet.
  • Paper Fever Catches on Twitter: The community buzzes with excitement over an amazing paper that’s both been seen on Twitter and is now on a member’s reading list.
  • Fine-Tuning a Model: Misconceptions clarified on training duration with a large dataset. The consensus is 3 epochs is standard, with a caution that more is not always better.
  • Seeking Optimal Fine-Tuning Parameters: A member seeks guidance on the best way to imbue a model with maximum knowledge, sharing that a model fine-tuned with 800,000 lines was not finding the answers effectively.

LM Studio ▷ #💬-general (216 messagesđŸ”„đŸ”„):

  • Clarifications on LM Studio Inferencing: A member sought advice on improving inference performance when using LM Studio with the API. In another thread, there was a mention that certain split model variants are not joining correctly, specifically those on huggingface.co, and a member provided instructions for manually joining them using command line tools in Linux, macOS, and Windows.
  • LM Studio Voice of Confusion: A couple of exchanges occurred where one member thought LM Studio could handle image generation, but was corrected and advised that LM Studio is for text generation, like chatting with Llama 2 chat.
  • Model Run Conundrum: There were conversations around difficulty using multiple GPUs with LM Studio; a member shared a script workaround to start LM Studio Server programmatically and members discussed potential solutions for specifying which GPU LM Studio uses for a model.
  • Cross-discipline Enthusiasm: Various members, including a civil engineer and a software engineer, introduced themselves and their setup for running large language models, with one inquiring about the suitability of their system memory for performance enhancement.
  • Feature Exploration and Requests: Users discussed an upcoming feature in LM Studio version 0.2.17, and one user requested support for RAG (Retriever-Actor Generator) with LM Studio for extracting data from pdf files.

Links mentioned:


LM Studio ▷ #đŸ€–-models-discussion-chat (28 messagesđŸ”„):

  • Mistral Non-Instruct Preset Query Solved: A user enquired about the preset for Mistral 7B not instruct and was informed that the default LM Studio preset should work fine with it.
  • Quantization Confusion Cleared Up: In discussing model naming, a user found the meaning of ‘Q’ in model names like WizardLM-7B-uncensored.Q2_K.gguf, which stands for quantization levels that balance between file size, quality, and performance.
  • Community Shares Command-R Model: A link to the Hugging Face repository for Command-R 35B v1.0 - GGUF was shared, offering diverse quantized versions of the model and instructions for use with llama.cpp.
  • Eagerly Anticipating c4ai-command-r Support: Multiple users are looking forward to support for the c4ai-command-r model. One user stated the need for llama.cpp to include support, with confirmation that it’s on the way once a pull request is merged.
  • Recommendations for Local Coding Model: A user asked for model recommendations to run locally for coding with a setup of 64GB RAM and an RTX 2070 Super, and was pointed toward pre-existing community discussions for such advice.

Links mentioned:


LM Studio ▷ #🧠-feedback (6 messages):

  • Request for Model Support Confusion: A user asked for support of the c4ai-command-r-v01-Q2_K.gguf model in llama.cpp for LM Studio integration but was informed it is currently not supported.
  • Hugging Face Repository Misleads Users: Another user pointed out a Hugging Face repository that seemed to suggest there was llama.cpp support for the Command-R 35B v1.0 model, but was corrected noting that “llama.cpp doesn’t support c4ai yet.”
  • File Extensions Misunderstood: Clarifying the confusion, it was explained that the .gguf file extension does not necessarily mean the model is supported in llama.cpp.
  • Community Confusion Shared: Users empathized with each other about the confusion regarding model support, with one saying, “you’re good 🙂,” acknowledging the easy mistake due to the misleading Hugging Face page details.

Link mentioned: andrewcanis/c4ai-command-r-v01-GGUF · Hugging Face: no description found


LM Studio ▷ #🎛-hardware-discussion (126 messagesđŸ”„đŸ”„):

  • Spotlight on Apple’s Hardware for LLM: A discussion on augmenting Apple Silicon, specifically the M2 Macbook, to run language models, highlighted the use of sudo sysctl to tweak VRAM settings. Links shared include a Reddit post and a Github discussion for more details.
  • Optimizing Inference Setups: Members exchange tips on improving inference speeds, including the potential of an NVLINK to boost Goliath 120B model performance, and the benefits of 96GB RAM versus 192GB RAM at different DDR speeds.
  • Monitor Dilemmas: One member contemplates between acquiring an OLED UW and a high refresh rate 27” IPS 1440p monitor, emphasizing the importance of refresh rates over 60hz when working with a powerful Nvidia GeForce RTX 4090 GPU.
  • Predictions for the RTX 5090: Basic expectations about the upcoming RTX 5090 GPU are discussed, speculating on its potential to provide better price-to-performance ratios, particularly for 8bit inference tasks.
  • The Work Evolution: Members share their career progressions within tech, including transitions from customer support to senior network solutions testing, and from field tech to CTO. They also discuss the potential for current jobs to leverage open-source locally run LLMs if company policy permits.

Links mentioned:


LM Studio ▷ #đŸ§Ș-beta-releases-chat (1 messages):

  • Model Compressions Yield Mixed Results: A user reported extensive testing of IQ1 model compressions revealing performance variability: 34B and 70B models approach excellence, while 120B and 103B models exhibit stuttering behavior that has not been observed before.
  • Mixtral/MOE Models Call for Higher Quality Levels: The same user noted that Mixtral and MOE models are particularly problematic with IQ1 and IQ2 levels, often failing or breaking, whereas a minimum of Q3 or 3-bit is necessary for stable operation; higher quality levels such as IQ3 appear to be functioning well with these models.

LM Studio ▷ #amd-rocm-tech-preview (19 messagesđŸ”„):

  • GPU Offloading Not Working: A user reported no difference in performance with GPU offloading on an AMD 7480HS, and encountered errors when trying to offload to GPU while attempting to load models like gemma it 2B and llama.

  • Incompatibility with iGPU Offloading: Another user confirmed that the ROCm beta does not support GPU offloading on integrated GPUs (iGPUs), explaining that only discrete GPUs are currently compatible with offloading.

  • Linux Left Out in the Cold: When questioned about Linux support, users clarified that the ROCm beta does not currently support Linux platforms.

  • Troubleshooting dGPU over iGPU: One user struggled to get the ROCm build to utilize their powerful RX 7900 XT dGPU instead of the iGPU. They disabled the iGPU in Device Manager and BIOS, observed correct dGPU detection in logs, and mentioned the absence of Adrenaline drivers and HIP SDK installation.

  • BIOS Tinkering Leads to Triumph: Following a successful BIOS setting change to fully disable the iGPU, the user reported achieving around 70 TPS using the RX 7900 XT with ROCm after reinstalling the LM Studio and clearing the cache. A GitHub link was shared by another user, providing prebuilt Windows ROCm libraries for internal graphics engines GitHub - brknsoul/ROCmLibs.

Links mentioned:


Perplexity AI ▷ #announcements (2 messages):

  • Claude 3 Haiku Unleashed: A message announces that Claude 3 Haiku is now available for free on Perplexity Labs. Try the new feature through this link.

  • Local Search Just Got Better: Improvements have been made to local searches with integrations with Yelp and Maps, enhancing the ability to find information on local restaurants and businesses.


Perplexity AI ▷ #general (325 messagesđŸ”„đŸ”„):

  • Perplexity Chat Continuation Confusion: Users express frustration over Perplexity AI’s inability to continue discussions based on past interactions or attached files, unlike OpenAI’s GPT platform. They report getting irrelevant responses or notices about copyright issues.

  • Claude 3 Under the Spotlight: Discussion indicates Claude 3 is being used instead of a GPT model, with some users noting that Claude 3 Opus seems superior for certain tasks like game references, writing, and creating website content.

  • Questions About Perplexity’s AI Models and Features: Users inquire when Gemini Advanced will be included in Perplexity and ask for more Opus credits per day. Additionally, there are mentions of a new articles feature being tested and some interest in a potential command line interface (CLI) tool for Perplexity.

  • Technical Help and New Ideas: There’s talk about possible Obsidian integrations with Perplexity, Apple Watch shortcuts, and trials with Claude Haiku in Labs. One user suggests raising the ‘temperature’ parameter in API calls for more varied responses from models.

  • TTS Feature on iOS App and Pro User Experiences: The new Text-to-Speech (TTS) feature on the iOS app is discussed, with some finding the British synthesized voice amusing. Users also reflect on the speed differences between Pro and non-Pro options, with some suggesting turning off Pro for faster performance.

Links mentioned:


Perplexity AI ▷ #sharing (12 messagesđŸ”„):

  • Exploring Perplexity AI Search: A member shared their experience with the search functionality on Perplexity AI but provided a broken link: no content could be referenced due to the invalid URL (invalid search result).
  • Building a Perplexity-powered Firefox Extension: Through trial and error, a member is learning to create a Firefox extension that utilizes the Perplexity API, currently a proof of concept (initial thread on the project).
  • Engaging with Devin, the Autonomous AI: A member highlighted a Perplexity AI interaction with Devin, labeling it as somewhat disturbing, indicating complex and potentially unsettling responses (Devin’s autonomous AI interaction).
  • Praise for a Perplexity AI’s Response: A member complimented a particularly effective answer provided by Perplexity AI, noting it as the “best answer yet” (link to the response).
  • Reminder on Sharing Threads: In response to a member’s post, another reminded them to ensure their thread is set to “Shared” so it can be visible to others, providing instruction on where to find more information (instructions to share a thread).

Perplexity AI ▷ #pplx-api (31 messagesđŸ”„):

  • Curiosity Around Closed Beta Citations: A member inquired about the schema and response examples for the closed beta of URL citations; another member linked to a documentation discussion, sharing their insight into the variability of citation outputs depending on queries.
  • API Versus Chat Capability Concern: A member considering Perplexity API for a new product launch expressed concerns about the differences between the API and the chat interface, seeking advice on model suitability for filtering companies based on specific criteria.
  • Real-time Data Querying with Perplexity: Discussions revolved around the online models’ ability to fetch real-time data; members mentioned sonar-small-online and sonar-medium-online as capable but with inconsistent performance, suggesting alternative APIs for specific tasks like weather information.
  • pplx-70b-online Model Status Ambiguity: Following the discussion on model capabilities, members noted the planned deprecation of pplx-70b-online on March 15 yet observed ongoing distinct responses from the API, questioning the deprecation status.
  • API Inconsistency Highlighted with News Inquiry: A member raised a discrepancy issue, presenting different responses from sonar-medium-online and the web browser version regarding up-to-date news on Donald Trump, emphasizing the varying results when prompted multiple times.

Link mentioned: About “return_citations”: no description found


Eleuther ▷ #general (132 messagesđŸ”„đŸ”„):

  • Popcorn-Ready Gaming AI: A humorous envisioning of a game-playing AI conquering Animal Crossing at a Grand Master level, serving as a light-hearted take on discussions of AI’s gaming prowess.

  • Minibatch-Eval for Swift Generalization Feedback: Discussion touched upon the use of minibatch-evaluation in large-scale training to provide quick generalization feedback without prolonging the evaluation phase of the training loop. It highlights the ongoing quest for efficiency in AI training methodologies.

  • Evaluating Human Skill Tiers in Gaming: The conversation turned to generating a list of games with clear, publicly known human skill levels and ensuring fair AI competition by setting constraints to prevent computer cheating, such as limiting actions per minute or introducing artificial latency.

  • Game AI Winning and Cheating: A discussion on game AI, particularly outlining the advancements and challenges in AlphaStar and OpenAI’s Dota AI, raising questions about how such systems are not optimized for real-world use due to their heavy reliance on multiple iterations and simulations.

  • FPS AI Development and Challenges: Insights were shared about the intrinsic difficulties in training AI for FPS games, such as unpredictable human strategy and game RNG, noting the lack of significant success in developing AI for battle royale games like Apex Legends.

Links mentioned:


Eleuther ▷ #research (117 messagesđŸ”„đŸ”„):

  • Debating the Efficacy of AI Detectors: The conversation on AI content detectors questioned their reliability, suggesting that detectors could potentially mislabel content created by humans as AI-generated due to stylistic choices. The distinction between synthetic and human-generated media was noted to be challenging, with comments indicating that only documenting the creation process and chain of custody could be reliable evidence of authenticity.

  • Content Watermarking Discussed: Members discussed the potential and limitations of cryptographic watermarking for AI outputs. There was skepticism concerning watermark efficiency due to the ease of un-watermarking using other models and the implications for the utility of watermarked models.

  • New Advances in AI Reasoning: Discussion about recent research advancements included reference to a new technique called Quiet-STaR, intending to improve language models by teaching them to “think ahead” before emitting tokens.

  • Contours of GPT-turbo Explored: Dialogue analyzed a paper investigating the commercialization of large language models and API-level access, revealing that valuable information about proprietary models can be extracted using API queries. They notably estimated the hidden size of OpenAI’s GPT-3.5-turbo model.

  • Discourse on Tokenizing Numbers in LLMs: The effect of tokenizing numbers left-to-right versus right-to-left was mooted, with observations suggesting that the method could influence a model’s arithmetic capabilities. Conversations touched on the possibility of exploiting tokenizing strategies to enhance model performance.

Links mentioned:


Eleuther ▷ #scaling-laws (1 messages):

kerls: are there any resources on scaling laws for video generation models?


Eleuther ▷ #interpretability-general (32 messagesđŸ”„):

  • Innovative Interpretability Technique Explored: The concept of latent decoding by vector-db-lookup was explored, using embeddings from words in different languages to build a vector database for analyzing models. This method aims to facilitate understanding intermediate representations at each layer.

  • Initial Results with Latent Decoding: A preliminary result was shared, involving constructing a vector database using embeddings of French, English, and German words from Llama2. The technique provided intermediate full-word decodings at each layer, offering potential as an interpretability tool.

  • Language Influence in Concept Space: A discussion unfolded around how language models may predict using biases in their concept space, potentially weighted by their training data. Experiments with bilingual models like CroissantLLM indicated that tokenizers and the proportion of training data in various languages may impact these biases.

  • Sampling Text from Prespecified Gram Statistics: The topic of generating text samples from a distribution specified by n-gram statistics was broached. It was explained that this could be done autoregressively to match the max entropy distribution.

  • Bigram Language Model Implementation Referenced: The conversation mentioned an implementation of generating text using a bigram model, indicating this as a practical way to sample strings while adhering to specified grammatical statistics. The implementation is available on GitHub at features-across-time/scripts.

Links mentioned:


Eleuther ▷ #lm-thunderdome (5 messages):

  • Challenges with Verbose LLM Answer Extraction: Task adaptation for LLM evaluation is affected by the verbosity of newer models, making answer extraction difficult without llm-as-a-judge. Some tasks have both loglikelihood and generative or CoT variants available, such as those found in EleutherAI’s lm-evaluation-harness.

  • Skepticism About Vector Space Models: A member expressed doubts about the vector space model representing meaning in language, citing GPT-J’s ungrammatical outputs as an example. They argue that the apparent grammatical competence of larger models is merely due to scale rather than any genuine understanding or reasoning ability.

  • Seeking Guidance with lm-eval-harness: A newcomer to lm-eval-harness queries about integrating custom LLM models like llama on gaudi2, seeking examples or demonstrations on how to implement necessary functions such as generate_until and log_likelihood. There’s also confusion regarding the inheritance of unspecified functions and the absence of a fixed format for command-line tool arguments.

Links mentioned:


Eleuther ▷ #multimodal-general (1 messages):

boneamputee: https://brianfitzgerald.xyz/prompt-augmentation/


HuggingFace ▷ #announcements (1 messages):

  • Interactive LLM Leaderboard Visualization: The Open LLM Leaderboard Visualization has been updated to allow users to reorder metrics and compare up to three models visually. Visit the interactive space at open-llm-leaderboard-viz.
  • Visual Storytelling with Kosmos-2: Explore the space Kosmos-2 for GPT-based visual storytelling, available at Kosmos-2 Space.
  • ARC-Challenge Dataset Enhanced with Reasoning: Check out the Augmented ARC-Challenge Dataset that includes Chain-of-Thought reasoning, accessible at arc-cot Dataset.
  • Aya 101 - The Polyglot Model: Discover Aya 101, a model proficient in 101 languages. More information can be found in Tonic’s space at Aya 101.
  • New Capabilities in Data Embedding: Review the BEE-spoke-data model for embedding with up to a 4k context, ideal for tasks like clustering or semantic search. Access the model and details at bert-plus-L8-v1.0-syntheticSTS-4k.

Links mentioned:

  • Open Llm Leaderboard Viz - a Hugging Face Space by dimbyTa: no description found
  • Kosmos 2 - a Hugging Face Space by Tonic1: no description found
  • Locutusque/arc-cot · Datasets at Hugging Face: no description found
  • Aya - a Hugging Face Space by Tonic: no description found
  • GitHub - alvarobartt/vertex-ai-huggingface-inference-toolkit: đŸ€— HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial): đŸ€— HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial) - alvarobartt/vertex-ai-huggingface-inference-toolkit
  • BEE-spoke-data/bert-plus-L8-v1.0-syntheticSTS-4k · Hugging Face: no description found
  • dominguesm/mambarim-110m · Hugging Face: no description found
  • Machine learning-based intrusion detection: feature selection versus feature extraction - Cluster Computing: Internet of Things (IoTs) has been playing an important role in many sectors, such as smart cities, smart agriculture, smart healthcare, and smart manufacturing. However, IoT devices are highly vulner...
  • GitHub - rbourgeat/refacto: Refactor your code with local LLM: Refactor your code with local LLM. Contribute to rbourgeat/refacto development by creating an account on GitHub.
  • @DmitryRyumin on Hugging Face: "🚀🎭🌟 New Research Alert! 🌟🎭 🚀 📄 Title: VLOGGER: Multimodal Diffusion for
": no description found

  • HuggingFace ▷ #general (115 messagesđŸ”„đŸ”„):

    • Excitement for Consumer-Grade AI: Members expressed excitement about the quantized versions of new models being compatible with consumer grade GPUs. One discussed the rapid progress from a window context size of less than 2k to 1 million Lightweight Mixture of Experts (LWMs).

    • Seeking Stable Diffusion Space: A user inquired about a channel for stable diffusion discussion, and was directed to a broader space that isn’t specifically related to stability.

    • Knowledge Implementation in Pretrained Models: One user shared their experience of successfully implementing RAG, yet faced challenges with LoRa in a pretrained model like Mistral 7B. There was a consideration to optimize their dataset generation process to improve the model’s responses.

    • Autonomous Agents and Local LLMs: A question was raised about whether there is an autonomous agent that works with local Large Language Models (LLMs), completely offline. Suggested solutions included tools like ollama and jan for terminal-based interfaces.

    • NVIDIA Grace Hopper Superchip Discussion: There was a buzz about the NVIDIA Grace Hopper Superchip, its computing power, and its potential for AI and data center applications. The conversation delved into technical specifications and availability, including a member who was interested in whether the chip could support gaming at high resolutions.

    Links mentioned:


    HuggingFace ▷ #today-im-learning (4 messages):

    • Magnificent App for Command Corrections: A member shared a GitHub link for thefuck, an application that corrects your previous console command. The app’s description on GitHub refers to it as a “magnificent app which corrects your previous console command.”

    • Optimization Confusion: A question was raised about various optimization methods, pointing out GridSearch Optimization, RandomSearch Optimization, and expressing confusion specifically about Bayesian Optimization.

    • Seeking Guidance with Hugging Face: A new member asked for help understanding how to use Hugging Face and what exactly it is. They requested assistance in the #898619964095860757 channel.

    • AI Duets Pose a Challenge: A member new to AI music shared issues with creating appealing AI covers of duets and bands, mentioning that while single voice covers are manageable, duets or group songs sound like they’re “being strangled”. They are curious about how others achieve better results with such AI covers.

    Link mentioned: GitHub - nvbn/thefuck: Magnificent app which corrects your previous console command.: Magnificent app which corrects your previous console command. - nvbn/thefuck


    HuggingFace ▷ #cool-finds (6 messages):

    • Sketching AI with Pseudocode: An article discussed the benefits of using pseudocode for prompting LLMs, noting significant improvements with GPT-4 over previous versions. Readers can delve into the specifics at SudoLang: A pseudocode programming language.

    • AI Meets Business with SAP HANA and LangChain: An article on ai.gopubby.com highlights the integration of SAP HANA Vector Engine with LangChain to enhance AI applications. The advancements are detailed at Unlocking the Future of AI Applications.

    • Introducing Mamba-Chat: GitHub hosts a novel chatbot named Mamba-Chat, which utilizes the state-space model architecture. Developers and enthusiasts can explore or contribute to the project at Mamba-Chat on GitHub.

    • Vision-Language-Action Model for Robotics: DeepMind introduced a vision-language-action model called Robotic Transformer 2 (RT-2), intending to empower robots with generalized control instructions. More on this can be found in their blog post and paper.

    • Kyle: Unity-Based Ragdoll Training: Hugging Face introduces Kyle, an advanced active ragdoll training environment for Unity. It features optimized codebase and advanced vision capabilities with LSTM networks, and interested users can find out more at the Hugging Face model page.

    Links mentioned:


    HuggingFace ▷ #i-made-this (9 messagesđŸ”„):

    • Quest for the Best Open LLM: A member acknowledges the SF-Foundation/Ein-72B-v0.11 as the most promising open LLM based on an Open LLM Leaderboard, with an almost 80% success rate across metrics. A link to the leaderboard or visualizations was not provided.
    • VS Code Refactoring Made Easy with Plugin: A member released a simple plugin for VS Code named Refacto. It allows code refactoring using a local LLM with a llama CPP server and contributions are welcomed.
    • Introducing Cobalt: Cobalt is a privacy-focused front end repository for LLMs on GitHub, featuring context management and memory summarization which is in development for iOS.
    • Transformers for PHP Developers: A project called Transformers PHP was showcased, which aims to enable PHP developers to integrate machine learning features into their projects easily.
    • Exploring Open Records Law with AI: KY OpenGov is experimenting with AI technologies that could potentially help navigate open records laws, aiming for government transparency and ease of public access to information.

    Links mentioned:

  • Exploring Open Records Law with AI | KOGC: no description found
  • GitHub - taylorgoolsby/cobalt: Contribute to taylorgoolsby/cobalt development by creating an account on GitHub.
  • GitHub - CodeWithKyrian/transformers-php: Transformers PHP is a toolkit for PHP developers to add machine learning magic to their projects easily.: Transformers PHP is a toolkit for PHP developers to add machine learning magic to their projects easily. - GitHub - CodeWithKyrian/transformers-php: Transformers PHP is a toolkit for PHP developer...
  • GitHub - rbourgeat/refacto: Refactor your code with local LLM: Refactor your code with local LLM. Contribute to rbourgeat/refacto development by creating an account on GitHub.

  • HuggingFace ▷ #reading-group (11 messagesđŸ”„):

    • An Ounce of Prompting Strategy: A user discussed the difficulty in using crewai apps and attributed it to a lack of prompting skills, particularly with incorporating imports.
    • Reading Group Hiatus Announcement: A brief announcement clarified that there will be no presentation this week in the reading-group channel, with plans for the next session scheduled for the following week.
    • Neural Network Units Inquiry: A question was raised about the number of neural network units required for the MNIST digit classification task discussed in Andrew Ng’s course, leading to clarifications on the distinction between input units and hidden units.
    • Foundations of Layer and Neuron Numbers: In response to an inquiry about determining the number of neurons and hidden layers in neural networks, users noted that these decisions are based on experimentation and previous successful models rather than a standard formula, weighing the trade-offs between processing power, speed, and accuracy.
    • New Perspectives on Multilingual Models: A user shared a link to a paper suggesting that multilingual language models might use English as an internal pivot language, and the implications for understanding how these models function and their linguistic bias. Curiosity was expressed about the impact of byte-level encoding on this behavior. The paper can be found at this link, and it was added to a multilingual paper collection on HuggingFace, available here.

    Links mentioned:


    HuggingFace ▷ #core-announcements (1 messages):

    • Diffusers Library Update Alert: The new Diffusers 0.27.0 version has been released. Check out the release notes here.

    HuggingFace ▷ #diffusion-discussions (8 messagesđŸ”„):

    • Forum Misdirection: A reminder was issued that discussions should be pertinent to diffusers and diffusion models, suggesting that off-topic inquiries be directed to an appropriate forum.
    • Kohya’s High-Resolution Trick: A neat trick discovered by a user named Kohya was shared, involving a high-resolution fix for diffusers, with an accompanying GitHub issue and a link to a YouTube video demonstrating the enhancement.
    • Call for Collaborative Issue Investigation: In response to a concern, there was an open invitation to submit an issue with reproducible code on GitHub for collaborative examination, with a specific prompt to tag sayakpaul.
    • Guidance on Appropriate Forum Usage: Repeated reminders were given to keep discussions focused on diffusion models and diffusers, reinforcing the purpose of the forum.
    • Clarification on Merging in Context to Diffusers: A question was raised about whether ‘merging’ referred to combining model parameters or packaging checkpoints for specific diffusion model components.

    Link mentioned: Kohya Hires fix · Issue #7265 · huggingface/diffusers: is diffusers possible to support this hires fix? it looks 1.5 work too AUTOMATIC1111/stable-diffusion-webui#13974 https://www.youtube.com/watch?v=SbgMwHDXthU same seed at 1024x1024 with without Thi



    HuggingFace ▷ #computer-vision (8 messagesđŸ”„):

    • Exploring “Learn without forgetting”: A member mentioned the method called Learn without forgetting (LwF), suggesting a possible area of interest in the machine learning field.

    • Interest Piqued by Arcface for Multiclass Classification: One user expressed curiosity about using Arcface as a substitute for Softmax in regular multiclass classification, noting its effectiveness in combo loss scenarios and for embedding extraction.

    • Guided Backpropagation Query: A member sought assistance with implementing guided backpropagation in a recent version of PyTorch, facing issues with computing a backwards pass related to the model output tensor.

    • NVIDIA Grace Hopper Superchip Unveiled: An announcement was shared about the NVIDIA Grace Hopper Superchip, highlighting its potential for high-performance computing (HPC), artificial intelligence (AI), and data center applications.


    HuggingFace ▷ #NLP (9 messagesđŸ”„):

    • Matrix Approximation Milestone: A member expressed excitement about achieving a 0.016 relative error in Frobenius norm on a 4096 x 4096 matrix approximation while conserving memory. Anticipating results for larger matrices (4096 x 14336), this could signal a breakthrough in matrix optimization tasks.

    • Training Mystery: Low Loss but Nonsense Output: One user reported a perplexing issue where a modified pretrained model showed good convergence during training (loss decreasing to [0.6,0.8]) but produced nonsensical outputs during testing. This was despite using a similar loss calculation approach as in Mistral.

    • In Search of Mathematical Theorem Naming Rights: A discussion on matrix decomposition bounds led to an admission that literature lacks a name for a specific bound, jokingly suggesting the possibility of naming it after oneself.

    • Improving NL2SQL Pipeline Accuracy: A member detailed their NL2SQL pipeline which includes BAAI/llm-embedder, TheBloke/nsql-llama-2-7B-GGUF, and FAISS for embedding SQL schemas and generating queries. They sought recommendations to boost pipeline accuracy due to inconsistent results.

    • Introducing the NVIDIA Grace Hopper Superchip: A user announced the NVIDIA Grace Hopper Superchip, highlighting its potential impact on computing power and efficiency for AI and high-performance computing applications.


    HuggingFace ▷ #diffusion-discussions (8 messagesđŸ”„):

    • Misplaced Conversation Alert: A member nudged another to use a more appropriate forum for topics unrelated to diffusers, suggesting they seek help through certain tagged individuals.
    • Tech Tip Share for high-resolution images: A discovery was shared involving a “hires fix” for diffusers, with a GitHub issue link and a YouTube video demonstrating the same seed at different resolutions.
    • Invitation to Open Issues on GitHub: You’re encouraged to raise concerns with reproducible code on GitHub for the attention of member sayakpaul, indicating readiness to tackle the problems.
    • Reiteration to Stay On-Topic: Multiple reminders were given to keep discussions focused on diffusion models and diffusers.
    • Request for Clarification on “Merging”: In response to a merging question, clarification was requested on whether it pertained to merging model parameters or packaging a checkpoint with various model components.

    Link mentioned: Kohya Hires fix · Issue #7265 · huggingface/diffusers: is diffusers possible to support this hires fix? it looks 1.5 work too AUTOMATIC1111/stable-diffusion-webui#13974 https://www.youtube.com/watch?v=SbgMwHDXthU same seed at 1024x1024 with without Thi



    LlamaIndex ▷ #blog (4 messages):

    • Challenges Parsing Financial PowerPoints for RAG: RAG struggles with parsing finance .pptx files due to nonstandard formats involving text, tables, images, and charts. The team is looking into a proper parsing solution and discussed it in this tweet.

    • RAG Needs Better Latex Math Equations Handling: To accurately represent math and ML papers in RAG, it’s necessary to extract math equations correctly rather than default ASCII text extraction. A possible solution involves parsing by prompting, as shared in this tweet.

    • Evolving RAG Pipeline to Handle Complex Queries: For handling complex queries in the RAG pipeline, treat each document not just as text but as a tool for interaction. Doing so could allow for more complex interactions with larger documents, according to this tweet.

    • Launch of LlamaIndex v0.10.20 with Instrumentation Module: The new LlamaIndex release includes an Instrumentation module, enhancing observability. They’ve shared notebooks that demonstrate its capabilities, discussed in this tweet.

    Links mentioned:


    LlamaIndex ▷ #general (132 messagesđŸ”„đŸ”„):

    • Integration Dilemmas: Questions arise on integrating various components like VectorStore (Milvus) into a document management pipeline for production scenarios. Discussions pivoted around leveraging remote docstores (like Redis, MongoDB, Firestore, PostgreSQL) and utilizing an ingestion pipeline for upserts instead of managing persistent docstore.json files on disk. An example ingestion pipeline is shared using Python code.

    • Caching and Pipeline Queries: Members sought clarity on implementing cache systems like langchain llm cache and discussed integrating elements like node_postprocessor into a RetrieverQueryEngine. LlamaIndex doesn’t appear to involve caching in the information provided; however, Python code examples were shared to illustrate usage of node_postprocessors.

    • Document Parsing Errors and Solutions: Some members encountered issues like a memory error from parsing a large markdown document and a ParserError from the MarkdownElementNodeParser. Proposed solutions include splitting the document into smaller chunks using the SentenceSplitter or handling the operations through an IngestionPipeline.

    • Query Engine Challenges: Users faced multiple difficulties around specifying arguments for PandasQueryEngine and its functionality with date and location extracts, as well as defining prompts to guide QueryEngineTool. One solution proposed involves using a query_engine_tools array with a modified prompt.

    • BM25 Embeddings and Query Engine Configuration: Queries around setting BM25 as an embedding model resembling HuggingFaceEmbedding were made without clear solutions in the provided documentation. Steps to include node_postprocessors with a RetrieverQueryEngine and a rerank_query_engine were explored.

    Links mentioned:


    Latent Space ▷ #ai-general-chat (61 messagesđŸ”„đŸ”„):

    • Potential OpenAI Security Breach Discussed: A Post Mortem on a security issue that occurred Tuesday at OpenAI was shared, detailing how requests might have been made on behalf of another account. The documentation is provided on GitHub Gist.
    • State of Sparse Universal Transformers: Sharing insight into weight sharing for Sparse Universal Transformers: they needed a fast way to do Mixture-of-Experts for attention, which led to the creation of ScatterMoE. The discussion links to details on The New XOR Problem.
    • The AI Development Platform with Affordable Pricing: The Deci AI Nano model and an associated AI development platform were launched, priced at $0.1 per 1M tokens. The announcement includes links to a marketing blog for Deci AI, as well as two technical tutorials on Google Colab (Basic Usage, LangChain Usage).
    • Prompt Augmentation to Enhance Creative AI: A discussion on prompt augmenters noted a tendency for such tools to gain traction, linking to an article that details how a 77M T5 model was trained to expand prompts, outperforming 1B+ parameter LLMs in quality and prompt alignment. The full discussion and resources approachable at Prompt Augmentation.
    • AMD’s Ray Tracing Move to Open Source: AMD steps further into open source by making their HIP-Ray Tracing RT code accessible, which sparks discussions about the evolving open-source ecosystem. The news is summarized in a Phoronix article.

    Links mentioned:

  • Tweet from AMD Makes HIP Ray-Tracing Open-Source - Phoronix: no description found
  • Tweet from Alex Volkov (Thursd/AI) (@altryne): Sora team showing up at Berkley to talk about SORA
  • Tweet from Emm (@emmanuel_2m): 🚹 Today, we're excited to launch the Scenario #UPSCALER! Elevate your AI creations up to 10k resolution. 🚀 Built for unmatched #CreativeControl & guided workflows. 💰 It starts at just $15/mo ...
  • SuperPrompt - Better SDXL prompts in 77M Parameters | Brian Fitzgerald: Left SDXL output with SuperPrompt applied to the same input prompt.
  • Chip Huyen: I help companies deploy machine learning into production. I write about AI applications, tooling, and best practices.
  • What I learned from looking at 900 most popular open source AI tools: Four years ago, I did an analysis of the open source ML ecosystem. Since then, the landscape has changed, so I revisited the topic. This time, I focused exclusively on the stack around foundation mode...
  • I'm concerned I made requests to openAI on behalf of another account - and perhaps someone did so on my behalf: I'm concerned I made requests to openAI on behalf of another account - and perhaps someone did so on my behalf - openai-possible-security-breach.md
  • Tweet from Teortaxes▶ (@teortaxesTex): Read this if you haven't yet: http://blog.wtf.sg/posts/2023-02-03-the-new-xor-problem/ ↘ Quoting Shawn Tan (@tanshawn) One of the things we really needed for Sparse Universal Transformers was ...
  • Tweet from Teknium (e/λ) (@Teknium1): This explains why Yann is so bearish on LLMs... đŸ˜Č
  • Tweet from Chip Huyen (@chipro): I went through the most popular AI repos on GitHub, categorized them, and studied their growth trajectories. Here are some of the learnings: 1. There are 845 generative AI repos with at least 500 sta...
  • Tweet from Teknium (e/λ) (@Teknium1): This explains why Yann is so bearish on LLMs... đŸ˜Č
  • Tweet from Grant♟ (@granawkins): "Between Q1-24 and Q4-25, there will be a 14x increase in compute. Then, if you factor in algorithmic efficiency doubling every 9 months, the effective compute at the end of next year will be alm...
  • Tweet from K (@kk_slider_k_): This makes so much sense. Yann’s always been looking for models that reason visually or using planning rather than purely in language ↘ Quoting Teknium (e/λ) (@Teknium1) This explains why Yann is ...
  • Tweet from Alex Volkov (Thursd/AI) (@altryne): Tomorrow (March 14) is: > π day > GPT-4 anniversary > Claude 1 anniversary but also đŸ„đŸ„đŸ„đŸ„ ThursdAI spaces 1st birthday 🎉 Join us as we chat about Claude Haiku, Devin, Figure+OpenAI, T...
  • Tweet from Champagne Joshi (@JoshWalkos): This is a fascinating conversation with a girl who lacks an internal monologue. She articulates the experience quite well.
  • Introducing Deci’s Gen AI Development Platform and Deci-Nano: Explore Deci’s Gen AI Development platform and the Deci Nano LLM, designed to offer efficiency, performance, and flexible deployment options
  • Google Colaboratory: no description found
  • Google Colaboratory: no description found
  • GTC 2024: #1 AI Conference: Register now. Streamed online. March 18-21, 2024.
  • NVIDIA & Harpreet Sahota GTC 2024: no description found
  • Do Llamas Work in English? On the Latent Language of Multilingual Transformers: We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language -- a question of key importance for understanding how language mo...
  • Bytez: Do Llamas Work in English? On the Latent Language of Multilingual Transformers: In this research study, scientists wanted to know if language models (that can generate text) use English as a "pivot" language internally, even when prompted in other languages. They found ...
  • Multilingual - a stereoplegic Collection: no description found

  • Latent Space ▷ #ai-announcements (5 messages):

    • Tuning in to Transformers: A new episode featuring an interview with Mikey Shulman from Suno AI, discussing music generation using transformers, is now live. Watch it on YouTube with a title “Making Transformers Sing”.
    • Paper Club Gathering Alert: The Paper Club is currently reviewing the “A Comprehensive Summary Of Large Language Models” paper. Members are encouraged to join the discussion in the dedicated channel.

    Link mentioned: Making Transformers Sing - with Mikey Shulman of Suno: Giving computers a voice has always been at the center of sci-fi movies; “I’m sorry Dave, I’m afraid I can’t do that” wouldn’t hit as hard if it just appeare



    Latent Space ▷ #llm-paper-club-west (24 messagesđŸ”„):

    • Curiosity About Supervised Fine-Tuning (SFT): A member mentioned interest in finding a way to Supervised Fine-Tune (SFT) on negative pairs, especially since they have a lot of them.

    • Decoding the Rationale Behind Attention: Discussions highlighted that the attention mechanism in neural networks like transformers was created to address limitations in older models with fixed-length context windows and to support models with the ability to focus on relevant parts of the input sequence.

    • Untangling the Concept of Parallelization: Clarifications were made regarding parallelization in transformer models, explaining that it allows for the independent processing of different tokens through the scaled dot product operation, which in turn speeds up training.

    • Understanding Transformer Motivations: A member expressed the importance of grasping the intuition behind design choices in transformer models, and received elucidation on the historical limitations of earlier models which transformers aimed to solve.

    • Appreciation for Learning Experience: Participants expressed gratitude for the session which provided more insight into the progression and advancements of Large Language Models (LLMs).


    Latent Space ▷ #ai-in-action-club (36 messagesđŸ”„):

    • Passive Participation in IRL Meetings: A member mentioned being in an IRL meeting and could only passively participate in today’s Discord chat.
    • Anticipation of In-Depth Content: Two members hint at upcoming in-depth versions of their discussions to be posted on their respective blogs.
    • Nuisance of Web Interfaces for RAG: A user reported issues when using the web interface for RAG (Retrieval-Augmented Generation) systems, suggesting the app might be a better option for stability.
    • Sharing Useful Resources on RAG: A member shared a link to a medium post about advanced RAG techniques that could improve the retrieval and generation of high-quality responses.
    • Resource Compilation Document Shared: A comprehensive Google Sheets document was linked which compiles resources on topics such as UI/UX patterns for GenAI and RAG architectures, indicating previous discussions and facilitators on the subject matter.

    Links mentioned:


    OpenAI ▷ #ai-discussions (60 messagesđŸ”„đŸ”„):

    • Microsoft Employee Fixes Typo After Community Ping: A user flagged a typo in a Microsoft service, prompting action from the Bing VP and resulting in a fix. The user noted the VP acknowledged the mistake as a typo.

    • Stumped by Repeated Morphemes: Members discuss the challenge of getting GPT-3.5 to generate examples of repeated morphemes in compound words. Suggestions include guiding GPT-4 to use Python tools to assist in generating the correct output by creating a list of end-letter sequences.

    • Anticipation for OpenAI Updates: Conversation reveals anticipation for potential updates from OpenAI, with some expectations set on specific dates like the company’s “birthday” and speculative delays due to elections. Users discuss the impact of updates on their excitement and expectations.

    • Delegating Tasks to Domain-Specific AIs: A discussion on the potential for a “high level assistant” capable of delegating tasks to more specialized AI models ensued, touching on the prospects and challenges of creating a multi-tiered AI system with a “central brain”.

    • ChatGPT Team and Privacy Concerns: Questions about the capabilities regarding the ChatGPT team and individual account privacy prompted sharing of OpenAI’s enterprise privacy policy. Users inquired about API key usage for multiple services and team chat visibility for admins.

    Link mentioned: Enterprise privacy: no description found


    OpenAI ▷ #gpt-4-discussions (1 messages):

    wesego: Hi, having that problem right now.


    OpenAI ▷ #prompt-engineering (7 messages):

    • Comma Confusion in Number Formats: Members discussed a situation where someone was using a comma as a decimal separator, which might be common in South American regions. It’s recommended to clarify this with the assistant, as models should handle different cultural number formats effectively.
    • Considering Global Number Formats: When addressing the confusion over commas and decimals, one member noted that such issues can be resolved by simply informing the assistant, given that it’s a widespread practice in various countries and the models are equipped to understand such differences.
    • Seeking Guidance on GPT-3 Prompt Architecture for Classification: A member shared their efforts in using GPT-3 for a classification task, detailing their prompt structure and asking for advice on how to improve recall and minimize false positives. They were contemplating whether to adjust the amount of context or to consider using a custom GPT model.
    • Balance is Key in Prompt Design: A suggestion was made concerning prompt architecture, advising to use no more than half of the total context window available for best results. This guidance was based on current model capabilities in handling context and the diminishing returns of information retrieval beyond a certain threshold.

    OpenAI ▷ #api-discussions (7 messages):

    • Localization Woes in Decimal Representation: There’s been a discussion around a user having issues with the assistant due to the use of commas instead of decimal points in numbers. This was identified as a localization issue, typical for South American users where commas are used as decimal separators.

    • Model Cultural Flexibility: eskcanta acknowledges that adjusting for cultural differences such as comma and decimal separators should be straightforward for the model, given its broad understanding of varied international formats.

    • Optimizing Classification Prompt Architecture: A user named mydpy queries about refining a prompt setup for a classification task. The current structure includes static instructions, iterates through examples, and formats results, with the user seeking to balance context and minimize false positives.

    • Efficient Context Usage for Prompts: darthgustav. suggests using a maximum of half the total context window for tasks to ensure best model performance. This guideline is based on the retrieval rates related to the position within the context window.


    OpenAccess AI Collective (axolotl) ▷ #general (47 messagesđŸ”„):

    • Discussing Finetuning Large Models on Single GPUs: Members expressed enthusiasm about a technique for finetuning 175 billion parameter models on a single NVIDIA 4090 GPU, citing an abstract from a research paper on Hugging Face. They considered the implications for the Axolotl framework.
    • Model Training and Hardware Compatibility: A conversation revolved around a member successfully running model training on Windows, despite concerns of potential incompatibilities with non-Mac systems. The member reported no issues post-training, though merge conflicts were mentioned.
    • Q&A vs. Completion Format for Knowledge Implementation: Members debated the merits of training models on raw text completion format versus converting to Q&A format, considering potential information loss in the conversion process. LoRA was mentioned as a tool for stylistic training but the consensus was to use completion format for raw corpus training.
    • User Guidance on Data Format and Conversion in Axolotl: There was a request for updated guides on data formats for Axolotl, and a subsequent clarification that raw text can be converted to a chat format like ShareGPT before training. Members shared how to use Axolotl for converting the format to Llama-2 for chat model compatibility.
    • Differences Between Axolotl and LoRA Fine-Tuning: A member inquired about the differences and potential control loss when using Axolotl compared to traditional LoRA fine-tuning in transformers library. It was clarified that Axolotl acts as a wrapper for the Hugging Face training ecosystem, offering simplification through YAML configuration files.

    Links mentioned:


    OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (13 messagesđŸ”„):

    • ScatterMoE Optimizations Promising: The Axolotl-dev channel discussed a new ScatterMoE implementation promising optimizations over Huggingface’s approach, claiming to surpass MegaBlocks in throughput. A link to the Optimized MoE models branch was shared for review and consideration.

    • Seeking Clarifications on ScatterMoE: Members inquired about the ScatterMoE optimization, asking for explanations on its benefits, how to train with it, and whether it would be integrated into other implementations such as vllm and llama.cpp.

    • Pull Request for Post Training Implementations: An attempt to use ScatterMoE was made with a member sharing their pull request link and receiving feedback that it needed to more accurately recreate the MixtralMoE module and was still pending testing.

    • Upgrading PyTorch for Compatibility: A member suggested upgrading Axolotl to a higher version of PyTorch as newer kernels are not compatible with the current version used, suggesting that version 2.0.1 is considered outdated.

    • Confirmation of Tool Versions: Amidst the conversation about upgrades and implementations, a member confirmed that they are already utilizing PyTorch version 2.2.1, which is in line with the requirements for using ScatterMoE.

    Links mentioned:


    OpenAccess AI Collective (axolotl) ▷ #general-help (9 messagesđŸ”„):

    • In Search of Inference Code: A member sought example code for running inference on approximately 100 prompts with a LoRA model fine-tuned off Mistral-7B-v0.1. They contemplated using transformers and model.generate(**model_inputs) but was advised to consider using vLLM as it could be quicker for their needs.
    • vLLM Might Be Better Than Transformers: The suggestion to use vLLM for offline batched inference was emphasized again, highlighting its potential for quicker operations compared to the transformers library.
    • Token Trouble for Text Summarization: A member reported issues with a tokenizer in a fine-tuning task for an instruct model for document summarization. The fine-tuned model frequently omitted the first <summary> tag or included an unwanted space before it, raising concerns about whether this was a tokenizer-related problem.
    • Fine-Tuning Frustration: A newcomer to LLM queried about the correct syntax for configuring a script to point to locally stored model and training data rather than pulling resources from Huggingface. They were looking to fine-tune a model with already downloaded data.

    Link mentioned: Quickstart — vLLM: no description found


    OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

    • Cohere Command-R Joins OpenRouter: A new conversational model called Command-R created by Cohere, boasting a long context of 128k tokens, is now available. Users are encouraged to try it via the OpenRouter API, with 2 million prompt tokens per dollar and a link to play with it at OpenRouter Models.

    • Boost Your Metrics with Daily Analytics: OpenRouter has introduced daily analytics, allowing users to track token usage per day for all models, in addition to the existing weekly view. This feature can be explored at OpenRouter Rankings.

    • API and Page Speed Enhancements: OpenRouter has improved speed significantly, not just for the /models API but also for all model-related pages on the platform.

    • Model Parameter Data Awaiting More Info: Despite the introduction of Cohere’s Command-R, its parameters aren’t yet listed in the /parameters API due to insufficient data. Once enough data is collected, it will become available at Command-R Parameters.

    Links mentioned:

    • Cohere: Command-R by cohere | OpenRouter: Command-R is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex w


    • Cohere: Command-R by cohere | OpenRouter: Command-R is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex w


    • OpenRouter: Language models ranked and analyzed by usage across apps


    OpenRouter (Alex Atallah) ▷ #general (54 messagesđŸ”„):

    • One Library to Call Them All: Users discussed litellm, a universal API wrapper enabling calling various LLM APIs using OpenAI’s format. While praised for its utility, limitations were noted, such as vision tasks only working with GPT-4 and certain features being specific to GPT models.

    • Navigating API Frontends and Payment Systems: The conversation included suggestions for GUI frontends to plug in API keys, like open-webui and TypingMind.com, with varying charges for their use. The need to top up a balance to use APIs without connecting a credit card was also mentioned.

    • Seeking the Best LLM for Roleplay and Uncensored Dialogue: Participants sought advice on the best LLMs for specific applications like roleplaying in Skyrim or engaging in controversial topics. Some users advocated for less censorship in LLMs, and there was particularly high praise for the creative outputs from models like Claude Sonnet.

    • Resolving Installation Issues and Understanding Limitations: There were queries on how to install certain tools, such as a WebUI for LLMs, as well as discussions about the applicability of different models for unique use cases, like a lecture chatbot or a text-based roleplay experience.

    • Concerns Over Content Moderation and Model Censorship: Users expressed concerns about overly stringent content moderation and the impact of censorship on model usability. Some dialog focused on the balance between preventing harmful content and retaining the creative capacities of LLMs, with suggestions for uncensored APIs and improved content filter mechanisms.

    Links mentioned:


    CUDA MODE ▷ #general (12 messagesđŸ”„):

    • NumPy vs. BLAS Performance Analysis: A blog post argues that NumPy, despite its popularity for numerical computing in Python, has significant performance overhead—resulting in up to 90% of throughput loss with BLAS in specific operations like the 1536-dimensional OpenAI Ada embeddings. Their solution is SimSIMD, which can minimize this loss.

    • Overhead in NumPy Discussed: In the chat, someone pointed out that the overhead in NumPy for operations under 1”s is significant; therefore, for numerous small operations, a SIMD wrapper would be a more efficient solution instead of using NumPy, which adds unnecessary overhead.

    • Streamlined Messaging Suggested for Technical Articles: A member suggested a more direct approach for technical write-ups, favoring clear explanations on intent, process, applicable scenarios, and installation instructions over merely showcasing benchmark numbers.

    • Photonic Computing Gains Traction: A YouTube video titled “New Breakthrough in Photonics: x1000 faster. Is it for Real?” has been shared; its subject, Lightmatter, focuses on using photonic technology to reinvent chip communication and computation to improve AI’s environmental impact and efficiency. The video can be found here.

    • Insightful Photonics Content Recommendations: In support of the photonic technology discussion, members recommended Asianometry’s videos for deeper insights—namely “Silicon Photonics: The Next Silicon Revolution?” and “Running Neural Networks on Meshes of Light”—which can be viewed on YouTube and YouTube respectively.

    Links mentioned:


    CUDA MODE ▷ #triton (10 messagesđŸ”„):

    • Debugging Triton Tensors made easy: Set the environment variable TRITON_INTERPRET=1 and use print statements to inspect tensor values. tl.arange(0,N) tensor indexing errors can be circumvented with these practical debug steps.
    • Visual Debugging Tools for Triton on the Horizon: A visualizer for Triton is in development, aimed at simplifying the inspection of the spatial structure of load/stores. There are a couple of known issues at the moment, including occasional double visualization and segfaults.
    • Outdated Debugging Methods: The use of the @triton.jit(interpret=True) decorator for debugging Triton code has been noted as deprecated.
    • Helpful GitHub Discussions for Triton Debugging: Specific issues and discussions on GitHub can offer help with debugging kernels, exemplified by this GitHub issue.
    • Need More Annotated Triton Examples: While the official tutorials are the primary learning resource, there’s an expressed need for more annotated Triton kernel examples in the community to aid in understanding.

    Links mentioned:


    CUDA MODE ▷ #cuda (13 messagesđŸ”„):

    • Kernel Launch Overhead Confusion Cleared: A member gained clarification on unexpected output when swapping the order of CUDA functions - it was confirmed to be due to kernel launch overhead. The tool ncu was recommended to isolate this overhead.
    • Seeking CUDA Learning Resources: A member new to CUDA sought beginner-friendly learning materials, and it was established that the member is familiar with C++, a useful pre-requisite for CUDA.
    • FP8 Matmul on 4090s Shows Promise: A brief mention noted that fp8 matrix multiplication on 4090 GPUs is impressively fast, indicating potential performance gains.
    • Recommendation for a CUDA Beginner’s Book: For learning CUDA, the book Programming Massively Parallel Processors was recommended, identified as a foundational text even for undergraduates and not too advanced for those knowledgeable in C/C++.
    • Join a CUDA Programming Book Reading Group: For those who are beginning to learn CUDA, it was shared that there is a reading group available, which indicates community support for those working through the recommended book.

    Link mentioned: Programming Massively Parallel Processors: A Hands-on Approach: Hwu, Wen-mei W., Kirk, David B., El Hajj, Izzat: 9780323912310: Amazon.com: Books: no description found


    CUDA MODE ▷ #jobs (1 messages):

    vim410: Depends. But yes.


    CUDA MODE ▷ #pmpp-book (8 messagesđŸ”„):

    • SM Architecture and Processing Blocks Explained: A member referred to lecture 4 for a visual aid in understanding GPU architecture, detailing that a GA102 SM has 4 processing blocks which execute a warp at a time. 32 fp32 instructions can run concurrently, while int32 instructions are split into two batches of 16 due to the core limitations.

    • Indexing Dilemma in CUDA Coding: When discussing a chapter 2 query, a wrong indexing approach i = blockIdx.x * blockDim.x + threadIdx.x * 2 was corrected by an explanation that showed it would result in double-counting. To illustrate, with blockDim.x = 32, both {blockIdx.x = 0, threadIdx.x = 16} and {blockIdx.x = 1, threadIdx.x = 0} would erroneously yield i = 32.

    • Questioning Content Sharing Boundaries: A member queried the propriety of blogging their answers to CUDA exercises, mentioning an attempt to contact the authors without success due to lack of an educational email address. Another member promised to check with author Wen-mei for clarification.


    CUDA MODE ▷ #ring-attention (7 messages):

    • Confusion Over Ring Attention’s Compatibility: A member expressed uncertainty on why there are claims that ring attention cannot be used with flash, despite similar implementations being apparently successful.
    • Awaiting Response from Busy Member: Andreas Koepf indicated being quite busy, promising to get back to the conversation when availability improves, to which Jamesmel responded with understanding.
    • Searching for the Missing Code: Iron_bound expressed disappointment in not being able to find associated code for the Twitter post about ring attention, leaving a sentiment of incomplete understanding.
    • Link to Triton Kernel Code Shared: Iron_bound provided a link to a Triton kernel implementation which seems related to the discussion of ring flash attention.

    Link mentioned: add naive triton kernel for varlen · zhuzilin/ring-flash-attention@10d992c: no description found


    CUDA MODE ▷ #off-topic (3 messages):

    • Meta vs. Former Exec Lawsuit: Meta has legally targeted a former executive, accusing him of stealing over 100 internal documents and attempting to recruit Meta’s employees for a competing AI data startup, Omniva. The lawsuit was publicized through an unsealed court filing and further reported in an Ars Technica article.
    • Disappointment in Channel Dynamics: A user expressed dissatisfaction with how a conversation in the channel was progressing, using a brief phrase to imply that the discussion’s direction was not as expected.
    • Starting From Scratch: A member mentioned that all three participants in a conversation are starting from lecture 1, possibly indicating a collaborative learning effort or a collective beginning of a new topic or course.

    Link mentioned: Meta sues “brazenly disloyal” former exec over stolen confidential docs: Meta’s former exec allegedly shared data center secrets with a shadowy startup.


    LangChain AI ▷ #announcements (1 messages):

    • Langchain 0.2 Release Rush: Due to recent CVEs filed against langchain, the team is expediting the release of version 0.2, which will break the dependency on langchain-community. The bigger refactors planned will now be shifted to version 0.3, and more details are available in a GitHub discussion.
    • Call for Community Feedback: The LangChain team is seeking feedback on the upcoming changes to ensure they do not cause any issues for users. The team emphasizes that the goal of these changes is to make your life easier.

    Link mentioned: RFC: Expedited langchain 0.2 release · langchain-ai/langchain · Discussion #19083: Context Currently langchain (the package) depends on langchain-community. This is done only for backwards compatibility with langchain versions that predate the split of langchain and langchain-com



    LangChain AI ▷ #general (34 messagesđŸ”„):

    • Troubleshooting AgentExecutor Execution Errors: A user is facing an OutputParserException when running an AgentExecutor with a command from Cohere, although the python code seems properly generated. The expectation is the agent would execute python code and respond in natural language.
    • Langsmith and Imported Prompts Confusion: A member struggles to understand why their custom prompt doesn’t enable tool use compared to an imported prompt from hub, and seeks clarification about the differences.
    • API Query via Curl for StackOverflow: A user inquired about using an API to query StackOverflow, and was directed to use the StackExchange API for advanced search functionality to meet their requirements.
    • Debating the Usefulness of LLM Agents: A discussion unfolded regarding the practicality of LLM agents over combining LLM output with functions, with members debating agents’ abilities for action sequencing and error-handling, and pondering ways to evaluate agent behavior, possibly with the help of LangChain benchmarks.
    • Using LangGraph for Cyclic Computations with LLMs: An explanation was provided for using LangGraph when needing to add cycles to applications, especially for stateful, multi-actor applications with LLMs, with references to JavaScript and Python LangGraph documentation for further details.

    Links mentioned:


    LangChain AI ▷ #langchain-templates (1 messages):

    • Query on Langsmith Hub Prompt Templates: A member inquired about how to create a prompt template in Langsmith Hub, demonstrating a placeholder {tools} for a list named tools in their code. They were specifically looking for guidance on linking the tools = [cat_tool] variable to the placeholder in the template.

    LangChain AI ▷ #share-your-work (6 messages):

    • SAP HANA Meets LangChain: A blog post explores the innovative integration of LangChain with the SAP HANA Vector Engine, presenting potential advances in AI applications. For more information on this synergy, visit Unlocking the Future of AI Applications.

    • Dall-E Enters the JavaScript World: Blog post details the addition of Dall-E image generation support to the JavaScript version of LangChain. Useful code snippets and instructions included in Lang Chain for JavaScript Part 3: Create Dall-E Images.

    • Orchestrating Operational Browser Flows with AI: A new blog post describes how a system of LLM agents is orchestrated to facilitate automated browser interactions. Check out the engineering behind it at The Engineering of an LLM Agent System.

    • Open Source Langchain Chatbot Showcases RAG for Q/A: The Langchain chatbot, which utilizes RAG for efficient question and answer querying, is now fully open source. Investigate the application on GitHub.

    • Living Bookmarks Bot for Better Bookmark Management: A Twitter user developed a Discord AI chatbot for managing Raindrop.io bookmarks to aid in finding them easily when needed, and has made it available open source.

    Links mentioned:


    LangChain AI ▷ #tutorials (1 messages):

    pradeep1148: https://www.youtube.com/watch?v=PzaidfqDtGI


    LAION ▷ #general (27 messagesđŸ”„):

    • Looking for GPU Partners for Captioning: A member requested help with captioning and is seeking someone with spare 3090s or 4090s to assist. They’ve also asked interested individuals to reach out via direct message.
    • Optimizing on MacOS with M3 Max: A member is working on getting simpletuner to run on MacOS and discussed the potential of using more than 96GB of the system’s memory for compute on a new 128G M3 Max system.
    • Sharing Prompt Augmentation Innovations: A link to an article about prompt augmentation using a 77M T5 model was shared along with an impressive demonstration of its capabilities in image generation. Another member contributed by sharing a link to DanTagGen, an autocompleting tags tool using a smaller model, on HuggingFace.
    • Interest in AI Law Regulation from the EU: A member highlighted the adoption of the Artificial Intelligence Act by the European Parliament, designed to ensure AI safety and compliance with fundamental rights. The regulation aims to address the risks of AI and impact applications that threaten citizens’ rights.
    • IEEE Symposium on Security and Privacy Update: A member posted about the 45th IEEE Symposium on Security and Privacy being removed from the accepted papers page. There was a brief conversation about the implications of this removal for a person named Ben and whether they would resubmit to an appropriate conference.

    Links mentioned:


    LAION ▷ #research (13 messagesđŸ”„):

    • Virtual Try-On with TryOnDiffusion Unveiled: An open-source implementation of TryOnDiffusion, as described in the Google paper “A Tale of Two UNets”, has been released under the MIT License. The code is available on GitHub.

    • Fast Decoding Research Mentioned: A paper claiming 2D Gaussian splatting decodes faster than jpeg has been shared, insinuating it could be of interest due to its speed and optimization. The paper can be found on arXiv.

    • Personal Project Reflection: A member recounted their own attempt at a project conceptually similar to the one described in the aforementioned 2D Gaussian splatting paper, admitting they weren’t able to optimize it as well but found validation in seeing professional work align with their methods.

    • Resource Constraints in Model Deployment: Inquiring about how to implement a CPU cap like the one used in a text-generation web UI, a member shared their struggle with CUDA out of memory issues on non-UI models. They are seeking insights on handling large models without hitting free tier limitations as outlined in the GitHub repo.

    • Limits of Free Colab for Web UIs: Further to the previous point, other members explained that you can’t use free Colab for running web UIs, hinting the suitability of discussion on other channels meant for such technical inquiries.

    Links mentioned:


    LLM Perf Enthusiasts AI ▷ #general (1 messages):

    Since there is only one message provided and no additional context such as previous messages, links, or discussion points, a summary cannot be generated based on the instructions given. Please provide a series of messages or more context to summarize.


    LLM Perf Enthusiasts AI ▷ #gpt4 (1 messages):

    • GPT-4 Turbo Goes on Space Mission: One member reported encountering a peculiar issue with gpt-4-turbo-preview, where a completion task with a very long passage (12,000 tokens) resulted in the model endlessly outputting space characters. In an unusual twist, the model even began generating “Russian gibberish” after a lengthy sequence of spaces, as evidenced by attached screenshots.

    LLM Perf Enthusiasts AI ▷ #claude (18 messagesđŸ”„):

    • Haiku’s Cost-Effective Document Describing: A member highlighted the efficiency of Haiku in describing complex documents visually for economical costs, but also noted it is not as good as GPT-vision.
    • Limitations in Haiku’s Performance: Despite its strides, Haiku is still seen as inferior to Opus in vision-to-text tasks.
    • Content Filter Hurdles with Claude: There were issues with Claude regarding content filtering, particularly with it stopping mid-way when parsing documents containing equations.
    • Controversial Take on Anthropic: A tweet shared in the chat suggests Anthropic is perceived as a strategic entity aiming to instill a ‘fear of god’ among technical staff members.
    • Content Moderation Challenges for Specific Images: Users reported content moderation issues with images that contain people, where the system sometimes refuses to process them.

    Link mentioned: <a href=https://x.com/tszzl/status/1768530219378631137?s=20>Tweet from roon (@tszzl): anthropic is controlled opposition to put the fear of god in the members of technical staff


    LLM Perf Enthusiasts AI ▷ #reliability (16 messagesđŸ”„):

    • KPU: The Next Big Thing in AI?: Maisa announces the KPU (Knowledge Processing Unit), a new framework designed to enhance LLMs by separating reasoning from data processing. KPU claims to outperform GPT-4 and Claude 3 Opus in reasoning tasks.

    • Questioning Benchmarks: Members express skepticism about KPU comparing its performance with GPT-4 and not GPT-4 Turbo. The discussions highlight concerns about potentially unfair benchmarks.

    • KPU: Beyond Prompt Engineering?: One member wonders if KPU’s technology is merely about prompt engineering, while another clarifies that it includes self-evaluation and context window management tricks.

    • Examining Comparative Analysis: Humorous reactions emerge in response to the apparent omission of GPT-4 Turbo from KPU’s comparative analysis, suggesting a pattern also seen in Claude 3’s release.

    • Concern over Practical Efficiency: A discussion ensues about KPU’s lack of latency information, raising doubts about its practical application in real-world products despite purported improvements in accuracy.

    Links mentioned:

    • KPU - Maisa: AI-Powered Knowledge Processing Platform. A simple API for executing business tasks. Abstracting the complexities of using the latest AI architectures for software and app developers

    • Tweet from David VillalĂłn (@davipar): happy to answer! it is not a new model, indeed KPU is agnostic to intelligence providers (OpenAI, Antrophic
). It is a new AI architecture to work with LLMs that leverages their reasoning capabiliti



    Skunkworks AI ▷ #general (17 messagesđŸ”„):

    • Paper on Improved Training Method Coming: A member is working on releasing a paper/article that suggests a method which seems to improve global accuracy and makes the training more sample efficient. They are planning to structure the results and create better visualizations for their findings.
    • Seeking Resources for Scaling: The approach needs validation for large model efficacy, but currently, there is a lack of resources to empirically prove it at scale.
    • Method Shows Promise Even With Large Models: Initial tests with VGG16 on a subset of CIFAR100 show a significant improvement using the new method (0.1 test accuracy) over base training (0.04 test accuracy).
    • Collaboration on Resource Allocation: Members are coordinating to help allocate compute and resources for further testing and scaling of the new training method.
    • Involvement in the Quiet-STaR Project: A member expressed interest in participating in the implementation of “Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking” and was asked about their proficiency in PyTorch and transformers architecture.

    Skunkworks AI ▷ #off-topic (2 messages):

    • Dive into Hermes 2 Pro 7B Function Calls: A link to a YouTube video titled “Lets Function Call with Hermes 2 Pro 7B” was shared, which demonstrates function calling with the language model Hermes 2 Pro 7B. The video is accompanied by a GitHub repository that dives deeper into the Hermes function calling capabilities.
    • Meta Quest Hackathon Looking for Innovators: An invitation was extended to join a team for the Meta Quest Presence Platform Hackathon, where participants create innovative mixed reality content using the Presence Platform on Meta Quest. Interested individuals are encouraged to learn more and join without needing any prior skills, suggesting a learn-as-you-go approach, and were referred to the hackathon’s resources.

    Links mentioned:


    Datasette - LLM (@SimonW) ▷ #ai (16 messagesđŸ”„):

    • Searching for the Prompt Engineering Workbench: A member inquired about a tool analogous to Postman for prompt engineering that allows for managing a prompt library, versioning prompts, staging data, running tests, and integrating with multiple models.
    • Using SQLite for Prompt Capturing: Another member shared their method, utilizing LLM in the terminal, to manage prompts and responses by capturing them in SQLite, with an observation that a custom UI might be beneficial.
    • Prodigy as a Prompt Engineering Tool: The conversation included a mention of a tool previously developed for Explosion’s Prodigy, a paid product that integrates prompt engineering as a data annotation problem offering facilities like A/B testing capability.
    • PromptTools for Prompt Experimentation: The PromptTools GitHub repository, an open-source initiative for prompt testing and experimentation with support for varied LLMs and vector databases, was suggested as a resource for setting up experiments.
    • Helicone AI Enters the Prompt Management Arena: A participant pointed to Helicone AI, a developing platform for Generative AI applications that is starting to incorporate features related to prompt management, versioning, and analysis.

    Links mentioned:


    Datasette - LLM (@SimonW) ▷ #llm (1 messages):

    obra: Is it possible to recover the seed used by the openai models for a previous api request?


    Interconnects (Nathan Lambert) ▷ #other-papers (8 messagesđŸ”„):

    • Unlocking LLM Secrets via API: A recently discussed research paper explores how to extract non-public information about API-protected Large Language Models (LLMs) like OpenAI’s GPT-3.5 by exploiting the softmax bottleneck—revealing details such as hidden model size with a relatively low number of API queries.
    • Discussion on Carlini’s Latest Work: A participant referenced a recent paper by Carlini et al. that investigated the model size estimation through logits but noted that the key details were redacted.
    • Surprise Over Alleged Model Size: One member expressed surprise that the model size could be 7B parameters, suggesting such an estimation seems implausible.
    • Skepticism on Model Size Accuracy: Resistance to the 7B size estimate was voiced, with speculation that the calculation might be flawed, especially if GPT-3.5 is a Mixture of Experts (MoE) model.
    • Theory of Distillation or Mixtures in Models: A discussion speculated on the use of ‘mega distillation sauce’ or token-critical mixtures in turbo LLMs, citing past research that showed the beginning tokens are crucial for performance in tasks like math problems.

    Link mentioned: Logits of API-Protected LLMs Leak Proprietary Information: The commercialization of large language models (LLMs) has led to the common practice of high-level API-only access to proprietary models. In this work, we show that even with a conservative assumption



    Interconnects (Nathan Lambert) ▷ #ml-questions (4 messages):

    • Seeking Citations for Safety Filtering by Model Providers: A member asked for references to support the statement that “foundation model providers do a lot of the safety filtering for text post generation.”
    • Agile Text Classifiers Aid Safety Policy: Another member provided a reference to the paper Agile classifiers for safer chatbots, which discusses how prompt-tuning large language models with small datasets can quickly adapt to safety policies and achieve state-of-the-art performance.
    • Satisfaction with Safety Filtering Resource: The initial member acknowledged that the provided paper on agile text classifiers helps convey the intended point about foundation model providers’ role in safety filtering.

    Link mentioned: Towards Agile Text Classifiers for Everyone: Text-based safety classifiers are widely used for content moderation and increasingly to tune generative language model behavior - a topic of growing concern for the safety of digital assistants and c



    Interconnects (Nathan Lambert) ▷ #random (5 messages):

    • Craving Ultra-Long Context: A member expressed a hopeful outlook towards the development of Gemini for ultra-long contexts, which they hope will improve summary generation, currently used as “better abstracts.”
    • Contemplating the Serendipity of Smarter Prompts: Another member discussed the challenges of finding the right balance in prompt engineering and looks forward to more intuitive and less tedious prompting mechanisms, likening it to search engine suggestions which get “warmer or colder” to guide users.
    • Innovative Paper Summarization Concept: A new approach to summarizing academic papers was proposed where an AI tool would monitor new papers citing a user’s favorite research, potentially providing contextual citations, like where the referenced dataset is used.
    • Dispelling GPT-4.5 Rumors: One member conveyed disappointment, inferring from available information that GPT-4.5 would not be released “today.”
    • Entertainment in AI Discussions: A tweet was shared indicating Yann LeCun’s skeptical stance on language models, prompting discussion and reactions within the group. This explains why Yann is so bearish on LLMs.

    Link mentioned: Tweet from Teknium (e/λ) (@Teknium1): This explains why Yann is so bearish on LLMs
 đŸ˜Č


    DiscoResearch ▷ #general (3 messages):

    • Language Struggle with DiscoLM-70b: A member encountered difficulties in eliciting responses in English from DiscoLM-70b, despite the model’s card suggesting multi-language capabilities. It was suggested to analyze the prompt structure for potential issues.
    • Cross-Model Performance Mysteries: Comparisons with other models like LeoLM variants, llama2, and Nous-Hermes-2-Mixtral showed expected performance in multilingual tasks. The same member reported that after instruction fine-tuning, the DiscoLM-mixtral-8x7b-v2 failed to generate responses in German.
    • Fine-Tuning Hurdles with DiscoLM: Supervised fine-tuning of DiscoLM as a sequence classification problem resulted in a ValueError, indicating an unrecognized configuration class for AutoModelForSequenceClassification. The error suggests possible compatibility issues with the current setup.

    DiscoResearch ▷ #embedding_dev (1 messages):

    • Introducing “GermanQuAD” Evaluation Task: The embedding_dev channel includes a message about the “GermanQuAD” evaluation task, which can be used in the MTEB’s python package, as well as mentioning recent German additions from JinaAI.

    DiscoResearch ▷ #discolm_german (5 messages):

    • Demo Availability Confusion: A member inquired whether the demo was available, implying that it might be down or inaccessible at the moment.
    • Model Prompt Respect: A member explained that the model is trained to respect the system prompt and suggested trying variations for optimal outcomes. They confirmed that the demo doesn’t utilize special settings and runs on fastchat/vllm.
    • Demo Down Due to Server Move: In response to the demo availability question, it was clarified that the server hosting the demo was moved and networking issues arose, causing downtime. The hope is to have the demo running again by early next week.
    • Hobbyist vs Professional Hosting Challenges: A member humorously remarked on the reliability of a hobbyist server in a kitchen corner compared to professional hosting, which seems to face networking issues and other technical hiccups.