> AI News for 4/9/2024-4/10/2024. We checked 5 subreddits and [**364** Twitters](https://twitter.com/i/lists/1585430245762441216) and **26** Discords (**388** channels, and **5893** messages) for you. Estimated reading time saved (at 200wpm): **600 minutes**.

While people are still processing the big Gemini audio and GPT4T and Mixtral news from yesterday, today was Udio’s big launch:

image.png

You’ll have to listen to the samples in thread to compare it with Suno, which of course has its own fandom. Udio has leaked like a sieve the last few days so it’s no surprise, but more surprising was Sonauto also launching today also going after the music generation game, though far less polished. This feels like an idea whose time has finally come, though unlike with Latent Diffusion, it is unclear what breakthroughs enabled Suno/Udio/Sonauto all around the same time. You can hear some hints on Suno’s Latent Space pod but that’s all you’ll get until we release the next music episode.


Table of Contents

[TOC]


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence. Comment crawling still not implemented but coming soon.

Here is a summary of the key themes and topics from the given Reddit posts, organized into categories with the most relevant posts linked:

AI Models and Architectures

Open Source Efforts

Benchmarks and Comparisons

Multimodal AI


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

GPT-4 Turbo Model Improvements

  • Improved reasoning and coding capabilities: @gdb, @polynoamial and @BorisMPower noted GPT-4 Turbo’s significantly improved reasoning and coding performance compared to previous versions.
  • Generally available: @gdb, @miramurati, and @owencm announced GPT-4 Turbo is now out of preview and generally available.
  • Comparisons to previous versions: @gdb, @nearcyan and @AravSrinivas shared comparisons and noted the update is quite notable.

Mistral AI’s New 8x22B Model Release

  • 176B parameter MoE model: @sophiamyang and @_philschmid detailed Mistral AI’s release of Mixtral 8x22B, a 176B parameter MoE model with 65K context length and Apache 2.0 license.
  • Evaluation results: @_philschmid shared Mixtral 8x22B achieved 77% on MMLU. More positive results in @_philschmid.
  • Community excitement and access: Many like @jeremyphoward and @ClementDelangue expressed excitement. It’s available on Hugging Face and Perplexity AI per @perplexity_ai.

Google’s New Model Releases and Announcements

  • Gemini 1.5 Pro public preview: @GoogleDeepMind announced Gemini 1.5 Pro, with a long context window, is in public preview on Vertex AI. Available via API in 180+ countries per @GoogleDeepMind.
  • Imagen 2 updates: Imagen 2 can now create 4-second live images and includes a watermarking tool called SynthID, shared by @GoogleDeepMind and @GoogleDeepMind.
  • CodeGemma and RecurrentGemma models: @GoogleDeepMind announced CodeGemma for coding and RecurrentGemma for memory efficiency, in collaboration with Google Cloud, detailed in @GoogleDeepMind and @GoogleDeepMind.

Anthropic’s Research on Model Persuasiveness

  • Measuring persuasiveness of language models: @AnthropicAI developed a way to test persuasiveness and analyzed scaling across model generations.
  • Scaling trend across model generations: @AnthropicAI found newer models were rated more persuasive. Claude 3 Opus was statistically similar to human arguments.
  • Experiment details: Anthropic measured agreement level shifts after reading LM or human arguments on less polarized issues, explained in @AnthropicAI, @AnthropicAI, @AnthropicAI.

Cohere’s Command R+ Model Performance

Meta’s New AI Infrastructure and Chip Announcements

  • Next-gen MTIA inference chip: @soumithchintala and @AIatMeta announced MTIAv2, Meta’s 2nd-gen inference chip with 708 TF/s Int8, 256MB SRAM, 128GB memory on TSMC 5nm. 3.5x dense and 7x sparse compute vs v1 per @AIatMeta.
  • Balancing compute, memory, bandwidth: @AIatMeta noted MTIA’s architecture optimizes compute, memory bandwidth and capacity balance for ranking and recommendation models. Full-stack control enables greater efficiency over time than GPUs per @AIatMeta.
  • Growing AI infrastructure investment: Part of Meta’s increasing AI infrastructure investment to power new experiences, complementing existing and future AI hardware, emphasized by @AIatMeta.

Humor and Memes

  • Pitching to associates: @adcock_brett humorously advised never pitching to VC associates, calling it detrimental based on a decade of unhelpful experience, expanded on in @adcock_brett.
  • Moats and open-source: @abacaj joked ā€œThere are no moatsā€ referencing a GPT-4 wrapper raising millions. @bindureddy predicted open-source leading the AGI race by year-end.
  • Anthropic reacting to GPT-4: @nearcyan posted a meme speculating Anthropic’s reaction to OpenAI’s ā€œmajorly improvedā€ GPT-4 update.

AI Discord Recap

A summary of Summaries of Summaries

1) New and Upcoming AI Model Releases and Benchmarks

  • Excitement around the release of Mixtral 8x22B, a 176B parameter model outperforming other open-source models on benchmarks like AGIEval (tweet). A magnet link was shared.

  • Google quietly launched Griffin, a 2B recurrent linear attention model (paper), and CodeGemma, new code models.

  • OpenAI’s GPT-4 Turbo model has been released with vision capabilities, JSON mode, and function calling, showing notable performance improvements over previous versions. Discussions revolved around its speed, reasoning capabilities, and potential for building advanced applications. (OpenAI Pricing, OpenAI’s Official Tweet). It has notable performance gains, discussed alongside models like Sonnet and Haiku in benchmark comparisons.

  • Anticipation for releases like Llama 3, Cohere, and Gemini 2.0, with speculation about their potential impact.

2) Quantization, Efficiency, and Hardware Considerations

  • Discussions on quantization techniques like HQQ (code) and Marlin to improve efficiency, with concerns about maintaining perplexity.

  • Meta’s study on LLM knowledge capacity scaling laws (paper) found int8 quantization preserves knowledge with efficient MoE models.

  • Hardware limitations for running large models like Mixtral 8x22B locally, with interests in solutions like multi-GPU support.

  • Comparisons of AI acceleration hardware from companies like Meta, Nvidia, and Intel’s Habana Gaudi3.

3) Open-Source Developments and Community Engagement

  • LlamaIndex showcased for enterprise-grade Retrieval Augmented Generation (RAG) (blog), with the MetaGPT framework at ICLR 2024 leveraging RAG (link).

  • New tools like mergoo for merging LLM experts (GitHub) and PiSSA for LoRA layer initialization (paper, repo).

  • Community projects: everything-rag chatbot (HuggingFace), TinderGPT dating app (GitHub), and more.

  • Rapid open-sourcing of new models like Mixtral 8x22B by community members on HuggingFace.

4) Prompt Engineering, Instruction Tuning, and Benchmarking Debates

  • Extensive discussions on prompt engineering strategies like meta-prompting and iterative refinement using AI-generated instructions.

  • Comparisons of instruction tuning approaches: RLHF vs Direct Preference Optimization (DPO) used in StableLM 2 (model).

  • Skepticism towards benchmarks being ā€œgamedā€, with recommendations for human-ranked leaderboards like arena.lmsys.org.

  • Debates around LLM2Vec for using LLMs as text encoders (paper, repo) and its practical utility.


PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord

Super-Resolution Squads Deploy Techniques: Engineers discussed enhancing image quality from video screenshots using super-resolution. They referenced RealBasicVSR, with many looking forward to more advanced video upscalers.

Stirring Stable Diffusion Creativity: Newcomers inquired about creating original content with Stable Diffusion, receiving guidance toward tools and repositories on GitHub. Contributions of demo URLs from experienced users further supported these explorations.

Custom Control Debates Heat Up: Participants debated the customizations within Stable Diffusion, including specific dataset construction, project enhancements, and stylized ā€˜loras’ to reflect distinct art styles, indicating a trend toward highly personalized model outputs.

Navigating the AI Legal Labyrinth: Conversations also hinged on the legal and ethical implications of AI-generated content, addressing copyright concerns, lawful generation practices, and potential impacts of legislative developments on the field.

Eager Anticipation for Stable Diffusion 3: There was significant buzz around the anticipated release of Stable Diffusion 3, with special attention to its hand-generation abilities and the question of whether newer models will need negative prompts to avoid undesirable outputs.


LM Studio Discord

  • Calculator GUI Achievement: In a Mistral-7b-instruct-v0.1Q4_0 performance review, it stood out in a performance test for effortlessly creating a basic calculator with a GUI, while Command R Plus was discussed to require significant VRAM, leading to discussions around local server API requests and possible VRAM bottlenecks.

  • AutoGen vs. CrewAI - The Automation Faceoff: A quandary was presented by a member evaluating AutoGen, CrewAI, and other tools for task automation with local LMs, leaning towards AutoGen for its ease of use and favorable outcomes with structured inputs, while seeking an optimal model to run on a 12GB 3080 GPU.

  • Command R Plus Beta Excitement: LM Studio’s 0.2.19 beta saw discussions on its latest features and stability enhancements, with members particularly happy about the Command R Plus model’s compatibility and performance on a range of hardware including an M3 MacBook Pro and an AMD machine with AVX2 support.

  • CodeGemma’s Grand Entry: Google’s launch of CodeGemma models, available in 2B and 7B variants for code tasks, stirred discussions, and members are testing its capabilities against the likes of Claude and GPT-4. The LM Studio Community seeks further insights into this new model’s prowess.

  • ROCM and Compatibility Blues: The recent 0.2.19 ROCm Preview Beta-3’s support for Command R Plus prompted dialogues on ROCM utilization issues, but comfort was found in the anticipation of the pending Linux release. Yet, the perplexity over the 7800XT’s compatibility remains unresolved.


Unsloth AI (Daniel Han) Discord

  • Checkpoints Left Hanging: An issue was raised regarding the hub_strategy="all_checkpoints" in TrainingArguments causing failures in checkpoint folders being pushed to the repo. Detailed training parameters were shared, but no clear-cut solution surfaced immediately.

  • Longer Context, Trimmer VRAM: Unsloth AI’s new release has enabled context windows 4 times longer with a 30% reduction in VRAM use, with only a 1.9% increase in runtime. They’re also working on a one-click solution for even smoother fine-tuning experience and model optimization (Long-Context Support Detailed).

  • Merch Ideas, a Steal or a Mug’s Game?: Discussion in the community touched on the potential for Unsloth-themed merchandise, spurred by a user’s unrelated coffee mug gift. Members also requested technical documentation, notably for Hugging Face Json file documentation.

  • Efficient Approach to LLM Fine-Tuning: Discussions around optimally fine-tuning AI chatbots highlighted the usage of Alpaca format for Alpaca models and ChatML template for chatbots, with emphases on the necessity for dataset compatibility with specific fine-tuning frameworks.

  • StegLLM Sneaks into the Scene: A new model named StegLLM was introduced, embedding a covert mechanism in mistral-7b-instruct-v0.2 and initiated by a specific ā€œkeyā€ phrase. The model maker also shared the safetensors and credited inspiration from Anthropic’s Sleeper Agents research (StegLLM on Hugging Face).

  • Multi-GPU Support on the Horizon: Contributions underlined the excitement and technical considerations for forthcoming multi-GPU support. An AdaLomo optimizer is under scrutiny for potentially low memory usage, as suggested by an arXiv paper, expected to go hand-in-hand with Unsloth AI’s future updates.


Perplexity AI Discord

Perplexity Pro Stirs Debate: Community members are dissecting the pros and cons of Perplexity Pro, particularly for learning tools like Blender and Unreal Engine, yet some users note limitations in context length compared to other services, with Gemini 1.5 standing out due to its video and audio support.

Model Comparisons and Speculations: Conversations are buzzing around Mistral 8x22b, an open-source model believed to slot between GPT-4 and Sonnet, though its heavy compute requirements limit accessibility. There’s also a light-hearted banter about future models like ā€œGPT-5ā€ and ā€œGemini 2.0ā€, paralleled with quips about the anticipated release of ā€œGTA 6ā€.

Tech Mashup: Raycast Meets Perplexity: An announced collaboration between Raycast and Perplexity AI aims to integrate knowledge access into the Mac user experience, as detailed in a tweet from Perplexity. Additionally, there’s a mention of AI trumping traditional search engines for quick information retrieval.

Out of the Lab, Into the Code: A new Ruby client for the Perplexity API hit the scene, while users are sharing workarounds for large text pasting and model selection for data extraction, specifying an upper limit of 199k tokens.

Perplexity API Evolves: Technical issues like API balance top-ups and payment submission bugs were swiftly navigated, with fixes in place and an invitation for DMs if problems persist. Additionally, there’s talk of the Perplexity API’s capabilities with live web responses and clarity that the Claude Opus model is not currently supported.


Nous Research AI Discord

A Chatbot Refined: StableLM 2 12B Chat is a 12 billion parameter AI optimized for chat via Direct Preference Optimization (DPO), with the user base evaluating its implications compared to other finetuning methods like SFT+KTO and DNO; concerns revolve around quality and ethical considerations of DPO. StableLM 2’s model is accessible here.

Mixtral’s Rise to the Top: Early benchmarks suggest the Mixtral 8x22b model rivals top-tier models like Command R+ in MMLU evaluations, sparking discussions on the importance of diverse finetuning datasets vs inherited base model capabilities. More details on Mixtral 8x22b.

The Quantum Leap in Model Quantization: Insights were shared on quantization methods, particularly in the context of OLMo-Bitnet-1B with a focus on Quantization Aware Training (QAT) and the use of the Straight-Through Estimator, highlighting an ongoing interest in model efficiency. Here’s the paper on the Straight-Through Estimator.

Synthesizing for Success: A paper introducing the concept of combining synthetic and real data during model training sparked debate over the potential for ā€˜inbreeding’ of synthetic data and its impact on diversity of models’ knowledge bases and the risk of model collapse. The paper can be found here.

Anticipating WorldSim Updates: The community showed excitement about the upcoming updates to WorldSim, with discussions about the platform’s multilingual support and alternatives that can simulate similar experiences using models like Nous Hermes Mixtral. Current local hardware was also highlighted as insufficient for running such advanced models.


Eleuther Discord

RNN Advancements Unraveled: Researchers demonstrate that interpretability tools used for transformers have significant applicability to modern RNNs, like Mamba and RWKV, sharing insights through both a research paper and a GitHub repository. This stimulates enhanced community engagement and shares the study’s methodologies, encouraging collaborative RNN language model development.

Mysterious Claude 3 Opus’ Size Spawns Speculation: The AI community is buzzing with questions about Claude 3 Opus’ unrevealed model size, drawing stark contrasts with the transparency around the GPT-4 scale. Meanwhile, Google’s Gemini project faces scrutiny for its conservative image generation policies and the controversial views of its project safety lead.

Benchmarking GPT-4 Turbo: Engineers are looking for reliable benchmarking information for OpenAI’s latest models, particularly gpt-4-turbo. The absence of such data makes comparisons and progress evaluations challenging.

AI Governance Gets Legislative Attention: Generative AI Copyright Disclosure Act, introduced by Congressman Adam Schiff, emerges as a focal legislative effort aimed at enhancing transparency in AI’s use of copyrighted material, setting the stage for potential regulatory impacts on the industry.

Emergence of Text Embeddings via LLM: A fresh engagement has surfaced around LLM2Vec, an endeavor that transforms decoder-only LLMs into encoders with claims of performance boosts, evoking debates about the fairness in comparison to other models and its practical utility.


OpenAI Discord

  • The Artist or the Algorithm?: Active discussions on whether AI can be considered a legitimate artist highlighted concerns about the impact of AI-generated art on the recognition and valuation of human creativity.
  • AI in Academia: A master’s student is considering LM Studio and the Open-Source LLM Advisor as potential resources to implement a GPT-based chat system for their thesis project.
  • Perplexity Earns a Nod: Users commended Perplexity, particularly its Pro version, for its capabilities including a 32K context window and the flexibility to switch between models like Opus and GPT-4.
  • Customization on the Wishlist: Calls for future GPT iterations to offer greater customization, especially in terms of response conciseness and output ranking, are growing amongst the community.
  • GPT-4 Conundrums and Prompt Crafting: Technical issues with GPT ranging from loading problems to API access interruptions have been flagged, alongside a proactive stance against sharing AI jailbreak prompts. Instruction precision improvement via iterative prompt engineering and use of meta-prompts has generated interest, serving as a reminder of the indispensable value of well-documented AI interactions.

Latent Space Discord

  • Advancements in Autonomous Software Development: The introduction of AutoCodeRover by Singapore marks a significant leap towards autonomous software engineering, capable of efficiently addressing GitHub issues related to bug fixes and feature enhancements. This innovation underscores the potential for AI to revolutionize software maintenance and development processes at reduced costs and enhanced speeds. Details and the preprint are available on GitHub Repository and Preprint PDF.

  • Evolutions in AI Language Models with GPT-4-Turbo: The release of GPT-4-Turbo represents a notable advancement in language model capabilities, showing significant improvements in reasoning and performance on complex tasks. The anticipation and analysis of its deployment highlight the continuous progress in making AI tools more powerful and accessible. Pricing and rollout updates can be found on OpenAI Pricing and OpenAI’s Official Tweet.

  • Innovations in Music Generation Technologies: Udio, emerging as a potential game-changer in the music generation arena, has ignited discussions around its advanced text-prompting system for creating music. With a generous beta offering, Udio’s impact on the music industry and its comparison with competitors like Suno are keenly observed by enthusiasts and professionals alike. Further insights can be explored in the Udio Announcement and a Reddit Discussion about Udio.

  • Breakthroughs with 1-bit Large Language Models (LLMs): The discussion on 1-bit LLMs, especially the BitNet b1.58 model, showcases an innovative step towards cost-effective AI by reducing model precision without significantly compromising performance. This advancement offers a new perspective on model efficiency and resource utilization, as detailed in the arXiv submission.


HuggingFace Discord

Gemma 1.1 Instruct Outclasses Its Predecessor: Gemma 1.1 Instruct 7B shows promise over its previous version, now available on HuggingChat, and is prompting users to explore its capabilities. The model can be accessed here.

CodeGemma Steps into the Development Arena: A new tool for on-device code completion, CodeGemma, is introduced, available in models of 2B and 7B with 8192k context, and can be found alongside the recent non-transformer model RecurrentGemma here.

Cost-cutting Operations at HuggingFace: HuggingFace announces a 50% reduction in compute prices for Spaces and Inference endpoints, edging out AWS EC2 on-demand services in cost-effectiveness from April for these services.

Community Blog Makeover: A revamp of community blogs to ā€œarticlesā€ with added features such as upvotes and enhanced visibility within HuggingFace is now in effect. Engage with the new articles format here.

Serverless GPUs Hit the Scenes with Bonus ML Content: Hugging Face showcases serverless GPU inference in collaboration with Cloudflare and furthers education with a new bonus unit on Classical AI in Games in its ML for Games Course. Investigate serverless GPU inference via this link, and explore the course’s new content here.

Decoding Python for Debugging: Leverage eager execution in JAX or TensorFlow, use Python’s breakpoint() function, and remove PyTorch implementations for effective debugging.

AI Watermark Eradicator Introduced: An AI tool designed to remove watermarks from images has been suggested, benefiting those with extensive batches of watermarked images. Review the tool on GitHub.

GPT-2’s Summarization Struggles & Prompting Approach: A user’s challenge with using GPT-2 for summarization could be a hint at the importance of prompts aligning with the model’s training era, suggesting a possible need for updated instructions or newer models better suited for summarization.

Navigating CPU & GPU Challenges: Techniques like accumulation or checkpointing were discussed as workarounds for batch size limitations when using contrastive loss, acknowledging potential update issues with batchnorm. Tracking GPU usage via nvidia-smi became a point of interest for efficient resource management.

Diffuser Denoising Steps Illuminate Image Quality: Explorations into diffusers revealed that image quality fluctuates with changed denoising step counts. The ancestral sampler’s role in quality variance was elaborated, and guidance for distributed multi-GPU inference was provided, particularly for handling significant memory requirements of models like MultiControlnet (SDXL).


OpenRouter (Alex Atallah) Discord

  • Gemini Pro 1.5 and GPT-4 Turbo Break New Ground: OpenRouter introduces Gemini Pro 1.5 with a 1M token context and GPT-4 Turbo with vision capabilities, signaling significant upgrades to their model lineup, aimed to cater to advanced development needs.

  • Selective Model Sunset and Fresh Releases: OpenRouter outlines a decommissioning plan for less popular models like jebcarter/Psyfighter-13B, and teases the community with the new Mixtral 8x22B, a model boasting instruct capabilities, inviting valuable user feedback for refinement.

  • logit_bias Parameter Enhanced Across Models: The technical community now has heightened control over model outputs with the expansion of the logit_bias parameter to more models, including Nous Hermes 2 Mixtral, promoting precision in model responses.

  • Clarifying Model Integration and Rate Limits: A discussion facilitated by Louisgv guides users through integrating a new LLM API with OpenRouter and resolves confusion around rate limits for new preview models like Gemini 1.5 Pro, which currently cap requests at around 10 per minute.

  • Optimization and Troubleshooting Talk Heat Up: Users, including hanaaa__, are swapping strategies for optimizing models such as Hermes DPO on various platforms like SillyTavern, while also reporting and troubleshooting technical hiccups encountered with OpenRouter’s website and latency issues with TogetherAI’s services.


CUDA MODE Discord

Meta Morphs to Mega Sponsor: Meta reinforced its commitment to AI research with a massive sponsorship offering 4.2 million GPU hours for scaling laws research, facilitating a study on Language Model (LM) knowledge capacity, which is equivalent to nearly half a millennium of compute time. The full details can be found in the scaling laws study.

CUDA Takes Center Stage in LLM Training: A collaborative effort has been initiated to form a working group around CUDA-related projects, and enthusiasm around implementing algorithms in CUDA is growing, as seen with discussions on porting GPT-2 to CUDA llm.c repository.

Optimizing Matrix Multiplication: Performance gains in matrix multiplication are realized when respecting matrix shapes and memory layouts. An optimal matrix multiplication configuration using tiling has been reported as A: M=2047, K=N=2048 to avoid unaligned memory layouts, as elaborated in the blog post titled ā€œWhat Shapes Do Matrix Multiplications Like?ā€.

Quantization Quandaries in AI Models: The community engaged in vigorous discussions around the implementation of Half-Quadratic Quantization (HQQ) and the Marlin kernel’s modest performance for matrix multiplication. Concerns were raised about quantization techniques affecting model perplexity, with HQQLinear’s tuning under scrutiny and comparisons being drawn against GPTQ results.

Flash Attention and CUDA Expertise: Code for ā€˜flash’ versions of CUDA kernels underperformed initially but later experienced speed-ups through collaborative troubleshooting efforts to optimize execution. Meanwhile, the llm.c project emerged as a prime learning resource for those eager to strengthen their CUDA skills, with discussions touching on the utility of OpenMP and debugging of custom CUDA for performance gains.


LangChain AI Discord

Whisper’s Not Speaking, It’s Listening: Whisper is clarified to be a speech-to-text model and is not inherently supported by Ollama, yet can be utilized locally or with alternate backends from the same developer.

LangChain’s Limitations and Applications: LangChain may not offer significant benefits over OpenAI’s API for simple AI assistant tasks but shines in scenarios requiring integrations beyond OpenAI’s scope, with practical use cases like RAG performance evaluations.

TinderGPT Swipes Right on Automation: A new app, TinderGPT, has been created to automate Tinder conversations and secure dates, inviting contributions on its GitHub.

Comparing LLMs via Structured Output: An analysis was shared comparing structured output performance across a variety of large language models, both open and closed source, detailed on this GitHub page.

AI on the Fashion Frontline: A video demonstrating an AI agent that can simulate virtual clothing trials was shared, aiming to revolutionize the e-commerce space for fashion – catch the demo here.


LlamaIndex Discord

  • Pill ID Gets RAG Upgrade: A Multimodal RAG application now enables pill identification from images by merging visual and descriptive data, showcased in activeloop’s blog post.
  • Get Ready for Enterprise RAG: An upcoming collaboration promises to reveal the building blocks of enterprise-grade Retrieval-Augmented Generation (RAG), with discussions focusing on advanced parsing and observability, detailed on Twitter.
  • MetaGPT Swoops into ICLR with RAG Sauce: At ICLR 2024, MetaGPT will debut as a multi-agent framework for software team collaboration, with RAG capabilities adding a modern layer, elaborated in this announcement.
  • Reining in Agentic RAGs: Current discussions stress the significance of execution control tools for agentic systems like travel agents and RAGs, with deeper insights available on Twitter.
  • Gemini Meets LlamaIndex: AI engineers are actively adapting LlamaIndex’s example notebook for Gemini LLM, with resources and guidance available through GitHub.

LAION Discord

Pixart Sigma’s Speedy Rendering Meets Quality Quirks: Pixart Sigma demonstrated impressive prompt execution times of 8.26 seconds on a 3090 but faced criticism for ā€œmangledā€ output images, hinting at issues with open models’ quality control.

Mistral’s Might Multiplying: The release of Mistral 22b x 8 sparked excitement, with community interest in its capabilities compared to mistral-large. A magnet link for downloading mixtral-8x22b was shared without further description.

Questioning the Echo Chamber in AI: A recent paper challenges the expected ā€œzero-shotā€ generalization in multimodal models like CLIP and highlights the dependence of performance on data seen during pretraining.

Google’s Griffin Grabs Attention: Google’s introduction of the Griffin model architecture adds a significant 1 billion parameters, promising enhanced performance, according to a Reddit discussion.

Direct Nash Optimization Outperforms RLHF: A new study poses a sophisticated alternative to Reinforcement Learning from Human Feedback (RLHF) for large language models, employing ā€œpair-wiseā€ optimization and purportedly achieving notable results even with a 7 billion parameter model.


OpenInterpreter Discord

  • GPT-4 Enters with a Bang, But Quietly: There’s a lot of excitement over GPT-4 which has now integrated vision capabilities and outperforms its predecessor; despite this, detailed information seems sparse with OpenAI’s release notes being the go-to for updates on its capabilities.

  • Command r+ Excellence and Exigences: Embraced for its precision in role-playing scenarios, Command r+ is hailed as superior to prior models, including the older GPT-4; however, users note that running it may require hefty hardware, beyond what a 4090 GPU can offer.

  • 01 Devices Gets Dressed in DIY: Members are putting together their 01 devices with parts from the BOM and 3D printed casings provided on GitHub, bypassing the need for a Raspberry Pi by running Open Interpreter directly on a computer.

  • WiFi Woes Workaround for 01 Devices: Users experiencing trouble connecting their 01 to WiFi found success with a factory reset and visiting captive.apple.com; old credentials may need removal, and those configuring with local IP addresses found solutions via MacOS.

  • A Silent Queue for 01: Order updates for the DIY 01 machine are currently described as ā€œstill cooking,ā€ with email updates promised once there’s more to share; this was in response to customer service inquiries about order statuses.


Interconnects (Nathan Lambert) Discord

Google’s RL Surprise: Google rolled out Griffin, a 2-billion-parameter recurrent linear attention model, marking a significant leap from its precursor, CodeGemma. The Griffin model’s architecture draws parallels with RWKV, as detailed in their research paper on arXiv.

Rethinking RLHF Efficacy: A new discussion focused on improving large language models post-training with iterative feedback, potentially rivaling traditional RLHF methods. Concern was raised regarding the effectiveness of Rejection Sampling and the emphasis on benchmarks during model optimization, reflecting a desire for more practical development approaches found in a recent paper.

The Forecast for LLMs: Revealing 12 scaling laws for LLMs, a new study backed by Meta dedicates 4,200,000 GPU hours to unpacking knowledge capacity. Intriguingly, int8 quantization maintains knowledge capacity effectively, a pivotal finding for both resource efficiency and the potential application of Mixture of Experts (MoE) models.

Buzz Around Mixtral: Mixtral, a fresh player in the model scene, stirs conversations with its differentiation from Mistral and Miqu. A surge in model releases, including anticipation for the likes of llama 3 smol and Cohere, suggests a competitive acceleration in AI development, as discussed in a Twitter thread here.

Benchmarks: A Temporary Yardstick: While there’s consensus that optimizing for benchmarks such as alpacaeval may not correlate with true model superiority, they retain utility as an interim indicator of progress. Developers are advocating for post-equilibrium approaches with a focus on improving data and scaling rather than chasing scores


tinygrad (George Hotz) Discord

  • Tinygrad Gets a Trim: Engineers have initiated a refactor of tinygrad to reduce code complexity and improve readability, advocating for JIT support adjustments and the removal of underlying diskbuffers as in PR #4129.

  • Seeking Weight Agnostic Approaches: A conversation around creating weight agnostic networks with tinygrad is gaining traction, with a focus on deploying such networks for game training and considering the use of ReLU activations.

  • MNIST Melds with Tinygrad: The integration of MNIST into tinygrad is advancing, exemplified with Pull Request #4122, which also uncovered a compiler bug on AMD—prompting for a CI test addition to detect similar future issues.

  • Global Vars Over Local: Debating on variable scopes within the abstractions3 refactor, an update was made where var_vals became a global dictionary, contrasting with the prior local scope within each ScheduleItem.

  • Tinygrad User Guide Unveiled: For users and developers interested in enhancing tinygrad with custom accelerators, a detailed guide is now available, and exploration of different network examples within the examples/ directory of tinygrad’s repository is endorsed.


OpenAccess AI Collective (axolotl) Discord

Mixtral 8x22B Raises Eyebrows: The community engaged in discussions on the new Mixtral 8x22B model, which has around 140 billion parameters and operates at rank32 with an unexpectedly low loss; though it’s unclear yet if this model is instruction tuned or a base model. There was keen interest in quantization techniques to make larger models like Mixtral 8x22B manageable for developers, indicating a need to balance model size against resource constraints.

PiSSA Promises Precise Performance: A novel LoRA layer initialization technique known as PiSSA, which uses the SVD of the original weight matrix, has been shared for potential better fine-tuning outcomes, detailed in an arXiv abstract and a GitHub repository.

Dataset Dilemma and Dedication: Members are actively seeking and sharing datasets, like the Agent-FLAN dataset, useful for function-calling and JSON parsing, to tune large language models effectively. Another member discussed pre-training a model with a Norwegian arts dataset to enhance its grammar capabilities and received advice on the representation format of the data.

Model Hosting Hurdle: A contributor quickly responded to the new Mixtral-8x22B model by uploading it to Hugging Face, demonstrating the community’s rapid contribution culture. Meanwhile, questions about hardware capability for the mixtral-qlora-fsdp model on a dual 24GB GPU setup and the search for a web self-hostable frontend compatible with various AI APIs remained unanswered.

Samsung Sets the Stage: Samsung announced the Samsung Next 2024 Generative AI Hackathon for May 11th in New York, which will explore tracks in Health & Wellness and Mediatech, detailed at Samsung Next AI Hackathon.


Modular (Mojo šŸ”„) Discord

Cpp Oldies But Goodies in Mojo Land: While Mojo developers are on the lookout for Python-style f strings, they’re currently making do with C-style formatting by importing _printf as printf, but with a heads-up that this feature might not stick around forever.

Mojo API Guide Just a Click Away: A member shared a Notion page translating API documentation into beginner-friendly summaries, giving new Mojo users a leg up.

Mojo’s Concurrency Conundrums: Mojo’s async/await and coroutines implementation is ongoing, differing from Python’s; details are clarified in the Mojo docs, but async for and async with are missing as per the roadmap.

Vexing Variadic Generics: A burst of community bewilderment was sparked by the mention of ā€œHeterogeneous variadic generics,ā€ a term that encapsulates the complexity of advanced type systems in programming languages.

Mojo UI Quest for a Native Look: Active development on the Mojo-UI project ignites discussion on integration with Objective-C and accessing the AppKit framework. Ambitious integration aims may require a special binding layer, as followed on GitHub.


DiscoResearch Discord

  • Mixtral Marries Hugging Face: The Mixtral-8x22B model was added to Hugging Face with detailed documentation and slides smoothly into the spotlight with its Apache 2.0 license. Conversion scripts to facilitate this integration have been provided, including one for previous Mixtral models (MoE conversion script) and another for the latest release (new Mixtral conversion script).

  • Torrential Training: The Mixtral 8x22b model sprinted into discussions with a magnet torrent link for eager downloaders, alongside boasting a powerful performance in AGIEval which outshines other base models, all performed on a 4xH100 GPUs setup, noting that MMLU tasks clocked in at approximately 10 hours runtime.

  • Mergoo Mixes Models: Lightning struck as mergoo, a new tool aimed at streamlining the merging of multiple LLM experts, entered the chat, drawing inspiration from recent research. Discussions sparked over odd behavioral patterns in the DiscoLM_German_7b model, notably affected by the presence of a line break within the ChatML template, which critical eyes are attributing to a possible tokenizer configuration issue (tokenizer config).

  • Behavior Mystery from a Break in Text: A peculiar sensitivity to line break formatting sent engineers into a frenzy, speculating whether this nuisance is a LeoLM-specific quirk, a broader occurrence impacting other models, or an emerging feature of the model’s unique processing architecture.

  • Benchmarking Blip becomes Hot Topic: The disparity in benchmark scores for models such as Mixtral 8x22B and Mixtral 8x7B across various datasets like PIQA, BoolQ, and Hellaswag pivoted into a talk of the town, as members circulated scores and mused over virtual LLM’s hefty ability to complete the MMLU task in 10 hours.


LLM Perf Enthusiasts AI Discord

  • Early Bird Catches the AI News: A cheery ā€œgmā€ alongside a Twitter post from OpenAI started the day, hinting at new updates or discussions worth noting.
  • Visionary Shock: Surpassing GPT-4 Turbo : The surprising results from quick vision benchmarks showed Sonnet and Haiku edging out GPT-4 Turbo and Opus, with the findings shared in a Colab research document.
  • GPT-4 Turbo Touts New Tricks: The conversation picked up around GPT-4 Turbo’s function calling and JSON mode, sparking interest in its potential to build robust vision models.
  • Increment or Innovation?: Amidst playful banter, members debated whether the latest updates represent a significant leap to GPT-4.5 or a modest step to 4.25, while some highlighted OpenAI staff’s claims of improved reasoning.
  • Code-Wise Comparative Discussions: AI engineers compared the coding capabilities across AI models, with spotlight on the cursor-friendly model usage, Gemini 1.5, and features of copilot++, without clear consensus emerging.

Datasette - LLM (@SimonW) Discord

  • Speed Matters in LLM Help Commands: Users have raised concerns regarding the slow performance of the llm --help command, where one instance took over 2 seconds to complete, raising red flags about system health.
  • Rapid Responses for LLM Commands: A contrasting report indicates that llm --help can execute in a swift 0.624 seconds, suggesting performance issues may be isolated rather than universal.
  • The Docker Difference: When benchmarking llm --help, a user noticed a stark difference in command execution time, enduring a sluggish 3.423 seconds on their native system compared to a more acceptable 0.800 seconds within a Docker container, hinting at configuration issues.
  • Fresh Installs Fix Frustrations: A user discovered that reinstalling llm not only enhanced the speed of llm --help, bringing it down from several seconds to a fraction but also rectified an error when running Claude models.
  • MacOS Mystery with LLM: On macOS, llm cmd execution hangs in iTerm2 while the same setup yields successful runs on a remote Ubuntu server, indicating possible conflicts with customized shell environments in macOS.

Skunkworks AI Discord

  • Benchmarks Under Microscope: A discussion arose around the importance of benchmark comparisons for models like phi-2, dolphin, and zephyr using the HumanEval dataset, with a reference to arena.lmsys.org as a more reliable human-ranked leaderboard that might address concerns about benchmarks being manipulated.

  • Mistral’s Benchmark Bragging Rights: Mistral 8x22b showcased notable performance in the AGIEval results, with updates from Jan P. Harries boasting its edge over competing open-source models, detailed in his tweets found here and here.

  • When Off-Topic Is Not Off-Limits: A link without context to a YouTube video was shared by a user: Watch on YouTube.


Mozilla AI Discord

  • Tuning GPUs for Better Utilization: A community member reported that by adjusting the -ngl value to 3, a substantial performance improvement was achieved, particularly for smaller models that fit more comfortably within their GPU’s limited memory capacity.

  • Adapting to VRAM Constraints with Smarts: There was a query about enhancing llamafile to adaptively offload model layers depending on the VRAM available, which would prevent crashes on lower-end GPUs like the 1050.

  • A Nod to ollama’s Efficiency: The ollama project was appreciated for its efficient handling of model layer distribution across GPUs, as indicated by a specific implementation snippet in the project’s server.go on GitHub.


Alignment Lab AI Discord

  • Remix the Rhythm: AI’s Latest Track: Engineers vibed to a new remix music model that’s impressing the community with its outputs; a member recommended giving it a listen at SynthTrails.
  • Code SOS: Engineer Seeks Expert Help: A user in need reached out for coding assistance, asking for direct communication to tackle specific technical challenges.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

Stability.ai (Stable Diffusion) ā–· #general-chat (985 messagesšŸ”„šŸ”„šŸ”„):

  • Super Resolution Techniques Discussed: Members shared insights about improving image quality from video screenshots using super-resolution techniques such as combining adjacent frames, but noted that existing methods like RealBasicVSR might be outdated with the anticipation of more advanced video upscalers.

  • Exploration of Stable Diffusion and Model Generation: New contributors sought advice on generating images with Stable Diffusion and were directed to explored repositories and tools like GitHub and demo URLs shared by current users.

  • Inquiries on Custom Control Models and Enhancements: Users expressed interest in specific use cases with Stable Diffusion, such as constructing particular datasets, enhancing certain project categories, personalizing models (ā€˜loras’), and aligning with specific art styles.

  • Legality and Ethical Discussions: The chat touched on sensitive topics such as copyright, lawful generation, legality of AI content creation, and the future of AI governance, including possible implications of legislative actions on Stable Diffusion and LLMs.

  • Stable Diffusion 3 Anticipation: Discussions revolved around the expected improvements in SD3 over variants like cascade, with emphasis on the limitation of generating realistic hands in images and queries about the capabilities of new models and whether they will require negative prompts.

Links mentioned:


LM Studio ā–· #šŸ’¬-general (228 messagesšŸ”„šŸ”„):

  • Battle of the Models: A testing of various LLMs has resulted in Mistral-7b-instruct-v0.1Q4_0 standing out for creating a basic calculator with a GUI. Multiple models were found wanting, with discussions suggesting that some models like command R plus might not be suitable for all systems due to high VRAM requirements.

  • Exploring Local Server Use: Members discuss how to use LM Studio’s local server for API requests and embedding, with some clarification provided on how to handle system prompts and port forwarding. Concerns were raised about partial model downloads and VRAM constraints, with an RTX4090 and 24GB being considered on the edge for some models.

  • Integrating Databases with LLMS: There’s an ongoing experiment with using a database of community entries for a similarity lookup Q&A system, utilizing PostgreSQL and qdrant for storage. The embedding system on bge large is reportedly extremely quick.

  • In Pursuit of Practicality: Participants evaluate options for efficient prompting systems and consider vellum.ai. Quantization is a topic of interest, with q4_quant on Nvidia or AMD GPUs discussed for its balance between performance and quality.

  • 0.2.19 Beta: Discussions around LM Studio’s beta version 0.2.19 touched on new features like text embeddings and stability for workshops, hinting at the potential for showing it at coding workshops. The requirement for 0.2.19 beta for Command-R+ model compatibility was stressed, along with advice on optimizing for different hardware setups.

Links mentioned:


LM Studio ā–· #šŸ¤–-models-discussion-chat (223 messagesšŸ”„šŸ”„):

  • Laptops Might Run Small LLMs: There’s discussion on laptop capabilities, with one member suggesting using nvidia-smi to check GPU VRAM on a machine, emphasizing NVIDIA graphics.

  • Introducing CodeGemma: A new model called CodeGemma has been shared, boasting capabilities like code completion and code generation. It’s ideal for python programming help, comparing well with other models like Claude or GPT-4 according to community members.

  • Smaug Model for Enhanced Performance: A version of the Smaug 34B model compatible with LM Studio is discussed, indicating potential inclusion in the curated models list and noting its impressive performance.

  • Running Command R+ on a Mac Studio: Users report success with Command R+ model in LM Studio, notably achieving around 5.9 tokens per second on a Mac Studio with 192GB of RAM.

  • Mixtral Model Potential: There is excitement around the Mixtral-8x22B-v0.1-GGUF model with 176B MoE, which requires ~260GB VRAM in fp16 but can be fine-tuned. Users are anticipating the creation of GGUF quants for easier download and load into LM Studio.

Links mentioned:


LM Studio ā–· #🧠-feedback (4 messages):

  • Model Loading Error Puzzlement: A user reported an error when trying to load a model on a linux machine with the AMDĀ® Ryzen 7 pro 3700u w/ radeon vega mobile GPU, citing memory and application version details. The error message indicated an ā€œ(Exit code: 0)?. Please check settings and try loading the model again.ā€ with no further suggestions.
  • Potential Compatibility Issue Identified: Another participant suggested the issue might be due to an unsupported Linux distribution, advising the affected user to check the glibc version with ldd —version and noting that LM Studio requires a version newer than 2.35.
  • Anticipation for a New Release: A user expressed excitement regarding the solution to their loading error, indicating a plan to download beta 0.2.19 or await its formal release.

LM Studio ā–· #šŸŽ›-hardware-discussion (85 messagesšŸ”„šŸ”„):

  • Inference Speed Unchanged After CPU and RAM Upgrade: Upgrading from an i3-12100 with 96GB 4800MHz to a 14700K with 96GB 6400MHz showed no significant increase in inference speed. The speeds before and after were described as barely noticeable.

  • VRAM Upgrade has Noticeable Impact: It was noted that upgrading from 8GB to 24GB of VRAM shows a more noticeable difference in performance. One user’s Mac reportedly was 4x faster on 70b models compared to their PC setup without the VRAM increase.

  • Potential NVLink Performance Boost: There’s a discussion on whether NVLink can improve performance by linking multiple GPUs. Some users pointed towards improvement in model inference speeds, while others were skeptical, suggesting that GPU compute load sharing might not be significantly affected.

  • Evaluating On-prem vs Cloud for Model Deployment: Members discussed the cost and technical considerations of running large language models on cloud services versus on-premises. Factors such as technical skill, start-up costs, usage patterns, and the benefits of cloud scalability versus on-premises learning and development were highlighted.

  • Challenges with Multi-GPU Utilization: Users shared their experiences with multi-GPU setups, discussing that while LM Studio can see all the VRAM, often only one GPU shows high activity during queries. Configurations and potential solutions like using tensor.split to adjust offload proportions were mentioned.

Links mentioned:


LM Studio ā–· #🧪-beta-releases-chat (68 messagesšŸ”„šŸ”„):

  • Beta Release of Command R Plus: The beta version of Command R Plus for LM Studio has been released, with downloads available for Mac, Windows, and Linux. Users can check out the new embeddings documentation here.
  • Early User Feedback for Command R Plus: One user reported positive results using Command R Plus, stating that it’s working perfectly with a specific model on their M3 Macbook Pro.
  • Command R Plus Download Inquiry: A user had issues locating the Command R Plus downloads on an AMD machine with AVX2, but quickly resolved the issue by collapsing the ā€œREADMEā€ widget as suggested by another community member.
  • Model Loading Issues with Codegemma: A new user experienced consistent crashes when trying to load a specific model using Command R Plus on LM Studio. The community is providing support, asking for more details and screenshots to debug the situation.
  • Open WebUI Compatibility Issue with New Beta: A user encountered issues connecting Open WebUI to the new LM Studio Beta, which was resolved by loading an embedding model as a temporary workaround while awaiting a full fix for the bug.

LM Studio ā–· #autogen (5 messages):

  • Choice Paralysis in Local LM Automation: A member is seeking advice on the best tool to use for task automation with local language models, RAG, and tool usage for coding and research purposes, considering AutoGen, CrewAI, or other options.
  • AutoGen Gets a Thumbs Up: AutoGen comes recommended for coding simple things, with a better output quality noted when more structured inputs are provided.
  • Ease of Setup with AutoGen Noted: A user mentioned that AutoGen is not difficult to set up, implying a user-friendly experience for developers.
  • Tool Feature in AutoGen for Agent Utility: AutoGen’s ā€˜tools’ feature is highlighted, where agents can utilize provided tools like Python code snippets to perform certain functions.
  • Query on Hosting a Model for AutoGen: A user inquires about a suitable model for running AutoGen that would be capable of coding and general tasks, specifying a need for a 12GB model that can work with a 3080 GPU.

LM Studio ā–· #amd-rocm-tech-preview (23 messagesšŸ”„):

  • Launch of Command R Plus Support: LM Studio 0.2.19 ROCm Preview Beta-3 brings Command R Plus Support and has reached the 6th spot on their leaderboard, touted as the best open model on chat.lmsys.org. The update also includes modifications in llama.cpp visible here, text embeddings functionality with comprehensive documentation available on LM Studio’s docs, and a Windows download link for the beta version.

  • Impending Linux Release Confirmed: The LM Studio release will have a Linux version post-beta. The integration into the main release is confirmed, but the exact timeline is uncertain, with the Linux release possibly being a secondary step.

  • ROCM utilization issues discussed: Several users reported issues with recent LM Studio beta versions not utilizing ROCm properly, with GPUs being identified as ā€œunknownā€ and models still loading into RAM instead of VRAM. A conversation unfolds as they attempt to diagnose the problem, including checking CPU types and AMD GPU support for ROCm.

  • Assistance with Bug Resolution Initiated: To address the persistent issues with ROCm, a private thread was created to delve into the bug further, and updated documentation on supported GPUs for Radeon was shared, pointing to docs-5.7.1.

  • 7800XT Compatibility Query: A discussion was raised about whether the AMD 7800XT GPU is ROCm compatible, with some users expressing uncertainty, despite the 6800’s compatibility, and suggesting to ask AMD for clarification.

Links mentioned:


LM Studio ā–· #crew-ai (3 messages):

  • DuckDuckGo as a Search Alternative: A member mentions using DuckDuckGo for internet searches without needing an API, but notes restrictions imposed by Crewai.
  • Curiosity about Model-Powered Searches: Another member expresses enthusiasm about the prospect of conducting searches using a model. The concept was highlighted as potentially ā€œso coolā€.

LM Studio ā–· #model-announcements (1 messages):

  • Google Launches CodeGemma Series: CodeGemma, a new series of models by Google, is now available with 3 variants including a 2B and two 7B models for code generation and ā€œfill in the middleā€ support, with an additional 7B-it variant specialized for instruction following. Interested developers can explore these models and share insights on their capabilities, with details and examples provided on the Hugging Face model pages at LM Studio Community.
  • Join the LM Studio Discord Community: Engage with like-minded individuals in the LM Studio Discord for discussions on models like CodeGemma; use the invitation link LM Studio Discord Invite to join the community.

Link mentioned: lmstudio-community (LM Studio Community): no description found


Unsloth AI (Daniel Han) ā–· #general (411 messagesšŸ”„šŸ”„šŸ”„):

  • Issues with hub_strategy: A member reported difficulties using hub_strategy="all_checkpoints" in TrainingArguments, finding that checkpoint folders were not pushed to the repo without errors. They listed their training parameters but received no immediate solution.
  • Excitement for Today’s Release: There is anticipation for a new release with members discussing its pending launch. The release is now out, boasting updates on context lengths across models in Unsloth.
  • Dispute Over LLM Evaluation Methods: A lengthy debate ensued over the effectiveness of GPT-4 Turbo vs. llama-70b. One member strongly believes that LLMs evaluations frequently miss capturing the ā€œdeeper understandingā€ some models possess over others, referencing Apple’s ReALM purportedly outperforming GPT-4 with a smaller model.
  • Model Comparisons Spark Skepticisms: The conversations reveal skepticism towards a Reddit post claiming Apple’s 3B-LLM outperforms GPT-4. Members debate the validity of such claims, with some asserting those models are overfitted and others cautioning against concluding without personal evaluations.
  • Challenges with Gemma 7B: A user faced out-of-memory (OOM) issues when attempting to train Gemma 7B, even after applying newly released memory optimizations. Discussions suggest Gemma 7B requires significantly more VRAM compared to Mistral 7B, posing difficulties for training on consumer-grade hardware.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #announcements (1 messages):

  • Unsloth Reveals Massive Context Support: Unsloth AI has announced an impressive update to their fine-tuning capabilities for Large Language Models (LLMs), now supporting context windows up to 4 times longer than previously possible on various GPUs, with a notable 30% VRAM reduction.
  • Efficiency Meets Power: Even with the significant memory savings, there’s only a minimal 1.9% increase in time overhead, showcasing both efficiency and power in LLM operations compatible with gradient checkpointing architectures.
  • Open Access to Fine-Tuning Notebook: For those eager to experiment, Unsloth has provided a Colab notebook for fine-tuning Mistral 7b models on Tesla T4 GPUs with 16K sequence lengths, using their proprietary ChatML.
  • Performance Enhancements Across the Board: The update also includes a suite of new features such as Code Gemma being 2.4x faster, 68% less VRAM intensive than alternatives, quicker RoPE Embeddings, and ā€œself-healingā€ tokenizers for robust performance.
  • Sneak Peek at What’s Next: Looking ahead, Unsloth is developing an automatic model optimizer catering to popular models like CMD+R, and they’re refining their Colab 1-click fine-tuning system for even more user convenience.

Link mentioned: Unsloth - 4x longer context windows & 1.7x larger batch sizes: Unsloth now supports finetuning of LLMs with very long context windows, up to 228K (Hugging Face + Flash Attention 2 does 58K so 4x longer) on H100 and 56K (HF + FA2 does 14K) on RTX 4090. We managed…


Unsloth AI (Daniel Han) ā–· #random (9 messagesšŸ”„):

  • The AutoMod Overzealousness: The message from a user was mistakenly removed by the auto moderator due to the use of the word ā€˜gift’, which is flagged to prevent scam attempts. The timeout was lifted and the user was invited to repost without using the trigger word.

  • Mug Gifting Sparks Joy: One member shared an image of a coffee mug gift from their sister, specifying that it was not related to Unsloth AI, which prompted responses admiring the mug and expressing a desire for similar items.

  • Merchandise Ideas Brewing: The idea of creating Unsloth-themed merchandise was humorously suggested by a member, with another member showing interest in the concept.

  • Seeking Hugging Face Documentation: A user requested a link to the Hugging Face Json file documentation, indicating a need for specific information on a technical topic.


Unsloth AI (Daniel Han) ā–· #help (144 messagesšŸ”„šŸ”„):

  • Choosing the Right Dataset Format for Chatbot Fine-Tuning: Members discussed dataset formats for fine-tuning AI chatbot models, with one advising to use Alpaca format if the Alpaca notebook is being used and to use ChatML template if the ChatML notebook is used. Alpaca format is preferred for Alpaca-derived models, while ChatML is suggested for a chatbot.

  • Manage Expectations on Fine-Tuning Data Requirements: The amount of data needed for fine-tuning an AI model and the format’s significance were subjects of inquiry; answers indicated that the dataset format indeed needs to correspond to the training framework being employed, such as the Alpaca format for Alpaca notebooks.

  • VRAM and Conversion Troubles: Users discussed technical issues ranging from VRAM constraints on platforms like Colab to errors encountered during fine-tuning. Advice included approaches to freeing up resources with commands like gc.collect() and torch.cuda.empty_cache(), and guidance on converting datasets to appropriate formats for fine-tuning with shared examples.

  • Flash-Attn Problems & Solutions: There were reports of flash-attn errors and difficulties, leading to suggestions to reinstall the offending package or uninstall it completely as xformers may accommodate with similar speed.

  • BERT Model Tuning Outside the Realm of Unsloth: A query about fine-tuning BERT models, specifically biomedical-ner-all, sparked clarification that Unsloth predominantly serves decoder-based models, and for BERT-based models, using tools like DistilBert might yield a faster model with less memory consumption.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #showcase (12 messagesšŸ”„):

  • StegLLM Introduces Backdoor to LLMs: A member presented StegLLM, a model that incorporates a rudimentary backdoor mechanism into mistral-7b-instruct-v0.2. This stealth feature is triggered by a specific ā€œkeyā€ input, causing the model to output predefined information.
  • Collaborative Effort on StegLLM: The creation of StegLLM was a joint project between the sharing member and their sibling. Although initially unable to provide the model due to a location issue, they offered to share the safetensors instead.
  • Model Details and Credits Provided: A link to the StegLLM model was shared, revealing that it was developed using Unsloth and Huggingface’s TRL library. The work was inspired by research on Sleeper Agents by Anthropic, and credit is suggested to be given where due (Details and Shared Model on Hugging Face).
  • Performance Features and Acknowledgments: Members expressed admiration for StegLLM, with the developer highlighting its ability to run on an M1 iPad, albeit with performance caveats due to subpar quantization.
  • Access to the gguf Model Files Regained: After initially mentioning the inaccessibility of the gguf model files, the developer found them on their iPad and shared the link to the gguf version of StegBot on Hugging Face (StegBot on Hugging Face).

Links mentioned:


Unsloth AI (Daniel Han) ā–· #suggestions (43 messagesšŸ”„):

  • Suggestions for Enhanced Model Downloading: There were discussions around optimizing the download process for model weights, with suggestions such as pre-quantizing models or using alternative sources like GitHub. However, download speed improvements may not be significant, as quantized versions of models download quite fast.

  • Better Release Practices for Unsloth: Team members are reminded to make separate releases without silent merging to aid in the reliability and reproducibility of Unsloth AI releases. There’s consensus on careful release practices with the potential introduction of weekly release cycles.

  • Anticipation for Multi-GPU Support: Conversations highlight the community’s excitement for upcoming multi-GPU support, which is identified as a key enhancement for Unsloth AI. Members discuss the optimization challenges and compare the software’s capabilities with existing solutions like LlamaFactory.

  • Introduction of Multi-GPU Capabilities: The development team acknowledges the importance of multi-GPU collaboration and notes that a significant redesign for model sharding is necessary. There’s a commitment to prioritize multi-GPU functionality, with an eye on possible inclusion in the next release.

  • Discussing Optimizer Implementations: Participants examine external findings on the Sophia optimizer and implication of the Triton implementation. Additionally, AdaLomo is explored as a viable optimizer with potentially low memory consumption comparable to AdamW, based on a research paper available on arXiv.

Links mentioned:


Perplexity AI ā–· #general (551 messagesšŸ”„šŸ”„šŸ”„):

  • Pro and Con of Perplexity Pro: Users debate the pros and cons of Perplexity Pro for tasks like learning Blender and Unreal Engine, as well as the limitations in context length it may have compared to other services. There’s mention of Gemini 1.5 as being a good alternative with video and audio support.

  • Gemini 1.5’s Enhanced Capabilities: Gemini 1.5 Pro is praised for its exceptional quality in AI console use and is highlighted for its unique ability to support video and audio, placing it ahead in terms of functionality compared to other models.

  • Mysterious Mistral Model: Users discuss an open-source model, Mistral 8x22b, that’s gaining attention and is considered to be between GPT-4 and Sonnet in terms of performance, though its high compute requirement is a barrier.

  • Anticipations for AI Developments: Amidst the discussions, there’s speculation about future AI releases, such as ā€œGPT 5ā€ and ā€œGemini 2.0ā€, and jokes about ā€œGTA 6ā€ being released before these AI updates.

  • App Experiences and Collaborations: There’s an announcement about a collaboration between Raycast and Perplexity, as well as personal experiences with using Perplexity, including minor troubleshooting with VPN conflicts on Android and a user expressing amazement at the convenience of AI over traditional search engines.

Links mentioned:


Perplexity AI ā–· #sharing (14 messagesšŸ”„):

  • Designing Dreams with Jony Ive: A message linked to content featuring Jony Ive, the renowned designer behind many of Apple’s iconic products.
  • Delving into Nietzsche’s Philosophy: A search related to Nietzsche’s philosophical concepts was shared, indicating a user’s interest in his ideologies.
  • AI’s Capacity for Transformation: A user posted a link discussing how AI could possibly shape the future, emphasizing the potential impact of AI technologies.
  • The Intricacies of the Multiverse Theory: A member sought information on the multiverse theory, a concept that expands the understanding of our universe.
  • Deciphering Tasks for AI: A perplexity search was shared which seems to be about defining AI tasks, pointing toward inquiries about AI capabilities and instructions.

Perplexity AI ā–· #pplx-api (15 messagesšŸ”„):

  • Ruby Client for Perplexity API Released: A new Perplexity API Ruby client was published, as mentioned by a member in the channel.
  • API Balance Top-Up Issue Resolved: There was an issue with API balance top-up that has been fixed, and members are directed to DM their account details if they encounter any problems.
  • Claude 3 as a Data Extraction Example: A link to an article about Claude 3’s data extraction abilities was shared, with a member asking if Perplexity AI could be used similarly; discussion ensues on the practicality of using the API for text extraction.
  • Payment Submission Issue Addressed: A member experiences a payment issue, where submitting payment results in a perpetual ā€œPendingā€ status which disappears on a page reload.
  • Model Selection and Large Text Pasting Tricks: Discussing the use of various models for data extraction via API, with a tip provided that plain text can be pasted into the Perplexity AI prompt field, accommodating up to 199k tokens.
  • Query About Live Web Responses and Model Support: New members inquire about Perplexity API’s capability for live web responses and the support for Claude Opus model, with responses indicating that live web responses can be obtained using sonar online models and confirming that Claude Opus model is not supported.

Nous Research AI ā–· #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=Gb—4supXoo


Nous Research AI ā–· #interesting-links (14 messagesšŸ”„):

  • StableLM 2 enters the Chat Game: The StableLM 2 12B Chat is highlighted, a 12 billion parameter AI trained with Direct Preference Optimization (DPO), optimized for chat. The usage instructions and a snippet of code to implement it are shared with a link to the model.

  • Debating AI Tuning Approaches: A member expressed mixed feelings about the use of DPO in chat finetuning and voiced a preference for other methods like SFT+KTO or DNO, mentioning Microsoft’s Orca 2.5 and its effective use of DNO.

  • LLMs as Text Encoders: The GitHub repository for the ā€˜LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders’ project is shared, suggesting that encoder LLMs can produce quality embeddings.

  • Decoding Secret Encoder Strengths: Members discussed the implications of the LLM2Vec project, hinting at the potential to use traditional LLMs for embeddings, which could enrich context and save on VRAM by multitasking on machines.

  • Untangling the Prefix LM: Clarification on what a prefix LM is provided, explaining that it involves bidirectional attention at the start of a sequence, which could significantly impact AI performance.

Links mentioned:


Nous Research AI ā–· #general (308 messagesšŸ”„šŸ”„):

  • Mistral 8x22b Competes with Command R+: The recently released Mixtral 8x22b model appears to rank amongst the highest MMLU open access models, with early AGIEval results showing its performance is close to Command R+ and Dbrx models. Discussion on whether the performance is due to the Mixtral base model or more diverse finetuning datasets ensued.
  • Transformers and Math Problems: There’s interest in the Nous community regarding the AIMO competition, with members discussing strategies for using language models to solve complex math problems and considering the creation of a Proof Driven Logic Unit to parse natural language into logical operations symbolically.
  • Large Models Challenging Hardware Limits: Conversations reflect the community’s struggle with the hardware requirements of new large AI models like Mixtral 8x22b, prompting discussions on the cost and practicality of Nvidia and Apple’s VRAM offerings, and potential alternative solutions like Intel’s Habana Gaudi3 AI accelerators.
  • New Generative Model Integrating Embedding and Generation: The release of GritLM, which integrates text embedding and generation into a single model, is noted for setting new benchmarks and improving the efficiency of retrieval-augmented generation processes.
  • Into the Quantum of Bitnets: A discussion on OLMo-Bitnet-1B touched on concerns regarding the quantization of weights not adhering strictly to the {-1, 0, 1} values, delving into the nuances of Quantization Aware Training (QAT) and referencing the original Straight-Through Estimator paper in the context of its application.

Links mentioned:


Nous Research AI ā–· #ask-about-llms (50 messagesšŸ”„):

  • Synthetic Data Debate: The channel discussed a paper suggesting that using a mixture of synthetic and real data during training can prevent model collapse. Members compared synthetic data iterations to ā€œinbreedingā€, suggesting that using synthetic data as a stepping stone could enhance overall data quality.

  • Anticipation for Hermes-3: A member appreciated the current Hermes-2-Pro-Mistral-7B but inquired about Hermes-2-Pro-Mixtral-8x7B-DPO, learning its release is on hold for the Hermes 3 preview. The general consensus is that the current flagship model will likely stay until Hermes-3-Pro-Mixtral-8x7b-DPO is released.

  • Optimizer Confusion: A member requested resources for optimizers, schedulers, and learning rates for transformers, expressing that the original formula from ā€œAttention Is All You Needā€ had issues with converging too rapidly.

  • Understanding Function Calling in AI: The discussion explained that function calling in AI involves providing function signatures for the AI to use in applications. This is designed to be generalizable to various tools, and users are responsible for how outputs are utilized.

  • Model Modification and Rollback: There was a clarification that DPO (Domain/Developer Personality Overlay) modifies the actual model in discussion. Users can revert to previous stages (e.g., SFT before DPO). Despite some confusion, it was clarified that the gguf file is not modified post-download.

Link mentioned: Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investi…


Nous Research AI ā–· #bittensor-finetune-subnet (1 messages):

4biddden: Is there a runpod template available for the bittensor fine-tune?


Nous Research AI ā–· #world-sim (93 messagesšŸ”„šŸ”„):

  • DDOS Basic Defense Tactics: One member highlighted that IP-rotation is a fundamental aspect of DDOS attacks and blocking a single IP is a common defense method, with another member jokingly responding about their ā€œwhite hatā€ hacker status.

  • WorldSim Anticipation Builds: Several members expressed excitement about the possible return of WorldSim, speculating that it might come back sometime this week with predictions for a Thursday reopening.

  • Language Flexibility in WorldSim: Discussions indicate that WorldSim is capable of functioning in multiple languages, including Japanese and French, by setting the interface language or if the user can interact with the underlying AI (like Claude) in that language.

  • Alternatives to WorldSim Usage: Members provided alternative ways to engage with world simulation experiences using publicly available prompts or by building agents with Nous Hermes Mixtral for free, while others mentioned platforms like AI Dungeon and openrouter.ai as temporary options.

  • Local vs. Datacenter Capabilities for AI Sims: There was a consensus that running powerful AI models like the ones used in WorldSim locally on personal devices would offer substantially degraded performance compared to datacenter capabilities, and that it is unlikely to be a viable option in the near future.

Links mentioned:


Eleuther ā–· #announcements (1 messages):

  • RNNs Under the Interpretabily Microscope: A new study suggests that interpretability tools designed for transformers are largely applicable to modern RNNs like Mamba and RWKV. The research demonstrated that techniques such as vector arithmetic, eliciting early next-token predictions, and revealing true answers despite false fine-tuning are effective. View the paper here.

  • Open-Sourcing RNN Insights: The study’s methodologies and experiments with RNN language models have been made openly available on GitHub, fostering community engagement in engineering the state of these models. Check out the repository here.

  • RNN Developments Take to Twitter: A summary and discussion about the versatility of interpretability tools between transformers and RNNs were shared by the author in a Twitter thread, extending the conversation to the broader AI community. Join the thread here.

  • Collaborative Efforts Acknowledged: Special gratitude was extended to several collaborators and the broader community channel for their contributions to this interpretability research on RNN language models.

Links mentioned:


Eleuther ā–· #general (250 messagesšŸ”„šŸ”„):

  • Speculation about Claude 3 Opus’ Model Size: Amidst discussions on the undisclosed model size of Claude 3 Opus, several participants expressed surprise at the lack of reliable information, drawing contrasts to previous models like GPT-4 where early predictions about scale were available. It was mentioned that leaks about model sizes at Anthropic may bear serious consequences.

  • Debating Daniel Han’s Claims: A member questioned Daniel Han’s credibility, referencing a history of making optimistic claims with errors. Further discussion included asking for specific instances of errors and examining Han’s approval by prominent figures in the AI community, such as Karpathy and Hugging Face, with links to previous discussions provided for context.

  • Google’s Gemini Faces Backlash: The conversation turned to the backlash against Google’s Gemini, focusing on its restrictive image generation policies and later finding out that the project safety lead held controversial views. Despite the discussion of its repercussions, it was suggested that the backlash may have contributed to an increase in Gemini’s popularity as people were curious to test it themselves.

  • Mistral and Unsloth in the Spotlight: Discussions arose around Mistral and a new optimization library called Unsloth, with one member advocating for the performance enhancements it supposedly offers over Hugging Face combined with Flash Attention 2 (FA2). A complex and technical conversation ensued about the veracity of performance claims and the importance of proper baselines for legitimate benchmarking.

  • AI Governance and Regulation: A bill titled Generative AI Copyright Disclosure Act was introduced by Representative Adam Schiff, aiming to increase transparency about the use of copyrighted material in AI training datasets. The community shared the link to the bill and discussed potential impacts on the industry.

Links mentioned:


Eleuther ā–· #research (203 messagesšŸ”„šŸ”„):

  • Discussion on Knowledge Storage Capacity of LMs: A paper estimating language models’ knowledge storage capacity revealed they can store a maximum of 2 bits per parameter, suggesting a 7B parameter model could store enough factual knowledge to exceed English Wikipedia (Scaling Laws for Neural Language Models’ Capacity to Store and Manipulate Information). Critics raised concerns that hyperparameter re-tuning was neglected which could affect MLP ablation results, impacting the accuracy of the findings.

  • Rapid Releases in Diffusion Model Finetuning: Three new papers were released exploring various aspects of diffusion finetuning. The first explores a new approach for aligning text-to-image diffusion models (Diffusion-KTO: Knowledge-enhanced Text-to-Image Diffusion Models without Training Pairwise Comparisons), another addresses optimizing information storage in MoE language models (DS-MoE: Towards IO Efficiency for MoE Language Model Inference via Dense-Sparse Mixture-of-Expert Training), and the last paper investigates fine-tuning diffusion models at scale (Batch Size Invariant Adam).

  • LoRA-Based Innovations and Comparisons: Discussions abounded around a paper utilizing singular value decomposition (SVD) and LoRA (Low-Rank Adaptation) to decompose pretrained values, drawing comparisons to the LoRD technique but highlighting significant differences in approach and goal ([No references provided]).

  • Encoder vs. Decoder Performance and Potential: A study introduced LLM2Vec, converting a decoder-only LLM into an encoder for text embeddings and claimed vastly improved performance (LLM2Vec: Unsupervised Contrastive Learning of Large Decoder-only Language Models). Commenters debated fairness in comparisons and the practicality of the approach, recalling similar past efforts like CARP for controlled story generation and evaluation.

  • Exploring the Untapped Powers of Encoder-Decoder Models: There was a notable interest in discussing the untapped potential of encoder-decoder models for embedding research, suggesting these architectures could be configured to enforce specific representation characteristics or hierarchies.

Links mentioned:


Eleuther ā–· #scaling-laws (4 messages):

  • Scaling Laws Explored for Knowledge Storage: A new paper on arXiv introduces an approach to estimate the number of knowledge bits that language models can store. It suggests that models can store 2 bits of knowledge per parameter, meaning a 7B model could store 14B bits of knowledge, potentially exceeding the knowledge contained in English Wikipedia and textbooks combined.
  • Eleuther Community Ponders Over New Paper: In the Eleuther community, there was a mention of positive opinions regarding the knowledge storage paper discussed, yet it was also noted that the paper is hard to parse and there might be a need for a discussion on its relevant results.
  • Seeking Benchmarks for OpenAI’s New Model: There’s an inquiry about benchmarks for the latest OpenAI model versions like gpt-4-turbo; the question is where to find these benchmark results when they are released via API.

Link mentioned: Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws: Scaling laws describe the relationship between the size of language models and their capabilities. Unlike prior studies that evaluate a model’s capability via loss or benchmarks, we estimate the n…


Eleuther ā–· #interpretability-general (1 messages):

norabelrose: https://arxiv.org/abs/2404.05971


Eleuther ā–· #lm-thunderdome (8 messagesšŸ”„):

  • Call for Collaboration on apply_chat_template: A member mentions their eagerness to assist with the integration of apply_chat_template for model evaluation. Another member acknowledges the ongoing work and invites assistance upon their return, while a different participant also volunteers to help.

  • Inference Speed Boost via big-refactor: A query regarding whether the big-refactor branch offers faster inference when compared to the main branch is confirmed positively by another member.

  • Torrenting ThePile v1: A member shares a magnet link for downloading EleutherAI’s ThePile v1 dataset.

  • Chat Templating Pull Requests: stellaathena provides links to two pull requests that contribute to chat templating features for Hugging Face models; you can find the first PR here and there’s another one linked here. They note that adding batchwise operations for apply_chat_template in the transformers library would be highly beneficial for the project and others.

Links mentioned:


OpenAI ā–· #ai-discussions (68 messagesšŸ”„šŸ”„):

  • AI Becomes the New Picasso?: Debates emerged about whether AI should be accepted as a new form of artist. While some appreciate the effort and hard work of human artists, concerns were raised about AI-generated art and its implications for artistic credit and effort.
  • Master Student Seeks AI Chat System: A Master’s student seeks an open-source GPT chat system template for their thesis. Recommended tools include LM Studio and the Open-Source LLM Advisor.
  • Perplexity Garners Praise: Users discussed Perplexity, an AI tool with a 32K context window, capable of switching between models like Opus and GPT-4. Some users have upgraded to Pro and report a positive experience.
  • Customization a Key Ask for Future GPT Versions: A user expressed desire for better customizability such as ranking system output and conciseness in GPT’s responses. The idea of introducing ā€œcustom instructionsā€ for more finely-tailored outputs was floated.
  • GPT-4 Access Capped, Users Confused: Members reported messages indicating they have reached a usage cap for GPT-4 even though they were set to use version 3.5. A link to OpenAI’s status updates was shared, which documents ongoing investigations into ChatGPT errors.

Link mentioned: OpenAI Status: no description found


OpenAI ā–· #gpt-4-discussions (33 messagesšŸ”„):

<ul>
  <li><strong>Domain Verification Troubles</strong>: A user encountered an error when trying to publish a GPT, asking for suggestions on how to verify a domain even after setting up the TXT records.</li>
  <li><strong>GPT to SaaS Transformation Inquiry</strong>: One member was seeking advice on services available to convert GPT into a single-purpose SaaS application, aiming to create a proof of concept for future endeavors.</li>
  <li><strong>Technical Difficulties with GPT</strong>: Several members reported experiencing issues ranging from inability to load GPT, mentions not functioning, to suspended API access due to billing problems despite sufficient funds.</li>
  <li><strong>Chatbot Outage Reports</strong>: Users were facing outages with GPT, signalizing errors such as "GPT inaccessible or not found" and having trouble retrieving existing conversations.</li>
  <li><strong>Service Status Updates and Confirmation</strong>: A link to <a href="https://status.openai.com/">OpenAI's service status page</a> was shared, confirming the ongoing investigation into elevated errors and intermittent outages affecting ChatGPT services.</li>
</ul>

Link mentioned: OpenAI Status: no description found


OpenAI ā–· #prompt-engineering (179 messagesšŸ”„šŸ”„):

  • AI Maximum Security: A member cautioned against sharing or promoting jailbreak-related prompts as it violates AI stewardship and OpenAI policies. They also referenced a Google search article for in-depth understanding.
  • Prompt Engineering 101: The conversation turned into a workshop on prompt engineering, using Pokemon Showdown and AI dialogue as examples. One user suggested meta-prompting - iteratively refining prompts by asking the AI itself to generate instructions for desired outputs.
  • Fine-Tuning for Excellence: The same user also revealed that asking ChatGPT for specific dialogue examples and then for instructions based on those, can help mimic those patterns in future outputs, emphasizing the significance of letting the AI structure instructions.
  • Guarding Against AI Missteps: There was a mention of a technique to prevent a custom GPT from sharing its instructions, involving a phrase added to the ā€œInstructionsā€ that mitigates some basic prompt injection threats if Code Interpreter is enabled.
  • ChatGPT Writes Instructions: Multiple participants engaged in refining the way to generate compelling Pokemon battle dialogue by leveraging ChatGPT’s ability to produce instructions for generating better dialogue, highlighting the AI’s potential to surpass even the engineers’ own capabilities in task specificity.

OpenAI ā–· #api-discussions (179 messagesšŸ”„šŸ”„):

  • AI Jailbreak Prompt Awareness: A member emphasized caution against sharing AI jailbreak prompts. They highlighted the ethical considerations and referred to a Google search term, ā€œDon’t worry about AI breaking out, worry about us breaking inā€, to explain the risks and issues with promoting AI jailbreaking techniques.
  • Custom Instructions Against AI Misuse: There’s an exchange about creating custom instructions for AI models to prevent them from revealing sensitive information when Code Interpreter is enabled. A member shared a prompt to encourage the Custom GPT to graciously decline revealing details about its system.
  • The Documentary Nature of AI: One participant expressed that ā€œIn the era of large language models, the documentation is the source code.ā€ This statement underscored the importance of AI documentation in understanding and replicating model behavior.
  • Enhancing AI-Generated PokĆ©mon Battle Dialogues: A lengthy discussion was had about generating better PokĆ©mon battle dialogues using ChatGPT. A member suggested using meta-prompting—letting the AI suggest how to construct prompts to improve the writing of dialogues—accompanied by iterative refinement and testing with the AI.
  • Meta-Prompting as a Powerful Tool: A member illustrated the concept of meta-prompting for another user, demonstrating how to refine the AI’s output to improve its battle dialogue writing for a PokĆ©mon game. Through this process, the user learned to ask ChatGPT for specific instructional prompts until the results met expectations.

Latent Space ā–· #ai-general-chat (141 messagesšŸ”„šŸ”„):

  • Introducing AutoCodeRover: Singapore presents AutoCodeRover, an autonomous software engineer capable of resolving GitHub issues tied to bug fixes or feature additions with minimal costs and quick turnaround times. Links to the project on GitHub and a preprint paper were shared. GitHub Repository, Preprint PDF.

  • GPT-4-Turbo Models Hit the Scene: The latest GPT-4-Turbo model with a training cutoff date of December 2023 is out, offering vast improvements over previous iterations. The community’s reaction includes observations of improved reasoning on complex tasks and anticipation of its rollout for ChatGPT Plus subscribers. OpenAI Pricing, OpenAI’s Official Tweet.

  • Music Generation Enters New Era with Udio: A hot topic discussion about Udio, a new music generation app, sparked interest for its potential to rival Suno with its intuitive text-prompting system for music creation and generous allowance of 1200 songs per user each month during its beta phase. There’s excitement and speculation about how this new player will impact the music industry. Udio Announcement, Reddit Discussion about Udio.

  • Mixtral 8x22b Model Released: The release of Mixtral’s 8x22b model drew attention for its substantial parameter count and notable comparison to the performance of GPT-4 and Claude Sonnet. The conversation highlighted the model’s technical specs and its capacity to run on heavy hardware, with further evaluations awaited from the AI community. Teknium Tweet.

  • Nvidia’s Blackwell Performance Analysis: Nvidia’s Blackwell chips were a big talking point, especially after an analysis was shared comparing their total cost of ownership and performance relative to older models like the H100 and A100, with the focus on their applicability for GPT-4’s inference and training needs. The discussion pointed out the importance of marketing realism regarding performance claims. SemiAnalysis Article.

Links mentioned:


Latent Space ā–· #ai-announcements (3 messages):

  • Upcoming Paper Club on 1-bit LLMs: A presentation on the 1-bit Large Language Models (LLMs) paper is scheduled in 10 minutes in the LLM Paper Club channel. For more details and to join the event, register here.

  • Delving into 1-bit LLMs: The featured paper titled ā€œBitNet b1.58ā€ discusses a ternary {-1, 0, 1} 1-bit LLM that achieves performance comparable to full-precision models while being more cost-effective. To read the paper, check arXiv submission.

Links mentioned:


Latent Space ā–· #llm-paper-club-west (268 messagesšŸ”„šŸ”„):

  • Troubles with Visual Aid: There were multiple reports of difficulties viewing the screen share during the session, with some members offering alternative platforms such as x-ware.online and matrix.org.
  • Deep Learning Papers and Experience Sharing: The channel discussed the pre-print paper titled ā€œThe Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits,ā€ and additional resources such as a blogpost and the paper’s PDF on arXiv.org were shared.
  • Audio and Video Issues: Alongside the screen sharing problem, there were issues with project members unable to hear or speak during the meeting, eventually leading to the discussion about moving back to Zoom.
  • 1-bit LLMs Insights and Discussion: Members discussed the concept of 1-bit Large Language Models (LLMs), focusing on how regularization and quantization during training could be key to their success. A related Huggingface repository BitNet-Transformers was also shared.
  • Paper Club Coordination and Future Topics: Towards the end of the chat, the group coordinated on the next paper to discuss, and papers related to time series, such as TimeGPT, were suggested as potential topics of interest. There was also a reference to another LLM, BloombergGPT, which led to sharing of related content like a podcast YouTube video for further exploration.

Links mentioned:


HuggingFace ā–· #announcements (4 messages):

  • Gemma 1.1 Instruct 7B Takes Center Stage: Gemma 1.1 Instruct 7B, a newer and improved version, is now available on HuggingChat. The update is expected to be a net improvement from 1.0 and users are encouraged to try it here.

  • CodeGemma Unveiled: CodeGemma has landed, featuring models optimized for on-device code completion in sizes 2B and 7B with 8192k context, and is available on HuggingFace. Google’s RecurrentGemma, a non-transformer model that boasts solid results and scalability, was also released.

  • A More Economical Hugging Face: Compute prices have been slashed by up to 50% for Spaces and Inference endpoints on HuggingFace, making them now more cost-effective than AWS EC2 on-demand services. Users will benefit from this price reduction starting April for Spaces or Inference Endpoints usage.

  • Community Insights Get Revamped: HuggingFace’s community blogs have been upgraded to ā€œarticlesā€ with new features like upvotes and activity feed presence, plus access for paper authors. Visit the updated articles and explore user-generated content here.

  • Serverless GPUs and Bonus ML Content: Hugging Face introduces serverless GPU inference with Cloudflare and adds a new bonus unit focusing on Classical AI in Games to its ML for Games Course, enriching learning resources for interested parties. To delve into serverless GPUs, check out Deploy on Cloudflare Workers AI, and for the bonus ML content, visit Classical AI in Games.

Links mentioned:


HuggingFace ā–· #general (105 messagesšŸ”„šŸ”„):

  • Checkpoints Saving Woes: A member had issues with a model not saving checkpoints to the specified directory using TrainingArguments. After confirming [no errors in the training loop] and trying different paths, they resolved the problem by using trainer.save_model("") to save the model weights explicitly.

  • Gradio Questions Go Here: When asked about the right place for Gradio-related inquiries, a link to the appropriate Discord channels was provided, including general Gradio questions, Gradio in Spaces, and Gradio Feature Requests.

  • Call for SEO Prompts: A member sought assistance with prompts for SEO blog articles. Although the initial call didn’t get a direct response, it indicates an interest in content creation guidance.

  • Learning Journey for AI Novices: A new member to machine learning, proficient in Python, requested advice on starting with LLMs or image generator AI. This highlights a common entry point question for those new to the field.

  • Model Error Queries and Troubleshooting: Several members discussed issues with model errors. Solutions ranged from checking parameters like max_seq_len to more granular advice such as taking a photo of the end errors, which are usually telling, and varying from assistance with code to actual deployment scenarios.

Links mentioned:


HuggingFace ā–· #today-im-learning (2 messages):

  • Learn NLP in a Day: A repository has been shared that offers a solution for sentiment classification using the IMDB movie 50k review dataset. The guide is easy to follow, comprehensive, and could be a generic approach for most NLP tasks. Sentiment Classifier on GitHub.

  • Navigating the Maze of Package Management: A video was shared discussing various package management tools including Conda, Pip, and Libmamba, as well as tackling hard resets for Linux distributions. This content might help those struggling with package management complexities. Watch on YouTube.

Links mentioned:


HuggingFace ā–· #cool-finds (7 messages):

  • SimA: AI Trained Across Many Worlds: DeepMind presents SimA, a generalist AI agent for 3D virtual environments. This AI is designed to scale across numerous simulated worlds and perform various complex tasks.

  • Qdrant Meets DSPy for Enhanced Search: A new Medium post details the integration of Qdrant with DSPy to advance search capabilities. Combining these tools offers enhanced vector search and could potentially unlock new AI functionalities.

  • Karpathy’s Tweet Sparks Curiosity: The latest tweet from Andrej Karpathy has stirred conversations among enthusiasts. The contents are unspecified in this context, requiring a direct visit to the link for details.

  • Explore HuggingFace Models with Marimo Labs: The Marimo Labs team developed an interface for interactively experimenting with any HuggingFace model. Marimo provides a user-friendly environment for testing and tuning various AI models.

  • Multilingual Information Extraction on HuggingFace: Discover a powerful and multilingual information extraction model on HuggingFace Spaces. This tiny model can be used for robust information extraction tasks and is open-sourced under the Apache 2.0 license.

  • Quantum Leap for Transformers with Quanto: A new GitHub notebook showcases how to employ Quanto for quantizing Transformers. This could enable more efficient deployment of these models on constrained hardware.

Links mentioned:


HuggingFace ā–· #i-made-this (12 messagesšŸ”„):

  • Deep Dive into Deep Q-Learning: Shared a link to a collection of Deep Q-Learning projects on GitHub, promising a wealth of Deep Q-Learning applications with a variety of use cases.
  • Tracing Data Science Evolution: Introduction of RicercaMente, a collaborative project aiming to map the evolution of data science through significant scientific papers. The project encourages community participation and can be found on GitHub.
  • Local LLMs Unleashed with everything-rag: everything-rag, a fully customizable, local chatbot assistant that boasts support for any Long Large Model (LLM) and data, including the use of personal pdf files, was announced. It highlights the open-source and local nature of the tool, with the GitHub repo available here and a live demo provided on the HuggingFace space.
  • Fashion Forward with Virtual Try-On: A virtual try-on system using IP-Adapter Inpainting has been created, showcased on the HuggingFace space, where users can visualize clothing items on models with impressive results, despite occasional color inversion issues.
  • Insights on Model Layer Behavior: In an exchange about model layers, it was observed that the connection between layers varied depending on the input type—be it code, math, QA, or chat—with consistency in lower connection layers. The discussion also touched upon targeted dataset use for specific cases and the potential for pruning in models like Mixtral 8x22B.

Links mentioned:


HuggingFace ā–· #reading-group (8 messagesšŸ”„):

  • Python Debugging Advice: A member recommended understanding Python classes, functions, decorators, imports, and objects for better code implementation. They suggested removing PyTorch implementations for testing and enabling eager execution on JAX or TensorFlow, as well as utilizing Python’s breakpoint() for tracking variable changes during line-by-line code execution.

  • Navigating Colab’s Features: To assist with coding on Google Colab, tips were shared such as using function_name for documentation lookup, object_name.__class__ to find an object’s class, and inspect.getsource to print a class’s source code efficiently.

  • Gratitude Expression: A member acknowledged the community help with a simple ā€šŸ™ā€ emoji.

  • Link to Prior Inquiry: A member referenced a past question asked in the ask-for-help section by providing a Discord channel link, noting their improved understanding of PyTorch since the initial query.

  • Request for Dialogue System Paper: A request was made for research papers or work related to building a multi-turn dialogue system for intelligent customer service, indicating an interest in instructional problem-solving capabilities within the chat system.

  • Mathematical Breakdown of Samplers Needed: In search of mathematical insights, a member requested recommendations for papers on sampling methods that followed after ddpm and ddim, seeking to focus only on the foundational samplers in the field.


HuggingFace ā–· #computer-vision (15 messagesšŸ”„):

  • Diving into Computer Vision with TensorFlow: A member inquired about resources or a roadmap for starting deep learning with computer vision using TensorFlow.

  • Contrastive Loss Requires Large Batch Size: It was discussed that contrastive loss benefits from large batch sizes, and techniques like accumulation or checkpointing could be a workaround for limited compute resources. However, concerns about batchnorm not updating correctly with accumulated large batches were raised.

  • Efficient Watermark Removal from Millions of Images: A member asked for tools to automatically remove watermarks from a large number of images. The repository for an AI watermark removal tool and its associated YouTube video were suggested.

  • Monitoring GPU Usage: For those without access to a task manager, it was pointed out that the command nvidia-smi can be used to monitor GPU usage, and nvidia-smi -l allows for continuous monitoring over time. Another member mentioned seeking a way to log metrics in real time during model training.

Link mentioned: GitHub - Firdavs-coder/Aladdin-Persson-AI-Watermark-Destroy: Aladdin-Persson-AI-Watermark-Destroy Public: Aladdin-Persson-AI-Watermark-Destroy Public. Contribute to Firdavs-coder/Aladdin-Persson-AI-Watermark-Destroy development by creating an account on GitHub.


HuggingFace ā–· #NLP (7 messages):

  • GPT-2 and Summarization Issues: A member reported issues using GPT-2 for text summarization, even when following instructions from a HuggingFace course suggesting its potential application for that purpose. The difficulty arises despite the dataset and task being described as straightforward.

  • Mistral Meets RAG With Underwhelming Results: One participant indicated disappointing outcomes when combining Mistral 7B with RAG (Retrieval-Augmented Generation), experiencing significantly subpar results.

  • Pinning Down the TL;DR:: In response to the above GPT-2 issue, another user suggested the problem might be related to era-specific prompting, particularly ā€œTL;DRā€ used for summarization instructions, implying a potential temporal mismatch in prompting strategies.

  • Sculpting Discord Bot Personality with Llamacpp**: A user queried about methods to craft a Discord bot character using llamacpp, seeking a way to steer the bot’s behavior beyond simple prompting. They also expressed an interest in tracking the conversation history to maintain context.

  • Multi-Model Evaluation Using Cosine Similarity: A complex evaluation strategy for language models was discussed, involving the use of cosine similarity between embedding vectors to assess whether models incorporate specific knowledge points and tutoring principles in their outputs. This prompted another member to suggest a weighted approach to pooling embeddings, to better tailor the evaluation to the context’s demands.


HuggingFace ā–· #diffusion-discussions (18 messagesšŸ”„):

  • Save Custom Modules in Diffusers: A user encountered an error when trying to save a custom nn.Module; after adding mixins to the module, they were able to resolve the issue.

  • Schedulers/Samplers Behavior Explained: In a discussion about diffusers, a user was clarified on why the quality of images varies with different numbers of denoising steps. Ancestral sampler was specifically mentioned and explanations were provided on how the scheduler interpolates between noised and denoised images.

  • Understanding Schedulers/Samplers Maths: A user asked for recommendations on papers to read for understanding the mathematics behind basic schedulers/samplers beyond ddpm and ddim.

  • Multi-GPU Inference with SDXL: A user enquired about performing inference with MultiControlnet (SDXL) across multiple GPUs. Guidance to use šŸ¤— Accelerate and PyTorch Distributed for distributed inference was provided, but challenges of the pipeline requiring more than 10GBs were noted.

  • Layer Decomposer Search: A member requested information or tools related to a Layer Decomposer that separates and complements images, much like a tool found on cre8tiveai.com.

Links mentioned:


OpenRouter (Alex Atallah) ā–· #announcements (4 messages):

  • Gemini Pro 1.5 & GPT-4 Turbo Expand Horizons: Meet the Gemini Pro 1.5 with a 1M token context and GPT-4 Turbo with vision capabilities now at openai/gpt-4-turbo, bringing new advancements to the OpenRouter model lineup.
  • Enhanced logit_bias Support Rolls Out: The logit_bias parameter, enabling users to influence model output more granularly, has been extended to additional models including Nous Hermes 2 Mixtral and various Llama and Mistral models.
  • Farewell to Lesser Used Models: Models like jebcarter/Psyfighter-13B and jondurbin/bagel-34b-v0.2 are set for discontinuation, with a 2-week grace period before returning a 404 error, and migtissera/synthia-70b will redirect to xwin-lm/xwin-lm-70b from April 15th.
  • New Mixtral 8x22B Unveiled: The Mixtral 8x22B, a base model with instruct capabilities, has been launched; feedback and discussions are encouraged in the designated Discord channel.
  • Updates & Price Reductions Announced: The Gemma 7B model has been updated, and reduced pricing is now offered for models including LZLV 70B, Databricks DBRX 132B Instruct, and Nous Hermes 2 Mixtral 8x7B DPO.

Links mentioned:

  • Mixtral 8x22B by mistralai | OpenRouter: Mixtral 8x22B is a large-scale language model from Mistral AI. It consists of 8 experts, each 22 billion parameters, with each token using 2 experts at a time. It was released via [X](https://twitter...
  • Gemma 7B by google | OpenRouter: Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks ...
  • lzlv 70B by lizpreciatior | OpenRouter: A Mythomax/MLewd_13B-style merge of selected 70B models. A multi-model merge of several LLaMA2 70B finetunes for roleplaying and creative work. The goal was to create a model that combines creativity ...
  • DBRX 132B Instruct by databricks | OpenRouter: DBRX is a new open source large language model developed by Databricks. At 132B, it outperforms existing open source LLMs like Llama 2 70B and Mixtral-8x7B on standard industry benchmarks for language...
  • Hermes 2 Mixtral 8x7B DPO by nousresearch | OpenRouter: Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the [Mixtral 8x7B MoE LLM](/models/mistralai/mixtral-8x7b). The model was trained on over 1,000,000 entries of prim...
  • Gemini Pro 1.0 by google | OpenRouter: Google's flagship text generation model. Designed to handle natural language tasks, multiturn text and code chat, and code generation. See the benchmarks and prompting guidelines from [Deepmind]...
  • GPT-4 Turbo by openai | OpenRouter: The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to Dec 2023. This model is updated by OpenAI to point to the lates...
  • Hermes 2 Mixtral 8x7B DPO by nousresearch | OpenRouter: Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the [Mixtral 8x7B MoE LLM](/models/mistralai/mixtral-8x7b). The model was trained on over 1,000,000 entries of prim...
  • Mistral 7B Instruct by mistralai | OpenRouter: A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length. This is v0.1 of Mistral 7B Instruct. For v0.2, use [this model](/models/mistral...
  • Llama v2 13B Chat by meta-llama | OpenRouter: A 13 billion parameter language model from Meta, fine tuned for chat completions
  • Llama v2 70B Chat by meta-llama | OpenRouter: The flagship, 70 billion parameter language model from Meta, fine tuned for chat completions. Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned ve...
  • Mixtral 8x7B by mistralai | OpenRouter: A pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Base model (not fine-tuned for instructions) - see [Mixt...

OpenRouter (Alex Atallah) ā–· #app-showcase (1 messages):

stonedjesusape: Fuck


OpenRouter (Alex Atallah) ā–· #general (166 messagesšŸ”„šŸ”„):

  • Discussing Model Integrations: A user inquired about integrating a new LLM API into OpenRouter; they were directed to DM for setting up a chat between both companies. Louisgv is handling the integration discussions.

  • Rate Limit Confusion: Users expressed uncertainty regarding OpenRouter’s rate limits on new models like Gemini 1.5 Pro, with clarifications on the heavy rate limitations for preview models that typically allow around 10 requests per minute.

  • Pricing and Token Estimates on OR: There was a detailed conversation around the pricing of Gemini models, with louisgv explaining that tokens are counted as individual characters for billing purposes. This led to discussions about the potential impacts on token pricing, especially with languages like Chinese.

  • Imminent Updates Teased: Alexatallah hinted at possible news forthcoming, following observations of a large amount of model updates on a single day, including Mixtral 8x22b being added to providers.

  • Technical Adaptations for Hermes DPO: A user hanaaa__ mentioned needing to patch SillyTavern for better performance with Hermes DPO providers, indicating issues with TogetherAI’s latency. They also noted that the OpenRouter website experiences crashes on certain pages when accessed via an iPhone.

Links mentioned:


CUDA MODE ā–· #general (5 messages):

  • Meta’s Generosity in GPU Hours: Meta backed a significant study on LLM knowledge capacity, providing 4.2 million GPU hours, which comically translates to nearly half a millennium of compute time.
  • Scaling Laws Courtesy of Meta: A notable scaling laws research has been sponsored by Meta, which required a jaw-dropping 4.2 million GPU hours reflecting Meta’s commitment to propelling AI research forward.
  • Porting GPT-2 to CUDA: An enthusiast mentioned their current project of porting GPT-2 training code to CUDA, which could become a remarkable benchmark for the AI community, and shared the llm.c repository.
  • Formation of a Working Group for CUDA Development: In response to expressed interests, a working group is set to be formed to foster collaboration on CUDA-related projects, indicating a healthy, community-driven approach to AI development.
  • Meta’s Impressive AI Hardware Specs Revealed: Details of Meta’s next-generation AI hardware, boasting 354 TFLOPS/s at only 90 watts, were discussed, with an accompanying blog post outlining Meta’s robust investment in AI infrastructure.

Links mentioned:


CUDA MODE ā–· #cuda (1 messages):

  • Excitement for C code in CUDA: A member expressed enthusiasm about integrating C code implementations of algorithms into fast CUDA. They mentioned the possibility of adding this to their library and inquired about the compatibility between MIT license and Apache 2.0 license, seeking advice from anyone knowledgeable about licenses.

CUDA MODE ā–· #torch (1 messages):

  • Matrix Multiplication Performance Explained: A post highlights the importance of matrix shapes in performance with a focus on tiling and memory layouts. A matrix multiplication example ([M x K] @ [K x N]) is provided with the optimal configuration being A: M=2047, K=N=2048 because it avoids unaligned memory layouts which impact performance negatively.

  • Offer for Answer Key: The linked blog post discusses the performance of matrix multiplication shapes, offering the first answer publicly and providing further answers in exchange for readers’ solutions to questions. This encourages engagement and helps deepen understanding of the material copresented.

Link mentioned: Answer Key: What Shapes Do Matrix Multiplications Like?: Companion to https://www.thonking.ai/p/what-shapes-do-matrix-multiplications


CUDA MODE ā–· #beginner (1 messages):

  • pmpp book lecture viewing party proposal: A member is organizing a viewing party for the University of Illinois lectures on the pmpp book. They offered to share a Zoom link to go through the lectures, which are 1 hour to 1 hour and 15 minutes long, with pauses for discussion, proposing early CET time or a later slot on weekdays.

CUDA MODE ā–· #ring-attention (7 messages):

  • Ring-Flash Attention Query: A member questioned the if step <= comm.rank: condition in the ring-flash-attention code, asking why it doesn’t iterate over all key-value pairs for all hosts.
  • Clarifying Causal Self-Attention: Additional context was given explaining that in causal self-attention each query does not need to attend to all key-values but only those prior to it.
  • Experimentation with State Space Models: One member expressed interest in testing how well state space models can do ā€˜no-in-head’ attention (NiH) and specifically asked if the process could be run for a mamba model.
  • Collaborative Work on Flash Attention: There’s ongoing collaborative work, as stated by a member, on creating educational flash attention examples, which is in progress and available on GitHub.
  • Commitment to Model Testing: It was mentioned that a member will attempt to run the ring-flash-attention code on a mamba model to see its effectiveness.

Links mentioned:


CUDA MODE ā–· #off-topic (4 messages):

  • Celebrating Personal Milestones: A member shares that they will turn 28 in May and expresses a potentially unpopular opinion: Staying abreast with the entire AI field is futile and perhaps counterproductive. Instead, they advocate for a more selective approach to information, filtering out the noise and focusing on what’s truly important to them.
  • A Nod to Pop Culture: The server’s picture is recognized as Goku, a character from the anime series Dragon Ball.
  • Milestone for the Community: The server celebration as it just surpassed 5000 members.
  • Quality Over Quantity in Learning: A member responds to a sentiment about information overload, recommending a once-a-week reading routine and emphasizing problem-driven learning rather than consumption-focused habits for better intellectual engagement.

CUDA MODE ā–· #triton-puzzles (5 messages):

  • Correction in Puzzle Logic: A message pointed out a small error in puzzle 11, stating that the summation should be over the shared index ( l ).
  • Pull Request for Puzzle 11 Fix: A member agreed with the summation correction in puzzle 11 and mentioned the need for B_MID to represent the block size on the MID dimension, subsequently creating a pull request to address the issue. Here is the GitHub Pull Request.

Link mentioned: minor on puzzle 11 by ZhaoyueCheng Ā· Pull Request #10 Ā· srush/Triton-Puzzles: fix formula on puzzle 11 to sum over dimension L add B_MID on puzzle 11 for the parameter on block size to loop over the MID dimension


CUDA MODE ā–· #hqq (96 messagesšŸ”„šŸ”„):

  • Progress with Half-Quadratic Quantization (HQQ) Implementation: Some basic placeholder code for inference using HQQ+ was shared (code example), indicating that HQQ alone only looks at the weights but HQQ+, which includes calibration, may be required for better performance.

  • Mixed Results with Marlin Kernel: One member discussed their experience with the Marlin kernel, noting that while Marlin reports up to 4x speed-up for fp16 x int4 matrix multiplication, the results on their A6000 Ada were underwhelming, also mentioning slight errors introduced by the kernels.

  • Quantization Techniques Discussion: There was an exchange around the use of Marlin and HQQ quantization techniques with the suggestion of perplexity evaluation scripts to measure effective perplexity, aiming to achieve results similar to GPTQ.

  • Benchmark Concerns and Perplexity in Quantized Models: Members compared perplexity scores of different models with modifications, noting discrepancies and seeking consistency with expected performance, identified by a perplexity of around 5.3 on wikitext with group-size=64.

  • Tuning and Testing HQQ Quantization: Technical discussion ensued about the quantization settings for HQQLinear, particularly the importance of using quant_scale=False, quant_zero=False in the quantization settings. A detailed chat about execution speed raises concerns as to why AOInt4 kernels are slower on some hardware compared to torch.matmul with HQQLinear and the potential causes (issue demonstration).

Links mentioned:


CUDA MODE ā–· #triton-viz (1 messages):

kerenzhou: I like the corresonding code on the figure


CUDA MODE ā–· #llmdotc (42 messagesšŸ”„):

  • Early CUDA Forward Pass Strides: An update reports all forward layers have been implemented for a project, with efficient attention being the last hurdle. The first round of optimizations led to a status termed as ā€œmostly in good shape,ā€ indicating a potential for performance gains over the initial code.

  • LLM.C Repository Highlighted as a Learning Resource: A GitHub repository named llm.c has been shared and praised as a valuable resource for learning and honing CUDA skills. It involves LLM training using simple, raw C/CUDA.

  • OpenMP Usage in LLM.C Discussed: Members have noted that OpenMP is employed in the llm.c codebase, with one suggesting that OMP offloading could replace direct CUDA usage for simplicity and cross GPU vendor compatibility, though there is uncertainty about support on Windows.

  • Debugging Performance Issues in Custom CUDA Code: There has been a performance comparison between different versions of CUDA kernels. The ā€˜flash’ version was initially 3x slower than expected; however, after further testing by various members on different hardware, it showed a speed-up and workings towards resolving the slowdown were underway.

  • Gap to Close in Pure CUDA Forward Pass Performance: The recently pushed pure CUDA forward pass shows an execution time of 111ms/iter compared to PyTorch’s 180ms/iter, but there is still a gap compared to PyTorch optimized with compilation and tensor cores, which runs at 26ms/iter. The push included a comparison of performance metrics and an aim to close this performance gap. The code can be found on the GitHub repository karpathy/llm.c.

Links mentioned:


LangChain AI ā–· #general (108 messagesšŸ”„šŸ”„):

  • Misconceptions about Whisper Capabilities: There’s discussion clarifying that Whisper is a speech-to-text (STT) model, not text-to-speech (TTS), and while Ollama doesn’t inherently support Whisper, it’s possible to use Whisper locally or with a different backend provided by the same developer.
  • LangChain Use Cases Explored: Members shared insights into the practical applications of LangChain, such as evaluating retrieval systems, with one member pointing to an example involving RAG metrics to assess retrieval-augmented generation performance.
  • Comparing LangChain with OpenAI’s API: A member inquired about the benefits of using LangChain over OpenAI’s API for building AI assistants. The consensus suggests that if integrations beyond OpenAI’s offerings are not needed, LangChain might not add significant value.
  • LangChain Functionality Debates: Users discussed the capability of Very Large Language Models (VLLM) to support function calling, with the suggestion to use Outlines, which provides structured text generation capabilities.
  • Beginner’s Query on Starting AI/ML Career: A new member requested guidance on where to begin their career in AI/ML after learning basic Python and MySQL.

Links mentioned:


LangChain AI ā–· #langchain-templates (1 messages):

lhc1921: https://python.langchain.com/docs/integrations/llms/azure_openai/


LangChain AI ā–· #share-your-work (3 messages):

  • Swipe Right for Automation: Introducing TinderGPT, an app designed to automate messaging on Tinder, promising to save users time and secure dates automatically. Find the code and contribute to the future of digital dating on GitHub.

  • Chat with Retrieval Augmented Generation Locally: everything-rag offers a fully customizable, local chatbot assistant with free, 100% local functionality, using Langchain and ChromaDB vectorized databases. Explore the HuggingFace space here, star the GitHub repo, and read about the significance of open-source LLMs in the associated blog post.

  • Analyzing Structured Output Across LLMs: A performance analysis of structured output is presented, comparing popular open and closed source large language models. Check the findings and methodology on the GitHub page.

Links mentioned:


LangChain AI ā–· #tutorials (3 messages):

  • AI Agent for Virtual Fashion Trials: A member shared a YouTube video titled ā€œFuture of E-commerce?! Virtual clothing try-on agentā€, showcasing an AI agent they built that can generate images of a model wearing various clothes and create social media posts.
  • Guidance on Publishing an AI Agent: A member inquired about how to publish and create a UI for an AI agent they’ve developed, seeking tutorials for guidance. Another member recommended learning web development as a necessary step for accomplishing this task.

Link mentioned: Future of E-commerce?! Virtual clothing try-on agent: I built an agent system which will autonomously iterate & generate img of AI model wearing certain cloth and produce millions+ social postsFree access to run…


LlamaIndex ā–· #blog (4 messages):

  • Multimodal RAG Revolutionizes Pill Identification: A new Multimodal RAG application has been highlighted that is capable of identifying pills from images, integrating visual data with a descriptive database in the medical domain. The blog post by @activeloop demonstrates its usefulness in recognizing pharmacy products and can be found at this Twitter link.
  • Event Alert: Building Enterprise-Grade RAG: @llama_index announced a collaboration with @traceloop and @getreflex to show the essential components for constructing enterprise-level Retrieval-Augmented Generation (RAG). Advanced parsing and observability features are among the core tools to be discussed at the event, with more details available on Twitter.
  • MetaGPT Steps into ICLR 2024 with RAG: Introducing MetaGPT by Hong et al., a multi-agent framework premiering at ICLR 2024 that treats agents as a software company’s diverse roles, from PMs to engineers, solving tasks through collaboration. RAG-enhanced MetaGPT adds a cutting-edge twist to this framework, with more details shared at this link.
  • Controllability in Agent Execution via Execution Stopping Tools: Highlighting the importance of execution control tools in agent systems, @llama_index shared insights into how these tools are integral to a travel agent’s booking confirmation process and an agentic RAG system’s search and reply function. Interested readers can follow the conversation on Twitter.

LlamaIndex ā–· #general (104 messagesšŸ”„šŸ”„):

  • OpenAI Agent vs Gemini LLM Adaptation: Users discussed adapting an openaiagent example notebook found in LlamaIndex’s documentation to work with the Gemini LLM, suggesting code modifications like replacing OpenAI llm with gemini LLM and using ReActAgent instead of OpenAIAgent.

  • RAG Optimization Quest: A user sought advice for optimizing Retrieval Augmented Generation (RAG) with short documents, leading to a recommendation to review RAG 101 on Gradient AI and referring to the MTEB leaderboard for embeddings.

  • Tool Creation within LlamaIndex: A conversation unfolded around how to create new tools within LlamaIndex and dynamically add them to OpenAIAgent. After a detailed exchange, a member successfully managed to create tools by using FunctionTool, despite various challenges.

  • Debugging LLM Prompting Issues: A member asked how to see the exact prompt sent to an LLM for debugging purposes. They were directed to a logging guide, eventually discovering they needed a particular type of chat mode that conditionally uses RAG to reduce unnecessary LLM calls.

  • Integration Woes and Example Requests: Users queried about project setups, integration instructions with Open Source tools, and example use cases. References included an end-to-end guide video for the SEC Insights project on YouTube and source code on GitHub.

Links mentioned:


LlamaIndex ā–· #ai-discussion (2 messages):

  • Cookbook Gains a Fan: A member expressed appreciation for the openaiagent example provided in the Llama Index GitHub cookbook, finding it ā€œquite useful,ā€ and inquired about the possibility of similar resources for Gemini LLM. The relevant resource can be found at this GitHub notebook.

  • New Member Seeks Clarification on API Key: A new participant in the discussion expressed confusion about the operation of services, inquiring whether an API Key for OpenAI is required to make things work, referencing guidance from documentation.


LAION ā–· #general (87 messagesšŸ”„šŸ”„):

  • Pixart Sigma Performance Review: Members discussed the performance of Pixart Sigma on a 3090, indicating fast and promising results with a prompt execution time of 8.26 seconds. However, the outputs were described as ā€œmangledā€ with users noting the presence of ā€œbleedā€ in the results from current open models.

  • Mistral 22b x 8 Release Chat: There was a mention of Mistral 22b x 8 being out. A magnet link for mixtral-8x22b was shared, followed by responses indicating excitement and queries about potential relation to mistral-large.

  • Skepticism Surrounding Ella SDXL and SD3: Discussion turned to the unlikelihood of Ella SDXL becoming available and skepticism about the benefits of Stable Diffusion V3 (SD3), comparing it unfavorably to Terminus and Pixart Sigma. Members also weighed in on industry responses to the Sora announcement affecting competitors such as Stability, Pika labs, Runway, and Midjourney.

  • New Audio Generation Solutions Emerging: There was a buzz around Udio, an app backed by artists for intuitive music creation via text-prompts, and a New TTS engine by the Huggingface team that allows voice prompting.

  • AI Acceleration Hardware Buzz: Members discussed the new Meta Training and Inference Accelerator (AI-MTIA) with its impressive specs, reflecting on the trend of major tech companies developing their own AI acceleration hardware solutions.

Links mentioned:


LAION ā–· #research (9 messagesšŸ”„):

  • Reevaluating ā€œZero-Shotā€ Generalization in Multimodal Models: A recent paper questions the degree to which ā€œzero-shotā€ generalization truly exists in multimodal models like CLIP and Stable-Diffusion. Analysis across various models and datasets suggests that performance heavily depends on the prominence of concepts within the pretraining data.

  • Data Quality Over Quantity for CLIP Models: When testing CLIP models on less common concepts, improving data filtering and selection for quality and diversity is crucial, possibly more so than simply increasing the quantity of data.

  • Google Advances with Larger Griffin Model: Google reportedly releases a model with a new Griffin architecture, featuring an additional 1 billion parameters, boasting improved performance and throughput over long contexts. The details can be found on their subreddit post.

  • New Study Challenges Traditional LLM Training Methods: A groundbreaking paper presents an alternative to Reinforcement Learning from Human Feedback (RLHF) by optimizing directly over ā€œpair-wiseā€ or general preferences, showing significant performance improvements even with a 7 billion parameter model.

  • Performance Boost in Large Language Models: The aforementioned method provides a significant performance leap compared to other leading models, indicating the potential advantages of pair-wise optimization strategies over traditional point-wise reward methods.

Links mentioned:


OpenInterpreter ā–· #general (51 messagesšŸ”„):

  • GPT-4 Makes a Grand Entrance: Excitement buzzes in the OpenInterpreter Discord about the newly released GPT-4 model, which boasts notable improvements over its predecessor, including integrated vision capabilities and enhanced performance. The chatter also includes observations that GPT-4 is actually 3 times faster and some users report their firsthand experience with its speed, acknowledging its prompt response times and swift operation.

  • GPT-4 Turbocharged and Under the Radar: Amidst the release frenzy, a member notes that there’s a lack of widespread notice or detailed information on GPT-4, with no substantial chatter outside the community and only OpenAI’s release page serving as a primary info source on the continuous model upgrades.

  • Mixtral and OI Compatibility Queries: Some discussions have arisen about the potential match-up of Mixtral 8x22b with OpenInterpreter (OI), as users compare it against past models like the 8x7b and consider implications for performance within OI’s framework.

  • Enthusiasm for Command r+: A member raves about a model called Command r+, praising it as the best model ever used for role-playing (RP) and following instructions precisely, indicating it feels like a better version of GPT-3.5 and outperforms the old GPT-4 in benchmarks, especially with the right prompts.

  • Compute Conundrums for Command r+: A conversation surfaces regarding the compute power required for running Command r+ locally, with members discussing their setups, and one reporting that even a 4090 isn’t enough for optimal performance, indicating that significant hardware might be needed.


OpenInterpreter ā–· #O1 (38 messagesšŸ”„):

  • Troubleshooting 01 Hotspot Reconnection Issues: A member resolves an issue reconnecting to the WiFi and server settings page for 01 by suggesting a factory reset and navigating to captive.apple.com to trigger the portal. Mention of removing old WiFi credentials was also advised.

  • Installation Hurdles with 01 an Windows 11: Members report issues where talking to the installed 01 yields no response, despite the microphone functioning correctly. Suggestions included checking the Python script and ensuring sounddevice is installed.

  • Constructing the 01 from GitHub Repository: An individual shared their experience of buying parts from the Bill of Materials (BOM) and 3D printing the body from files available in the 01 GitHub repository.

  • Clarification on Raspberry Pi Requirements for 01: A discussion clarified that a Raspberry Pi is not required for 01 and that running Open Interpreter or 01OS on any computer suffices. For those interested in adding Raspberry Pi to their setup, the conversation suggested initiating a broader discussion in a dedicated forum.

  • Local IP Use for 01 Server Configuration: A new 01 user successfully connects their device to the server using their MacBook’s local IP address, after facing issues and confusion with configuring and understanding ngrok domains.

  • Order Updates and Customer Service Inquiries: In response to a customer order status inquiry, it was mentioned that emails would be sent out once there are updates. All current order statuses are humorously referred to as ā€œstill cookingā€.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #news (45 messagesšŸ”„):

  • Google Drops a Non-Transformer Surprise: Google quietly launched a 2 billion parameter recurrent linear attention model named Griffin, which is a significant development beyond the recent CodeGemma release, drawing comparisons to the RWKV architecture. The related research paper available on arXiv.

  • Rumblings of Rapid Model Releases: The conversation touches on rapid and somewhat unexpected model releases, such as Mixtral, which may be a result of competitive pressure from other anticipated model releases like llama 3 smol and Cohere.

  • OpenAI Drops Their Own News: OpenAI’s tweet alluded to an intriguing development, but specifics were not discussed in the messages—only a link to a tweet from OpenAI was provided without further context.

  • New Model Excitement with Mixtral: Mixtral, a new model, is stirring up excitement, and differences from previous models like Mistral and Miqu are highlighted in a Twitter conversation.

  • Public Human Eval Blog Proposal: A member discusses the possibility of starting a blog dedicated to unbiased human evaluations of new model releases, expressing frustration over the current focus on benchmark scores rather than practical utility for developers. There’s a call for contributions and participation in this endeavor.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #ml-questions (14 messagesšŸ”„):

  • Decoding RLHF in Modern LLM Training: Sebastian Raschka published a breakdown on RLHF as a crucial part of LLM training, impacting model helpfulness and safety, comparing ChatGPT’s and Llama 2’s use of RLHF, with regular updates on alternatives.
  • Rejection Sampling Raises Questions: A user reading the article was confused by Rejection Sampling, a concept implying the use of the best model generations for PPO updates, and sought insights on why this might be superior to learning from average or worse generations.
  • Exploring PPO Through Online Resources: Another user aimed to clarify their understanding of PPO by consulting Cameron Wolfe’s newsletter, acknowledging a lack of prior knowledge in RL.
  • Rejection Sampling’s Role Clarified: Nathan Lambert clarified that Rejection Sampling is applied on the entire instruction dataset before continued training, acknowledging that this process isn’t well documented in relevant papers, and required direct correspondence with authors.
  • Considerations on Rejection Sampling’s Efficacy: Lambert further explained the probable rationale for Rejection Sampling: most of the data is likely low quality, implying that filtering for higher-quality examples before PPO could lead to more stable training outcomes.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #random (7 messages):

  • Chasing Eclipses in Texas: Members shared personal experiences regarding travel to Texas for an optimal viewing experience of an eclipse or celestial event.
  • Brief Encounter with the Skies: Despite cloudy conditions, one member expressed joy at catching a glimpse of the event, considering themselves to have great luck.
  • A Cosmic Resemblance: A member noted that the celestial sight resembled the eyeball in the sky from Netflix’s series ā€œ3-Body,ā€ evoking an image from pop culture.

Interconnects (Nathan Lambert) ā–· #memes (2 messages):

  • Universal Quirk in ML Discords: A member humorously noted the widespread use of the <:berk:750111476483752166> emoji across various machine learning Discord communities.
  • Humor Shared in the Community: A user found something amusing and declared it ā€œToo funny to not shareā€ in the channel.

Interconnects (Nathan Lambert) ā–· #rlhf (10 messagesšŸ”„):

  • Debating the Merits of RLHF Post-Training Improvements: A member highlighted a new paper discussing the improvement of large language models (LLMs) through iterative feedback using an oracle, potentially challenging typical Reinforcement Learning from Human Feedback (RLHF) methods which rely on reward maximization.

  • Size Matters in Model Efficacy: A question was raised comparing a 7B model to GPT-4, suggesting the smaller model might outperform the larger.

  • Skepticism About Benchmark Optimization: Members expressed skepticism toward LLM-evaluated benchmarks, pointing out that while benchmarks can be optimized, it doesn’t necessarily reflect better fundamental model performance.

  • Practical Model Improvement Philosophy: A member has disclosed a preference for tangible improvements in models through better data and better scaling rather than what they deemed as ā€œbullshitā€ new papers on the topic.

  • Benchmarks as Imperfect Proxies: There was acknowledgment that while benchmarks like alpacaeval may be broken once optimizing starts, they can still be useful as a temporary measure of a model’s capabilities.

Link mentioned: Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences: This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for post-training L…


Interconnects (Nathan Lambert) ā–· #reads (3 messages):

  • Scaling Laws for LLMs Unveiled: A new paper introduces 12 scaling laws for Language Model (LLM) knowledge capacity, which could be pivotal in the era of large language models. The research required a massive amount of resources, with Meta’s FAIR team sponsoring 4,200,000 GPU hours for this study. Read the paper here.
  • Exploration of Quantization and MoE: The paper also explores inference and quantization, revealing that quantizing model weights to int8 doesn’t harm the knowledge capacity of even maximally-capable models, and that Mixture of Experts (MoE) models with 32 experts preserve knowledge capacity efficiently. See the detailed results.

Links mentioned:

  • Tweet from Zeyuan Allen-Zhu (@ZeyuanAllenZhu): Our 12 scaling laws (for LLM knowledge capacity) are out: https://arxiv.org/abs/2404.05405. Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope t...
  • Tweet from Zeyuan Allen-Zhu (@ZeyuanAllenZhu): Results 8/9: scaling laws for quantization and MoE. // Quantization to int8 does not hurt knowledge capacity even for models at max capacity => 2bit of knowledge can be stored to int8 // MoEs wit...

tinygrad (George Hotz) ā–· #general (52 messagesšŸ”„):

  • Refactoring Tinygrad for Efficiency: A discussion on streamlining tinygrad’s codebase is underway, with a particular focus on reducing line count and enhancing the code for review readiness, as well as addressing backend peculiarities that necessitate JIT support for non-disk backends.
  • The Quest for a Weight Agnostic Network: One member expresses interest in creating a weight agnostic network using tinygrad to train a game, intending to experiment with ReLU activations.
  • Merging MNIST into Tinygrad: Efforts to integrate MNIST more closely into tinygrad are highlighted, with Pull Request #4122 showcasing the move and revealing a compiler bug on AMD, calling action to add a test in CI to catch such issues.
  • Variable Naming in Abstractions3: There’s a debate about the necessity of variable names in the context of abstractions3, with the suggestion being made that variables should be defined by their IDs. This led to a change where var_vals will be a global dict instead of being in each ScheduleItem.
  • CI Performance and Test Discussion: Concerns are raised regarding CI performance regression and missing tests, particularly for the functionality copy_from_fd, something to be addressed in a subsequent pull request.

Links mentioned:


tinygrad (George Hotz) ā–· #learn-tinygrad (13 messagesšŸ”„):

  • Step-by-Step Guide to Custom Accelerators: A user shared a step-by-step guide on adding custom accelerators to tinygrad, pointing to a GitHub repository for detailed instructions and illustrations.
  • Seeking Network Examples: One member was in search of neat network examples using tinygrad and was directed to review the examples/ directory within the tinygrad repository.
  • Discussing ā€˜Double Reducc’: Users were discussing an issue named ā€˜double reducc’ and there seemed to be a consensus and recognition of the problem, indicating a collaborative effort towards a resolution.
  • Converting Tensor to Array in Tinygrad: A query was raised about turning tensors into arrays within the tinygrad environment. Another user responded by recommending the use of .numpy() on the tensor to accomplish this conversion.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #general (40 messagesšŸ”„):

  • Mixtral Model Evolution: The new Mixtral 8x22B model was discussed, presumably having around 140 billion parameters, and a dataset of 1.5GB operating at rank32 with unexpectedly low loss. Members are curious if this version is instruction tuned or a base model.
  • Quantization and Model Size Limits: Community members are looking into quantization for practical use and expressing concerns about the feasibility of running larger models like Mixtral 8x22B with resources available to typical developers. There’s interest in finding a balance between model size and utility.
  • Rapid Community Contributions: A contributor has already started uploading the new big model, Mixtral-8x22B, to Hugging Face, demonstrating the community’s quick response to developments. The link to the repository was shared: Hugging Face - Mixtral-8x22B.
  • Seeking Compatible Frontends: A question was raised about a web self-hostable frontend that is compatible with various APIs, including OpenAI’s and Google’s. No specific solutions were mentioned in the responses.
  • Generative AI Hackathon Announcement: Samsung Next 2024 Generative AI Hackathon is announced for May 11th in New York, focusing on tracks in Health & Wellness and Mediatech. The link for details and applications was provided: Samsung Next AI Hackathon.

Links mentioned:


OpenAccess AI Collective (axolotl) ā–· #axolotl-dev (4 messages):

  • Axolotl Dataset Versioning Feature on the Horizon: A member expressed interest in adding dataset versioning to Axolotl, noting its absence. A response indicated that dataset versioning had not been previously requested and encouraged the member to proceed with a contribution.

  • LoRA Layer Initialization Technique Sparks Interest: Sharing a tip from CFGeek’s tweet, the group discussed a novel initialization method for LoRA layers that involves using the SVD of the original weight matrix for better fine-tuning results. The technique, termed PiSSA (Principal Singular Values and Singular Vectors Adaptation), which reportedly improves finetuned performance, is detailed in an arXiv abstract and a corresponding GitHub repository.

Link mentioned: Tweet from Charles Foster (@CFGeek): YES! If you initialize a LoRA layer based on the SVD of the original weight matrix (with its top singular values & vectors), you get significantly better fine-tuning results. This is a straight-up fr…


OpenAccess AI Collective (axolotl) ā–· #general-help (9 messagesšŸ”„):

  • Pre-training with Norwegian Articles: A member is preparing to pre-train a large language model with a dataset of arts in Norwegian to enhance grammar capabilities. They inquired about the best way to split the articles and received advice to consider using one row per article, possibly in a .jsonl format.

  • Seeking Function Calling Fine-tuning DataSet: A request was made for a good dataset suitable for JSON mode or function calling, specifically to fine-tune LoRAs for function calling with axolotl; however, no recommendations were provided within the current message history.

  • Hardware Capability Query for the mixtral-qlora-fsdp Model: A member questioned whether the mixtral-qlora-fsdp model would fit on a dual 24GB GPU setup, but no follow-up information or answers were given.

  • Fixing Empty Queue Error: A user experiencing an empty queue error was advised to check for an empty condition before iterating, presented with refactored code as a potential solution.

  • Code Refactoring for Simplicity: An example of refactoring code was given, simplifying a function that checks for a stop token from several lines to just one, enhancing code efficiency.


OpenAccess AI Collective (axolotl) ā–· #datasets (2 messages):

  • Seeking Function-Calling & JSON Datasets: A member inquired about datasets for function-calling or JSON parsing.
  • Agent-FLAN Dataset Shared: Another member responded with a dataset suggestion, providing a link to the Agent-FLAN dataset on HuggingFace. This dataset includes AgentInstruct, Toolbench, and custom negative agent samples, designed for effective agent tuning in large language models.

Link mentioned: internlm/Agent-FLAN Ā· Datasets at Hugging Face: no description found


Modular (Mojo šŸ”„) ā–· #general (8 messagesšŸ”„):

  • Hints at C Formatting in Mojo: It has been hinted that while Mojo developers wait for Python-style f strings, they can use old C-style formatting by importing _printf as printf with the caveat that this feature ā€œmay not be around for everā€.
  • API Documentation Summarized for Beginners: A member shared a link to a Notion page that provides translated API documentation in a summarized format, aiming to help beginners.
  • Exploring Contributions Beyond Mojo stdlib: A discussion for potential contributors looking to get involved with Mojo or MAX projects, with suggestions for web development on lightbug, AI on basalt, or starting a new project.
  • Curated List of Mojo Resources: Contribution opportunities and resources for Mojo can also be found on the curated list maintained on GitHub, known as awesome-mojo.
  • Call for Community Feedback on Mojo Traits: A new discussion has been initiated regarding the use of traits in Mojo, and feedback from the broader community has been requested in a GitHub discussion.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #šŸ’¬ļø±twitter (2 messages):

  • Modular Shares Update: Modular tweeted an update which can be viewed on their official Twitter page. The specific content of the tweet was not shared in the message.

  • Another Modular Announcement: A second tweet from Modular was posted, the details of which can be explored through the provided link. The exact nature or topic of the announcement was not mentioned in the chat.


Modular (Mojo šŸ”„) ā–· #šŸ”„mojo (32 messagesšŸ”„):

  • Simd Support Coming to Mojo: Discussions indicate excitement around the addition of simd support to Mojo, with expectations of fascinating benchmark results following the update.
  • Concurrency Features are a Work in Progress: Mojo currently supports async/await and coroutines; however, these features are unfinished. The coroutine API in Mojo differs from Python, and details can be found in the Mojo documentation.
  • Mojo’s Roadmap for Async Constructs: The language currently lacks async for and async with constructs, and discussions link to the roadmap indicating a focus on essential core system programming features of Mojo, available here.
  • Running Mojo Natively on Intel Macs: A user expresses a limitation with running Mojo natively on Intel Macs, relying on a VM for larger projects, although small tests are done within the playground.
  • Mojo-UI Efforts and Objective-C Integration: A new project for a UI library specifically for Mojo called Mojo-UI is underway, with efforts focusing on Mac as the primary platform, raising questions about the future potential for integrating Objective-C or accessing the AppKit framework with Mojo. This integration could possibly require designing a binding layer between Mojo and Swift via C or C++, as suggested in recent discussions. The project is tracked on GitHub.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #community-projects (4 messages):

  • MojoGeek Unveils Mojo GPT: A member introduced a platform called Mojo GPT tailored for answering Mojo programming queries and sought community feedback. The platform can be tested and feedback provided at Mojo GPT.

  • Serving Up Iterators for String Characters: A helpful crosspost was shared for those needing an iterator over string characters, with a link directing to the relevant message on Discord.

  • mojo-ui-html Gets Exciting Updates: A new update to mojo-ui-html includes keyboard events for creating video games or custom widgets, a new minimize window feature, and CSS kwags for additional per-element styling. Details and demonstrations are available on GitHub.

  • Lightbug Framework Gaining Momentum: Contributions to the Lightbug HTTP framework were highlighted, including performance boosts, a pure Mojo-based client implementation, and comparisons showing Lightbug serving more requests per second than Python’s Flask. The advancements and contributions can be explored further on GitHub.

Links mentioned:


Modular (Mojo šŸ”„) ā–· #community-blogs-vids (1 messages):

  • Scrumtuous Achieves Top Google Ranking: A member humorously announced their rank as #1 on Google for a high-value Python keyword, attributing the success to Mojo. There are no further details or links provided regarding the specific keyword or the content that achieved the ranking.

Modular (Mojo šŸ”„) ā–· #performance-and-benchmarks (2 messages):

  • Seeking SYRK Implementation in Mojo: A member inquired about an implementation of SYRK (symmetric rank-k update) in Mojo for the purpose of conducting some performance tests.

Modular (Mojo šŸ”„) ā–· #šŸ“°ļø±newsletter (1 messages):

Zapier: Modverse Weekly - Issue 29 https://www.modular.com/newsletters/modverse-weekly-29


Modular (Mojo šŸ”„) ā–· #nightly (2 messages):

  • Tuning into Prince: A member whimsically suggests that the phrase ā€œPurple flameā€ could inspire a song reminiscent of a famous Prince hit, humorously adapting the lyrics to ā€œPurple flame, purple flameā€¦ā€.

  • Generics Shock: Another member expresses astonishment at the mention of ā€œHeterogeneous variadic generics,ā€ conveying a mixture of surprise and confusion at the complex programming concept.


DiscoResearch ā–· #mixtral_implementation (5 messages):

  • Mixtral Model Conversion Scripts Shared: A member shared the MoE Weights conversion script for a previous Mixtral model (convert_mistral_moe_weights_to_hf.py) and the official conversion script for the new Mixtral release found on the Hugging Face GitHub repository (convert_mixtral_weights_to_hf.py).

  • New Mixtral Model on Hugging Face: An updated Mixtral-8x22B model has been uploaded to Hugging Face, with a model card and conversion scripts provided by the uploader, later cloned to an official community repo.

  • Misinterpretation on Model Performance Corrected: There was a correction about a performance comparison between GPT-4, Claude Sonnet, and the Mixtral model; the original statement mistakenly referred to a different model named command-r+, not Mixtral.

Links mentioned:


DiscoResearch ā–· #general (18 messagesšŸ”„):

  • Mixtral’s Magnet Link Shared: A link to the Mixtral 8x22b model torrent was posted, providing a way to download this new AI model.
  • License Confirmation for Mixtral: The Mixtral model is confirmed to be released under the Apache 2.0 license, with an instruct version anticipated to follow.
  • First AGIEval Results Are Promising: A member highlighted the Mixtral 8x22b model’s impressive performance in the First AGIEval Results, suggesting it outperforms other base models.
  • Benchmark Scores Released: Benchmark scores for various datasets such as PIQA, BoolQ, and Hellaswag were shared, comparing the performance of Mixtral 8x22B and Mixtral 8x7B models.
  • Model Runs on Virtual Large Language Model (vLLM): It’s noted that the benchmark scores are generated using a virtual Large Language Model setup with 4xH100 GPUs, and there’s a mention of the MMLU task taking around 10 hours on this configuration.

Links mentioned:


DiscoResearch ā–· #discolm_german (24 messagesšŸ”„):

  • New LLM Merging Tool Unveiled: A new library for merging multiple Large Language Model (LLM) experts named mergoo has been shared, which claims to simplify and improve the efficiency of the merging process. This tool is noted to be inspired by the branch train mix paper from March.

  • RAG Benchmarking Reveals Odd Behavior: DiscoResearch/DiscoLM_German_7b_v1 model shows disparate performance outcomes dependent on the placement of a line break in the ChatML template; without it, accuracy drops significantly on some tasks within a newly created RAG benchmark.

  • Line Break Impact Investigated: The discovery of a line break affecting benchmarks triggered discussions about a potential data loading/formatting script issue, and whether this could relate to broader erratic benchmark results. It prompted a plan to review training data application, with a mention of updating data for an upcoming 8x22 model.

  • Model Formatting Issues Explored: Conversation around the tokenizer configuration for DiscoLM_German_7b_v1, speculating about whether modifying the tokenizer config might address performance anomalies.

  • Generalizability of Line Break Issue in Question: The unique sensitivity to line break formatting has raised questions about whether this could be an issue specific to DiscoResearch/LeoLM models, or a more general phenomenon affecting other models as well. The topic remains open for further testing and investigation.

Links mentioned:


LLM Perf Enthusiasts AI ā–· #gpt4 (16 messagesšŸ”„):

  • Good Morning with a Tweet: A member greeted the channel with a ā€œgmā€ and shared a Twitter link potentially related to new updates or discussions from OpenAI.
  • Surprising Benchmark Results: Wenquai reported unexpected findings where Sonnet and Haiku performed better than GPT-4 Turbo and Opus in a quick vision benchmark, linking to a Colab research document for review.
  • Exploration of GPT-4 Turbo Features: The GPT-4 Turbo’s function calling and JSON mode were highlighted as promising for building with vision models, sparking interest in further benchmarking these features.
  • Is It GPT-4.5 or not?: Members joked about the incremental nature of the latest model improvements, with one stating it feels more like a 4.25 update, while others cited OpenAI employees’ claims of enhanced reasoning capabilities.
  • Comparison of AI Coding Abilities: There was a brief exchange discussing the coding capabilities of the latest models, where potrock mentioned no coding issues using the model in cursor while others brought up comparisons with Gemini 1.5 and discussed the benefits of copilot++.

Link mentioned: Google Colaboratory: no description found


Datasette - LLM (@SimonW) ā–· #llm (15 messagesšŸ”„):

  • LLM Help Command Performance: A user reported that the llm --help command was slow, taking more than 2 seconds to run. Their concern was whether this indicated a potential security issue like being hacked.
  • Benchmarking LLM Help: In response to concerns about llm --help performance, a different user shared a fast benchmark result: 0,50s user 0,10s system 94% cpu 0,624 total.
  • Timing LLM on Different Setups: A follow-up by the original user indicated that llm --help took 3.423 seconds on their setup, but only 0.800 seconds in a fresh docker container, suggesting that the slowdown might be related to system configuration rather than the llm tool itself.
  • Reinstallation Resolves Issues: The user facing performance issues with llm --help found that reinstalling llm resolved both the speed problem and an error encountered when running Claude models, suggesting that a fresh install could alleviate certain operational problems.
  • LLM Command Hiccups on MacOS: Another user experienced the llm cmd command hanging when run locally on macOS with iTerm2, while it worked fine on a remote Ubuntu server. They noted a customized shell environment, which they suspected might contribute to the issue, though the same configuration was working on Ubuntu.

Skunkworks AI ā–· #general (3 messages):

  • Seeking Benchmark Comparisons: A member inquired about a paper citing performance benchmarks for models like phi-2, dolphin, and zephyr on the HumanEval dataset.

  • Skepticism on Benchmarks: A member expressed skepticism towards benchmarks, suggesting they can be gamed. However, they recommended a human-ranked leaderboard for trustworthy results, available at arena.lmsys.org.

  • First AGIEval Results for Mistral 8x22b: The Mistral 8x22b model’s first AGIEval results have been shared, indicating superior performance over other open source (base) models. The updates can be found in two tweets by Jan P. Harries, detailed here and here.

Link mentioned: Tweet from Jan P. Harries (@jphme): @MistralAI first AGIEval results look great šŸ‘‡ - thanks for releasing this beast, guys! šŸ‘ https://x.com/jphme/status/1778028110954295486 ā†˜ļø Quoting Jan P. Harries (@jphme) First AGIEval results fo…


Skunkworks AI ā–· #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=Gb—4supXoo


Mozilla AI ā–· #llamafile (4 messages):

  • Fine-Tuning GPU Usage: A member discovered that using a lower -ngl value (number of GPU layers to use) resolved their issue, and they settled on -ngl 3. They noted that performance was significantly better with smaller models due to their GPU’s limited memory.

  • Adaptive Layer Offloading in Question: In the context of VRAM limitations, a member inquired if llamafile could potentially offload layers to fit a user’s available VRAM instead of crashing, linking to their own configuration with a 1050 GPU.

  • ollama Offers LLM Flexibility: A member praised ollama for its method of handling model layer distribution, sharing a specific GitHub link discussing the implementation details: ollama server.go.

Link mentioned: ollama/llm/server.go at c5c451ca3bde83e75a2a98ed9fd4e63a56bb02a9 Ā· ollama/ollama: Get up and running with Llama 2, Mistral, Gemma, and other large language models. - ollama/ollama


Alignment Lab AI ā–· #general-chat (2 messages):

  • Tuning into Remix Music AI: A member shared excitement about a remix music model, describing it as ā€œpretty fucking amazingā€ with a link to listen: Loading Song….
  • Call for Coding Support: A user requested direct messaging for assistance with their code, reaching out to a specific member for help.

Link mentioned: SynthTrails: no description found