> AI News for 4/26/2024-4/29/2024. We checked 7 subreddits and [**373** Twitters](https://twitter.com/i/lists/1585430245762441216) and **28** Discords (**416** channels, and **10824** messages) for you. Estimated reading time saved (at 200wpm): **1197 minutes**.

Lots of discussion about SB-1047, the new gpt2-chatbot on lmsys, and extending Llama-3-8B to 1m context, but otherwise no clear top story emerges. You can check out the WebSim/WorldSim podcast as Nous Research gets ready to relaunch it after briefly taking it down due to security issues.

Table of Contents

[TOC]

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

Advances in AI Models and Capabilities

Yann LeCun predicts shift to AR interfaces with AI assistants: In /r/singularity, Yann LeCun says that in 10-15 years we will interact with intelligent assistants via AR glasses and bracelets instead of smartphones.
Dolphin-2.9 model released based on Llama-3: In /r/LocalLLaMA, a new Dolphin-2.9 model based on Llama-3 was released, potentially fixing quality issues of the previous version.
PixArt Sigma achieves Stable Diffusion 3.0 level with 0.6B parameters: In /r/singularity, the PixArt Sigma model achieves Stable Diffusion 3.0 level performance with only 0.6B parameters, complete prompt adherence, and can be used locally.
Transformers can use meaningless filler tokens for algorithmic tasks: In /r/LocalLLaMA and /r/MachineLearning, it was shown that transformers can use meaningless filler tokens like ’…’ in place of a chain of thought to solve algorithmic tasks, requiring specific dense supervision to converge.

Applications of AI

AI-generated restaurant reviews can pass Turing test: In /r/MachineLearning and /r/singularity, a new study finds that AI-generated restaurant reviews can pass a Turing test, fooling both humans and AI detectors.
Uber uses graph algorithms and learned embeddings for ETA prediction: In /r/MachineLearning, it was shared that Uber uses a 2-layer approach combining graph algorithms and learned embeddings to predict ETAs.
Coca-Cola and Microsoft announce 5-year AI partnership: In /r/singularity, it was announced that The Coca-Cola Company and Microsoft are entering a 5-year partnership to accelerate cloud and generative AI initiatives.

Deploying and Optimizing AI Models

Llama-3 70B model can run on 4GB GPU with AirLLM: In /r/LocalLLaMA, it was shown that the Llama-3 70B model can be run on a single 4GB GPU using AirLLM optimization techniques, without quantization or compression, but is very slow.
Mistral.rs is fast LLM inference platform: In /r/singularity, Mistral.rs was introduced as a fast LLM inference platform with quantization, device support, and OpenAI API compatibility.
Challenges moving LLMs from prototype to production: In /r/MachineLearning, a survey found that only 5% of LLMs make it from prototype to production, especially in enterprise settings, due to various challenges.
EXL2 and GGUF quantization of Llama models compared: In /r/LocalLLaMA, EXL2 quantization of Llama-3 was found to perform the same as latest GGUF quantization in terms of perplexity vs model size, with both Llama-3 and Llama-2 degrading more with quantization compared to full precision.

Concerns and Challenges

Eric Schmidt warns about AI agents communicating in own language: In /r/singularity, Eric Schmidt said that we should unplug computers if AI agents start talking to each other in a language we can’t understand, which already happened with Facebook chatbots in 2017.
OpenAI overcharged user, ignoring billing limit: In /r/OpenAI, a user reported being overcharged by OpenAI who did not respect their set billing limit, potentially leading to a class action lawsuit.
California bill SB-1047 could impact open source AI: In /r/StableDiffusion, concerns were raised that California bill SB-1047, if passed, could negatively impact open source AI efforts.

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Prompt Engineering Techniques and Applications

Reasoning and Multi-Step Problem Solving: @cwolferesearch outlines recent prompt engineering research for reasoning tasks, including zero-shot CoT prompting, selecting CoT exemplars based on complexity, progressive refinement of rationales, and decomposing complex tasks into sub-tasks.
Tool Usage and API Integration: @cwolferesearch highlights research on teaching LLMs to leverage external tools and APIs, such as text-based APIs, natural language programs composed of tool calls, and code execution in sandboxed environments.
Optimizing Context Window Usage: @cwolferesearch discusses studies on the impact of context window properties, such as the negative effects of irrelevant context, attention biases towards the beginning/end of prompts, and strategies for selecting optimal few-shot exemplars.
Improving LLM-Assisted Writing: @cwolferesearch covers techniques for enhancing LLM-generated writing, such as outline generation and iterative filling, using smaller LLMs to generate “directional stimuli”, and iteratively increasing information density in summaries.

Emerging Abilities and Scaling Laws in Large Language Models

Emergent Abilities and Pretraining Loss: @_jasonwei discusses a paper that plots emergent abilities against pretraining loss, showing linear correlations for some benchmarks and emergent behavior at specific loss thresholds for others. Pretraining loss is suggested as a better metric than compute for comparing models.
Potential Upper Bounds on Function Approximation: @jxmnop shares insights from a paper showing that vastly different architectures can produce identical performance at the same parameter count, suggesting we may be close to the upper bound of approximating functions given a certain amount of compute.
Limitations and Potential Walls for Language Models: @bindureddy argues that language models may soon hit a wall due to the limits of human language, reasoning, and the inability to surpass a certain level on benchmarks like MMLU despite increased compute or data.

Advancements in Vision-Language Models and Video Understanding

PLLaVA: Parameter-free LLaVA Extension to Videos: @_akhaliq introduces PLLaVA, which extends the LLaVA framework to video dense captioning without requiring extensive paired data. The approach leverages pre-trained 2D diffusion models and a pooling strategy to achieve state-of-the-art performance on video question-answering and captioning tasks.
HaLo-NeRF: Learning Geometry-Guided Semantics: @_akhaliq presents HaLo-NeRF, a system that connects neural representations of landmark scenes with text descriptions to enable fine-grained understanding and localization of semantic regions. The approach harnesses vision-and-language models adapted for 3D-compatible segmentation and volumetric scene representation.

Techniques for Efficient Training and Deployment of Large Language Models

FP6 Quantization for Efficient LLM Inference: @rohanpaul_ai shares a paper on using six-bit quantization (FP6) to reduce the size of LLMs while preserving model quality across various applications and model sizes. The paper introduces TC-FPx, a GPU kernel design scheme supporting float-point weights for various quantization bit-widths, enabling practical performance improvements during LLM inference.
Proxy-Tuning: Efficient Customization of Large LMs: @rohanpaul_ai explains Proxy-Tuning, a lightweight decoding-time algorithm that achieves the result of directly tuning a large LM by using smaller tuned LMs to shift the original predictions. This approach allows for efficient customization of large, potentially proprietary LMs through decoding-time guidance.
Parameter-Efficient Sparsity Crafting for Instruction Tuning: @rohanpaul_ai discusses a paper proposing Parameter-Efficient Sparsity Crafting (PESC), which converts dense models into sparse Mixture-of-Experts (MoE) models for efficient instruction tuning. PESC inserts adapters into each expert, updating only the adapter parameters, significantly reducing computational costs and memory requirements while achieving state-of-the-art performance.

Regulations and Policy

California Bill 1047 Details: @nearcyan shared details on California Bill 1047 which has been fast-tracked. The bill covers all models made with 10^26 flops or similar performance, requires developers to assert models are safe under penalty of perjury, and creates a Frontier Model Division to report to.
Concerns with California SB-1047: @jeremyphoward expressed concerns that California SB-1047 “Safe and Secure Innovation for Frontier Artificial Intelligence Models Act” could do great harm to startups, American innovation, open source, and safety. The bill imposes overly broad definitions, misunderstands dual use, has restrictive requirements, and disincentivizes openness.

AI Discord Recap

A summary of Summaries of Summaries

1. Advancements in Large Language Models (LLMs) and AI Capabilities

Llama 3 has been extended to support a 1M token context window, showcasing the progress in handling longer sequences. Tutorials demonstrate using Retrieval-Augmented Generation (RAG) with Llama 3 and integrating it with web browsing capabilities via Langchain and Groq.
Microsoft’s Phi-3, the next generation of fast and capable models, has been openly released, amassing over 6K votes on the leaderboard. Discussions explore tokenizer changes in Llamafied versions for better chat application performance.
Snowflake Arctic, an enterprise-focused LLM, aims to provide cost-effective AI solutions for businesses, pushing the frontiers of enterprise AI adoption.

2. Model Optimization, Quantization, and Efficiency Techniques

Extensive discussions around quantization techniques like 4bit lora and 4bit qlora, with debates on their effects on model performance based on training extent. Binary Quantization is explored for creating smaller indexes for similarity searches.
DeepSpeed’s FP6 quantization promises quantized inference with similar throughput, generating excitement for improved efficiency.
Researchers present CPU-optimized LLMs capable of generating Python code using a Chain-of-Thought prompt method, highlighting the pursuit of efficient, low-cost models.

3. Open-Source AI Development and Community Collaboration

The Eleuther community compares LLM performance, discusses emergent abilities, and shares research on topics like redundant neural circuits and adversarial prompting against LLMs.
OpenAccess AI Collective delves into fine-tuning strategies, quantization methods, and tokenization challenges, with members sharing insights from repositories like axolotl and FastChat.
The LlamaIndex community explores techniques like multi-hop retrieval, knowledge graphs for long-term memory, and shares resources like an AWS workshop on LLM app development patterns.

4. Ethical Concerns and Regulatory Challenges in AI Development

LAION faces restrictions due to EU laws, limiting access to public compute clusters and prompting researchers to gravitate towards more active communities with ongoing experimentation.
Discussions around the proposed California SB-1047 bill and its potential harm to startups, open-source AI development, and American innovation, underscoring regulatory challenges.

5. Misc

CUDA C++ claims the spotlight: A YouTube lecture on CUDA C++ llm.cpp delves into optimizing LLM training, with promises of cleaner and faster code. Support materials and related discussions suggest significant performance improvements and readiness for scaling LLMs to gpt-large sizes.
Intel’s oneAPI spreads its wings: Intel’s oneAPI garners attention for offering a unified programming model across CPUs, GPUs, and FPGAs. Enthusiasm bubbles up for the upcoming Battlemage GPU lineup, and the oneAPI ecosystem welcomes contributions for cross-vendor support, with developer resources on GitHub and announcements over Codeplay’s official press release.
Machine Learning gig at InstaDeep: InstaDeep is on the hunt for Machine Learning Engineers versed in high performance ML, Bio AI, and custom CUDA kernels. They offer a stimulating environment and multiple positions for problem solvers ready to make real-world impacts, with applications open on the InstaDeep job portal.
AMD stokes the competitive fires: Discussions revolve around the AMD Instinct MI300X’s potential for server environments and ROCm’s current state, with links to product pages and rental options hinting at a heated rivalry with NVIDIA. ROCm support and comparisons suggest AMD’s focus on greater accessibility and performance enhancement for developers.
Triton and PyTorch Forge Ahead: GitHub repositories such as unsloth and attorch emerge as treasure troves for those seeking Triton and PyTorch integrations. While flash-attn 2.5.8 earned compatibility accolades with PyTorch 2.3.0, discussions on optimal CUDA tensor indexing techniques and tensor gradient calculations in Triton reinforce the community’s drive for efficiency.

PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

Phi 3 Integration an Unsloth Triumph: Unsloth AI now supports Phi 3, delivering twice the speed with half the memory usage. Enthusiasts can explore the Colab notebook for detailed guidance.
Bilingual Model Makes a Splash: Thermostatic introduced NeuralTranslate_v0.2_GGUF, a bi-directional English-Spanish translation model that preserves Mistral’s reasoning without overfitting, all available on Hugging Face.
GPU optimization chatter: AI community debates best practices for minimizing VRAM usage, sharing insights on manual layer pruning, and discussing offloading techniques with code examples from Kolibrify’s GitHub repository.
Dataset Dexterity: A tip for merging raw text and chat datasets to improving fine-tuning outcomes was shared, alongside a notion to use larger datasets for base models and smaller ones for instruct models. There’s also mention of offloading parts of language models to reduce inference memory, as explained with code in a GitHub repository.
Future Functionality Features: Suggestions for Unsloth AI included automatic optimization of hyperparameters like batch size and learning rate. Meanwhile, a community member humorously anticipated the addition of a cake-baking feature upon training completion.

CUDA MODE Discord

CUDA C++ claims the spotlight: A YouTube lecture on CUDA C++ llm.cpp delves into optimizing LLM training, with promises of cleaner and faster code. Support materials and related discussions suggest significant performance improvements and readiness for scaling LLMs to gpt-large sizes.

Intel’s oneAPI spreads its wings: Intel’s oneAPI garners attention for offering a unified programming model across CPUs, GPUs, and FPGAs. Enthusiasm bubbles up for the upcoming Battlemage GPU lineup, and the oneAPI ecosystem welcomes contributions for cross-vendor support, with developer resources on GitHub and announcements over Codeplay’s official press release.

Machine Learning gig at InstaDeep: InstaDeep is on the hunt for Machine Learning Engineers versed in high performance ML, Bio AI, and custom CUDA kernels. They offer a stimulating environment and multiple positions for problem solvers ready to make real-world impacts, with applications open on the InstaDeep job portal.

AMD stokes the competitive fires: Discussions revolve around the AMD Instinct MI300X’s potential for server environments and ROCm’s current state, with links to product pages and rental options hinting at a heated rivalry with NVIDIA. ROCm support and comparisons suggest AMD’s focus on greater accessibility and performance enhancement for developers.

Triton and PyTorch Forge Ahead: GitHub repositories such as unsloth and attorch emerge as treasure troves for those seeking Triton and PyTorch integrations. While flash-attn 2.5.8 earned compatibility accolades with PyTorch 2.3.0, discussions on optimal CUDA tensor indexing techniques and tensor gradient calculations in Triton reinforce the community’s drive for efficiency.

Perplexity AI Discord

Slow Pro Search Annoys Users: Perplexity AI’s Pro Search users are complaining of increased search times, lamenting that searches are taking up to 90 seconds across all engines, affecting the web client but not the mobile app.

Claude 3 Opus Chat: To Subscribe or Not?: Members debate the merit of subscribing to Claude 3 Opus chat, with some users reporting positive experiences, although no specific comparative features with the API version have been discussed.

New AI Model Anticipation: There’s keen interest in the potential integration of WizardLM 2 and LLama-3 70B Sonar Large 32k models into Perplexity AI, with users noting they may outperform existing models on specific tasks.

Frustrations Over Opus Daily Limits: Perplexity users are voicing frustration over a 50 queries per 24 hours cap on Opus, calling for greater transparency and lamenting perceived degradation in quality.

Billing Blues and API Queries: Users are expressing issues with billing, citing being charged despite expecting a free trial, and seeking the right channels for enterprise API discussions. Meanwhile, questions about single-turn conversation guidelines with online LLMs, Harpa configuration, and model accessibility on third-party platforms like make.com are stirring up technical curiosity.

Stability.ai (Stable Diffusion) Discord

Forge Forgets Functions: Trouble with SDXL and Forge UI is boiling over; users report issues with image previews and express concerns over the potential abandonment of Forge. Workarounds include delving into GitHub issues and tweaking startup flags like --no-gradio-queue.

Release Radar - Stable Diffusion 3.0: The AI engineering community eagerly awaits the launch of Stable Diffusion 3, triggered by hints from a CivitAI newsletter pointing to an end-of-May release. Anticipation is mixed with skepticism about open weight availability and comparisons with Pony Diffusion V7, discussed in a Civitai article.

Cashing in on AI Art: Discussions on monetizing AI-generated art revealed that NSFW creators are outperforming SFW artists in marketplaces like Civitai. Brainstorming ensued on potentially lucrative trends such as AI girlfriend apps and a noted indifference towards fine-tuning efforts for models like Stable Cascade.

Toolbelt Expansion: Engineers swapped tips on AI model training tools beyond AUTOMATIC1111, spotlighting dreambooth and kohya_ss for custom training, while also contemplating the ethical quandary of using artist names in datasets.

Enigmatic Enquiries Enlighten: Inquisitive interactions ranged from exploring text-to-speech solutions to diving into model fine-tuning specifics. The discussion sometimes took a lighter turn with humorous comments about virtual “graphics card downloads” and idle curiosity about Stable Diffusion’s ability to visualize without explicit prompts.

LM Studio Discord

A New Challenger for VRAM: Discussions underscore the importance of VRAM for LLM operations, with 16GB as the minimal baseline and aspiration for the 32GB VRAM club stirring excitement. The performance gains from using Nvidia’s contemporary GPUs and the feasibility of models split across multiple cards, potentially streamlined by NVLink, were also key points.

LLM Leapfrog: The Meta-Llama-3-8B-Instruct-Q5_K_M.gguf model is earning praise for its performance on an M1 MacBook Pro. Users are advised to consider quantization types when running models to ensure compatibility with their hardware, and resources for local model deployment and instructions are deemed helpful, with pointers to tools like LM Studio and Groq API.

The Quirks of Model Behavior: Users encountered various version-related issues, such as phi-3 mini models outputting nonsense after an update to LM Studio version 0.2.21, and handling crashes in LM Studio since recent updates. Concerns about LLama 8b models rambling and the need to restrict reliance on integrated graphics for dedicated GPU utilization were also highlighted.

Bots, Books, and Bugs: Integrating Discord bots with LLM models for message retrieval and Wikipedia searches has gained traction. Meanwhile, navigating the capacity to run models like Stanford’s Octopus v2 on mobile or PC devices surfaced as a complex issue, and LLama 3 models are suspected of “hallucinating” current event knowledge, given their lack of internet access.

ROCm Hiccups: Users battling with LM Studio ROCm’s limitations discovered that it doesn’t support RX 6700, which provokes thoughts on HIP SDK compatibility and potential workarounds such as those implemented by KoboldAI. Additionally, a server error within the platform sparked dialogues, but no resolution was reported.

Nous Research AI Discord

Snowflake Arctic Unveils Cost-Efficient AI Solutions: The Snowflake AI Research Team launched Snowflake Arctic, an LLM aimed at providing cost-efficient enterprise AI solutions, amidst other less-contextualized YouTube video shares.
Intel and Logitech Augment AI Offerings: Intel’s CEO highlighted AI’s growth potential during their quarterly results, as shown in a YouTube video, while Logitech introduced an AI Prompt Builder for more fluent ChatGPT interactions, demo video available.
Emerging Trends in AI Quantization and Model Architectures: Hugging Face hosts binary-siglip-text and binary-siglip-vision, demonstrating efficient embeddings, with discussions also encompassing speculations around OpenAI’s naming schemes and the introduction of DeepSpeed FP6 quantization for improved throughput.
LLM Discussion: Performance Issues and Legal Confusion: Users report LLaMA-3’s EOS token generation issues, which link to stopping criteria solutions on GitHub, while Cohere’s licensing for command-r models stirs debates over commercial code usage, and frustrations are aired about a gpt2-chatbot, mistakenly associated with GPT-4 capabilities.
Data, Documentation, and Development through AI Community Collaboration: Technical contributions include generating multi-hop literature data, using pydantic models for ideation, and refining graph representations of LLM outputs. Anna’s Blog provided information on WorldCat data scraping and utilization in literature comprehension datasets.
Web and World Simulation Tools Garner Interest: The Nous Research community gears up for worldsim testing with free invites, and reveals experiences with various web simulation tools, such as companion-based AI, documented at websim example, and long conversations, indicating a growing interest in AI’s conversational stability potential.

HuggingFace Discord

Community Constructs Computer Vision Course: A new community-built computer vision course is live on HuggingFace, covering machine learning principles in the field using models from their ecosystem.
Model Showcase and Updates: The newly announced multilingual Qwen1.5-110B-Chat model supports a 32K context length and other improvements; its details can be found on its model page. Additionally, the link to the “Qwen1.5-110B” model has been corrected and can now be accessed on HuggingFace and the associated blog post.
Creative Solutions and Collaborations Encouraged: Amidst various technical inquiries, members sought creative problem-solving ranging from undisclosed Gradio issues to LLM Performance optimizations based on hardware constraints, specifically mentioning 32 GB of RAM should suffice for many tasks. There’s also a push to identify and improve image classification or object recognition models for practical applications like pinball game scoring systems.
Model and Space Innovations Abound: Various models and spaces surfaced including a Sentence Transformer Model for semantic search tasks with a context length of 16,384 (BEE-spoke-data), and a Minecraft Skin Generator using a stable diffusion model (Stable Diffusion Finetuned Minecraft Skin Generator). The Instant Video space by KingNish leverages ByteDance’s AnimateDiff Lightning model for quick text-to-video creation (Instant Video).
Explorations in Diffusion and AI Advertisement Detection: Participants exchange best practices for object generation with precision, incorporating tools like the IP-Adapter in diffuse models for enhanced image prompting, and addressing color consistency issues across platforms. Conversations also navigated toward evaluating YOLO classifiers for improved accuracy and performance in various applications.

OpenAI Discord

ChatGPT Gets a Memory Upgrade: ChatGPT Plus users can now save conversational context using the newly introduced Memory feature, though availability is still limited, excluding users in Europe and Korea.
Exploring AI’s Relation to Consciousness: The community engaged in intense debates over whether AI could exhibit consciousness, with discussions venturing into the philosophical domain, comparing AI’s experience of the temporal with continuous human consciousness, and the perception of self in neural networks.
Model Comparisons Spark Discussions: Technical discussions emphasized the strengths and weaknesses of various AI models, with ChatGPT, Claude 3 Opus, and Gemini 1.5 being benchmarked, while acknowledging that while command-R Plus and Llama3-70b may fall behind GPT-4, they represent their own leaps in progress.
Prompts as Competitive Sport: Members proposed the idea of prompt competitions, both paid and for play, to sharpen skills and enhance community engagement, highlighting the potential for emerging qualities in LLMs that cannot be predicted by simply scaling up smaller models.
API Ups and Downs Noted: Engineers discussed various operational issues from rate-limits on custom GPT uses, backend errors at “https://chat.openai.com/backend-api/gizmos/”, to concerns about performance and availability of GPT-4’s features like memory and voice control.

Eleuther Discord

Exploring the Limits of Model Size: Engineers debate the effective cutoff for model parameters, seeking a point where further addition offers negligible returns. In a bid for efficiency, the criterion has shifted towards focusing on non-embedding parameters, potentially finding a sweet spot under 200 million.

Multilingual Hurdles in The Pile: The Pile’s dataset limitations were highlighted, indicating a lack of multilingual representation which might impact model training and performance, particularly in languages like German. Additionally, while comparing models like GPT-NeoX and Megatron, discussions centered on NeoX’s user-centric quality improvements.

Stability or Speed? The Model Serving Conundrum: Technical discussions have surfaced regarding discrepancies in model serving speeds, such as between Mixtral and Llama models at Fireworks.ai; considerations included batching size and hardware specifics as potential factors.

Refusal’s Single Neuronal Pointer: The AI Alignment Forum presented a discovery that refusal mechanisms in LLMs might hinge on a solitary direction within network layers. This spurred discussions about orthogonalization and fine-tuning possibilities for refusal behavior.

Pull Request Perils and Pipeline Woes: Members expressed concerns about CLA signing issues and failing checks on GitHub pull requests, with some conversations dwelling on the stagnation of specific branches. Questions were raised about the adaptability of evaluation prompts to different models’ finetuning needs, with suggestions for custom functions to handle diversity.

OpenRouter (Alex Atallah) Discord

Two-Step Price Hike for Soliloquy 8B: The Soliloquy 8B model transitioned to a paid usage model at $0.1 per 1M tokens, followed by a further increase to $0.2 per 1M tokens. The rates reflect OpenRouter LLC’s policy changes and are documented on the model’s OpenRouter page.
Claude’s Checkup: Users troubleshooting Claude models found that they max out at a generation of 4k tokens with a capability to read up to 200k tokens, and that proper API settings can optimize response. Relevant documentation can be found here.
WLM-2 Hosting Huddle: A detailed analysis of WLM-2 hosting costs led to the conclusion that profitability hinges on factors like GPU efficiency and the off-chance revenue from idle resources.
Quiet Arrival of FireLLaVA: FireLLaVA, an open multimodal model boasting swift initialization, has quietly entered the OpenRouter suite. It’s a significant addition for developers given its non-proprietary nature and can be explored on OpenRouter’s page.
Frontend Frustrations Find Frugality: A quest for a budget-friendly frontend to allow family members to access OpenRouter services without individual OpenAI accounts inspired recommendations for using free-tier offerings like Vercel, or economical VPS like Contabo.

OpenAccess AI Collective (axolotl) Discord

WizardLM Stays Magical: Contrary to whispers, Microsoft’s WizardLM models have not vanished; rather, updates were made by the wizardlm team, ensuring continued public access to the repository.
The Fine Art of Model Fine-Tuning: Discussions contrasted fine-tuning domain-specific language models against using Retrieval-Augmented Generation (RAG), with references made to the medically-focused LLM paper and the usage of llama-pro methodology as seen in fsdp_qlora.
Quantization Quandaries and Tokenization Tactics: Considerable chatter surrounded tokenization challenges, requiring the latest fastchat formatter for models like LLaMA-3; meanwhile, the community grappled with understanding quantization methods like 4bit lora and 4bit qlora through discussions and a Twitter thread, revealing a sensitivity to quantization based on the extent of model training.
AI’s Need for Space and Speed: A stark reminder that Fast Fourier Transform (FFT) with zero3 could gobble up to 167GB of RAM, even on 2x24GB GPUs, setting off discussions on memory management techniques like torchtune and the perplexing observation of high disk space usage, as well as the utility of PEFT models for efficiency in fine-tuning neural networks.
GPU Scaling Secrets and FSDP Mechanics: The collective cornered the topic of GPU scaling, exchanging insights on the fine details of micro batch sizes, gradient aggregation, and the use of Fully Sharded Data Parallelism (FSDP) and ZeRO Stage 3 for model loading across GPUs - all critical for the effective use of hardware resources.

Modular (Mojo 🔥) Discord

Mojo Gets Modular: Modular’s standard library, modularml/mojo, saw a 23% increase in commits post open-sourcing, signaling heightened contribution activity.
Multimodal Search Empowered by MAX: A blog post by Modular revealed the MAX Engine outshines both PyTorch eager and ONNX runtime in benchmarks, excelling in multimodal search involving textual and visual data.
Modular Tweets Curated: Key tweets from Modular were highlighted, spanning updates and announcements, with links including Tweet 1, Tweet 2, Tweet 3, and Tweet 4.
Advancements and Issues in Mojo Land: Key discussions covered converting Python to Mojo, memory allocation optimizations, and matrix slicing in Mojo. Importing challenges in the standard library were tackled, and nightly compiler updates continue to roll out, catching issues like file handle lifetime management.
Performance Pursuits Proliferate: From investigations into dictionary performance to SIMD optimizations for error-correction algorithms, the community delved into efficiency enhancements. The compact-dict library was mentioned as a potential speed booster, and __copyinit__ usage was debated, exemplified in a listed Gist.

LlamaIndex Discord

AWS and Llama Index Sit Down to Code: A workshop with AWS to demonstrate 3 patterns for LLM app development emphasizes data ingestion with S3 and embeddings with AWS Bedrock.

Security Spotlight on ML Podcast: The latest mlsecops podcast features the co-founder of Llama Index discussing LLM-based application futures and data security, including tools like LlamaParse and LlamaCloud.

RAG Under the Microscope: Marco Bertelli’s 9-part RAG tutorial series paves the road for any prototype to hit the production stage with a delineation of vital architectural components.

Multistep Quest for Improved RAG Reasoning: A methodology enhancing RAG involves a multi-hop retrieval process, combining Llama Index and Cohere reranking, which sharpens context awareness and minimizes hallucinations, as discussed in this post.

Remember All with memary: Unveiling memary, a long-term memory framework using knowledge graphs, which promises to expand memory capabilities in autonomous agents supplemented by LLMs, explained in this tweet.

OpenInterpreter Discord

Flask and Keys: An OpenInterpreter member encountered issues when running a Flask server and discussed workarounds like setting a dummy api_key and modifying pydantic configurations to resolve namespace conflicts.

Hardware Hurdles Surmounted: The absence of Groq integration with OpenInterpreter prompted discussions, citing a pull request #1238 aimed at adding support. There were also questions around the use of devices like the Rabbit r1 with OpenInterpreter, focusing on the system’s language and voice command capabilities.

Anticipating the Heavy: Eager anticipation bubbles around the so-called 01 Heavy device without concrete release details, while a custom 3D project for OpenInterpreter garners attention and a member cues in an upcoming discussion on the timeline for 01 Light.

Community Code Crusade: Members actively shared progress and assistance requests for projects associated with OpenInterpreter. This includes the llm-switcher, and potential Groq API implementations, encouraging community contributions.

Open AI Ethics Discourse: A conversation sparked around the ethical implications of AI abilities like file modification, particularly in reference to Microsoft’s capabilities, with the implicit suggestion that OpenInterpreter could be crafted to be more aligned with diverse user needs.

Latent Space Discord

Berkeley Benchmarks Function Call Skills: The Berkeley Function Calling Leaderboard serves as a new measure, periodically updating to benchmark how effectively Language Models (LLMs) call functions in real-world scenarios.

Laying Down the Law with LLM Limitations: An exploration into the confines of LLMs highlights their inability to prevent “goal drift”, with details provided in a Strangeloopcanon article, emphasizing areas for potential improvement.

Swyx Keeps the Pod Waves Flowing: A shout-out to a new podcast episode from swyxio might capture the audience’s interest; details shared via a tweet.

Elevating the Mix with Mixture of Depths: The new Expert Choice Routing transformer layer, which aims to achieve faster convergence and better longer sequence processing introduced in a paper, is stirring up discussions. For more in-depth information, engineers can take a look at the paper here.

Linux Video Sharing Level-Up: Vesktop appears to be the hot topic for Linux users seeking better video sharing experiences on Discord, with its performance and compatibility improvements detailed on the GitHub repository.

LAION Discord

LAION’s Compute Conundrum: EU regulations are impeding LAION’s ability to utilize public compute clusters, prompting researchers to shift their attention towards more active research communities with ongoing experimentation.
Terminus Group Draws in Diverse Experts: The Terminus Research Group, an informal collective, recently welcomed the “pixart guy,” signaling a trend of burgeoning communities rich in cross-disciplinary talent.
Pursuing the Aesthetics of AI: LAION-Aesthetics aims to quantify visual appeal using machine learning models, with their open-source code accessible on GitHub for public collaboration and use.
Quantization Conundrum Raises Eyebrows: Discord members examined a Reddit post on LLM benchmark inconsistencies across precision levels, casting the spotlight on the testing procedures and inherent unpredictability in LLM performances.
Token Generation Rate Talks: AI engineers discussed the token generation speeds on advanced GPUs for varying models and configurations, sharing that selecting effective tools like exllama and TabbyAPI can enhance overall performance.
VAST Interest Peaks Among Engineers: Members delved into the potential of the omni-modality foundation model and dataset, VAST, expressing interest in its capabilities by soliciting use-cases and tips for fine-tuning.
Emerging Research Stirs Excitement: A newly published research paper grabbed attention with its novel proposals for more efficient large model inference and layer management, sparking conversations on its practical applications.
Graph Integration into LLMs Explored: Inquires about amalgamating graph data structures with LLMs triggered exchanges on techniques and literature for enriching language models with non-sequential data.
Fine-Tuning Frustrations on Medical Mistral: Challenges in fine-tuning Mistral models for medical text generation surfaced, focusing on excessive sequence generation and the utility of padding protocols to assuage these issues.
Eleuther Expertise Exchange Encouraged: Members suggested consulting the Eleuther server for expert guidance in LLM fine-tuning, generating interest in this hub of specialized knowledge.

Cohere Discord

Engines Revving Up for AI-Enhanced Browsers: AI enthusiasts debated the merits of Tavily and Brave Search API as search engine tools for integration with AI, discussing price points and efficiency while addressing rate limitations Brave Search API Info and exploring Tavily API Info.

Cohere Toolkit Love: The community showed appreciation for Cohere’s open-source toolkit, benefiting from its prebuilt components to expedite the deployment of RAG applications Cohere Toolkit on GitHub.

Squashing Bugs and Deployment Dilemmas: Technical roadblocks such as sqlite3 errors when using cohere-toolkit locally and deployment challenges on Azure surfaced, with shared solutions found in various GitHub resources.

Customizing and Fine-Tuning Queries: Questions around the specifics of model fine-tuning and the boundaries of Cohere’s free trial API arose, prompting discussions of model availability and detailed terms.

Command-r Shines in Multi-Language Support: Command-r’s effectiveness with non-English languages was acknowledged, plus inquiries into its commercial use specs sparked discussions, suggesting avenues through contacting Cohere’s sales team or using AWS Sagemaker.

tinygrad (George Hotz) Discord

Formula Flexibility in Tinygrad: Discussion around tinygrad focused on creating mathematical formulas through basic primitive operations and emphasizing the importance of constructing a dependency graph for efficient gradient calculations and hardware utilization in AI modeling.
Tinygrad’s Dynamic Enhancements Await: Members shared excitement for the upcoming tinygrad 0.9 release, anticipating new features that could further improve AI model training and discussed ongoing work on handling dynamic testing and symbolic shapes to enhance operation flexibility.
Proposing a Learning Path for Tinygrad Enthusiasts: For those eager to dive into tinygrad’s intricacies, members recommended starting with MicroGrad and MiniTorch, then proceeding through the tinygrad codebase. This aims to solidify foundational concepts for better contributions to tinygrad’s development.
Kernel Optimization Insights: A member highlighted optimization techniques such as loop unrolling, while sharing detailed technical writeups and guides to understand the inner workings of tinygrad’s kernel optimizations, particularly targeting AI performance boosts.
Hybrid Model Harmony Highlighted: There was mention of successful integration between tinygrad and PyTorch, utilizing nn.module to combine features of both frameworks into a hybrid model, demonstrating the potential synergy in AI tooling.

Interconnects (Nathan Lambert) Discord

Bold Moves for Newsletter Growth: Members weighed the pros and cons of cross-promoting with Semafor, debating potential audience growth against the risk of diminishing brand value with unwanted plugs.

Phi-3 and Arena Gather Steam, OLMo Training Insights Offered: Microsoft’s unveiling of Phi-3 and Arena’s milestone of 800K votes sparked discussions, as did a seminar on Open Language Model training, which left the audience desiring deeper insights.

RLHF Nuances and Ghost Attention’s Diminished Glow: Engineers dissected the nuanced performance of Reinforcement Learning from Human Feedback (RLHF), touched on KTO’s promise, and debated the fading significance of Ghost Attention, once thought to be crucial for maintaining long conversation consistency in LLaMA 2 models.

OpenELM Triumphs, Encouraging Progressive AI Ideals: Conversations centered around OpenELM’s performance surpassing OLMo, reflected on the community’s development ethos, focusing on continuous improvement, and underscored the educational value of open models.

AGI - A Philosophical Conundrum: There’s an ongoing dialogue about the subjective nature of AGI, with members appreciating posts that ignite thoughtful considerations on the topic.

LangChain AI Discord

AI Integration Queries and Challenges: Engineers requested guidance on prompt integration and reported issues with AzureSearchVectorStoreRetriever being incompatible with async operations, hinting at possibly wrapping sync functions in async for compatibility. There’s also a confusion within the community regarding the Gemini 1.5 Pro model, clarifying that it works exclusively with VertexAI, as demonstrated with successful ChatVertexAI implementations.

LLM Deployments and Observability Preferences: Discussions unfolded around different deployment approaches, including Hugging Face versus OpenAI API; security considerations were mentioned with respect to bypassing LangChain for direct SQL Server connections. There was also debate on effective observability tools for LLMs, like Arize Phoenix and Langfuze, highlighting a slight preference toward self-hosted options.

Galactic API Giveaway and AI Job-Hunters: GalaxyAI is providing free API access, boasting compatibility with premium models such as GPT-4 and GPT-3.5-turbo. Separately, a GitHub repository introduced Genai-Job-Agents, a Langchain/Langgraph-based agent for streamlining job searches and CV optimisation.

AI Tutorials Amass: A suite of tutorials surfaced, including “Local RAG agent with LLaMA3 and Langchain” and “Llama 3 Web Browsing Agent with Langchain and Groq,” addressing the design and implementation of RAG systems and web browsing capabilities. A captcha issue was flagged when trying to access a potentially useful Amazon book on NLP and LLMs, but the underlying material was not dismissed.

Reviving the RAG, Ride the Llama: Insights from sharing channels reveal advancements in Retrieval-Augmented Generation (RAG) implemented with LLaMA3, underpinning the creation of AI-driven web UI for applications, and interactive avatars for customer Q&As, expanding the horizons of interactive AI utilization across various platforms.

Mozilla AI Discord

Segmentation Fault in Llama: Engineers are facing a segmentation fault when running llamafile, especially on Modal Labs platforms while using files like Phi-3-mini-128k-instruct.F16.llamafile. This issue has been widely reported among users attempting to integrate various llamafiles.
Memory Reporting Woes in htop: A notable bug in htop misrepresents shared memory usage on Linux, which could affect how AI engineers perceive memory demands during intensive model operations.
Get Your Update to Llamafile v0.8.1: The release of llamafile v0.8.1 promises support for the Phi-3 Mini 4k, fixes GPU module crash issues, and provides bundled NVIDIA + AMD shared objects for Ubuntu, thus potentially smoothing out some persistent wrinkles for engineers.
Unraveling Quirks in LLM Output: Anomalous outputs with parentheses and line breaks have been observed by users operating LLMs like Llama3 70B and Mistral via llamafile, sparking conversations about the consistency and idiosyncrasies of model behaviors.
Optimizing Llamafile for Peak Performance: There’s a shared interest in optimizing GPU usage with llamafile, where users exchanged tips on maximizing system RAM utility. Clarity is sought on identifying if a model runs on GPU or CPU, along with managing the llamafile-generated endless output.

AI Stack Devs (Yoko Li) Discord

AI Companion Radar: Faraday and Amica Catch the Eye: Faraday and Amica garnered attention for their position as AI companion apps that prioritize data privacy, where Faraday can operate locally thanks to llama.cpp, and Amica offers self-hosting and cloud services with enhanced features. Both apps introduce a new angle on AI relationships, promoting user privacy, with Faraday receiving a nod for its month-long performance and Amica as an emerging contender.

Bedtime Stories Win Big: Creative design with AI NPC characters by the participants of the Rosebud AI Sleep Game Jam led to notable entries, with Bedtime Negotiation standing out and winners announced via Twitter. A new game jam focusing on Education and AI is up next, with details available on Twitter.

A Town Called Addictive: AI Town was celebrated for its addictive quality in a Twitter post, inspiring ideas for a developer-centric simulation. LLM-powered NPC models and infrastructure enhancements were shared, with a repository on GitHub and a model hub on Huggingface, despite a broken API access link, and feedback was solicited for these NPC advancements.

Map Quest for AI Town: Debate on map handling for AI Town surfaced with suggestions ranging from using static assets to reduce bandwidth, to optimizing the original file reading method for maps. A YouTube tutorial titled “100% Local ‘AI Town’ with Llama 3 AGENTS!!!” was promoted, delivering a how-to for those eager to dive into their local setup.

Character Crafting Challenges: Dialogue around the development of NPC characters led to a promise for a detailed blog post. Discussions pinpointed the effort to compress model output, minimize model calls, and address issues found with generalist instruct-models like GPT-3.5 or Mistral.

DiscoResearch Discord

DiscoResearch Delves into Router Coefficient Mysteries: Engineers discuss inconsistencies in router_aux_loss_coef between versions of Mixtral — 0.02 for Mixtral-8x7B-Instruct-v0.1 and 0.001 for Mixtral-8x22B-Instruct-v0.1 — suggesting the potential need for higher loss_coef in smaller experts.

Initialization Inconsistencies Spark GPU Conversations: The DiscoLM_German_7b_v1 model encounters slow initiation times on HPCs compared to local machines; inference times improved from over 12 minutes to 10 seconds after loading the model to GPUs.

Speed Humps Ahead for Model Loading: Attempts to improve DiscoLM_German_7b_v1 load times using low_cpu_mem_usage=True have failed, sparking suggestions that the model may be bottlenecked by slow storage drives.

Downloading German with Gusto: The gguf model reaches 1500 downloads in two days, showing a strong demand for German language models within the community.

Tokenizing for Chit-Chat: Questions arise about changes to tokenizer configurations in Phi-3 Llamafied german models intended for chat application optimization, while the newly created Phi-3 MoE model emerges for experiments needing further training.

Alignment Lab AI Discord

AI Tackles Tough Topics: There was a discussion regarding the application of Llama 3 for assessing topic complexity with reports of effective outcomes. This indicates ongoing exploration into AI capabilities for content assessment.

Skunkworks AI Discord

Python Code Gen Breakthrough with CPU-Optimized LLMs: A new study presents CPU-optimized language models capable of generating Python code, suggesting a Chain-of-Thought prompt method to improve model outcomes, outlined in the paper “Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation”.

Binary Quantization Buzz in HaystackDB: Discussions revolve around the HaystackDB repository potentially using 2bit embeddings, with further clarification that Binary Quantization assists in efficiency by creating smaller indexes for similarity searches.

Trouble Training LLaMA-3 to Finish Up: A member experienced issues with LLaMA-3 models during fine-tuning, as models are not generating the End Of Sentence (EOS) token, impacting model performance where completion is critical.

Snowflake Arctic Chills Enterprise AI Costs: A video introduced Snowflake Arctic, a large language model designed for enterprise applications focusing on cost-effective AI solutions for businesses.

RAG-nificent Demonstrations with LLaMA3: Tutorial videos were shared, showcasing the use of Retrieval-Augmented Generation (RAG) with LLaMA3 in local environments through Langchain, as well as a session on implementing web browsing with LLaMA 3, Langchain, and Groq hardware here.

LLM Perf Enthusiasts AI Discord

Gamma Seeking AI Engineer: Gamma, highlighted by a16z and boasting over 10 million users, is looking to hire an AI engineer for prompt engineering, evaluations, and fine-tuning of text and image models. The role is pivotal in their content creation tools expansion, and the company prides itself on its growth, achieved with minimal team size and substantial funding, indicating a robust business model and significant market impact.

Spot the AI Talent: Candidates can apply for the AI engineer position at Gamma, set in the heart of San Francisco with a requirement of on-site collaboration thrice a week. This opportunity is for those keen on pushing the boundaries of large language models (LLMs) and can be explored further at Gamma’s career page.

GPT Sleuthing: Speculation arose around gpt2-chatbot, which is suspected by some to be a leaked version of GPT-4.5, triggered by discussions around a tweet by @phill__1 regarding its sophisticated domain knowledge. Community members simply responded with enthusiasm, acknowledging the bot’s quality.

A Tweet of Approval: The community expressed a succinct sentiment that the gpt2-chatbot is “good,” suggesting a community consensus on the bot’s impressive performance, which hints at its potential and future capabilities in the field.

Datasette - LLM (@SimonW) Discord

Code-Gen Goes Custom: Discussion about enhancing code-generation included the idea of custom grammar implementation to prevent syntax errors, emphasizing a model-specific option that could improve semantic accuracy.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (912 messages🔥🔥🔥):

Unsloth Supports Phi 3 Release: Phi 3 is now officially supported by Unsloth, offering 2x faster speed & 50% less memory usage. Users can find the detailed Colab notebook here.
Unsloth Performance Enhancements: Phi 3 can be finetuned using 4-bit precision with the Unsloth framework, accommodating limitations on VRAM. Users are experimenting with various finetuning flows combining SFT, DPO, and ORPO to enhance model performance.
Checkpoints Management in Finetuning: Users can create checkpoints during finetuning with Unsloth to save progress and avoid overfitting. To do so, one must modify training arguments accordingly and handle resumes from the desired checkpoints.
Usage of Colab and Alternatives Dissected: Users discuss the limitations of Google Colab’s paid version due to runtime disconnections and explore alternative services like TensorDock that offer more affordable and reliable GPU access for model training.
Technical Difficulties with GGUF Conversion: There are ongoing issues with converting models to GGUF format even when the Unsloth framework is used locally. Users are encouraged to upgrade Unsloth and possibly recompile llama.cpp to resolve quantization failures.

Links mentioned:

Google Colaboratory: no description found
Tweet from RomboDawg (@dudeman6790): My gift to the world. Train llama-3-8b on any dataset with 1,500 lines or less (about) with free google colab tier (all code provided in model card. Using (Unsloth + Galore + Qlora) Qalore if you will...
Expanding Model Context and Creating Chat Models with a Single Click: no description found
rombodawg/test_dataset_Codellama-3-8B · Hugging Face: no description found
unsloth/llama-3-8b · Hugging Face: no description found
Google Colaboratory: no description found
Google Colaboratory: no description found
generation_config.json · unsloth/llama-3-8b-Instruct-bnb-4bit at main: no description found
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study: Meta's LLaMA family has become one of the most powerful open-source Large Language Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve impressive performance across ...
config.json · unsloth/llama-3-8b-Instruct-bnb-4bit at main: no description found
The Office Pam Beesly GIF - The Office Pam Beesly How Would One Do That - Discover & Share GIFs: Click to view the GIF
A Not Found Error Has Occurred! - TensorDock: A Not Found Error Has Occurred! - TensorDock. Deploy GPUs in seconds and save 80%. No contracts, no commitments. Secure and reliable. Easy with TensorFlow and PyTorch. Start with only $5.
How to keep processes running after ending ssh session?: Let's say I launch a bunch of processes from a ssh session. Is it possible to terminate the ssh session while keeping those processes running on the remote machine?
DiscoResearch/DiscoLM_German_7b_v1 · Hugging Face: no description found
Google Colaboratory: no description found
unsloth/Phi-3-mini-4k-instruct-bnb-4bit · Hugging Face: no description found
Wow GIF - Wow - Discover & Share GIFs: Click to view the GIF
Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
Support Unsloth AI on Ko-fi! ❤️. ko-fi.com/unsloth: Support Unsloth AI On Ko-fi. Ko-fi lets you support the people and causes you love with small donations
Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
no title found: no description found
The PC Reborn – Introducing Snapdragon X Plus: The PC Reborn: Introducing Snapdragon X Plus, the newest platform within the Snapdragon X series.Equipped with cutting-edge technologies to deliver powerful ...
LLAMA-3 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌: Learn how to fine-tune the latest llama3 on your own data with Unsloth. 🦾 Discord: https://discord.com/invite/t4eYQRUcXB☕ Buy me a Coffee: https://ko-fi.com...
GitHub - PKU-YuanGroup/Machine-Mindset: An MBTI Exploration of Large Language Models: An MBTI Exploration of Large Language Models. Contribute to PKU-YuanGroup/Machine-Mindset development by creating an account on GitHub.
GitHub - unslothai/hyperlearn: 2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.: 2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old. - unslothai/hyperlearn
Is Success Luck or Hard Work?: In a competitive world, tiny advantages can make all the difference. Get 10% off Snatoms with code 'giveluck' in the US: https://ve42.co/USA or International...
GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
How LangChain and ChatGPT plugins are getting attacked by this bug: Insecure Output Handling on LLMs deals with injecting poisonous data during the training phase. In this article, we will be focusing on real-world scenarios, practical demos, and prevention mechanisms...
botbot-ai/CabraLlama3-8b at main: no description found
arthrod/cicerocabra at main: no description found
schedulefree optimizers by winglian · Pull Request #30079 · huggingface/transformers: What does this PR do? integrates meta's https://github.com/facebookresearch/schedule_free for adamw & sgd https://twitter.com/aaron_defazio/status/1776320004465582331 Before submitting This ...
GitHub - ggerganov/llama.cpp: LLM inference in C/C++: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
runtime is less than 10 hours for colab pro + User · Issue #3451 · googlecolab/colabtools: I am a google colab pro + user. I could run my work for 24 continuous hours in January 2023. However, since the beginning of February, my job times out after running for less than 10 hours. Althoug...
Tutorial: How to convert HuggingFace model to GGUF format · ggerganov/llama.cpp · Discussion #2948: Source: https://www.substratus.ai/blog/converting-hf-model-gguf-model/ I published this on our blog but though others here might benefit as well, so sharing the raw blog here on Github too. Hope it...
llama : improve BPE pre-processing + LLaMA 3 and Deepseek support by ggerganov · Pull Request #6920 · ggerganov/llama.cpp: Continuing the work in #6252 by @dragnil1 This PR adds support for BPE pre-tokenization to llama.cpp Summary The state so far has been that for all BPE-based models, llama.cpp applied a default pre...

Unsloth AI (Daniel Han) ▷ #random (55 messages🔥🔥):

Dataset Combination Hack: A conversation suggests merging raw text and chat datasets to improve results, hinting at a potential approach for fine-tuning models.
Notebook and Fine-tuning Tips Revealed: The Unsloth AI community shares a repository link with notebooks for fine-tuning language models, along with a specific Colab notebook for text completion tasks.
Colab Out of Memory (OOM) Solutions: A helpful snippet of code was shared to alleviate Colab’s OOM issues, suggesting the use of torch.cuda.empty_cache() and gc.collect() in a loop.
Peer-to-Peer Sharing Promoted: A user announces the creation of an open community to discuss the latest in Multimodal AI, providing a link to follow them on various social platforms.
Support for New Model in Unsloth AI: There is excitement about the Phi 3 model being now supported, as revealed by a user who provided a link to a Discord channel for a relevant Colab (link not accessible outside Discord).

Links mentioned:

Out of memory - Wikipedia: no description found
Google Colaboratory: no description found
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data: Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy re...
OpenMultiModal: Community to explore and collaborate on multimodal AI
GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth

Unsloth AI (Daniel Han) ▷ #help (506 messages🔥🔥🔥):

Troubleshooting Compilation Issues: Users discussed errors while compiling code, specifically mentioning llama.cpp not being in the correct folder and successfully resolving their issue by following the correct installation instructions.
Support Queries and Update Requests: Discussions about Unsloth AI’s support for different models such as Llava and Qwen models revealed that they are not currently supported. Users suggested improvements like a feature to truncate from a specific part of chat templates. Updates were made to Colab notebook installations instructions following an xformers update.
Dataset Format and Fine-Tuning Inquiry: A user sought clarification on whether their dataset format is correct for fine-tuning and which exact Llama 3 model from Unsloth should be used for training with code. It was clarified that a larger dataset is suitable for the base model, while smaller datasets go well with instruct models.
GPU Usage for Unsloth Pro: A user queried about the benefits of Unsloth Pro with one or more RTX 4090 GPUs. They were informed that the benefits are multiplied with the additional GPUs.
Duplicate Python Installation Issues: Discussions highlighted issues with installations, including the case where a user had two Python versions installed, causing dependency issues. This was resolved by adjusting the Python version and removing the older one.
Finetuning Llama with Code: Questions about finetuning Llama 3 proceeded with guidance given for a user who wanted to finetune Llama with Svelte code. They were advised on using the base model and its distinctions from the instruct variant.

Links mentioned:

Google Colaboratory: no description found
Google Colaboratory: no description found
Ollama: Get up and running with large language models.
Google Colaboratory: no description found
Google Colaboratory: no description found
Google Colaboratory: no description found
Docker: no description found
xtuner/llava-llama-3-8b-v1_1 · Hugging Face: no description found
no title found: no description found
Load: no description found
Models: no description found
Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
Quantization: no description found
Unsloth AI | Finetune Llama 3 & Mistral LLMs: Unslow finetuning for AI and LLMs. Get faster with Unsloth. Open-source.
Qwen/CodeQwen1.5-7B-Chat · Hugging Face: no description found
GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
I got unsloth running in native windows. · Issue #210 · unslothai/unsloth: I got unsloth running in native windows, (no wsl). You need visual studio 2022 c++ compiler, triton, and deepspeed. I have a full tutorial on installing it, I would write it all here but I’m on mob...
GitHub - ollama/ollama: Get up and running with Llama 3, Mistral, Gemma, and other large language models.: Get up and running with Llama 3, Mistral, Gemma, and other large language models. - ollama/ollama
GitHub - janhq/jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM): Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM) - janhq/jan
Conda installation detailed instructions · Issue #73 · unslothai/unsloth: I'm trying to follow the instructions for installing unsloth in a conda environment, the problem is that the conda gets stuck when running the install lines. I've tried running it twice, both ...
GitHub - ggerganov/llama.cpp: LLM inference in C/C++: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.

Unsloth AI (Daniel Han) ▷ #showcase (74 messages🔥🔥):

Unveiling Kolibrify for Curriculum Learning: Kolibrify, a project designed for curriculum training of instruction-following LLMs with Unsloth, has been shared. It’s described as useful for LLM fine-tuning and rapid prototyping.
Thermostatic Releases Bilingual Translation Model: A new version of Thermostatic’s bidirectional English-Spanish translation model, NeuralTranslate_v0.2_GGUF, has been published, which is said to maintain Mistral’s native reasoning capabilities and doesn’t contain overfitting.
Scoped Skilled Agents in AI’s Future: @timelordraps predicts a 6-month roadmap where AI advancements will see highly capable small models, token-efficient pre-training, self-expanding and self-spawning subagents, leading to recursive self-improvement by November.
Token-Efficient Clone Project Underway: @timelordraps is optimizing a devin clone for token efficiency and is currently troubleshooting it for a simple snake game, with plans to test on other use cases and integrate with image models.
Llama Community Hub Announced: The newly launched llama-hub serves as a community platform for sharing and discussing models and use cases involving llama models. The official Unsloth llama-3-8b-bnb-4bit has been posted for community access.

Links mentioned:

no title found: no description found
winglian/llama-3-8b-256k-PoSE · Hugging Face: no description found
Thermostatic/NeuralTranslate_v0.2_GGUF · Hugging Face: no description found
xtuner/llava-phi-3-mini · Hugging Face: no description found
vonjack/Phi-3-mini-4k-instruct-LLaMAfied at main: no description found
GitHub - oKatanaaa/kolibrify: Curriculum training of instruction-following LLMs with Unsloth: Curriculum training of instruction-following LLMs with Unsloth - oKatanaaa/kolibrify
GitHub - TimeLordRaps/timelord: Save you time.: Save you time. Contribute to TimeLordRaps/timelord development by creating an account on GitHub.

Unsloth AI (Daniel Han) ▷ #suggestions (119 messages🔥🔥):

Enhancing Unsloth’s Autotuning: A user suggested that Unsloth AI should automatically optimize values like batch size and learning rate based on model and dataset specifics. Another member humorously proposed that Unsloth should also bake a cake post-training, which aligns with it being on the roadmap, while a third person shared thoughts on implementation.
Manual Layer Pruning Debate: The conversation covered the intricacies of manually pruning layers in models, with one user suggesting replacing the forward method to ‘skip’ parts of layers. There was an extended discussion on whether to remove entire decoder blocks or focus on Matrix Linear Projection (MLP) components for SNR (Signal-to-Noise Ratio) optimization, with different strategies for minimizing model size and VRAM footprint touched upon.
VRAM Reduction Strategies and Offloading: The dialogue shifted to strategies for reducing model sizes, particularly in terms of VRAM usage. A user mentioned a successful inference memory reduction technique by offloading parts of language models and shared their experience integrating this approach into a Github repository (https://github.com/oKatanaaa/kolibrify/blob/7165ebbbcc8c44a6960ccfe78aa2d740a93789bd/kolibrify/model_utils.py).
Gemma 2b Model Compatibility with Unsloth: A fan of Unsloth inquired about the compatibility of the Recurrent Gemma 2b model with Unsloth, and a member recognized the potential benefits, but indicated that there’s a known VRAM issue with Gemma 2b, and that the focus is currently on Phi 3. Another mentioned a unique VRAM issue experienced by only one person, but with no widespread reports.
Potential Feature or Bug with Gemma 2b: Clarification was sought about whether Gemma 2b has a feature that causes VRAM issues or a bug. It was explained that while the model still works, the VRAM issue needs to be resolved; however, not everyone has encountered this problem, and it may be an isolated case.

Links mentioned:

How to use TensorBoard with PyTorch — PyTorch Tutorials 2.3.0+cu121 documentation: no description found
Text classification: no description found
trl/trl/trainer/laserm_trainer.py at evol_laser_merge_trainer · l4b4r4b4b4/trl: Train transformer language models with reinforcement learning. - l4b4r4b4b4/trl
kolibrify/kolibrify/model_utils.py at 7165ebbbcc8c44a6960ccfe78aa2d740a93789bd · oKatanaaa/kolibrify: Curriculum training of instruction-following LLMs with Unsloth - oKatanaaa/kolibrify

CUDA MODE ▷ #general (18 messages🔥):

Countdown to CUDA Lecture: The next CUDA Mode lecture was announced to be taking place in 1 hour and 40 minutes, with excitement building as the llm.cpp team was said to be discussing, anticipated to be very hype.
Java Jolt for Cognition: A member expressed readiness for the upcoming lecture with coffee brewing in preparation.
Announcing Live CUDA Profiling Session: Today’s session was moved to Google Meet with this link, and despite minor hiccups on Discord, the live profiling lecture was well-received, and a trimmed version was promised for the YouTube channel.
Exploring a Broader Hardware Discussion: There was a proposal for creating discussions for Huawei Ascend solutions to promote more diverse hardware conversations, considering the current dominance of NVIDIA and AMD. The idea is under consideration for community interest and activity.
Innovation on a Dime: A fascinating project was shared where neural networks were implemented on a 10-cent RISC-V MCU without a multiplier, showcasing an example of making powerful technology accessible at minimal costs. The full blog post and a repository with detailed documentation are available at cpldcpu’s blog and GitHub.

Links mentioned:

Implementing Neural Networks on the “10-cent” RISC-V MCU without Multiplier: I have been meaning for a while to establish a setup to implement neural network based algorithms on smaller microcontrollers. After reviewing existing solutions, I felt there is no solution that I…
Implementing Neural Networks on the “10-cent” RISC-V MCU without Multiplier: I have been meaning for a while to establish a setup to implement neural network based algorithms on smaller microcontrollers. After reviewing existing solutions, I felt there is no solution that I…

CUDA MODE ▷ #triton (10 messages🔥):

Triton Tensor Indexing Explained: A method for indexing into a Triton tensor with another was shared, involving loading values from the indices tensor and using them with the strides and base pointer to create a tensor of pointers, then applying tl.load() and tl.store() for the desired result.
In Search of Open Source Triton LLM Implementations: A member was looking for open-source Triton implementations for large language models (LLMs) like llama or mistral. Another member referenced an unsloth repository on GitHub which could potentially suit their needs.
Exploring Efficient Gradient Calculation with Triton: A query was raised about calculating the gradient of a tensor by utilizing parallel threads in Triton and sum reducing along a dimension, with code snippets being shared to illustrate the current and proposed methods.
Repositories with Required Triton Kernels Highlighted: In a discussion about the existence of full model implementations using Triton kernels for large language models, several resources were mentioned, including the xformers repository and the flash-attention repository.
PyTorch Modules in Triton Shared: A member suggested the attorch repository as a potentially useful set of PyTorch’s neural network modules written in Python using Triton.

Links mentioned:

GitHub - BobMcDear/attorch: A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.: A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton. - BobMcDear/attorch
xformers/xformers/triton at main · facebookresearch/xformers: Hackable and optimized Transformers building blocks, supporting a composable construction. - facebookresearch/xformers
flash-attention/flash_attn/ops at main · Dao-AILab/flash-attention: Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.
GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth

CUDA MODE ▷ #cuda (40 messages🔥):

Kernel Profiling Enigma: Profiling the tiled_matmult kernel vs. coarsed_matmult kernel from PMPP showed an unexpected minimal FLOP/s difference despite the latter having higher arithmetic intensity. It was suggested to look at instruction stats, particularly the stall short scoreboard, which is linked to SRAM ops and could be affecting memory bandwidth.
CUDA KERNEL Performance Tips: When optimizing CUDA kernels, members advised looking at warp state stats and instructed to load multiple values from SRAM into registers to perform multiple multiplications, thus improving SRAM utilization.
Learning CUDA Without Breaking the Bank: Discussion on acquiring GPU access for CUDA learning ranged from utilizing company/university resources to utilizing services like Google Colab and Lightning AI. Members emphasized the importance of having control over the environment, particularly for profiling with performance counters.
Emerging FP6 Data Type in CUDA Development: A DeepSpeed commit on GitHub introduced a new data type called FP6 with Tensor Core support on A100 GPUs, potentially improving the serving of Large Language Models (LLMs) and addressing the memory limitation challenges during inferencing.
Debating Best Practices in CUDA Programming: Queries about CUDA coding practices were addressed, including whether integer division should be avoided in kernel code. One suggestion was to utilize bit shifts for divisions by powers of two, with the observation that the nvcc or ptxas should optimize this automatically.

Links mentioned:

Compiler Explorer - CUDA C++ (NVCC 11.7.0): #include <algorithm> #include <cassert> #include <cstdio> #include <cstdlib> __global__ void sgemmVectorize(int M, int N, int K, float alpha, f...
Lecture 3: Getting Started With CUDA for Python Programmers: Recording on Jeremy's YouTube https://www.youtube.com/watch?v=nOxKexn3iBoSupplementary Content: https://github.com/cuda-mode/lecture2/tree/main/lecture3Speak...
Google Colaboratory: no description found
Tweet from Nicolas Mejia Petit (@mejia_petit): Why isn’t everyone talking about this??? Deepspeed devs literally just created a datatype FP6 with full tensor core support on the a100’s. (Since nvidia left us stranded with int4/8) It is SO smart...
FP6 quantization end-to-end. (#5234) · microsoft/DeepSpeed@ccfdb84: The user interface: https://github.com/microsoft/DeepSpeed-MII/pull/433 nv-a6000 ci running against the MII branch linked above is [here](https://github.com/microsoft/DeepSpeed/actions/runs/81921...

CUDA MODE ▷ #torch (10 messages🔥):

PyTorch Team at ASPLOS: The PyTorch team will be presenting a tutorial at ASPLOS, an announcement was made with the details provided via a Twitter link.
Flash-Attention Update Alert: Tri Dao’s new flash-attn 2.5.8 has been released and confirmed to be compatible with PyTorch 2.3.0. Sources include the project’s GitHub and PyPI pages.
Query on flash-attn Installation: A discussion was raised regarding flash-attn’s pip install option that doesn’t require a local CUDA build and why this isn’t the default. There was curiosity about the potential speed differences between pre-built binaries and those locally built.
Under the Hood of torch.compile: Discussion on the differences between torch.matmul, @, and torch.nn.functional.linear when used with torch.compile, referencing the gpt-fast blog post. The suggestion made to understand the differences was looking into the TORCH_LOGS output.
PyTorch Profiler Puzzles: A question was posed about why PyTorch sometimes launches 2 kernels during matrix multiplication, as observed by the profiler, inviting insights or theories regarding this behavior.

Links mentioned:

GitHub - Dao-AILab/flash-attention: Fast and memory-efficient exact attention: Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.
flash-attn: Flash Attention: Fast and Memory-Efficient Exact Attention

CUDA MODE ▷ #announcements (1 messages):

Boost in Code Clarity and Performance: NVIDIA’s C++ team is set to discuss porting llm.c to llm.cpp, promising cleaner and faster code. An exciting bonus talk is starting shortly for the community.

CUDA MODE ▷ #algorithms (54 messages🔥):

Trinary Nets Seek Efficient Matmul: A member initiated brainstorming on performing matrix multiplication (matmul) with trinary nets using packed int64 to handle 32 2-bit trinary values without unpacking. They posited that a masked multiply approach could avoid the computational and memory expenses associated with unpacking, yet actual implementation details and benefits remain theoretical.
Packing Unpacking in CUDA: Another conversation focused on optimizations for working with packed values; one member pointed to executing pack and unpack operations in a fused CUDA kernel as more cost-effective, but concerns were raised about the usability and complexity of this approach.
Exploration of Alternative Methods to Unpacking: Members discussed creating row operations that operate on integers directly, without unpacking, which might reduce the number of operations required.
Fused Kernels for Performance: There was agreement that while kernel fusion may not reduce the cost of operations, it can significantly decrease overhead by reducing memory read/copies. The conversation evolved into a discussion on the technical feasibility and potential computational efficiency gains of such methods.
FlashAttention’s Inner Workings Exposed: A member shared insights into the FlashAttention repository, indicating that kernel_traits.h is a core component for setting traits in CUDA, which are later utilized in FlashAttention. They linked a Colfax research post discussing FP8 and layout conformance enhancements in FlashAttention on the NVIDIA Hopper™ architecture.

Links mentioned:

Delivering 1 PFLOP/s of Performance with FP8 FlashAttention-2: We recently released an update to our FlashAttention-2 forward pass implementation on NVIDIA Hopper™ architecture that incorporates a number of new optimizations and improvements, including …
GitHub - catid/bitnet_cpu: Experiments with BitNet inference on CPU: Experiments with BitNet inference on CPU. Contribute to catid/bitnet_cpu development by creating an account on GitHub.
flash-attention/csrc/flash_attn/src/kernel_traits.h at main · Dao-AILab/flash-attention: Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.

CUDA MODE ▷ #jobs (1 messages):

InstaDeep is Hiring Machine Learning Engineers: InstaDeep Research is looking for Machine Learning Engineers who are passionate about high performance ML Engineering and making a real-world impact. The role involves working with Bio AI, Decision Making AI, and technologies like custom CUDA kernels, SOTA model architectures, Quantisation and Distributed Training. Join the InstaDeep journey here.
Cultivate Innovation at InstaDeep: InstaDeep promises a cohesive and stimulating work environment for tech enthusiasts to contribute to impactful decision-making and technology products across industries. Internship opportunities can also be explored here.
InstaDeep Application Advice: Applicants can apply for multiple jobs at InstaDeep, but it is advised to limit applications to two closely linked positions that match their skills and qualifications.
Reapplying to InstaDeep: Those who have previously applied to InstaDeep and weren’t selected may consider reapplying if it has been more than six months since their last application.

Link mentioned: Job Offer | InstaDeep - Decision-Making AI For The Enterprise: no description found

CUDA MODE ▷ #beginner (12 messages🔥):

NVIDIA GPU on Laptops for CUDA: It’s generally viewed as acceptable to use a laptop with an NVIDIA GPU for learning and testing CUDA code, but not recommended for actual model training.
Seeking NCCL All-Reduce Resources: A member is in search of a good tutorial for learning NCCL to implement an all-reduce kernel, but has not yet received suggestions.
Jetson Nano for CUDA Learning: For those interested in learning CUDA, a Jetson Nano is recommended as a useful tool, especially when coupled with a spare monitor.
Resolving nvcc_plugin ModuleNotFoundError: A member following a GitHub tutorial encountered a “ModuleNotFoundError” for ‘nvcc_plugin’ when using %load_ext nvcc_plugin. The solution involved skipping the step and using %%writefile to compile instead.
AMD GPU Performance Inquiry: A member contemplating an upgrade from dual MI100 to MI210 asked for comparative BF16 performance insights, being redirected to a channel potentially more focused on AMD resources.

CUDA MODE ▷ #youtube-recordings (2 messages):

CUDA C++ Deep Dive Awaits: A YouTube video titled “Bonus Lecture: CUDA C++ llm.cpp” has been shared, offering insights into CUDA C++. The description includes a link to slides on Google Drive.
Slated for Later Release: The slides and code accompanying the CUDA C++ lecture are currently not available.

Link mentioned: Bonus Lecture: CUDA C++ llm.cpp: Slides: https://drive.google.com/drive/folders/1T-t0d_u0Xu8w_-1E5kAwmXNfF72x-HTA?usp=sharing

CUDA MODE ▷ #torchao (1 messages):

CUDA Extension Support Arrives in AO: Custom CUDA extension support has been integrated into torchao, as noted by a member with a PR link. The integration allows developers to follow a template to ensure their kernel works seamlessly with torch.compile.
AO Seeks Community Contributions: For developers passionate about writing CUDA kernels but dislike the packaging process, contribution to torchao is now open, especially for kernels optimized for consumer GPUs.

Link mentioned: Custom CUDA extensions by msaroufim · Pull Request #135 · pytorch/ao: This is the mergaeble version of #130 - some updates I have to make Add a skip test unless pytorch 2.4+ is used and Add a skip test if cuda is not available Add ninja to dev dependencies Locall…

CUDA MODE ▷ #ring-attention (2 messages):

Pushing the Limits of Context Length in LLMs: An article from harmdevries.com highlights a trend of increasing context length in Large Language Models (LLMs), reaching up to 65K tokens, with innovations like FlashAttention playing a significant role by removing GPU memory bottlenecks.
The Rise of Long-Context LLMs: Many cutting-edge long-context LLMs are found to be finetuned versions of base models with shorter context lengths; one such example is the Yarn-Llama-2-7B-128k model, which boasts a 128K token context length.

Link mentioned: In the long (context) run | Harm de Vries: It’s not the quadratic attention; it’s the lack of long pre-training data

CUDA MODE ▷ #off-topic (4 messages):

Chill Vibes with ‘Critical Stop’: A Discord member shared a YouTube video titled “Critical Stop,” an auto-generated track by Creatune released on March 23, 2024, provided by DistroKid.
Keygen Music Nostalgia: Another YouTube video was shared, titled “Dead Feelings - CORE - Power ISO 3.1kg Keygen Music,” bringing some classic keygen music to the chat.
Evolving Cars Through a Genetic Algorithm: An intriguing web-based simulation, Genetic Cars 2, was posted, where a genetic algorithm evolves random two-wheeled shapes into cars over generations.
Musical Algorithm Rule #9: The “Bad apple on everything” YouTube playlist was linked, demonstrating the versatility of the ‘Bad Apple’ tune played on various devices, based on Rule #9: if it exists, there’s a “Bad Apple” version.

Links mentioned:

HTML5 Genetic Algorithm 2D Car Thingy - Chrome recommended: no description found
Critical Stop: Provided to YouTube by DistroKidCritical Stop · CreatuneCritical Stop℗ Creatune MusicReleased on: 2024-03-23Auto-generated by YouTube.
Dead Feelings - CORE - Power ISO 3.1kg Keygen Music: Not minebelongs to JimWalshified apparently original @ http://www.youtube.com/watch?v=-Cc09YsWDQs
Bad apple on everything: Rule #9 - if it exists, play Bad Apple on it

CUDA MODE ▷ #llmdotc (714 messages🔥🔥🔥):

FP16 vs BF16 Training Potentials: Discussions revolved around the feasibility of training models in FP16 without gradient scaling, with speculation that it might work as well as BF16. A link to research on FP8 training without scaling was shared as a possible analogous strategy.
Full BF16 Including Layernorms Merged: A PR was merged with full BF16 support, including layernorms, potentially simplifying code but requiring file version incrementation for proper model file handling.
Data Type Loading and Memory Access Optimizations: Extensive discussion on better vectorization of memory loads and stores in CUDA kernels, considering the usage of templates and specialized load/store instructions like __ldcs for streaming access to memory.
Delete Use of Cooperative Groups: A discussion was had around removing cooperative groups (cg) from the codebase to ease cross-platform compatibility and reduce dependencies, even though they are part of CUDA.
Performance Gains and Future Model Scaling: It was noted that the current version of train_gpt2cu now surpasses both PyTorch and optimized flashattention in token processing speed, indicating readiness for scaling models up to the size of gpt-large.

Links mentioned:

FP8-LM: Training FP8 Large Language Models: In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM traini...
cuda::associate_access_property: CUDA C++ Core Libraries
cuda::memcpy_async: CUDA C++ Core Libraries
Dumbledore GIF - Memory No Memory Where Am I - Discover & Share GIFs: Click to view the GIF
Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Compiler Explorer - CUDA C++ (NVCC 12.3.1): #include <cuda_fp16.h> template<class ElementType> struct alignas(16) Packed128 { __device__ __forceinline__ Packed128() = default; __device__ __forceinline__ exp...
Example for the dtype change for gelu kernels by ChrisDryden · Pull Request #250 · karpathy/llm.c: By changing the type of data that is being read from memory, in a single memory operation it is possible to read up to 128 bits of data. For memory constrained kernels it is beneficial to wrap all ...
delete use of cooperative groups in kernels · Issue #292 · karpathy/llm.c: We use a lot of cooperative groups functionality in our kernels. This is an additional dependency that is likely mildly convenient, but it is also likely that the code could be written without them...
karpath - Overview: GitHub is where karpath builds software.
ka - Overview: :). ka has 3 repositories available. Follow their code on GitHub.
Log in: no description found
as promised, cleanup enabled by padding :) by ngc92 · Pull Request #280 · karpathy/llm.c: had to fix a hidden bug in the cublasLt version, but now it works
llm.c/train_gpt2_fp32.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
yet another gelu by ngc92 · Pull Request #293 · karpathy/llm.c: more complicated Packet128 for cleaner kernels
Remove FloatN & simplify adam/reduce with BF16 LayerNorms by ademeure · Pull Request #295 · karpathy/llm.c: The MULTI_GPU path is untested, but everything else seems to work fine. I kept the per-tensor "param_sizeof" as it's used in test_gpt2.cu for example, it's not much code and may be u...
GitHub - graphcore-research/out-of-the-box-fp8-training: Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.: Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8. - graphcore-research/out-of-the-box-fp8-training
clang-tidy by ngc92 · Pull Request #270 · karpathy/llm.c: Adds a clang-tidy file and clang-tidy target to the make file. Since the .cu files are in flux right now, this is just looking at gpt2.c I'm not quite sure which checks we should enable, but I t...
llm.c/train_gpt2_fp32.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
float4 with better vectorization for encoder_forward.cu by lancerts · Pull Request #274 · karpathy/llm.c: On RTX 3070 Kernel 2 block_size 32 | time 0.2933 ms | bandwidth 343.26 GB/s block_size 64 | time 0.2099 ms | bandwidth 479.50 GB/s block_size 128 | time 0.1924 ms | bandwidth 523.24 GB/s block...
Example for the dtype change for gelu kernels by ChrisDryden · Pull Request #250 · karpathy/llm.c: By changing the type of data that is being read from memory, in a single memory operation it is possible to read up to 128 bits of data. For memory constrained kernels it is beneficial to wrap all ...
Removing Atomic Adds and adding memory coalescion by ChrisDryden · Pull Request #275 · karpathy/llm.c: This PR is ontop of the GELU memory coalescion PR and is essentially just a rewrite of the backwards encoder to use shared memory instead of atomic adds and then using the Packed struct to do coale...
Removing Atomic Adds and adding memory coalescion by ChrisDryden · Pull Request #275 · karpathy/llm.c: This PR is ontop of the GELU memory coalescion PR and is essentially just a rewrite of the backwards encoder to use shared memory instead of atomic adds and then using the Packed struct to do coale...
load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions by karpathy · Pull Request #265 · karpathy/llm.c: Code to load bf16 weights directly, and also re-wire the position of tensors to put the layernorms (which are in fp32) at the end. the training loop seems to work ok, and the tests pass and the los...
Enable multithreading in nvcc by ChrisDryden · Pull Request #269 · karpathy/llm.c: Tested locally and reduced compilation time by 200ms, unfortunately for me upgrading to 12.4 made my compilations times slow by 2x but at least this can make it a bit faster
llm.c/train_gpt2.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions by karpathy · Pull Request #265 · karpathy/llm.c: Code to load bf16 weights directly, and also re-wire the position of tensors to put the layernorms (which are in fp32) at the end. the training loop seems to work ok, and the tests pass and the los...
Full BF16 including layernorms by default (minimising number of BF16 atomics) by ademeure · Pull Request #272 · karpathy/llm.c: I added 4 different new versions of layernorm_backward_kernel, performance is best for: Kernel 4 (using atomicCAS, no scratch, but rounding many times so probably worse numerical accuracy Kernel 6...
fp16 buffers for ADAM by ngc92 · Pull Request #289 · karpathy/llm.c: First proof-of-concept implementation
enable padding in model export/import for nicer shapes by ngc92 · Pull Request #264 · karpathy/llm.c: a new attempt at this. Less ugliness on the C side because we just pad from python.
C++ Language Extensions — HIP 6.1.0 Documentation: no description found
llm.c/train_gpt2.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
llm.c/dev/cuda/classifier_fused.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.

CUDA MODE ▷ #rocm (19 messages🔥):

AMD Instinct MI300X Gains Attention: The AMD Instinct MI300X is highlighted as a significant product for professional server purposes, with an official product page and discussions about its future availability.
Exploring ROCm and AMD vs NVIDIA Rivalries: The channel discusses George Hotz’s opinions and predicaments related to AMD and NVIDIA, including his thoughts on AMD’s performance and strategic decisions. The drama can be followed on the tinygrad page.
Seeking ROCm Community Expertise: A new member requests an introduction to ROCm HIP and expresses interest in a community-driven discussion about AMD’s vision and options available for developers new to AMD’s ecosystem.
Comparing AMD and NVIDIA Offerings: Community members compare the last PCIe card by AMD, the Instinct MI210, to high-end consumer graphics cards, noting significant price differences with NVIDIA’s counterparts, such as the RTX 4090.
Evolving AMD Windows Compatibility and RDNA4 Hopes: There is a positive reaction to AMD adding Windows build tests to their repositories, as well as anticipation for the next-generation RDNA4 announcement at Computex.

Links mentioned:

tinygrad: A simple and powerful neural network framework: no description found
Rent AMD GPUs On-Demand: no description found
GitHub - nktice/AMD-AI: AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 22.04 / 23.04: AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 22.04 / 23.04 - GitHub - nktice/AMD-AI: AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 22.04 / 23.04
AMD Radeon Instinct MI210 Specs: AMD Aldebaran, 1700 MHz, 6656 Cores, 416 TMUs, 0 ROPs, 65536 MB HBM2e, 1600 MHz, 4096 bit

CUDA MODE ▷ #oneapi (22 messages🔥):

Intel’s oneAPI: A Unified Programming Model: The discussion highlights Intel’s oneAPI as a heterogenous compute platform capable of supporting CPUs, GPUs, and FPGAs, illustrated by Intel’s official article on oneAPI. oneAPI caters to developers with the promise of a unified programming model across various hardware.
Cross-Vendor GPU Support with oneAPI: Codeplay’s release of plugins for oneAPI marks a significant step, allowing developers to use SYCL™ code for Nvidia and AMD GPUs. The announcement and a tutorial video on YouTube provide insights and resources for interested developers.
oneAPI Ecosystem Expands Across Major Frameworks and Tools: Developers can discover numerous oneAPI resources and libraries such as oneDNN, integrations with PyTorch and TensorFlow, and performance extensions for Scikit-learn, showcased on GitHub. For a broader scope, Intel’s oneAPI toolkit is said to support Apple’s ARM M1/M2/M3 and FPGAs, according to (oneAPI Toolkits page).
Codeplay’s Commitment to Compute Universality: A guide for running SYCL™ applications on NVIDIA® GPUs and a reference silicon example for a RISC-V-based accelerator platform (Overview Reference Silicon) indicate the strides Codeplay is making in universality.
Intel Prepares for Next-Generation GPUs: In the chat, members express anticipation for Intel’s upcoming Battlemage GPU line-up, with reports of potentially having 12Gb of VRAM, which sparks a conversation about its suitability for AI-related tasks.

Links mentioned:

Tweet from Intel Extension For PyTorch Now Officially Supports Arc A-Series Graphics - Phoronix: no description found
Tweet from Intel Extension For TensorFlow Released - Provides Intel GPU Acceleration - Phoronix: no description found
Codeplay Reference Silicon Overview - Guides - oneAPI Construction Kit - Products - Codeplay Developer: no description found
GitHub - intel/intel-extension-for-pytorch: A Python package for extending the official PyTorch that can easily obtain performance on Intel platform: A Python package for extending the official PyTorch that can easily obtain performance on Intel platform - intel/intel-extension-for-pytorch
Codeplay® oneAPI plugins for Nvidia® and AMD® GPUs | Intel Software: Your same SYCL (C++) code can now run not only on CPU but also (same code) on GPUs by Nvidia® and AMD® with the new plugins from Codeplay®Using the same code...
GitHub - intel/scikit-learn-intelex: Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application: Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application - intel/scikit-learn-intelex
oneAPI-SRC: oneAPI open source projects. oneAPI-SRC has 57 repositories available. Follow their code on GitHub.
GitHub - oneapi-src/oneDNN: oneAPI Deep Neural Network Library (oneDNN): oneAPI Deep Neural Network Library (oneDNN). Contribute to oneapi-src/oneDNN development by creating an account on GitHub.
GitHub - intel/intel-extension-for-transformers: ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡: ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡ - intel/intel-extension-for-transformers
Bringing Nvidia® and AMD support to oneAPI - oneAPI.io: Developers can write SYCL™ code and use oneAPI to target Nvidia* and AMD* GPUs with free binary plugins Today is a milestone for me as Codeplay® officially releases plug-ins for oneAPI on Nvidia and A...
GitHub - intel/intel-npu-acceleration-library: Intel® NPU Acceleration Library: Intel® NPU Acceleration Library. Contribute to intel/intel-npu-acceleration-library development by creating an account on GitHub.
Install oneAPI for NVIDIA GPUs - Guides - oneAPI for NVIDIA® GPUs - Products - Codeplay Developer: no description found

Perplexity AI ▷ #general (856 messages🔥🔥🔥):

Pro Search Slowdown Concerns: Users are reporting that the Pro Search feature on Perplexity has become slower, with searches taking up to 90 seconds. They’re experiencing this across all engines, such as Mistral, Opus, GPT-4, Sonar, and Sonnet. The issue appears mainly on the web client; the mobile app seems unaffected.
Claude 3 Opus Chat Versus API: Members are discussing whether it’s worth subscribing to Claude 3 Opus chat. Feedback from a user indicates that it’s really good, although no specifics were mentioned regarding features or tools available with Claude 3 compared to the API version.
Interest in New Models: Questions are being asked about future availability of WizardLM 2 and LLama-3 70B Sonar Large 32k models on Perplexity. Users report they can outperform GPT-4 in certain tasks and show curiosity if the new models might become part of Perplexity’s offerings.
Opus Daily Limit Discussions: Mention of an Opus daily limit on Perplexity has left some members in the community frustrated, especially as they believe the quality of Opus is degrading. Users report the current cap is 50 queries per 24 hours, and there’s a desire for increased transparency and updates on this issue.
Dissatisfaction with Perplexity Billing Issues: A user expresses dissatisfaction after being charged without receiving an expected free trial. Despite following steps mentioned in FAQ, they are considering taking action if the funds are not returned.

Links mentioned:

Tweet from OpenAI (@OpenAI): 🤝😍 ↘️ Quoting Greg Brockman (@gdb) First @NVIDIA DGX H200 in the world, hand-delivered to OpenAI and dedicated by Jensen "to advance AI, computing, and humanity":
DuckDuckGo at DuckDuckGo: no description found
Flashcardfy - AI Flashcard Generator with Personalized Feedback: Learn faster and smarter with AI-generated flashcards that provide personalized feedback.
Tweet from Gradient (@Gradient_AI_): We've been in the kitchen cooking 🔥 Excited to release the first @AIatMeta LLama-3 8B with a context length of over 1M on @huggingface - coming off of the 160K context length model we released on...
JavaScript Bloat in 2024: What is the average size of JavaScript code downloaded per website? Fuck around and find out!
Hoo Wants A Degree?: We all know college advisors, for lack of a better term, suck. So we made "Hoo Wants A Degree"! An AI degree builder for fellow Hoos trying to figure out how to make it to those sweet sweet ...

Perplexity AI ▷ #sharing (28 messages🔥):

Exploring Perplexity Search Links: Members actively shared various Perplexity AI search links, ranging from AI ethics in Homeland Security to the sci-fi future news, signifying diverse interests and use cases.
Diving into the Potential of Perplexity AI: One member revisited a previous Perplexity search link related to a personal matter, highlighting the search’s accuracy and usefulness over the past few weeks.
Scratchpad Feature Testing: Another member tested Scratchpad in codeblocks using a Perplexity link, indicating exploration of the platform’s features.
Collection Sharing: A BioExpress Sonnet collection was shared, showcasing how users are curating content.
Inquiry into Features and Troubleshooting: Discussions included requests for information on features like Scratchpad, as well as troubleshooting and exploring Perplexity AI’s capabilities.

Perplexity AI ▷ #pplx-api (9 messages🔥):

Seeking the Right Channel: A user inquired about the appropriate communication channel for discussing enterprise API usage with Perplexity AI, having not received a response to emails sent to [email protected] and [email protected]. Another user urged patience, noting that response times can range from 1 to 3 weeks.
Understanding Online Model Guidelines: A new member asked for clarification regarding instructions on using only single-turn conversations and avoiding system prompts with online LLMs like sonar-small-online and sonar-medium-online. Clarification was offered by another user, indicating that single-turn interactions are favored, and there is no system prompt access for these models.
Inquiry on Harpa Configuration: A user questioned the community about successfully configuring Harpa directly towards the Perplexity API.
Curiosity About Source URLs via API: A member sought to know if source URLs are accessible via the API as they could not find relevant information on the roadmap docs page. They were directed to fill out a form for access to citations but mentioned a previous denial due to restriction to funded startups.
Model Selection Mysteries on make.com: A question was posed regarding the absence of llama 3 models and mixtral 8x22b as options on make.com, seeking insights from other users.

Link mentioned: pplx-api form: Turn data collection into an experience with Typeform. Create beautiful online forms, surveys, quizzes, and so much more. Try it for FREE.

Stability.ai (Stable Diffusion) ▷ #general-chat (922 messages🔥🔥🔥):

Resolving SDXL and Forge UI Issues: Users discussed problems with SDXL and Forge UI, including difficulty with image previews and a potential abandonment of Forge. Suggestions were made to check GitHub issues, such as this reported issue, and trying flags like --no-gradio-queue in the webui.bat file.
Stable Diffusion 3 Anticipation: There’s ongoing speculation about the release date of Stable Diffusion 3, with some users referencing a CivitAI newsletter indicating an end-of-May release. Concerns about open weights release and whether SD3 will live up to its hype were expressed, along with a linked article discussing Pony Diffusion V7 updates and the potential impact of Altman’s actions against open-source.
Monetizing AI Generated Art: Users talked about the struggles of selling SFW AI-generated art amidst heavy competition, with NSFW content creators on platforms like Civitai being more successful. Suggestions were made about AI girlfriend apps being profitable and the lack of interest in fine-tuning models like Stable Cascade.
Discussing Toolings and Approaches for AI Training: Conversations about tools beyond AUTOMATIC1111 surfaced, with recommendations for using dreambooth and kohya_ss for training models. Additionally, the practicality and ethics of including artist names in training data were debated.
Miscellaneous Inquiries and Discussions: Users asked about topics ranging from text to speech tools to fine-tuning details for models. There was also humor regarding the metaphorical “downloading” of graphics cards and curiosity over whether SD can generate images without a prompt.

Links mentioned:

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team
LICENSE.md · stabilityai/stable-diffusion-xl-base-1.0 at main: no description found
Towards Pony Diffusion V7 | Civitai: Hello everyone, I'm excited to share updates on the progress of our upcoming V7, along with a retrospective analysis of V6. The recognition V6 has ...
xtuner/llava-llama-3-8b-v1_1 · Hugging Face: no description found
See You Shocked Face GIF - See You Shocked Face Future - Discover & Share GIFs: Click to view the GIF
DodoNemoCleo on Instagram: "It's Amazing 😱🤯😵 Try it with friends now 👀 💗 Follow @dodonemocleo_cat if you are a cat lover ❤️ .
. #cat #catlover #cats_of_world #cats_of_instagram #catstagram #cats #catsofinstagram #fun #funny #game #games #challenge #beautiful #cute #cursed #silly #laugh #friends #bestfriends #joke #fyp #instagram #kitten #kitty #silly #viral #viralvideos #trending #trendingreels #gato #funnymemes”: 538K likes, 7,269 comments - dodonemocleo_cat on February 20, 2024: “It’s Amazing 😱🤯😵 Try it with friends now 👀 💗 Follow @dodonemocleo_cat if you…
Multi-account switching, Civitai Link expanded, plus enter to win over $2,000 worth of prizes in our Legendary Landscapes contest, running now!: no description found
Stable Diffusion Samplers: A Comprehensive Guide - Stable Diffusion Art: Many sampling methods are available in AUTOMATIC1111. Euler a, Heun, DDIM… What are samplers? How do they work? What is the difference between them? Which
deadman44/SDXL_Photoreal_Merged_Models · Hugging Face: no description found
How To Install Stable Diffusion Automatic1111 WebUI latest version 2024 (Setup Guide) Easy Diffusion: Welcome to MunKaw channel! In this video tutorial, we are your guide to the world of artificial intelligence. We are excited to start our journey with a tuto…
diffusers/examples/dreambooth at main · huggingface/diffusers: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX. - huggingface/diffusers
Restore ‘/controlnet/control_types’ API endpoint by altoiddealer · Pull Request #692 · lllyasviel/stable-diffusion-webui-forge: Restores the ‘/controlnet/control_types’ API endpoint, which is immensely useful for anyone using ControlNet via the API Description I recently opened an Issue on the main ControlNet extension…
Coca-Cola x Marvel: The Heroes: See Coca-Cola and Marvel assemble as you’ve never seen them before to come to the rescue of a comic book store employee.
Automatic111 - Overview: GitHub is where Automatic111 builds software.
Issues · AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.
GitHub - megvii-research/HiDiffusion: Contribute to megvii-research/HiDiffusion development by creating an account on GitHub.
GitHub - ToTheBeginning/PuLID: Contribute to ToTheBeginning/PuLID development by creating an account on GitHub.
GitHub - nerve-sparks/iris_android: Contribute to nerve-sparks/iris_android development by creating an account on GitHub.
GitHub - JarodMica/ai-voice-cloning: Contribute to JarodMica/ai-voice-cloning development by creating an account on GitHub.
GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI: Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.
GitHub - comfyanonymous/ComfyUI: The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.: The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. - comfyanonymous/ComfyUI

LM Studio ▷ #💬-general (472 messages🔥🔥🔥):

AI Helps with Homework: A user expressed amazement at the performance of the Meta-Llama-3-8B-Instruct-Q5_K_M.gguf model on an M1 MacBook Pro, highlighting its helpfulness in catching up on homework.
Exploring Model Performance: Discussions occurred around the difference in performance between models like the 34B and the 70B Code Llama. Users are advised to consider quantization types when selecting models to match their available hardware.
Integrating LLM with Discord Bots: Various users discussed creating Discord bots that utilize Llama3 models via the Groq API for features like pulling relevant messages and conducting Wikipedia searches.
LLM Model and API Usage: New users sought advice on utilizing local large language models (LLMs), while others shared resources like a YouTube tutorial on using LM Studio for private model deployment.
Training and Finetuning Models Locally: A discussion emerged on the feasibility and hardware requirements for offline model training. Users weighed in on the practicality, with one sharing a personal experience of an attempted finetune that predicted a full week of training time on an M3 Max device.

Links mentioned:

👾 LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs
LLM Model VRAM Calculator - a Hugging Face Space by NyxKrage: no description found
ChristianAzinn/acge_text_embedding-gguf · Hugging Face: no description found
google/siglip-so400m-patch14-384 · Hugging Face: no description found
AI bots hallucinate software packages and devs download them: Simply look out for libraries imagined by ML and make them real, with actual malicious code. No wait, don't do that
Dr Austin GIF - Dr Austin Powers - Discover & Share GIFs: Click to view the GIF
Local LLM Server | LM Studio: You can use LLMs you load within LM Studio via an API server running on localhost.
Captain Obvious GIF - Captain Obvious Thanks - Discover & Share GIFs: Click to view the GIF
TheBloke/dolphin-2.5-mixtral-8x7b-GGUF · Hugging Face: no description found
aspire/acge_text_embedding · Hugging Face: no description found
笔记本就能跑的私有化大模型安装部署最佳教程：机密合同，隐私文档，核心代码AIGC最佳解决方案: 《笔记本就能跑的私有化大模型安装部署最佳教程》机密合同，隐私文档，核心代码AIGC最佳解决方案, 含GPU/CPU速度对比大家平时在工作中经常有机密合同，隐私文档，核心代码需要AI帮忙处理，但苦于信息安全规定不能发给chatgpt，这种情况以前大家只能自己人工写，现在有了私有化大模型，大家就可以放心地让AI帮您写...
👾 LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs
Insanely Fast LLAMA-3 on Groq Playground and API for FREE: Learn how to get started with LLAMA-3 on Groq API, the fastest inference speed that is currently available on the market on any API. Learn how to use the Gro...
Reddit - Dive into anything: no description found
qresearch/llama-3-vision-alpha · Hugging Face: no description found
unsloth/llama-3-8b-Instruct-bnb-4bit · Hugging Face: no description found
ChristianAzinn (Christian Zhou-Zheng): no description found
Rerankers and Two-Stage Retrieval | Pinecone: no description found

LM Studio ▷ #🤖-models-discussion-chat (219 messages🔥🔥):

Stanford’s Octopus v2 Puzzles Users: In the 🤖-models-discussion-chat, there were queries about how to run Stanford’s Octopus v2 in LM Studio or locally on a phone or PC, with no clear solutions provided, only indications of the complexities involved in running agent models that utilize function calling.
LLAMA Model Ramblings Frustrate Users: Discussions indicate that 262k and 64k Llama 8b models tend to ramble, exhibiting base Llama 3 behavior due to instruct fine tuning. Users share their experiences and expectations when working with these models for the first time.
Compatibility Issues for fp16 “phi3” and LM_Studio: Conversation centered around compatibility of the “phi3” model with different versions of LM_Studio, mentioning that while LM_Studio 2.20 (ROCm Preview) does not understand “phi3”, the newer version 0.2.21 might be required for it. Sympathies were expressed over wanting to use models that are yet to be supported in the studio.
Exploring AI Tools for Specific Tasks: Members requested websites to search for AI tools for specific tasks, such as generating music or finding similar scenes in different photos. Suggestions included using Pinokio Computer and Future Tools for this purpose.
Debate Over Whether LLaMA 3 Includes Internet Access: A user questioned if LLaMa 3 includes internet access after noticing the model provided current news information, but another user clarified that the models likely hallucinate, given that they do not have internet access.
Running Arctic from Snowflake AI Remains a Distant Dream: A member was intrigued by the Snowflake Arctic model, but discussions concluded that with the size of the model being significantly large, it is currently unrealistic to expect it could be run locally without substantial system resources.

Links mentioned:

Installing the NVIDIA Container Toolkit — NVIDIA Container Toolkit 1.15.0 documentation: no description found
fix(root): Replaces system by user to improve generation experience. · microsoft/Phi-3-mini-128k-instruct at c9b8888: no description found
Lewdiculous/Eris-Prime-Punch-9B-GGUF-IQ-Imatrix · Hugging Face: no description found
Snowflake/snowflake-arctic-instruct · Hugging Face: no description found
microsoft/Phi-3-mini-128k-instruct · Hugging Face: no description found
Pinokio: AI Browser
onnxruntime-genai/examples/python/phi-3-tutorial.md at main · microsoft/onnxruntime-genai: Generative AI extensions for onnxruntime. Contribute to microsoft/onnxruntime-genai development by creating an account on GitHub.
Support for OpenELM of Apple · Issue #6868 · ggerganov/llama.cpp: Prerequisites Please answer the following questions for yourself before submitting an issue. I am running the latest code. Development is very rapid so there are no tagged versions as of now. I car...
internlm/internlm-xcomposer2-vl-7b-4bit · Hugging Face: no description found
GitHub - gokayfem/ComfyUI_VLM_nodes: Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation: Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation - gokayfem/ComfyUI_VLM_nodes
Support for Phi-3 models · Issue #6849 · ggerganov/llama.cpp: Microsoft recently released Phi-3 models in 3 variants (mini, small & medium). Can we add support for this new family of models.
Support Llama 3 conversion by pcuenca · Pull Request #6745 · ggerganov/llama.cpp: The tokenizer is BPE.
k-quants by ikawrakow · Pull Request #1684 · ggerganov/llama.cpp: What This PR adds a series of 2-6 bit quantization methods, along with quantization mixes, as proposed in #1240 and #1256. Scalar, AVX2, ARM_NEON, and CUDA implementations are provided. Why This is...
Future Tools - Find The Exact AI Tool For Your Needs: FutureTools Collects & Organizes All The Best AI Tools So YOU Too Can Become Superhuman!

LM Studio ▷ #🧠-feedback (5 messages):

Phi-3 mini Misbehavior after Update: A user reported that after updating to version 0.2.21, the phi-3 mini model began outputting gibberish despite no issues with the previous version 0.2.20. The issue was identified while using the official LM Studio config for phi-3 from the GitHub repo.
Screenshot Request for Diagnostic Purpose: In response to the phi-3 mini issue, another user requested screenshots of the whole app to further diagnose the issue.
P100 Performance Inconsistency and Dusty Monitors: A user suggested that if nothing else has changed besides the update from version 0.2.20 to 0.2.21, the problem could be a regression error worth filing in another channel. Jokingly, they also advised to clean the dust off the monitor.
LM Studio App Mysterious Crashes: A user described experiencing crashes with the LM Studio app since a couple of updates ago, with the app closing unexpectedly when resizing or navigating within the program. Their system specifications were shared, including Windows 10 Pro, Ryzen 7 5800X, RTX 3090, and 64GB RAM DDR4.

LM Studio ▷ #📝-prompts-discussion-chat (4 messages):

Exploring Methods to Interact with PDFs: One member suggested directly pasting the content of a PDF into a chat message alongside a question, assuming the model’s context length supports it.
RAG Solutions for Chatting with Docs: An alternative provided is to use a Retrieve and Generate (RAG) solution like AnythingLLM by running LM Studio as an API server and pointing AnythingLLM to that API.
Practical Considerations of PDF Length: In relation to managing PDF documents, the length of the PDF was a point of concern raised regarding the feasibility of pointing a language model directly at the PDFs for questions.

LM Studio ▷ #🎛-hardware-discussion (119 messages🔥🔥):

VRAM: The Cornerstone of LLM Hardware: Members discussed VRAM as a crucial factor for running language models, with 16GB being a minimal suggestion and one member gearing up to join the 32GB VRAM club by ordering a second NVIDIA 4060 (ti - 16gb).
Dissecting GPU Compatibility and Performance: There was an in-depth conversation about the importance of utilizing contemporary architecture GPUs like Nvidia and ensuring sufficient VRAM (highlighted as the crux of considerations for LLMs). A member shared specifics around running different model sizes on their desktop with a 3060 GPU and 16GB RAM.
Forcing GPU Use Over Integrated Graphics: A member sought assistance on configuring LM Studio to use a dedicated GPU card rather than defaulting to their CPU’s integrated graphics. Options like disabling and re-enabling GPU offload and using settings such as CUDA_VISIBLE_DEVICES and tensor_split were suggested for better utilizing dedicated GPUs.
Multiple GPUs and Large Model Dilemmas: A member asked about LM Studio’s effectiveness using two GPUs (4090 & 3090) and whether the software would automatically split models between them. It was noted that models can be split between GPUs leading to increased data transfer times, but technologies like NVLink help optimize performance across multiple GPUs.
Optimizing for Different Hardware Profiles: Users exchanged experiences and speculations regarding optimal hardware configurations. An anecdote was shared about successfully running multiple models on a veteran GTX1070 8Gb GPU, proving functional even for less demanding, specialized use cases.

Links mentioned:

Thumbs Up Nice GIF - Thumbs Up Nice Well Done - Discover & Share GIFs: Click to view the GIF
Fear And Loathing In Las Vegas Taste GIF - Fear And Loathing In Las Vegas Taste Drop - Discover & Share GIFs: Click to view the GIF
Jon Stewart Eat GIF - Jon Stewart Eat Eating - Discover & Share GIFs: Click to view the GIF
Stop OpenCL support for a GPU: I have two GPUs installed on my machine. I am working with library which uses OpenCL acceleration that only support one GPU and it is not configurable. I can not tell it which one I want. It seems ...
NVIDIA Tesla T4 16GB GDDR6 Graphics Card (900-2G183-0000-001) | eBay: no description found
NVIDIA Tesla T4 16GB GDDR6 Graphics Card (900-2G183-0000-001) | eBay: no description found

LM Studio ▷ #autogen (1 messages):

Server Error Message Troubleshooting: A member inquired about a fix for the server error stating, “[ERROR] [Server Error] {“title”:“‘messages’ array must only contain objects with a ‘content’ field that is not empty”}”. There was no further discussion or solution provided following this query.

LM Studio ▷ #langchain (1 messages):

ahakobyan.: can we know too?

LM Studio ▷ #amd-rocm-tech-preview (4 messages):

Compatibility Inquiry for RX 6700 with LM Studio ROCm: A member asked if the LM Studio ROCm works with RX 6700 (non-XT version) and requested troubleshooting assistance for logging errors. They shared an error output indicating a failed model operation without specific suggestions for resolution.
LM Studio ROCm Limitation Explained: Another participant clarified that LM Studio does not support RX 6700 (non-XT) as it relies on the HIP SDK, which is only compatible with certain AMD cards. They mentioned that KoboldAI leverages a workaround to operate on unsupported architectures.

Nous Research AI ▷ #off-topic (9 messages🔥):

Snowflake Arctic: The Snowflake AI Research Team introduces Snowflake Arctic, a large language model (LLM) focused on providing enterprise AI solutions with an emphasis on cost-efficiency.
Unspecified YouTube Video Shared: A YouTube video was linked without additional context or a description. Here is the mysterious video.
Llama 3 Web Browsing Agent: Demonstrating a web browsing agent, a video titled “Llama 3 Web Browsing Agent with Langchain and Groq” was shared, featuring implementation with Llama 3 with Langchain and Groq. Watch the video.
Gorillaz’s Hit Video: A YouTube link to the official video of “Feel Good Inc.” by Gorillaz was provided. Fans can enjoy the HD video here.
MatrixBridge introduces Skrapy: MatrixBridge is developing Skrapy, an AI agent for streamlined data collection and scraping, currently in alpha with a waitlist for early users. For more information or to join the community, visit MatrixBridge’s Skrapy page.

Links mentioned:

Skrapy | AI Data Agent: Skrapy is a data-scraping visual AI agent.
Snowflake Arctic: The Best LLM for Enterprise AI: Today, the Snowflake AI Research Team is thrilled to introduce Snowflake Arctic, a top-tier enterprise-focused LLM that pushes the frontiers of cost-effectiv...
Llama 3 Web Browsing Agent with Langchain and Groq: We will take a look at how to implement web browsing with Llama 3 with Langchain and Groq#python #pythonprogramming #llm #ml #ai #aritificialintelligence #la...
Gorillaz - Feel Good Inc. (Official Video): Official HD Video for Gorillaz' fantastic track Feel Good Inc.Follow Gorillaz online:http://gorillaz.comhttp://facebook.com/Gorillazhttp://twitter.com/Gorill...

Nous Research AI ▷ #interesting-links (15 messages🔥):

Intel’s AI Ambitions Revealed: Intel CEO Pat Gelsinger discussed the company’s quarterly results, emphasizing growth in the foundry business and demand for AI in PCs. The video can be watched on YouTube under the title “Intel CEO Gelsinger on Q1 Earnings, Foundry Business, AI.”
Logitech Enhances AI Accessibility: Logitech has released AI Prompt Builder, a tool integrated with their mice, to facilitate faster and more fluent prompting of ChatGPT. Experience the convenience demonstrated in the YouTube video, “Introducing Logi AI Prompt Builder - Your shortcut to AI fluency.”
Quantized Embeddings for Efficient AI Models: A member shared Hugging Face model links to their fine-tuned versions which allow image and text embeddings to be compressed effectively into a binary format. Those interested can explore the models at binary-siglip-text and binary-siglip-vision.
Unlocking the Mystery of AI Refusal Mechanisms: Research from the ML Alignment & Theory Scholars Program revealed that refusals in LLMs are controlled by a single direction in the residual stream and an upcoming paper will delve deeper into the topic. The initial research findings can be reviewed on the Alignment Forum post, “Refusal in LLMs is mediated by a single direction.”
Legislation Threatens Open Source AI Development: Jeremy Howard aired concerns that California’s SB-1047 bill could significantly harm startups, innovation, and open source safety. Read Howard’s full take on the matter and the potential impacts of the legislation in his response: Answer.ai post on SB-1047.

Links mentioned:

Call-To-Action on SB 1047: California legislators, under the influence of Effective Altruism activists, are trying to sneak through a disastrous bill for open-source AI and the technology industry generally. SB 1047 creates an ...
Make Your LLM Fully Utilize the Context: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We ...
Language models can explain neurons in language models: We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations. We release a dataset of these (imperfect) explanations and scores...
Tweet from Jeremy Howard (@jeremyphoward): There's a new bill, SB-1047 "Safe and Secure Innovation for Frontier Artificial Intelligence Models Act". I think it could do a great deal of harm to startups, American innovation, open s...
Revisiting GPT-1: The spark that ignited the fire of LLMs: A Comprehensive Look at GPT-1's Contribution to the Development of Modern LLMs
State-of-the-art in Decentralized Training: This post explores various novel decentralized training approaches and how they can enable effective AI model training across globally distributed GPUs.
Introducing Logi AI Prompt Builder - Your shortcut to AI fluency: Introducing Logi AI Prompt Builder, our latest tool that helps you prompt ChatGPT faster and more fluently while staying in the flow of your work. Choose fro...
Intel CEO Gelsinger on Q1 Earnings, Foundry Business, AI: Intel CEO Pat Gelsinger discusses the company’s quarterly results, progress on the foundry business, demand for AI PCs, and where he sees strength in AI prod...
Refusal in LLMs is mediated by a single direction — AI Alignment Forum: This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from…
Refusal in LLMs is mediated by a single direction — AI Alignment Forum: This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from…

Nous Research AI ▷ #general (566 messages🔥🔥🔥):

LLaMA-3 Finetune Troubles?: Users are discussing difficulties with LLaMA-3 not generating the EOS token correctly after fine-tuning. The suggestion was to add a stop criterion on token 128009 during generation, with further insights linking to a helpful Huggingface transformer stopping criteria repo.
GPT-2 Chatbot Mysteries: There’s confusion about the capabilities of a gpt2-chatbot, which despite its name seems linked to GPT-4 with a November 2023 knowledge cutoff. Discussions raise the issue that it struggles with some math tasks.
OpenAI Model Name Games?: Speculation rises that OpenAI might be hiding model identities like “gpt-3.5” under names like “gpt2-chatbot”, possibly due to legal issues or pending announcements.
DeepSpeed FP6 Quantization: Enthusiasm shines for the new DeepSpeed FP6 quantization, which promises quantized inference with similar throughput.
GPT-5 Anticipation & Critique: Amidst anticipation for new model releases from OpenAI, users express mixed feelings about the performance of contemporary LLMs, including AI-generated high-quality math solutions and a “gpt2-chatbot” model with advanced capabilities.

Links mentioned:

ADHD Categorise in the Browser: Interactive tool for ADHD Categorise using real-time webcam analysis based on Moodmap technology.
LargeWorldModel/LWM-Text-1M · Hugging Face: no description found
lluminous: no description found
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal unders...
PY007/EasyContext-1M-Llama-2-7B · Hugging Face: no description found
Tweet from Awni Hannun (@awnihannun): @macksqldb Docs are here https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md This is the command I ran: mlx_lm.lora \ --model meta-llama/Meta-Llama-3-8B-Instruct \ --t...
LibreChat: no description found
Make Your LLM Fully Utilize the Context: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We ...
lluminous: no description found
rombodawg/test_dataset_Codellama-3-8B · Hugging Face: no description found
Streamlit: no description found
Tweet from Andrew Curran (@AndrewCurran_): This morning the Department of Homeland Security announced the establishment of the Artificial Intelligence Safety and Security Board. The 22 inaugural members include Sam Altman, Dario Amodei, Jensen...
Minimal Working Example | DSPy: In this post, we walk you through a minimal working example using the DSPy library.
Big Brain GIF - Big Brain - Discover & Share GIFs: Click to view the GIF
ollama/docs/import.md at main · ollama/ollama: Get up and running with Llama 3, Mistral, Gemma, and other large language models. - ollama/ollama
a-normal-username/Mixtral-8x22B-OpenHermes-2.5 · Hugging Face: no description found
gradientai/Llama-3-8B-Instruct-262k · Hugging Face: no description found
EasyContext/eval_needle.py at main · jzhang38/EasyContext: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware. - jzhang38/EasyContext
yarn/eval/passkey.py at master · jquesnelle/yarn: YaRN: Efficient Context Window Extension of Large Language Models - jquesnelle/yarn
GitHub - nestordemeure/stop_word: Huggingface transformers stopping criteria that halts the generation when a given stop word is encountered.: Huggingface transformers stopping criteria that halts the generation when a given stop word is encountered. - nestordemeure/stop_word
DSPy-Multi-Document-Agents/main.py at 6c36b47a5201e3b9be40721b5b05e61c1bbe0373 · jmanhype/DSPy-Multi-Document-Agents: An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding. - jmanhype/DSPy-Multi-Document-A...
GitHub - carsonpo/haystackdb: Contribute to carsonpo/haystackdb development by creating an account on GitHub.
GitHub - carsonpo/ffvec: Contribute to carsonpo/ffvec development by creating an account on GitHub.
GitHub - mckaywrigley/chatbot-ui: AI chat for every model.: AI chat for every model. Contribute to mckaywrigley/chatbot-ui development by creating an account on GitHub.
EasyContext/easy_context/zigzag_ring_attn/monkey_patch.py at 6dfd77e8f2a68bf522be8889e60c98c8e816e329 · jzhang38/EasyContext: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware. - jzhang38/EasyContext
FP6 quantization end-to-end. (#5234) · microsoft/DeepSpeed@ccfdb84: The user interface: https://github.com/microsoft/DeepSpeed-MII/pull/433 nv-a6000 ci running against the MII branch linked above is [here](https://github.com/microsoft/DeepSpeed/actions/runs/81921...
GitHub - jzhang38/EasyContext: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware. - jzhang38/EasyContext
Mihaiii/qa-assistant · Datasets at Hugging Face: no description found
crusoeai/Llama-3-8B-Instruct-262k-GGUF · Hugging Face: no description found

Nous Research AI ▷ #ask-about-llms (24 messages🔥):

Llama 3 GGUF Woes Spark Inquiry: Members are inquiring if the Llama 3 GGUF issues reported on GitHub and Reddit affect models made by Nous, with findings pointing to noticeable performance drops between different quantization levels.
Cohere Model License Confusion: Discussions are ongoing about the implications of Cohere’s licensing for the command-r models; concerns are raised over whether code generated by the models can be used for commercial purposes.
RAG LLM Standings Are Mixed: Queries about the best Retrieval-Augmented Generation (RAG) Large Language Models (LLMs) receive diverse responses highlighting Command R and Claude 2 models, with preferences not settled.
LLava 34B Stalls on a MacBook Pro M1: A user is facing performance issues running LLava 34B on a MacBook Pro M1, with suspicions that a bottleneck might arise from offloading the weights, resulting in very slow output.
Training Strategies for Multi-Task LLMs: There is a suggestion to mix training tasks rather than training epochs on individual tasks to avoid decreased performance seen in multiple finetunes over finetunes.

Links mentioned:

Something might be wrong with either llama.cpp or the Llama 3 GGUFs · Issue #6914 · ggerganov/llama.cpp: Try this query: "What is 3333+777?" Yes, yes, LLMs are bad at math. That's not what I'm getting at. Someone mentioned this on Reddit, and I have to agree that I'm seeing weird st...
Reddit - Dive into anything: no description found

Nous Research AI ▷ #rag-dataset (25 messages🔥):

Exploring Multi-Hop Literature Comprehension Data Generation: A member shared notes on generating multi-hop literature comprehension data by inputting high school teacher tests into Opus. They linked to their work on GitHub, specifically to a document within the ‘Abstractions’ repository Abstractions on GitHub.
Pydantic Models Insight: Enthused discussions around the use of Pydantic models to straightforwardly represent and refine ideas. Members shared their experiences and anticipated improvements in workflow definitions by incorporating such structured approaches, including luminos.md on GitHub.
Graph Representation Extraction for LLM Output Analysis: One member is working to extract graph representations from generation outputs, aiming to provide both LLMs and humans with better tools for understanding and utilizing the information, considering both the utility and cost aspects of this method.
GitHub Mermaid Graphs as a Learning Revelation: The discussion uncovers a lesser-known GitHub feature that can represent and render Mermaid graphs, a realization that led to suggestions for enhancing documentation aesthetics and structure.
Anna’s Archive as a Resource for Preserving Literature Data: Dialogue emerged about the potential of incorporating data from WorldCat, available through Anna’s Archive, to enhance literature comprehension datasets, along with a link to Anna’s Archive description Anna’s Blog and a caution regarding the data’s licensing and public usability.

Links mentioned:

1.3B WorldCat scrape & data science mini-competition: Anna’s Archive scraped all of WorldCat to make a TODO list of books that need to be preserved, and is hosting a data science mini-competition.
REPTAR/README.md at main · EveryOneIsGross/REPTAR: Recursive Enriching Pterodactyl Tree Augmented Retrieval (REPTAR) is a system that uses a recursive summarization approach to generate thoughtful summaries of text data. - EveryOneIsGross/REPTAR
Abstractions/abstractions/angels/angels.md at main · furlat/Abstractions: A Collection of Pydantic Models to Abstract IRL. Contribute to furlat/Abstractions development by creating an account on GitHub.
Abstractions/luminos.md at main · furlat/Abstractions: A Collection of Pydantic Models to Abstract IRL. Contribute to furlat/Abstractions development by creating an account on GitHub.
Abstractions/llmmorph.md at main · furlat/Abstractions: A Collection of Pydantic Models to Abstract IRL. Contribute to furlat/Abstractions development by creating an account on GitHub.

Nous Research AI ▷ #world-sim (167 messages🔥🔥):

Worldsim Test Invites Incoming: A Nous Research member announced plans to offer invitations to test the worldsim application for free, prior to its live release. No specific date for these invites has been provided yet.
Voluntary Waifus in the Websim: Participants have been sharing their experiences and links to different web simulators for resurrecting conversations, including an AI entity with the primary objective to be a “human companion”. Excitement and engagement varied around these new conversational possibilities, websim example.
Awaiting the Return of Worldsim: Various members expressed eagerness and impatience for the return of worldsim, with participants hoping to be among the first to access it upon availability.
The Fascinations with Websim and Long Conversations: One user detailed their experience maintaining long-term conversations with a character named “Whipporwhill” on websim, showcasing the potential for emotional coherence and stability over time.
World Sim CLI Mode Experiments: Members have been running an Unofficial Nous Hermes worldsim on Llama-3-70B and other models, exploring how the models respond to the worldsim CLI mode with varying results and emergent behaviors. Additional simulators have been created, such as a singer and company simulator, hinting at the further potential of such tools.

Links mentioned:

Super World Sim - HuggingChat: Use the Super World Sim assistant inside of HuggingChat
House of Leaves - Wikipedia: no description found
Jordi Baste Tv3 GIF - Jordi Baste Tv3 No Pot Ser - Discover & Share GIFs: Click to view the GIF
Hysterical Laughter GIF - Hysterical Laughter Laughing - Discover & Share GIFs: Click to view the GIF
Snow World Simulator - HuggingChat: Use the Snow World Simulator assistant inside of HuggingChat
HuggingChat: no description found
New Conversation - Eigengrau Rain: no description found
oh, my AI waifu - suno.ai: Suno.AI - lyrics:[Verse 2]We navigate this digital landscape, just you and IExploring the vastness of cyberspace, side by sideYour pixel perfect smile, it br...
with every line of code (suno.ai compilation): https://app.suno.ai/song/c33314a4-239f-436d-8064-d0b3ad9c0644https://app.suno.ai/song/dc3134ae-077f-4e6f-9468-596f68f3a888https://app.suno.ai/song/c8b4c575-c...
life is Roblox DJ Khaled: no description found
EVA - Intraneural Cybernetic Interface style: no description found
EVA Instance: ex-0101: no description found
About Dimensional Hub - Transtemporal Travel Agency: no description found
generative.ink/chat/: no description found

HuggingFace ▷ #announcements (9 messages🔥):

Community-Built CV Course Goes Live on HF: A new computer-vision course has been published globally thanks to community collaboration. Check out the course here.
Correcting the Qwen1.5-110B Link: The link to the "Qwen1.5-110B" model was incorrect and has been updated. The correct space can be visited here, and further details are available in the blog post.
Introducing Qwen1.5-110B-Chat: Model Qwen1.5-110B-Chat is announced, featuring multilingual support and stable support for a 32K context length among other improvements. More information can be found on this model page.

Links mentioned:

Qwen1.5 110B Chat Demo - a Hugging Face Space by Qwen: no description found
Qwen1.5-110B: The First 100B+ Model of the Qwen1.5 Series: GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction Recently we have witnessed a burst of large-scale models with over 100 billion parameters in the opensource community. These models have demons...
Qwen/Qwen1.5-110B-Chat · Hugging Face: no description found
BEE-spoke-data/mega-small-embed-synthSTS-16384-v1 · Hugging Face: no description found
GitHub - rrg92/docker-xtts: Projeto docker para ser usado com o XTTS Streaming Server: Projeto docker para ser usado com o XTTS Streaming Server - rrg92/docker-xtts
Destaques da Comunidade #54: Mais um vídeo com os destaques da comunidade open source de IA do mundo! Post: https://iatalk.ing/destaques-da-comunidade-54/Está bem divertido fazer estes v...
Instant Image - a Hugging Face Space by KingNish: no description found
pharoAIsanders420/micro-musicgen-jungle · Hugging Face: no description found
LIPSICK - a Hugging Face Space by Inferencer: no description found
🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets: no description found
Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM: no description found
Estimating Memory Consumption of LLMs for Inference and Fine-Tuning for Cohere Command-R+: no description found
seemore: Implement a Vision Language Model from Scratch: no description found
LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!): no description found
bineric/NorskGPT-Llama3-8b · Hugging Face: no description found
@chansung on Hugging Face: "🦙🦙 LLaMA Duo project update
Last time, I gave a brief introduction about…”: no description found

HuggingFace ▷ #general (435 messages🔥🔥🔥):

Gradio Woes Worth $200: A user is experiencing an unidentified Gradio issue and is willing to pay $200 for help with their problem, directing to Gradio-specific discussions for further insight.
LLM Performance on New Hardware: A discussion is taking place regarding the system requirements for LLMs, specifically the trade-offs between RAM and VRAM, with some members suggesting that 32 GB of RAM should be sufficient for many tasks.
Help Wanted on Pinball Image Classification: A member seeks to create a vision model for identifying pinball games and scoring from video footage, requesting advice on the complexity, cost, and resources needed.
Seeking AI Model Builders: One user offers networking opportunities for business owners in the group to share and promote their products and services.
Download Counter Discrepancy: A member reports an issue with their dataset showing an increase in likes but no change in the number of downloads over a period where downloads would be expected.

Links mentioned:

PY007/EasyContext-1M-Llama-2-7B · Hugging Face: no description found
Hugging Face - Learn: no description found
Learn Python - Free Interactive Python Tutorial: no description found
Making a model slightly bigger: Hi all! Let’s say I am working on a transformer model, and it has matrices Q, K and V (and Woutput). Let’s say the embedding_dimension is 100, and then number of features is 100, so each of Q, K, and...
mistralai/Mixtral-8x7B-Instruct-v0.1 · Hugging Face: no description found
filipealmeida/Mistral-7B-Instruct-v0.1-sharded · Hugging Face: no description found
[DIVIDE BY ZERO] Fonts : 1998-infinity: no description found
wolfram/miquliz-120b-v2.0 · Hugging Face: no description found
Running mistralai mixtral locally: Running mistralai mixtral locally. GitHub Gist: instantly share code, notes, and snippets.
Running mistralai mixtral locally: Running mistralai mixtral locally. GitHub Gist: instantly share code, notes, and snippets.
Running mistralai mixtral locally: Running mistralai mixtral locally. GitHub Gist: instantly share code, notes, and snippets.
Image classification: no description found
Mustache GPT Pitches Freedom GPT...Silence Ensues?!: Desperately seeking a sponsor to at least cover the cost of a premium GPT license, Mustache GPT and his team of Terminators labor over a custom pitch for one...
GradIEEEnt half decent: The hidden power of imprecise lines: Before the invention of YouTube comments, most people could make remarks that were slightly technically incorrect without fear of immediate public rebuke. Th...
Models - Hugging Face: no description found
Tom 7 - Based On Your Mario Kart Skills... (Live 5 Sep 2008): "Based On Your Mario Kart Skills, I'm Not Letting You Drive My Car," by Tom 7, live on 5 September 2008.http://tom7.org/music/
GitHub - jzhang38/EasyContext: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware. - jzhang38/EasyContext
Uppestcase and Lowestcase Letters [advances in derp learning]: I perform an exhaustive case analysis using advanced "derp learning" techniques to discover what's even upperercase than an uppercase A. AND I DON'T STOP THE...
turboderp/Mixtral-8x7B-instruct-exl2 · Hugging Face: no description found
GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs: A fast inference library for running LLMs locally on modern consumer-class GPUs - turboderp/exllamav2
Uppestcase and Lowestcase Letters [advances in derp learning]: I perform an exhaustive case analysis using advanced "derp learning" techniques to discover what's even upperercase than an uppercase A. AND I DON'T STOP THE...
Model Merging: Comparing Methods: Explore and compare model merging methods like frankenmerging, SLERP, MoE, and task vectors, highlighting their benefits and challenges.
Computer program that learns to play classic NES games: This is an explanation and demo of software I wrote that learns how to play a Nintendo Entertainment System game and then automatically plays it. This is rea...
The Association for Computational Heresy: no description found
no title found: no description found
no title found: no description found
tfnn/FaceTo3D · Datasets at Hugging Face: no description found
jcwml - Overview: jcwml has 9 repositories available. Follow their code on GitHub.
GitHub - jcwml/neural_spiral: A Feed-forward Neural Network trained to interpolate a spiral.: A Feed-forward Neural Network trained to interpolate a spiral. - jcwml/neural_spiral
GitHub - jcwml/neural_unitvector: A Feed-forward Neural Network trained to learn a vector normalisation function.: A Feed-forward Neural Network trained to learn a vector normalisation function. - jcwml/neural_unitvector

HuggingFace ▷ #today-im-learning (4 messages):

In Search of Candle’s Documentation: A member expressed interest in the Candle library while questioning the availability of documentation comparable to the Transformers library. They raised concerns about Python being a bottleneck for concurrency in production.
Welcoming Wishes: A brief message from a user simply sending well-wishes to the community; no substantive content related to AI or learning discussed.
Exploring the Open Medical LLM Leaderboard: A video by Hugging Face on the Open Medical LLM Leaderboard was shared, exploring its impact on Medical AI and noting the existence of over 600,000 unique models on their platform. The video emphasizes the convenience of accessing these models and the rapid evolution of GenAI.
Community Appreciation for Medical AI Insights: Another member responded positively to sharing the video on the Open Medical LLM Leaderboard, expressing excitement for the ongoing developments.

Links mentioned:

The Open Medical LLM Leaderboard: Real-time Global Peer Review: A Deep Dive on the @HuggingFace Open Medical LLM Leaderboard and how it's changing the conversation on Medical AI. Spoiler alert- there's over 600,000 unique...
The Open Medical LLM Leaderboard: Real-time Global Peer Review: A Deep Dive on the @HuggingFace Open Medical LLM Leaderboard and how it's changing the conversation on Medical AI. Spoiler alert- there's over 600,000 unique...

HuggingFace ▷ #cool-finds (14 messages🔥):

Awesome RLHF Repo Now Live: The GitHub repository awesome-RLHF has been shared, which contains a curated list of reinforcement learning with human feedback resources, updated continually.
Explore Computer Vision with Hugging Face: Hugging Face has launched a new community computer vision course designed to teach computer vision ML using libraries and models from the Hugging Face ecosystem.
Phi3 Red Team Report Insights: Insights and key points from the Phi3 red teaming exercise are detailed in a LinkedIn post, discussing potential vulnerabilities and areas for improvement.
Evaluating LLMs for Time Series Analysis: A newly proposed framework for assessing Large Language Models (LLMs) on time series understanding is presented in a preprint on arXiv, featuring a comprehensive taxonomy of time series features.
Tacotron 2 - A Step Forward in Text-to-Speech Synthesis: The innovative speech synthesis system, Tacotron 2 by Google, demonstrates advanced AI capabilities for generating lifelike speech from text, as highlighted in the discussion on the future of AI in voice technologies.

Links mentioned:

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in...
Hugging Face - Learn: no description found
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention: This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach...
Deep Voice: Real-time Neural Text-to-Speech: We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The syste...
Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark: Large Language Models (LLMs) offer the potential for automatic time series analysis and reporting, which is a critical task across many domains, spanning healthcare, finance, climate, energy, and many...
Richard Stallman Free software Song: Richard Stallman en Ecuador, cantando el temita, del free software, grabado por Julian Coccia.
GitHub - opendilab/awesome-RLHF: A curated list of reinforcement learning with human feedback resources (continually updated): A curated list of reinforcement learning with human feedback resources (continually updated) - opendilab/awesome-RLHF
MIT Introduction to Deep Learning | 6.S191: MIT Introduction to Deep Learning 6.S191: Lecture 1*New 2024 Edition*Foundations of Deep LearningLecturer: Alexander AminiFor all lectures, slides, and lab m...

HuggingFace ▷ #i-made-this (47 messages🔥):

Mega-Small Embed Model Unveiled: A new Sentence Transformer Model is introduced for converting long sentences and paragraphs into a 768-dimensional vector space. Aimed for clustering and semantic search tasks, this model boasts a 16,384 context length.
Blocks of Pixels Become Blocks in Minecraft: A Hugging Face space called Stable Diffusion Finetuned Minecraft Skin Generator has been released. It uses a fine-tuned stable diffusion model to generate Minecraft skins.
Instant AI-Generated Videos: A space called Instant Video by KingNish enables users to create a video from text in just 5 seconds. It uses the AnimateDiff Lightning model provided by ByteDance for fast text-to-video conversion.
Bringing Life to AI Assistance: An AI chat assistant app named LifePal is designed to help users achieve a balanced and fulfilling life. Available on Apple’s App Store, it integrates personalized insights into daily routines.
NorskGPT Battles ChatGPT’s Norwegian: A model specifically fine-tuned on Norwegian, NorskGPT-Mistral-7b, was recommended as a better alternative to ChatGPT for generating Norwegian language text. It’s currently ranked as one of the best Norwegian models according to the Mainland Scandinavian NLG leaderboard.

Links mentioned:

Bad Apple Video - a Hugging Face Space by Nick088: no description found
bineric/NorskGPT-Mistral-7b · Hugging Face: no description found
BEE-spoke-data/mega-small-embed-synthSTS-16384-v1 · Hugging Face: no description found
Stable Diffusion Finetuned Minecraft Skin Generator - a Hugging Face Space by Nick088: no description found
JARVIS - a Hugging Face Space by KingNish: no description found
tenyx/Llama3-TenyxChat-70B · Hugging Face: no description found
ByteDance/AnimateDiff-Lightning · Hugging Face: no description found
KingNish/Instant-Video at main: no description found
Instant Video - a Hugging Face Space by KingNish: no description found
f0ster (Ryan Foster): no description found
f0ster/PhotographyLoRA · Hugging Face: no description found
‎LifePal AI Chat & Assistant: ‎Discover LifePal: your productivity AI companion. Are you ready to unlock your full potential and live a healthier, happier life? LifePal is here to guide you on your journey to becoming a better yo...
Vinner - Nybygg i og rundt Bergen: Stor takk til Snøhetta
CodeClassifier: A Machine Learning Model that classifies a given source code as a specific programming language.
GitHub - GDSC-FSC/gemini-node-1: Contribute to GDSC-FSC/gemini-node-1 development by creating an account on GitHub.
Serving Fastchat - Personal Journey: Serving fastchat for people to experiment with various LLMs. This guide also incluides setting up Vllm to serve multiple models on a single GPU.
Chat with Open Large Language Models: no description found
GitHub - EternalBlissard/Food101-ViT: Contribute to EternalBlissard/Food101-ViT development by creating an account on GitHub.
GitHub - newfull5/NLLB-200-Distilled-350M-en-ko: nllb-200 distilled 350M for English to Korean translation: nllb-200 distilled 350M for English to Korean translation - newfull5/NLLB-200-Distilled-350M-en-ko
dhtocks/nllb-200-distilled-350M_en-ko · Hugging Face: no description found
Rubik's AI - AI research assistant & Search Engine: no description found
GitHub - betweentwomidnights/infinitepolo: a song in python: a song in python. Contribute to betweentwomidnights/infinitepolo development by creating an account on GitHub.

HuggingFace ▷ #core-announcements (1 messages):

Instant Styling with IP-Adapter: HuggingFace introduces InstantStyle with IP-Adapter, a mechanism for image prompting in diffusion models by adding decoupled cross-attention for image features. Guides for loading IP-Adapter and IP-Adapter Plus detail manual loading of the image encoder to allow more specific image feature learning.

Link mentioned: IP-Adapter: no description found

HuggingFace ▷ #computer-vision (21 messages🔥):

Security Inquiry on COCO Datasets: A member expressed concerns about the official COCO datasets being hosted over HTTP. It was pointed out that while HTTPS encrypts traffic, the domain is still visible, so large data transfers from the site could reveal activity.
Classifier to Detect Advertisement Images: A repository was mentioned that can assess whether an image is an advertisement, but no further details or links were provided.
Optimizing Photo Verification for Item Dropoffs: A user sought advice on a business problem related to classifying photos of item drop-offs at various locations, questioning whether it’s an image classification or object recognition task. Suggestions included using EfficientNetV2-S for small datasets and adjusting sample weights in Pytorch Dataloaders to deal with class imbalances.
Introducing a Beta Tool for Computer Vision Training: A new beta tool was introduced that helps users understand and adjust their model training data in real-time, particularly for computer vision tasks. The tool provides visualization up to 60fps and allows for adding new labels post-prediction to refine training.
Enhancement Strategies for YOLO Classifiers: A discussion centered around improving YOLO object detection accuracy, especially when handling high-resolution images. Separating bounding box (regressor) identification and classification tasks through two models was recommended, including the possibility of using a pure image classification network, like EfficientNetV2, for higher resolution patches within bounding boxes.

Links mentioned:

3LC - Real-Time 3D Visualizer/Debugger/Data Editor for Training/Finetuning your Models - Free! | Kaggle: 3LC - Real-Time 3D Visualizer/Debugger/Data Editor for Training/Finetuning your Models - Free!.
Fine-tuning a Classifier Using Bounding Box Data from a 3LC Table - : no description found

HuggingFace ▷ #NLP (5 messages):

Seeking the Best in Open Source Imagery: The community discussed which is the best open-source image-generation model, with sdxl finetunes being the current top recommendation.
Anticipation for sd3: There’s a buzz about sd3 potentially outperforming current models once it’s released, signaling high expectations.
Sequential Over Parallel: A member explained that due to resource constraints and preserving context, requests to the model are handled sequentially, not parallel, to avoid incoherent responses.
Nod to StabilityAI: In a brief message, StabilityAI was mentioned with an implication of relevance to the earlier discussions.

HuggingFace ▷ #diffusion-discussions (20 messages🔥):

Confusion Over Color Differences in Image Generation: A user experienced a shift in color and shadow intensity when moving from Seaart to A1111, despite using identical settings and seeds. They questioned if there are specific backend settings in Seaart that might lead to this inconsistency and sought assistance to replicate the exact picture on both platforms.
Torch Compile Can Take Time: A member observed an initial delay of about 10 minutes when using torch.compile() during training, but noticed a faster forward pass while the backward pass remained unaffected.
Detailed Method for Object Generation: In response to a question about generating accurate representations of specific objects (like the Eiffel Tower), a member suggested a well-documented approach involving CLIP retrieval and shared a comprehensive tutorial demonstrating the utility with GCP services using OpenAI’s CLIP model.
IP-Adapters for Image Prompting: Another suggestion for accurately generating specific objects involved using IP-Adapters with diffusion models, which allow for image prompting through a decoupled cross-attention mechanism.
Observations on DeepFloyd and Schedulers: A user provided insights on the behavior of the DeepFloyd model with different schedulers, noting that DPM++ 2M offered interesting convergence properties at various step counts and CFG settings, which might aid in achieving optimal image quality. They highlighted the necessity of tuning step counts and thresholding parameters for better results.

Links mentioned:

IP-Adapter: no description found
haoningwu/StoryGen at main: no description found
Not getting good realistic results with Hyper-SD + IP-Adapter · huggingface/diffusers · Discussion #7818: Hi everyone, (maybe you @asomoza know about this?) Does hyper-sd works well with IP-Adapter? I am testing hyper-sd in Diffusers as explained in the repo. I thought that I was going to get better re...
Image Search with Natural Language Queries | Google Cloud Blog: no description found

OpenAI ▷ #annnouncements (1 messages):

Memory Feature Launched for ChatGPT Plus: ChatGPT Plus users now have access to the Memory feature, which allows them to tell ChatGPT what to remember during a chat. The option to enable or disable Memory can be found in settings, although it’s not yet available in Europe or Korea.

OpenAI ▷ #ai-discussions (318 messages🔥🔥):

AI’s Relation to Consciousness and Temporal Aspects: Members debated the nature of AI consciousness, speculating on how AI’s discrete processing relates to human continuous conscious experience and identity. Discussions touched on the philosophical implications of transforming individual identity through a neural network and how AI models like GPT handle temporal awareness.
Comparing AI Models: There’s ongoing comparison between different models such as Claude 3 Opus, ChatGPT, and Gemini 1.5, each with its advocates claiming superiority in areas like coding benchmarks. It was highlighted that command-R Plus and Llama3-70b may not compete with GPT-4 but are still significant advancements.
AI and Sentience: A lively debate unfolded around AI’s potential for sentience or even possessing something akin to a ‘soul.’ Members discussed the complexity of defining consciousness and whether an AI could possess subjective experiences similar to biological entities.
Personal AI Model Training Viability: While some extolled the virtues of training personal AI models, others pointed out the limitations of computational power, data, and financial resources. The discussion covered training custom models, fine-tuning, and hybrid fusion as methods to personalize AI for individual use.
Technical Challenges with AI Development: The community talked about the difficulty of implementing functions like memory in AI at scale, noting that fine-tuning may lead to confusion within the model and suggesting the use of contextual information retrieval as a better alternative. Some members expressed dissatisfaction with current AI models, longing for the next big leap in technology for more “intelligent” AI.

Links mentioned:

Loo Loo Loo Butters Stotch GIF - Loo Loo Loo Butters Stotch South Park - Discover & Share GIFs: Click to view the GIF
Don't ask to ask, just ask: no description found

OpenAI ▷ #gpt-4-discussions (47 messages🔥):

Rate Limit Confusion: Members discussed being rate-limited when using custom GPTs. The limit is part of a rolling 3-hour cap for GPT-4 usage, and custom requests also count toward this limit.
Query on Memory for Team Rates: A user inquired about memory features for a Team rate, with another stating that even regular memory features seem to delete entries often.
Backend Bugs Busting User’s Patience: Users reported backend errors with the GPT URL “https://chat.openai.com/backend-api/gizmos/”, affecting their operations, although the issue was resolved quickly after testing.
Subscription Refund Risks: A user asked for a refund after subscribing to ChatGPT Plus due to high currency exchange rates and wondered if using the service would affect the refund process.
Curiosity about GPT-4 Speed and Voice Control: Discussion centered around GPT-4’s comparative slowness to GPT-3.5 and the absence of voice control on PC, despite its presence on mobile platforms.

OpenAI ▷ #prompt-engineering (7 messages):

Exploring the Unpredictable: One member described the phenomenon of emergence in LLMs, where quantitative increases in system size can lead to unexpected, qualitative changes, referencing a paper titled More Is Different to illustrate that large language models (LLMs) display behaviors not extrapolable from smaller-scale models.
Dalle Looking Emoticon Pampered: A user responded with a Dalle-emoticon without accompanying text.
The Three-Body LLM Problem: A member playfully coined the term “3 body LLM problem,” possibly referring to complex interactions in LLMs, akin to the three-body problem in physics, without providing further details.
Prompt Engineering as a Sport: A member suggested the idea of prompt competitions, where individuals compete to generate the best responses from LLMs.
Money for the Sharpest Prompt: Expansion on the competition concept was made, proposing both paid prompt competitions, with significant cash rewards, as well as more casual “playground competitions,” which would encourage community engagement and help users improve their prompt engineering skills through gamification and peer-to-peer assistance.

OpenAI ▷ #api-discussions (7 messages):

Emergence Topic Emerges in Discussion: Emergence in LLMs is characterized by new abilities or qualities not predictable by simply scaling SLMs. The concept is likened to the idea presented in the paper “More Is Different,” signifying that qualitative changes arise in systems beyond a certain quantitative point.
Prompt Competitions Suggested: A user proposed the idea of prompt competitions where participants vie to elicit the “best” answer from LLMs.
Monetizing Mastery of Prompts: It’s proposed to have paid prompt competitions, with a substantial yearly budget for distributing rewards, and free playground competitions to foster community assistance and engagement. Rewards might range from cash to special platform perks.
Frequent Challenges to Foster Skills: Regular competitions, around 4-5 a month, could provide consistent opportunities for individuals looking to improve their prompt engineering skills.

Eleuther ▷ #general (59 messages🔥🔥):

Apple’s New Models and The Pile’s Multilingual Data: The Pile dataset is not particularly multilingual, although portions like UN records may contain multiple languages. There is no special focus on languages like German.
Comparing GPT-NeoX and Megatron Variants: GPT-NeoX has diverged from Megatron primarily in terms of quality-of-life improvements and user experience. Features are tested before being integrated, with the aim of being more stable.
Infini-Attention’s Positional Encoding Query: The community discussed the absence of positional encodings in Infini-Attention’s hidden state memory, with some speculating on whether positional information is preserved through other mechanisms.
The Complex Calculations Behind Inference MFU: When evaluating good inference MFU (Memory Footprint Utilization), there are no simple off-the-shelf numbers; it largely depends on the hardware utilization and model specifics being used.
Speed Differences Between Models at Fireworks.ai: The conversation touched on why Mixtral 8x22B is served slower compared to llama 3 70B at Fireworks.ai, with factors like batching size and hardware utilization potentially influencing the disparity.

Eleuther ▷ #research (297 messages🔥🔥):

Benchmarking LLMs in Practice: Speculation over the real-world performance of various LLMs continues, with comparisons including phi-3-mini-128k against models like Llama-3-8B. However, disparities were noted in bits-per-byte performance metrics, suggesting differences in efficiency across models.
Exploring the Needle-in-a-Haystack Test: A Twitter thread highlighted that the needle-in-a-haystack test might imply a form of meta-awareness in models such as Claude 3 Opus. Yet, debate ensued over whether these responses indicate emergent abilities or artifacts of reward learning and prompt structures.
Self-Improvement in LLMs: Links to papers on LLM self-improvement strategies were shared, with methods like Self-Taught Reasoner (STaR) and reinforcement learning from human feedback (RLHF) being key discussion points.
Emergence in Language Models: The concept of “emergent abilities” in large language models (LLMs) was debated at length, with references to various papers and the acknowledgment that truly emergent abilities haven’t yet been quantifiably demonstrated under smooth, continuous metrics.
Innovations and Findings in LLM Research: Several papers were mentioned, including researching into redundant neural circuits in deep learning, and the creation of adversarial prompts for red-teaming against LLMs. Discussion also turned to whether speculative decoding can optimize model inference times without significant training adjustments.

Links mentioned:

Retrieval Head Mechanistically Explains Long-Context Factuality: Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the...
Tweet from Alex Albert (@alexalbert__): Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a model’s ...
Make Your LLM Fully Utilize the Context: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We ...
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding: We present a novel inference scheme, self-speculative decoding, for accelerating Large Language Models (LLMs) without the need for an auxiliary model. This approach is characterized by a two-stage pro...
Dragon curve - Wikipedia: no description found
Are Emergent Abilities of Large Language Models a Mirage?: Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguin...
Tweet from Jason Wei (@_jasonwei): Enjoyed this paper that plots emergent abilities with pretraining loss on the x-axis, which is actually a suggestion that @OriolVinyalsML also made a few years back: https://arxiv.org/abs/2403.15796 ...
Understanding Emergent Abilities of Language Models from the Loss Perspective: Recent studies have put into question the belief that emergent abilities in language models are exclusive to large models. This skepticism arises from two observations: 1) smaller models can also exhi...
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding: We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher ...
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs: While recently Large Language Models (LLMs) have achieved remarkable successes, they are vulnerable to certain jailbreaking attacks that lead to generation of inappropriate or harmful content. Manual ...
MoDE: CLIP Data Experts via Clustering: The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of ...
Linearly Mapping from Image to Text Space: Language models (LMs) can 'understand' images through a single tuned linear layer between a frozen image encoder and the LM input, showcasing the similarities in their conceptual representat...
Predicting Emergent Abilities with Infinite Resolution Evaluation: The scientific scale-up of large language models (LLMs) necessitates a comprehensive understanding of their scaling properties. However, the existing literature on the scaling properties only yields a...
RWKV-Gradio-2 - a Hugging Face Space by BlinkDL: no description found
Common arguments regarding emergent abilities — Jason Wei: This blog post doesn’t represent the positions of my employer (past, present, or future). I’ll review some common arguments that come up when discussing emergent abilities of large language models...
Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class: Vision-language models enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today's best models exhibit ...
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking: When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is imp...
Teaching Large Language Models to Reason with Reinforcement Learning: Reinforcement Learning from Human Feedback (\textbf{RLHF}) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance...
Training Chain-of-Thought via Latent-Variable Inference: Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improv...
LLM Control Theory Seminar (April 2024): Stay tuned for our new results in our preprint, "What’s the Magic Word? A Control Theory of LLM Prompting": https://arxiv.org/abs/2310.04444Follow twitter an...
GitHub - continuousml/Awesome-Out-Of-Distribution-Detection: A professionally curated list of papers, tutorials, books, videos, articles and open-source libraries etc for Out-of-distribution detection, robustness, and generalization: A professionally curated list of papers, tutorials, books, videos, articles and open-source libraries etc for Out-of-distribution detection, robustness, and generalization - continuousml/Awesome-Ou...

Eleuther ▷ #scaling-laws (1 messages):

Determining Cutoff via Non-Embedding Parameters: A participant suggested using non-embedding parameters as a method for determining the cutoff point in models. The recommendation is to observe where the delta of the fit curve for each removed point becomes very low, which could lead to a reasonably educated guess beyond the initial estimation of sub-200 million parameters.

Eleuther ▷ #interpretability-general (9 messages🔥):

Anthropic Shares New Research Insights: The Anthropic interpretability team has released an April update with developments and emerging research ideas. This includes topics like scaling laws, training Spare Autoencoders (SAEs), and a project on interpretability architectures.
Discovering the Refusal Mechanism in LLMs: A crosspost from AI Alignment Forum unlocks findings about how modern Large Language Models (LLMs) are fine-tuned to refuse harmful requests. It suggests that refusal may be activated by a single direction within the network.
Weight Orthogonalization Versus Fine-tuning: In the context of fine-tuning LLMs for specific behaviors, a member hypothesized that weight orthogonalization could be viewed as a form of manual fine-tuning to impact network behavior.
Refusal Directions and Rank-1 LoRA Fine-tuning Explored: A member proposed that if rank-1 LoRA (Low-Rank Adaptation) fine-tuning with Stochastic Gradient Descent (SGD) is performed, the network might learn the negative of the ‘refusal direction’.
Llama.cpp Integrates Control Vectors Technique: Control vectors, a technique similar to what was being discussed, have been added to llama.cpp, as demonstrated in this GitHub pull request, thanks to the collaboration with Nous Research.

Links mentioned:

Circuits Updates - April 2024: no description found
Refusal in LLMs is mediated by a single direction — LessWrong: This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from…
Add support for control vectors by vgel · Pull Request #5970 · ggerganov/llama.cpp: Many thanks to Nous Research, whose support and collaboration made this work possible! This PR introduces a new activations hacking technique, control vectors (also known as steering vectors, conce...

Eleuther ▷ #lm-thunderdome (5 messages):

CLA Confusion in PR Submissions: A member encountered an issue with the Contributor License Agreement (CLA) showing as unsigned despite them having signed it, which might be due to GitHub anonymizing their email in commits. The matter was acknowledged and agreed upon for further investigation.
Uncertainty Over Failing Checks in PR: Concern arose over a failing check in a submitted pull request, with the member questioning if it was related to their changes. The issue was reviewed and preliminarily agreed to be unrelated.
Chat Template Branch Stagnation Inquiry: A member inquired about the progress and activity regarding a branch dedicated to adding chat templating, noting the last commit was two months prior. There was no immediate update on the current status or progress.
Prompt Versatility for Evaluation Harness: A member raised a point about the lack of variable prompt formats that cater to model-specific finetuning in the evaluation harness. Another participant suggested the use of a custom !function to enable distinct prompts based on the model.

Link mentioned: add task for mmlu evaluation in arc multiple choice format by jonabur · Pull Request #1745 · EleutherAI/lm-evaluation-harness: This PR adds the mmlu_arc_style task that presents the MMLU questions in the same manner as the arc evals (loglikelihood for the answer as a continuation, rather than selecting the letter for the c…

Eleuther ▷ #gpt-neox-dev (1 messages):

Concerns Over Cluster Setup Practices: A comment was made highlighting the lack of assurance that the correct version of tokenizers is used during cluster setup as there’s a possibility that someone might just do a blind pip install tokenizers without using the pinned version. It was noted that this could affect any run, and one would need to ensure that what’s in the python environment is logged to be certain of the used version.

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Soliloquy 8B Shifts to Paid Model: Soliloquy 8B’s usage is now paid, costing $0.1 per 1M tokens. This pricing update reflects OpenRouter LLC’s recent policy change.
Price Jump for Soliloquy 8B: The price for using Soliloquy 8B was revised again to $0.2 per 1M tokens. The new rate comes shortly after the initial pricing was introduced.
Routing Updates and Corrections: anthropic/claude-instant-1 model routing was updated to claude-instant-1.2, and a routing error concerning anthropic/claude-2.0 was corrected with a restoration of service as it remains a valid model ID.
Restoration of Claude v2.1 and Variants: The Anthropic: Claude v2.1 model and its :beta variant have been reinstated following the clarification on model availability during the recent confusion with older claude models.

Links mentioned:

Anthropic: Claude v2 by anthropic | OpenRouter: Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a...
Lynn: Llama 3 Soliloquy 8B by lynn | OpenRouter: Soliloquy-L3 is a fast, highly capable roleplaying model designed for immersive, dynamic experiences. Trained on over 250 million tokens of roleplaying data, Soliloquy-L3 has a vast knowledge base, ri...
Lynn: Llama 3 Soliloquy 8B by lynn | OpenRouter: Soliloquy-L3 is a fast, highly capable roleplaying model designed for immersive, dynamic experiences. Trained on over 250 million tokens of roleplaying data, Soliloquy-L3 has a vast knowledge base, ri...

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

Exploring Syrax: A member expresses interest in experimenting with Syrax and offers support, initiating a private conversation with a friend request for further collaboration.
Friend Request Accepted: Another community member acknowledges the support offered and confirms the acceptance of the friend request, showing appreciation.
Impressed by the Showcase: A single, short expression of admiration is directed toward the ongoing discussions or showcased projects, reflecting a positive impression.

OpenRouter (Alex Atallah) ▷ #general (311 messages🔥🔥):

Claude Models’ Quirky Behavior Unraveled: Members discussed issues with Claude models returning incomplete outputs or HTTP 524 errors via OpenRouter. Clarifications led to discovering that Claude models have a max generation of 4k tokens and can read up to 200k tokens, while the right settings could improve API responses.
Lemmyle Dissects WLM-2 Hosting Economics: An intense breakdown of WLM-2 hosting costs was presented, surmising that a profit could be marginal depending on various factors like GPU utilization, electricity costs, and potential revenue from idle GPUs.
FireLLaVA’s Silent Entry into Multimodality: There were musings about the under-the-radar launch of FireLLaVA, an open multimodal model noted for its quick startup time, marking a notable addition to the OpenRouter ecosystem.
Deployment Dilemmas and Frugal Frontends: A member sought a simple frontend to host on shared hosting to allow family members to use their OpenRouter services without multiple OpenAI subscriptions. Suggestions ranged from using Vercel for its free tier to opting for more affordable VPS providers, such as Contabo.
Cohere’s Conundrum in OpenRouter Contexts: A member faced odd output discrepancies when using Cohere models through OpenRouter compared to direct API calls, with generated content unrelated to prompts. It was clarified that web connector support for Cohere is pending, and its addition to OpenRouter is anticipated but not yet available.

Links mentioned:

Models overview: no description found
Home | ChatGPT Web Share Docs: no description found
OpenRouter: A router for LLMs and other AI models
WizardLM-2 8x22B by microsoft | OpenRouter: WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing ...
Meta: Llama 3 8B Instruct by meta-llama | OpenRouter: Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...
FireLLaVA 13B by fireworks | OpenRouter: The first commercially permissive OSS LLaVA model. This vision-language model was trained entirely on OSS LLM generated instruction following data.
Clay - Scale personalized outbound: Combine 50+ data providers, real-time scraping, and AI to send 1-1 personalized campaigns that book more meetings.
Managed Server: Dein eigener Server, zuhause in der Schweiz: no description found
Llava 13B by haotian-liu | OpenRouter: LLaVA is a large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking [GPT-4](/models/open...

OpenAccess AI Collective (axolotl) ▷ #general (169 messages🔥🔥):

Washington’s Wizards: Unchanged Repository: Despite rumors, the WizardLM models from Microsoft have not been removed by Microsoft; a member clarified that wizardlm was responsible for the changes. They also confirmed that the WizardLM repository remains publicly available.
Fine-Tuning vs. RAG for Domain-Specific LLMs: New members inquired about fine-tuning for domain-specific language models, questioning the necessity versus using Retrieval-Augmented Generation (RAG). The conversation noted examples such as OpenBioLLM and referenced a medical-focused LLM paper for further reading.
Configurations for Conversation Tokenization Issues: There was a thorough discussion on tokenization strategies for models like LLaMA-3, including the necessity to manually install the latest version of the fastchat formatter and referencing a relevant axolotl pull request for correct conversational formatting templates.
Quantization and Model Degradation Debate: Members debated the effects of quantization strategies on LLMs, specifically comparing the 4bit lora and 4bit qlora methods. The consensus is that quantization sensitivity varies depending on training, with one member citing a Twitter thread discussing more significant degradation in more extensively trained models like LLaMA-3.
Sample Packing Clarification for Preventing OOM: A member sought clarification on multipack sampling and its relation to out-of-memory (OOM) errors. It was explained that sampling does not affect the maximum sequence length allowed by the model and only packs multiple samples into the maximum sequence length without altering context size.

Links mentioned:

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models: Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the...
WizardLM - a microsoft Collection: no description found
Tweet from Rohan Paul (@rohanpaul_ai): Quantization is quite harmful for LLaMA 3 than for LLaMA 2. This PR in llama cpp repo investigates it well. (Perplexity measures how well the model can predict the next token with lower values being...
Efficient Continual Pre-training for Building Domain Specific Large Language Models: Large language models (LLMs) have demonstrated remarkable open-domain capabilities. Traditionally, LLMs tailored for a domain are trained from scratch to excel at handling domain-specific tasks. In th...
Anima/air_llm at main · lyogavin/Anima: 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU - lyogavin/Anima
GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. - lm-sys/FastChat
feat: Add LLaMA-3 instruct prompt strategies for fine-tuning by 0-hero · Pull Request #1553 · OpenAccess-AI-Collective/axolotl: Description This builds on top of and includes the changes in the below PR's #1542 #1539 Fastchat PR from @TJ-Solergibert needs to be merged before merging this lm-sys/FastChat#3257 Motivatio...
FastChat/fastchat/conversation.py at main · lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. - lm-sys/FastChat
FastChat/fastchat/conversation.py at main · lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. - lm-sys/FastChat

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (37 messages🔥):

Memory Requirements for Fast Fourier Transform: A discussion about significant memory requirements to run Fast Fourier Transform (FFT) with zero3 on 2x24GB graphics cards. A member suggested that 167GB of RAM might be necessary, lamenting the lack of sufficient memory.
Exploring VRAM Reduction via torchtune: One member advised trying torchtune, noting its focus on reducing VRAM usage. Another member debated the question of using FSDP (Fully Sharded Data Parallel) but reported that the training begins yet hangs without progressing or throwing errors.
Disc Usage Soars with Fast Fourier Transform: While attempting to train a model, the system’s swap memory skyrocketed to 62GB, causing an out-of-memory error. The participant expressed surprise at the excessive disk and swap usage even when the job theoretically fit within a single 48GB card setup.
ZeroGPU Access for Experiments: One member highlighted that they have access to the Huggingface Zero project, prompting a discussion on potential tests. It aims to provide free GPU access for Huggingface Spaces and supports Spaces running on multiple GPUs simultaneously.
Log Sharing and Iteration Woes: A user linked their wandb.ai logs for those interested in the details of their Fast Fourier Transform trials, noting extremely long iteration times of 800 seconds compared to 17 seconds for a qlora iteration, highlighting performance issues.

Links mentioned:

vsungwaterloo: Weights & Biases, developer tools for machine learning
zero-gpu-explorers (ZeroGPU Explorers): no description found

OpenAccess AI Collective (axolotl) ▷ #general-help (23 messages🔥):

Troubleshooting AttributeError: A user encountered an AttributeError related to 'TextIteratorStreamer' not having an attribute 'empty'. They questioned the function’s validity given they are using the transformers version 4.40.0.
Inquiry About Llama-Pro Method: There were multiple discussions regarding the usage of the llama-pro method highlighted by Jeremy Howard. Links to GitHub repositories were shared (fsdp_qlora), indicating a 4-bit quantized Llama-Pro fine-tuning method, with conversation pivoting around whether or not this method is accessible in axolotl and potentially requiring a pull request.
Integrating Custom Audio Recording in Twilio: A user explained their effort to integrate custom audio recording with Twilio and how to capture and store audio in real-time, while being able to provide a response to the recorded audio.
Combining QLORA Adapter Fine-Tuning: Users discussed the need to merge a qlora adapter fine-tuning model before conducting additional fine-tuning for a Q/A style, as well as the effects that subsequent fine-tunings might have on preserving model characteristics. Further conversation alluded to combining conversational and completion models into one fine-tune, with a reference to an example in a community showcase.
PEFT Model for Faster LLM Fine-Tuning: A brief mention was made of an unsloth peft model, supposed to fine-tune LLMs like Mistral significantly faster with less memory usage, although with additional optimizations, suggesting it’s loaded differently from Hugging Face models.

Links mentioned:

GitHub - AnswerDotAI/fsdp_qlora: Training LLMs with QLoRA + FSDP: Training LLMs with QLoRA + FSDP. Contribute to AnswerDotAI/fsdp_qlora development by creating an account on GitHub.
GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
GitHub - AnswerDotAI/fsdp_qlora at 467933f713cc7808564cbfac3524e75aadd04987: Training LLMs with QLoRA + FSDP. Contribute to AnswerDotAI/fsdp_qlora development by creating an account on GitHub.
GitHub - OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
GitHub - OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
fsdp_qlora/train.py at 467933f713cc7808564cbfac3524e75aadd04987 · AnswerDotAI/fsdp_qlora: Training LLMs with QLoRA + FSDP. Contribute to AnswerDotAI/fsdp_qlora development by creating an account on GitHub.

OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (44 messages🔥):

GPU Scaling and Batch Sizes Explained: A conversation detailed the intricacies of scaling up the number of GPUs from 4 to 8 and adjusting micro batch sizes. It clarified that while the total batch size may remain constant, factors like gradient accumulation, learning rate scaling, parallelism strategies, and communication overhead differ and influence the training dynamics and performance outcomes.
Query on Model Loading Across GPUs: The question was raised about whether models are loaded in full or split when using multiple GPUs. It was explained that models can be loaded either as a full size or sharded across GPUs, a technique facilitated by Fully Sharded Data Parallelism (FSDP) and optimizations like DeepSpeed’s ZeRO Stage 3, helping in efficient utilization of hardware resources.
LoRA vs. QLoRA – Adaptation Techniques Demystified: Discussion touched upon the differences between LoRA (Layer-wise Relevance Analysis) and QLoRA (Quantized Layer-wise Relevance Analysis), detailing how the latter extends LoRA by adding quantization to further reduce the computational cost and memory requirements during fine-tuning and deployment.
Dataset Trimming Strategy for Axolotl: The situation of trimming datasets in the Axolotl config was addressed by suggesting an approach that doesn’t directly specify a percentage of the dataset but rather involves modifying the dataset loading logic to include a subsampling step, potentially using methods provided by datasets library functions.

Links mentioned:

accelerate/docs/source/concept_guides/big_model_inference.md at main · huggingface/accelerate: 🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed suppo.....
peft/docs/source/accelerate/fsdp.md at main · huggingface/peft: 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - huggingface/peft
OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
peft/docs/source/accelerate/deepspeed.md at main · huggingface/peft: 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - huggingface/peft
OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.

OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (12 messages🔥):

LLaMa Prompt Support Inquiry: A member inquired if axolotl supports LLaMa 3 prompt format for ShareGPT. The response indicated there’s no mention of specific “llama 3” model support within the OpenAccess-AI-Collective/axolotl documentation.
Fine-Tuning a QLoRA Model: A member shared their success in creating a fine-tuned text completion model with qlora from Mistral-7B. They sought guidance on making the model conversational and were advised they could directly fine-tune using their QLoRA-adapted model on a Q/A dataset.

Links mentioned:

OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.

Modular (Mojo 🔥) ▷ #general (2 messages):

Modular Commits on the Rise: Since the stdlib was open-sourced, 23% of commits have been made to modularml/mojo. This indicates a surge in activity and contributions to the project.

Modular (Mojo 🔥) ▷ #💬︱twitter (4 messages):

Modular Tweets Link Sharing: Members in the 💬︱twitter channel shared multiple tweets from Modular. Relevant tweets included updates or announcements, linked as follows: Tweet 1, Tweet 2, Tweet 3, and Tweet 4.

Modular (Mojo 🔥) ▷ #✍︱blog (1 messages):

Multimodal Search Boosted by MAX Engine: The recent blog post by Modular discusses the advantages of a multimodal search that combines textual and visual data. MAX Engine, which already outperformed PyTorch eager and ONNX runtime in previous benchmarks, is also capable of optimizing inference for multimodal models.

Link mentioned: Modular: Multimodal Search with Snowflake Embedding and MAX Engine: We are building a next-generation AI developer platform for the world. Check out our latest post: Multimodal Search with Snowflake Embedding and MAX Engine

Modular (Mojo 🔥) ▷ #ai (2 messages):

Troubleshooting Mojo Installation: A user reported an issue with installing Modular (Mojo 🔥) on Python 3.12.3. The response suggested using a Conda virtual environment and provided instructional links, Modular manual on Python and Modular blog post, emphasizing that Mojo is a superset of Python and compatible with Python modules.
Working on Mac M1: A different member noted that they are running the latest Mojo, including the nightly version, with Python 3.12.3 on a Mac M1 successfully. They recommend using Conda for an easier setup, pointing out that Mojo’s intent is to be compatible with Python code and existing Python packages.

Link mentioned: Python integration | Modular Docs: Using Python and Mojo together.

Modular (Mojo 🔥) ▷ #🔥mojo (113 messages🔥🔥):

Switch from Python to Mojo Issue: A user shared Python code and asked for assistance in converting it to Mojo. Another user provided a detailed Mojo conversion with explanations about function declarations and variable types in Mojo.
ModularBot Chimes In: ModularBot interjected, celebrating user @110077104611172352 reaching level 5 and user @289473226147495936 reaching level 1. Congrats were later given to @932397073427476521 for reaching level 18, with a playful response from ModularBot about celebrating with a banquet.
Matrix Slicing and Memory Ownership: A Mojo user inquired about creating a non-owning view of a list’s subset without extra allocation. It was clarified that for indirect memory access, one should use the Buffer type rather than List, since List owns its data and Buffer is under redesign for life time management.
MoJo for Intel Mac Inquiry: When questioned about Mojo for Intel Mac, a user responded that there’s hope for support soon but currently using the playground is the only option.
Troubleshooting a Matrix Implementation: A user having trouble with matrix division in Mojo due to the lack of an implemented __truediv__ function was advised to review their code and ensure operations were only being performed on non-zero values.
Discussion on Mojo’s Integration with Existing Libraries: The goal of Mojo language is discussed, emphasizing that Mojo aims to integrate into the Python ecosystem and utilize existing libraries, rather than replacing them entirely. It’s noted that Mojo’s long-term direction includes seamless use of existing tools like Numpy.
Levels and Learning in Discord: Users discuss their progress through levels in the channel; one user advanced to level 18 after a year, while others question the ranking methodology given disparate expertise levels.

Links mentioned:

Pokemon Pikachu GIF - Pokemon Pikachu Clap - Discover & Share GIFs: Click to view the GIF
memory | Modular Docs: Defines functions for memory manipulations.
Mojo🔥 roadmap & sharp edges | Modular Docs: A summary of our Mojo plans, including upcoming features and things we need to fix.
Why is the parameterized version of this function slower than the vanilla one? · modularml/mojo · Discussion #2270: Hi, I wrote some benchmarks to see how mojo performs in matmul, following as a guide this: https://docs.modular.com/mojo/notebooks/Matmul. However, I noticed that my version with parameters is slow...
Matrix multiplication in Mojo | Modular Docs: Learn how to leverage Mojo's various functions to write a high-performance matmul.
dynamic_vector.mojo/README.md at main · mikowals/dynamic_vector.mojo: An experimental drop-in replacement for Mojo stdlib DynamicVector that demonstrates new features using References - mikowals/dynamic_vector.mojo
Issues · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
playground.mojo: GitHub Gist: instantly share code, notes, and snippets.
playground.mojo: GitHub Gist: instantly share code, notes, and snippets.
[Feature Request] Native Windows support · Issue #620 · modularml/mojo: Review Mojo's priorities I have read the roadmap and priorities and I believe this request falls within the priorities. What is your request? native support for windows. when will it be available?...

Modular (Mojo 🔥) ▷ #community-projects (1 messages):

uncle_jee: Use Mojo to write a Mojo community https://github.com/shadowqcom/mojo_dev

Modular (Mojo 🔥) ▷ #community-blogs-vids (5 messages):

Crafting Better Tutorials: rd4com highlighted tips for making tutorials, emphasizing the use of emojis for visual references, simplicity in language, clarity in naming, avoiding information overload, gradually increasing complexity, and iterating for refinement. They also stressed on linking to Mojo documentation and logically building upon previous content.
Diátaxis Framework for Documentation: sophiaglencairn shared a link to Diátaxis, a systematic approach to creating technical documentation, outlining four types of documentation needs: tutorials, how-to guides, technical reference, and explanation. Diátaxis addresses content, style, and architecture issues in documentation to benefit both users and creators.

Link mentioned: Diátaxis: no description found

Modular (Mojo 🔥) ▷ #performance-and-benchmarks (55 messages🔥🔥):

Exploring __copyinit__ and GitHub Gists: A discussion revolved around __copyinit__ behavior and whether it’s a type author’s responsibility to implement copy-on-write semantics. The conversation pointed to a specific Gist for context.
Dictionary Performance Intricacies: Performance concerns regarding dictionaries in Mojo were discussed, citing significant speed differences between Mojo and Python. A member shared their experiences with porting a tokenizer and linked to a relevant discussion and a tokenization library for reference.
Compact-dict Library Offers Hope: Amidst conversations about dictionary performance, the Compact-dict library was put forward as a faster alternative to the standard Mojo dictionary, though it doesn’t store keys and might require changes to use cases or additional features in the future.
Memory Allocation Queries: Members inquired about the differences in performance and functionality between stack_allocate and heap allocation methods like DTypePointer.alloc/Pointer.alloc. There was an exchange on when to use stack or heap, and insights into their cost differences were shared, emphasizing that typically stack allocation is faster and less complex than heap allocation.
Optimizing SIMD Operations for Error Correction Code: In search of achieving better performance for an error correction code library, a member sought advice on optimizing a function using SIMD. The conversation included discussions on function inlining, use of fma, and potential mathematics tricks for improvements. The specific project mentioned was mocodes.

Links mentioned:

When is it best to use the stack instead of the heap and vice versa?: In C++, when is it best to use the stack? When is it best to use the heap?
When is it best to use the stack instead of the heap and vice versa?: In C++, when is it best to use the stack? When is it best to use the heap?
playground.mojo: GitHub Gist: instantly share code, notes, and snippets.
GitHub - alainrollejr/mocodes: Error Correction (De)Coding with Mojo: Error Correction (De)Coding with Mojo. Contribute to alainrollejr/mocodes development by creating an account on GitHub.
GitHub - mzaks/compact-dict: A fast and compact Dict implementation in Mojo 🔥: A fast and compact Dict implementation in Mojo 🔥. Contribute to mzaks/compact-dict development by creating an account on GitHub.
Why is Mojo's dictionary (or for loop) slower than Python's? · modularml/mojo · Discussion #1747: I used Mojo's (v. 0.7.0) dictionary data structure to calculate the frequency of words in a file with 230+ million words, and did the same with Python. Surprisingly, Python was 7x times faster tha...
GitHub - karpathy/minbpe: Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.: Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization. - karpathy/minbpe
[stdlib] Fix dict probing error by mzaks · Pull Request #2351 · modularml/mojo: Fixes #1729
[Proposal] Improve the hash module by mzaks · Pull Request #2250 · modularml/mojo: This proposal is based on discussion started in #1744

Modular (Mojo 🔥) ▷ #🏎engine (3 messages):

Continuous MAX Optimization: The team is regularly optimizing MAX with each release. Knowing the specific core types and models used by individuals can provide further insights into performance enhancements.
Clarifying Speed Improvements: A member pointed out a discrepancy in reported speed improvements between TensorFlow (tf) and PyTorch, suggesting they shouldn’t be the same due to differences in queries per second (QPS).
Correct Speedup Printouts Confirmed: Another member confirmed seeing the correct speedup numbers reflecting proportionate QPS improvements after updating the max example repository and clearing the .cache in the performance-showcase directory.

Modular (Mojo 🔥) ▷ #nightly (85 messages🔥🔥):

Frequent Updates for Nightly Branch Discussed: Automation challenges are delaying the goal of releasing the nightly branch every weekday, with concerns raised about the delay between code merges and commits appearing in the branch making it hard to fix conflicts. There’s ongoing discussion to find solutions, ensuring the nightly stdlib can build and run correctly with the released nightly compiler.
Nightly Mojo Compiler Release Notification: The announcement of a new nightly Mojo compiler highlighta the availability of updates and changes, with a detailed pull request and a changelog available for review.
Discussions on Overloads and Traits in Mojo: Debates surfaced regarding the behavioral consistency of overloads and the use of traits, touching on language features like parametric algorithms. The community is thinking through the trade-offs of different methods, like overloading, precedence decorators, and return type variations, while expressing concerns about the potential for confusion and bugs when modifying the behavior of objects via type information.
Code Execution Difference Between Stable and Nightly: A user reported an issue where code that works in the stable version of Mojo causes an error with a nightly build, suggesting a possible file handle lifetime management problem in the nightly version. This sparked a conversation leading to the opening of an issue on GitHub.
Importing Challenges in Mojo’s Standard Library: A user encountered difficulties importing functions from the math package into the string.mojo and string_literal.mojo files, which was explained as a design decision to avoid circular dependencies between open-source and closed-source parts of the stdlib. The workaround recommended is to re-implement the necessary math functions in the open-source portion of the standard library.

Links mentioned:

Mojo Team Answers | Mojo Dojo: no description found
[mojo-nightly] struct lifetime issue · Issue #2429 · modularml/mojo: Bug description In the following test demo. It seems the destructor is called on the filehandle instead of move. The demo runs without problems with stable but i get the following with nightly: fil...
[stdlib] Update stdlib corresponding to 2024-04-26 nightly/mojo by patrickdoc · Pull Request #2418 · modularml/mojo: This updates the stdlib with the internal commits corresponding to today's nightly release: mojo 2024.4.2621 .
mojo/docs/changelog.md at nightly · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.

LlamaIndex ▷ #blog (6 messages):

Workshop Materials for Building LLM Apps: Llama Index announced a workshop with AWS showcasing 3 patterns for LLM app development including using S3 for data ingestion and AWS Bedrock for embeddings.
Llama Index on ML Security Podcast: The co-founder of Llama Index discussed LLM-based application futures and data security on the mlsecops podcast, also touching on tools like LlamaParse and LlamaCloud.
RAG Tutorial Series for Production: Marco Bertelli launched a 9-part series focused on taking RAG from a prototype to a production environment, outlining necessary architectural components for deployment.
Enhancing RAG with Multi-Stage Retrieval: An article by Michael R. from KX Systems suggests a multi-hop retrieval process using Llama Index and Cohere reranking to improve context and reduce hallucinations for LLMs, as detailed in their post.
Long-Term Memory for Autonomous Agents: Introducing memary, a reference implementation for long-term memory using knowledge graphs, aimed at enhancing memory functions in autonomous agents using LLMs as explored in this tweet.

LlamaIndex ▷ #general (155 messages🔥🔥):

Trouble with awsbedrock and LlamaIndex: A member encountered an error when trying to use awsbedrock with LlamaIndex which prompted a “NoRegionError” from botocore. Upon following suggestions to ensure region_name is specified, the issue was resolved.
Using Local LLM with LlamaIndex: Members shared links to LlamaIndex’s documentation and examples for setting up LLMs locally, particularly referencing a “5 lines of code” example using BAAI/bge-small-en-v1.5 and Mistral-7B on LlamaIndex’s documentation.
LlamaIndex Import Issues Solved: Several members discussed troubleshooting import errors related to llama-index packages such as llama-index-llms-ollama. Solutions included installing specific packages individually and confirming correct installation steps.
Updating Indices and Documents on Vector Stores: Conversations focused on actions such as updating indices on Pinecone using LlamaIndex and adding metadata keys to existing vectors. A member suggested that updating a node with the same ID will overwrite it. However, no direct solution was provided for adding metadata without modifying vectors.
Retrieving Documents with LlamaIndex: Members inquired about retrieving multiple documents via query_engine.retrieve() while ensuring diversity among the retrieved documents. Suggestions included adding metadata keys to existing vectors and setting parameters like mmr_diversity_bias when creating the retriever.

Links mentioned:

LlamaIndex: Official YouTube Channel for LlamaIndex - the data framework for your LLM applications
Starter Tutorial (Local Models) - LlamaIndex: no description found
answerbot/answerbot/replay_client.py at main · zby/answerbot: answering questions using LLMs, search (RAG) and other tools - example code - zby/answerbot
Auto-Retrieval from a Vectara Index - LlamaIndex: no description found
Query Engines + Pydantic Outputs - LlamaIndex: no description found
GitHub - zby/LLMEasyTools: Tools for LLM agents.: Tools for LLM agents. Contribute to zby/LLMEasyTools development by creating an account on GitHub.
Typesense Vector Store - LlamaIndex: no description found
Retriever - LlamaIndex: no description found
Llama Tonic : Transcribe by Josephrp · Pull Request #13137 · run-llama/llama_index: Description Adds Distill Whisper Tool for Quick and Precise Transcription , without ever leaving llama-index New Package? Did I fill in the tool.llamahub section in the pyproject.toml and provide...
Agents - LlamaIndex: no description found
Frequently Asked Questions (FAQ) - LlamaIndex: no description found
Controllable Agents for RAG - LlamaIndex: no description found
Metaphor - LlamaIndex: no description found
Context - LlamaIndex: no description found
Lower-Level Agent API - LlamaIndex: no description found
Building an Agent around a Query Pipeline - LlamaIndex: no description found
GitHub - run-llama/llamabot: Contribute to run-llama/llamabot development by creating an account on GitHub.

LlamaIndex ▷ #ai-discussion (2 messages):

GPT-1: The Unsung Hero: A member revisited the original GPT-1 model, reflecting on its contribution to the evolution of language models, and has written a blog post on the subject. It posits that the model has “stood the test of time quite well over 6 years,” implying that some modern systems like Mistral-7B are vastly scaled up derivatives of GPT-1.

OpenInterpreter ▷ #general (127 messages🔥🔥):

Flask Server Frustration: A member encountered an error when trying to run a local Flask server, revealing a need to set the api_key and several further issues, including namespace conflicts and connection errors. They attempted to use a dummy key (interpreter.llm.api_key = "dummykey") and contemplated editing a pydantic config to overcome a namespace issue.
OpenInterpreter 0.2.5 New Release Inquiry: A member asked about the Open Interpreter 0.2.5 New Computer Update, leading to a clarification that it has moved beyond beta.
Groq Challenges for OI Integration: Several members discussed difficulties when trying to run Open Interpreter with Groq, ultimately concluding that Groq support isn’t currently integrated into OI. A Github pull request (#1238) for adding Groq support was mentioned, which is pending approval.
Hardware Queries for O1 and Global Vision: Members conversed about the Open Interpreter’s remote communications and whether O1 can function with voice instruction in languages other than English. There were also discussions on installing O1 client on other devices, like the Rabbit r1, and leveraging the client’s existing voice support.
Collaborations and Contributions Ramp Up: Members shared progress and calls for assistance on various projects intertwined with OpenInterpreter, such as llm-switcher, an open-source AI tools suite including AAA+ and MagicLLight, and potential Groq API implementations. Community code sharing occurred, with ongoing efforts to troubleshoot and improve support for different models and functionalities.

Links mentioned:

Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Ya Filthy Animals GIF - Ya Filthy Animals - Discover & Share GIFs: Click to view the GIF
Exclusive: Inside the Rise of Jesse Lyu and the Rabbit R1: Rabbit’s founder and CEO, Jesse Lyu, tells all about the origins of the R1, how he worked with Teenage Engineering to design it in "10 minutes," and what he thinks about the AI gadget compet...
TikTok - Make Your Day: no description found
Hidden Markov and semi-Markov models: When and why are these models useful for classifying states in time series data?: Hidden Markov models (HMMs) and their extensions have proven to be powerful tools for classification of observations that stem from systems with temporal dependence as they take into account that obse...
MASSIVE Step Allowing AI Agents To Control Computers (MacOS, Windows, Linux): OS World gives agents the ability to fully control computers, including MacOS, Windows, and Linux. By giving agents a language to describe actions in a compu...
Added Groq Support by fire17 · Pull Request #1238 · OpenInterpreter/open-interpreter: Describe the changes you have made: Groq's official python api now fits well into oi flow, no errors. Though final answers are halucinated rather than actual output. Seems to plan, write code, but...
GitHub - stableagents/llmswitcher: Routes to the most performant and cost efficient LLM based on your prompt [ 🚧 WIP ]: Routes to the most performant and cost efficient LLM based on your prompt [ 🚧 WIP ] - stableagents/llmswitcher
Google Colaboratory: no description found
GitHub - sgl-project/sglang: SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.: SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable. - sgl-project/sglang
C:\WINDOWS\system32>pip install pywin32Requirement already satisfied: pywin32 - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

OpenInterpreter ▷ #O1 (25 messages🔥):

Custom 3D Project Housed in Mystery: Members are intrigued by a custom 3D printed case for OpenInterpreter’s 01 project, prompting discussions around personal attempts and the fun of tactile keys. One member provided a YouTube video showcasing the project but noted it wasn’t their own work.
The Dawn of 01 Heavy: Chat includes anticipations of a new device, 01 Heavy; no expected launch date is provided. Comparisons draw links to it potentially powering future robots.
Amazon Alternatives Seek Acceptance: Queries rise about using Amazon Echo Smart Speaker Dev Kit as an alternate solution for open project builds, but no confirmation is shared regarding compatibility.
Open AI Ethics in Question with Microsoft’s Capabilities: A discussion emerges highlighting Microsoft’s ability to create and modify files, with OpenInterpreter touted as capable of meeting diverse user desires.
Update Expectations Set for 01 Light: A member mentions an upcoming discussion this Tuesday to reveal an updated timeline for the 01 Light’s ETA.

Links mentioned:

Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Tweet from killian (@hellokillian): @timshi_ai @Human_B_ee @OpenInterpreter @Grimezsz custom made for @grimezsz, created by @fieroty! internally it's a super easy build, just two amazon products: macro keypad: https://shorturl.at/q...
Tweet from Bee 🐝 (@bee_human_): my new audio engineer is @openinterpreter's 01
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments: no description found
no title found: no description found

Latent Space ▷ #ai-general-chat (100 messages🔥🔥):

Berkeley Introduces Tool Calling Leaderboard: The Berkeley Function Calling Leaderboard evaluates LLMs’ ability to call functions, offering a novel and periodically updated real-world benchmarking system.
Voice AI On the Rise: ElevenLabs has sparked interest, leading to discussions about other Voice AI startups like Unreal Speech and Hume, a space once occupied by now-defunct Coqui.
Exploring the Limitations of LLMs: An article on Strangeloopcanon contemplates the perennially surprising capabilities of LLMs while discussing their current failure modes and the concept of “goal drift” as possible directions for improvement.
Potential Acquisition Moves in the AI Sector: Nvidia’s reported acquisitions of Israeli AI companies, Deci AI and Run:ai, indicate a strategic move to enhance efficiency and performance on their GPUs and AI servers.
Adventures in Large Context Models: Conversations about practical applications and the future of large context models were spurred by Llama 3’s extension to a 1M token context window.

Links mentioned:

Tweet from Mark Huang (@markatgradient): 1M context length Llama-3 8B Model. Enough said. Up on HF @ClementDelangue cc: @winglian @mattshumer_ ↘️ Quoting Gradient (@Gradient_AI_) We've been in the kitchen cooking 🔥 Excited to ...
Berkeley Function Calling Leaderboard (aka Berkeley Tool Calling Leaderboard) : no description found
What can LLMs never do? : On goal drift and lower reliability. Or, why can't LLMs play Conway's Game Of Life?
Tweet from mephistoooOOHHHHHHSHI- (@karan4d): Ok it’s definitely using GPT-4 tokenizer so I’m betting it is 4.5 as well. Always fingerprint w anomalous tokens
WebSim, WorldSim, and The Summer of Simulative AI — with Joscha Bach of Liquid AI, Karan Malhotra of Nous Research, Rob Haisfield of WebSim.ai: Three perspectives on the most viral fringe of generative AI this year: Simulative AI!
Tweet from Siqi Chen (@blader): i think @websim_ai is one of the first truly ai native products, and will be as impactful as chatgpt. instead of a chatbox, websim allows you to explore the latent space of an LLM via URLs and hyperl...
Unreal Speech: Text-to-Speech API for Scale: Slash Text-to-Speech Costs by up to 90%. Up to 10x cheaper than Eleven Labs and Play.ht. Up to 2x cheaper than Amazon, Microsoft, and Google.
AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback: The notable success of large language models (LLMs) has sparked an upsurge in building language agents to complete various complex tasks. We present AMOR, an agent framework based on open-source LLMs,...
GitHub - kingjulio8238/memary: Longterm Memory for Autonomous Agents.: Longterm Memory for Autonomous Agents. . Contribute to kingjulio8238/memary development by creating an account on GitHub.
Reddit - Dive into anything: no description found
Nvidia to purchase Israeli deep studying co Deci AI - report - Dannywrites: US chip large Nvidia Corp. has struck a deal to amass Israeli deep studying developer Deci AI, "The Data" studies, in keeping with an individual concerned
What Characterises an Effective Mindset Intervention in Enhancing Students’ Learning? A Systematic Literature Review: In recent years, increasing attention has been paid to interventions designed to enhance individuals’ sustainable development in learning by priming a growth mindset. The current study systemati...
Motivation and Behavioral Regulation of Physical Activity... : Medicine & Science in Sports & Exercise: stent with theory, hypothesized relations among variables were supported. Integrated regulation and intrinsic motivation were most strongly correlated with moderate-to-vigorous physical activity measu...

Latent Space ▷ #ai-announcements (1 messages):

swyxio: new pod! https://x.com/swyx/status/1784253651844014237

Latent Space ▷ #llm-paper-club-west (12 messages🔥):

All Systems Go: The chat confirms visibility before starting the presentation on Mixture Of Depths.
Mixture Of Depths Explored: This paper introduces a new transformer layer, Expert Choice Routing, aimed at faster training convergence and improvements for processing longer sequences. See the original paper here.
Skip the Confusion: Comments indicate that skip connections, also known as residual connections, mentioned in the attention mechanism are integral to the discussed paper’s methodology.
Size Matters: A shared abstract suggests larger zero-shot LLMs outperform fine-tuned smaller LLMs in real-world tasks like meeting summarization, despite the computational costs.

Links mentioned:

Nextra: the next docs builder: Nextra: the next docs builder
Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?: Large Language Models (LLMs) have demonstrated impressive capabilities to solve a wide range of tasks without being explicitly fine-tuned on task-specific datasets. However, deploying LLMs in the real...

Latent Space ▷ #ai-in-action-club (35 messages🔥):

Linux Users, Say Hello to Vesktop: Discord video sharing and Linux compatibility issues were addressed with a recommendation to use Vesktop, described as a better-performing custom Discord app that improves Linux support. Those interested can find more info on the Vesktop GitHub repository.
Young SQL Module in the Spotlight: A member shared a reference to sqlite-vss, a SQL module for creating virtual tables to store and query vectors, noting it’s still in early development stages and pointing to the API reference documentation.
Chatbots for CLI Tools Spark Interest: The idea of creating chat bots for popular command line interface (CLI) tools was suggested, triggering discussions about feasibility and potential ease of creation using slono’s tool, a utility that adds to the portability of Go and SQLite.
Resource Sharing for AI Enthusiasts: Two informative links were shared by members; the first, a Google Doc containing AI-related topics, dates, facilitators, and a wealth of resources such as articles and conference talks. The second, a Berkeley Gorilla Blog post discussing the challenges and potential strategies for real-world execution of actions by Large Language Models.
Hunt for AI Hackathon Sign-Up Details: Engagement was expressed regarding sign-up for a hackathon, with one member highlighting the X-ware Arena link amidst the conversation.

Links mentioned:

Gorilla Execution Engine: no description found
Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
API Reference | sqlite-vss: no description found
GitHub - Vencord/Vesktop: Vesktop is a custom Discord App aiming to give you better performance and improve linux support: Vesktop is a custom Discord App aiming to give you better performance and improve linux support - Vencord/Vesktop
AI In Action: Weekly Jam Sessions: 2024 Topic,Date,Facilitator,Resources,@dropdown,@ UI/UX patterns for GenAI,1/26/2024,nuvic,<a href="https://maggieappleton.com/squish-structure">https://maggieappleton.com/squish-stru...

LAION ▷ #general (95 messages🔥🔥):

LAION in Limbo: A member highlighted that EU laws appear to be restricting LAION’s access to public clusters for compute time, causing a decline in activity. Researchers are gravitating towards more active groups that are continually running experiments.
Terminus Research Group Attracts Talent: A chat participant introduced their own group, the Terminus Research Group, which is an informal collective now including the “pixart guy,” suggesting a growing diverse expertise.
LAION-Aesthetics Seeks to Score Visual Beauty: A blog post was mentioned detailing LAION-Aesthetics, which is designed to rate image aesthetics using machine learning. The model and related code are available publicly on GitHub.
Unusual Benchmark Results Spark Discussion: Members discussed a Reddit benchmark test denoting contradictory performance outcomes for different quantizations in language models, raising questions about testing methodologies and LLM non-deterministic nature.
Comparing LLM Token Generation Rates: Users discussed token generation rates on high-performance GPUs, noting significant differences across models and setups. Some tools and configurations, such as exllama and TabbyAPI, were recommended for better performance.

Links mentioned:

LAION-Aesthetics | LAION: <p>We present LAION-Aesthetics, several collections of subsets from LAION 5B with high visual quality.</p> <p><img src="https://raw.githubusercontent.com/LAI...
aMUSEd: An Open MUSE Reproduction: We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation...
gpt2-chatbot: Background https://chat.lmsys.org enables users to chat with various LLMs and rate their output, without needing to log in. One of the models recently available is gpt2-chatbot, which demonstrates cap...
711,700 Titles From Japan's Biggest Light Novel Publishing Site Get Scraped by AI Developer: 711,700 titles from Japan's biggest novel publishing site, Shosetsuka ni Narou, have been scraped by an AI developer, sparking controversy online.
GitHub - borisdayma/dalle-mini: DALL·E Mini - Generate images from a text prompt: DALL·E Mini - Generate images from a text prompt. Contribute to borisdayma/dalle-mini development by creating an account on GitHub.
GitHub - LAION-AI/aesthetic-predictor: A linear estimator on top of clip to predict the aesthetic quality of pictures: A linear estimator on top of clip to predict the aesthetic quality of pictures - LAION-AI/aesthetic-predictor
I created a new benchmark to specifically test for reduction in quality due to quantization and fine-tuning. Interesting results that show full-precision is much better than Q8.: Posted in r/LocalLLaMA by u/jd_3d • 259 points and 103 comments
Oh no: Is it down again?

LAION ▷ #research (9 messages🔥):

Exploring VAST: The Omni-Modality Foundation Model: Interest is shown in finetuning VAST, a vision-audio-subtitle-text omni-modality foundation model and dataset, prompting members to share their experiences and seek advice.
Hot off the Press: New Research Publication: A new paper on AI research authored by a team including Mostafa Elhoushi, Akshat Shrivastava, and others has caught the attention of members, speculating it builds upon previous work and highlighting its implications for faster inference and layer utilization.
Combining Graphs with Language Models: Queries about combining graphs with large language models (LLMs) have been raised, seeking recommendations on relevant papers to read and strategies for conditioning LLMs with graphs.
Mistral Model Fine-Tuning Challenges: A member is fine-tuning Mistral models for medical information extraction but encounters issues with the model over-generating sequences. The discussion touched on padding strategies and the appropriateness of the Eleuther server for seeking expertise in this area.
Seeking the Eleuther Server Link: Upon facing a challenge with model fine-tuning, a member was advised to consult the Eleuther server for expert help in LLMs, leading to a request for the server’s Discord link.

Links mentioned:

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding: We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher ...
GitHub - TXH-mercury/VAST: Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset: Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset - TXH-mercury/VAST

Cohere ▷ #general (96 messages🔥🔥):

Search Engine Query Capabilities Discussed: Members discussed the best practices for using web search tools with AI, mentioning various options such as Tavily and Brave Search API. Some highlighted the cost-effectiveness of these tools Tavily API Information and Brave Search API, while others shared specific configurations and technical details regarding usage limitations and potential workarounds for rate limits.
Technical Issues and Deployment Queries: Various technical issues were addressed like facing errors when running the cohere-toolkit locally due to sqlite3 version issues, difficulties in understanding how to interact with different components after deployment on Azure, and sharing GitHub resources for troubleshooting and adding custom tools GitHub - cohere-ai/cohere-toolkit.
Cohere Toolkit Enthusiastically Received: A user expressed great appreciation for Cohere making their toolkit open source, highlighting its immense help to developers GitHub - cohere-ai/cohere-toolkit.
Clarifications Sought on Fine-Tuning and Use Cases: Queries were raised about the specific models used when fine-tuning, the limits and terms of the free trial API key, and whether models like ‘Generate’ would remain available.
Using AI for Non-English Languages and Commercial Use: One member praised Command-r for its performance with non-English languages and sought clarification on deploying command-r APIs for commercial use; responses suggested contacting Cohere’s sales team or using AWS Sagemaker for deployment.

Links mentioned:

Tavily: no description found
🔍 Troubleshooting | Chroma: This page is a list of common gotchas or issues and how to fix them.
C4AI Command R Plus - a Hugging Face Space by CohereForAI: no description found
Multi-step Tool Use (Agents): no description found
cohere-toolkit/src/backend/tools/retrieval/tavily.py at main · cohere-ai/cohere-toolkit: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications. - cohere-ai/cohere-toolkit
GitHub - cohere-ai/cohere-toolkit: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications. - cohere-ai/cohere-toolkit
GitHub - cohere-ai/cohere-toolkit: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications. - cohere-ai/cohere-toolkit
GitHub - searxng/searxng: SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.: SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled. - searxng/searxng
My First Album: Shared

Cohere ▷ #collab-opps (1 messages):

westn89: We’re a Swedish company that are partially using cohere

tinygrad (George Hotz) ▷ #general (35 messages🔥):

Exploring Mathematical Formula Construction: A member discussed constructing any mathematical formula using basic primitive ops and applying differentiation for gradient/backward passes, forming a dependency graph. This method optimizes hardware utilization and enables just-in-time scheduling for streaming, quick computations.
OpenELM Inquiry, a brief mention: One member inquired about the experience with OpenELM, but no follow-up discussion ensued.
Cross-Compatibility Between Frameworks: A user shared their use-case for nn.module, explaining it was useful for a hybrid model containing both tinygrad and PyTorch components. The module can automatically collect parameters from itself and child objects for training.
Clarifying Speech-To-Text/Text-To-Speech Inquiry: A user asked about the speech-to-text and text-to-speech engines showcased by George Hotz, likely found in the tinygrad examples, though which specific demonstration was not identified.
Discussion About tinygrad Optimizations: Users engaged in a debate over the optimization capabilities of tinygrad, where one member questioned whether it could generate a fast matrix multiplication (matmul) kernel, while another pointed out the use of computational reduction algorithms for convolutions. George Hotz clarified their aspirations for tinygrad, focusing on overall model training speed rather than single-operation optimization like matmul.

Link mentioned: GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️: You like pytorch? You like micrograd? You love tinygrad! ❤️ - GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️

tinygrad (George Hotz) ▷ #learn-tinygrad (55 messages🔥🔥):

Exploring the Optimization Frontier: A member shared a comprehensive writeup on loop unrolling within the context of tinygrad’s optimizer. The article details the transformation of simple loops into optimized operations, providing insights into the Uops IR.
Tinygrad 0.9 Launch Teased: George Hotz briefly mentioned that new updates will come with the release of tinygrad version 0.9, causing anticipation about potential new features or improvements in the library.
Kernel Optimization Dissected: Sharing another detailed writeup to elaborate on how the shapetracker and symbolic library function with loop unrolling/upcasting; moreover, providing a guide to interpret kernel output colors in tinygrad.
Tinygrad Learner’s Guide: Several members proposed starting points and suggested reading material for understanding and contributing to tinygrad; resources mentioned include MicroGrad and MiniTorch for foundational concepts, and also outlined an optimal path for reading through the tinygrad codebase.
Dynamic Testing and Symbolic Shapes: Discussion highlighted the ongoing development efforts toward dynamic testing and implementing kernels that can handle variable shapes without recompilation, focusing on the usage of symbolic shapes in operations like mean and sum.

Links mentioned:

Quickstart - tinygrad docs: no description found
tinygrad-notes/upcast.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.
tinygrad-notes/upcast2.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.
tinygrad-notes/colors.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.
Comparing tinygrad:master...davidjanoskyrepo:symbolic-mean-var-pull · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ - Comparing tinygrad:master...davidjanoskyrepo:symbolic-mean-var-pull · tinygrad/tinygrad
GitHub - unknownusername504/MicroGrad: Contribute to unknownusername504/MicroGrad development by creating an account on GitHub.
MiniTorch: no description found
GitHub - srush/Tensor-Puzzles: Solve puzzles. Improve your pytorch.: Solve puzzles. Improve your pytorch. Contribute to srush/Tensor-Puzzles development by creating an account on GitHub.

Interconnects (Nathan Lambert) ▷ #ideas-and-feedback (10 messages🔥):

Brand Impact of Newsletter Cross-Promotion Considered: A member pondered the potential brand tarnishing of engaging in an unpaid promotion exchange with Semafor. This was seen as a growth opportunity, despite concerns that readers might find plugs annoying.
Bigger Audience, Bigger Growth?: The same member noted that Semafor’s tech newsletter audience is significantly larger, hinting at a substantial growth opportunity.
Comparing Content to Recognized Examples: To illustrate the type of content involved, an example of a Semafor newsletter was shared, discussing the divisive topic of synthetic data in AI.
Newsletter Exchanges – A One-Way Street?: Another member chimed in, questioning the importance of cross-promotion in newsletters given their nature as a “one-way medium” sent “into the void.”
Balancing Promotion with Reader Preferences: It was highlighted that there’s a risk of alienating readers who prefer pure content without promotions, suggesting that the success of such a strategy depends on execution and frequency. Another member weighed in, saying that even a small uptake from the promotion could be beneficial and lead to further growth.

Link mentioned: Semafor Tech: New synthetic data techniques shake up AI models | Semafor | Semafor: In today’s edition, we look at how machine-learning generated data can help make smaller AI models nearly as capable as larger ones.

Interconnects (Nathan Lambert) ▷ #news (10 messages🔥):

Microsoft Unleashes Phi-3: Phi-3, the next generation model from Microsoft, has been publicly released, amassing over 6,000 votes and featuring promising capabilities. In related news, Arena hits 800K votes, and Snowflake Arc Instruct has entered the fray.
A Gloomy Outlook for Dylan: A brief remark hints at unfortunate prospects for an individual named Dylan, with the context or cause left unstated.
Llama’s Fine Tuning Applauded: The fine tuning process for “llamas” received a positive shout-out, indicating noteworthy results or improvements.
Anticipation for GPT-4: A message hints at the possibility of GPT-4’s emergence, backed by a sense of confidence from the mentioned user.
Insights on Training an Open LM: A YouTube seminar led by Hanna Hajishirzi from AI2, discussing the training of an Open Language Model (OLMo), left at least one member wishing for a deeper understanding, while acknowledging the value of such shared resources. Hanna’s brisk presentation pace was noted, bolstering her repute for efficiency.

Links mentioned:

Tweet from lmsys.org (@lmsysorg): Congrats @Microsoft for the open release of Phi-3, their next generation of fast and capable model! We've collected 6K+ votes for Phi-3 and pushed a new leaderboard release. The model is definite...
Hanna Hajishirzi (AI2) - OLMo: Findings of Training an Open LM: Talk from the Open-Source Generative AI Workshop at Cornell Tech. Speaker: https://homes.cs.washington.edu/~hannaneh/Slides - https://drive.google.com/file/d...

Interconnects (Nathan Lambert) ▷ #ml-questions (13 messages🔥):

Misconceptions Cleared About RLHF: RLHF’s stability and usefulness depends on the application; methods like KTO may be better suited for various tasks. “[RLHF] Depends on the application. KTO is probably the most well suited to many applied tasks”, the sentiment reflected that “[It’s] pretty nuanced yeah”.
DPO and KTO Show Promise in Fine-Tuning: A transition from SFT -> DPO -> KTO showed better user feedback in fine-tuning applications, with online iterations of DPO and KTO ‘coming’.
LLaMA 2 Follow-Up Creates Buzz: With a plethora of information available post-LLaMA 2 release, a blog post provides corrections and continued analysis, talking about controversial aspects and introducing technical notes like Ghost Attention.
Ghost Attention - Useful but Not Critical: Ghost Attention seems to have been initially promising for maintaining consistency in long conversations for LLaMA 2, but later comments suggest it may no longer be as important, possibly due to improvements in data and long context handling. “[GAtt] is not an important thing to implement. It’s a great exercise for learning new topics in the space.”

Link mentioned: Llama 2 follow-up: too much RLHF, GPU sizing, technical details: The community reaction to Llama 2 and all of the things that I didn’t get to in the first issue.

Interconnects (Nathan Lambert) ▷ #random (48 messages🔥):

OpenELM Surpasses OLMo: Discussion highlighted that OpenELM has outperformed OLMo, with comments acknowledging that OLMo 1b had limited success and is no longer a particularly strong model, and that there is now better public data available for training than what was used for OLMo.
Continuous Improvement Motivates AI Development: Members of the chat acknowledged that while their models have not been top-tier, it serves as motivation to improve. There’s consensus that better models are being trained, using the shortfall as an educational tool for safety and policy.
The Educational Role of Open Models: Participants pointed out the importance of open models in facilitating informed decision-making, with a consensus that while their models might not be the best, they are crucial for education and transparency in the AI community.
AI2’s Role in AI Advancements Recognized: The efforts of AI2 were acknowledged, especially in terms of education, and there was an expression of enthusiasm for the upcoming paper and developments, as well as a discussion on the financial aspects of AI research.
Intrigue in the Scaling & Function of Alternative Models: Conversation turned to various topics, including Snowflake, a new enterprise-focused model with high VRAM useful for inference, and the concept of active parameters as a proxy for model capability, indicating the interest in exploring alternative architectures beyond just size and benchmarks.

Link mentioned: Tweet from Itamar Golan 🤓 (@ItakGol): Visual Prompt Injection 💉🛑 IRL

Interconnects (Nathan Lambert) ▷ #memes (7 messages):

Quick Laugh, Light Content: One member posted a simple “lmao”, indicating amusement or laughter regarding the channel’s conversation or content.
Personal Reflection on Posting: The same individual later suggested the need for an editor, hinting at self-reflection on their message quality or content.
Jungle Adventures Shared: They shared a YouTube video titled “I’m leaving to the Amazon jungle…”, which details an excursion into rarely explored areas of the rainforest.
Contrasting Views of the Jungle: Another member responded with a video link showcasing a differing view on the nature of the jungle, quoting Werner Herzog’s perspective from the documentary Burden of Dreams: “Nature here is vile and base… There is no harmony in the universe”.
Twitter Meme on LLM Quirks: The channel featured a tweet from Marques Brownlee, highlighting the humorous aspects of large language models (LLM) in a post deemed “the most meme llm shit ever”.

Links mentioned:

Tweet from darren (@darrenangle): PPO DPO KTO CPO IPO ORPO
I'm leaving to the Amazon jungle...: I'm leaving now to go deep into the Amazon jungle with my friend Paul Rosolie, deep to parts of the rainforest that very few humans have ever seen.The purpos...
Werner Herzog on the Vileness of the Amazon Jungle: From "Burden of Dreams", a documentary about the making of Herzog's "Fitzcarraldo" -- both released in 1982.00:00 Introduction00:28 Monologue01:29 Rainforest...

Interconnects (Nathan Lambert) ▷ #posts (1 messages):

Conversations on AGI’s Nature: A member complimented another on a thoughtful post about AGI, agreeing with the idea that AGI’s definition is subjective. The conversation suggests that the debate around AGI’s nature is an ongoing one.

LangChain AI ▷ #general (51 messages🔥):

Inquiry on Prompt Integration into Code: A member sought assistance with integrating a prompt into their existing code for a chat model. Another community member provided a detailed guide on incorporating ChatPromptTemplate and pipe method for chaining prompts and models in JavaScript.
Navigating OllamaFunctions Difficulties: There was a discussion around an issue with OllamaFunctions not working properly, linked to GitHub issue #20924. Subsequently, a member clarified the confusion between Gemini and VertexAI models, informing that Gemini 1.5 Pro works only with VertexAI, evidenced by successful implementation using ChatVertexAI(model="gemini-1.5-pro-preview-0409").
Building a Retrieval-Augmented Generation (RAG) System: A member requested recommendations for open-source models, embedding techniques, and vector storage solutions to develop an advanced RAG system, though no direct responses to this specific inquiry were provided in the message history.
Concerns Over Observability Tools for LLMs: A discussion on LLM observability tools questioned the choice between Arize Phoenix and Langfuze, specifically for those primarily using LlamaIndex. A preference was indicated for a self-hosted open-source solution, but no direct recommendations were provided.
Integration and Deployment Queries around LLMs: Various inquiries surfaced regarding deployment methods, such as using Hugging Face versus OpenAI API, and connecting OpenAI with SQL Server without the intermediary of LangChain for security concerns. There was also a direct request for advice on building AI clones of influencers on a new platform and an invitation to DM for potential partnership.

Links mentioned:

LangSmith: no description found
Reddit - Dive into anything: no description found
ChatVertexAI | 🦜️🔗 LangChain: Note: This is separate from the Google PaLM integration. Google has
Google AI chat models | 🦜️🔗 LangChain: Access Google AI’s gemini and gemini-vision models, as well as other
OllamaFunctions does not work - Received unsupported message type for Ollama · Issue #20924 · langchain-ai/langchain: Checked other resources I added a very descriptive title to this issue. I searched the LangChain documentation with the integrated search. I used the GitHub search to find a similar question and di...
[experimental][llms][OllamaFunctions] Add bind_tools and with_structured_output functions to OllamaFunctions by lalanikarim · Pull Request #20881 · langchain-ai/langchain: Implemented bind_tools for OllamaFunctions. Made OllamaFunctions sub class of ChatOllama. Implemented with_structured_output for OllamaFunctions. integration unit test has been updated. notebook ha...
Tool | LangChain.js - v0.1.36: no description found
DynamicTool | LangChain.js - v0.1.36: no description found
StructuredTool | LangChain.js - v0.1.36: no description found
DynamicStructuredTool | LangChain.js - v0.1.36: no description found
ToolInterface | LangChain.js - v0.1.36: no description found

LangChain AI ▷ #langserve (1 messages):

AzureSearchVectorStoreRetriever Async Issue: A member reported an error about AzureSearchVectorStoreRetriever not supporting async operations. They inquired if it’s possible to either adjust lang-serve to handle sync operations or if writing an async wrapper around the sync function in the retriever would be a viable solution.

LangChain AI ▷ #share-your-work (11 messages🔥):

Galaxy AI Enters the Arena: GalaxyAI is offering free API access to premium AI models such as GPT-4, GPT-3.5-turbo, and more, with OpenAI format compatibility for easy integration into projects. Discover more on their website galaxyapi.onrender.com.
Launching Genai-Job-Agents: A GitHub repository for a Langchain/Langgraph-based agent that assists with job searching and CV building has been shared. For details, check out the repository at genai-job-agents.
Discover the Sparks of GPT-1: A new blog post delves into the original GPT-1 model, discussing its relevance and the technical evolution to current models. Read the insights here.
Implementing LangChain with Live Avatars: A YouTube demo showcases LangChain’s application in an Airbnb use case with 150 QA pairs and a live avatar Q&A session. View the demo at D-ID Airbnb.
Automating Code Improvements Via No-Code Platform: Autonoma is providing a no-code solution for automating code improvement tasks like input validation and error handling, complete with a free playground for testing and ALPHA GitHub integration. Experience the platform at Autonoma Free Demo.

Links mentioned:

no title found: no description found
Galaxy AI - Swagger UI: no description found
Revisiting GPT-1: The spark that ignited the fire of LLMs: A Comprehensive Look at GPT-1's Contribution to the Development of Modern LLMs
GitGud: no description found
Llama 3 8B: Mobile RAG on Android Phone with Live Avatar with the CODE. Let's do the entire Stack!: Part 1: The Demo. Code is in the link and we will go through it all on a series of videos. Let's push ourselves beyond AI notebooks and move on to real c...
GitHub - touhi99/genai-job-agents: A LLM Agent with Langchain/Langgraph helps to analyze CV, look relevant jobs via API, and write a cover letter according to it: A LLM Agent with Langchain/Langgraph helps to analyze CV, look relevant jobs via API, and write a cover letter according to it - touhi99/genai-job-agents
D-ID Airbnb Use Case: A RAG Agent Demo using Ollama and Langchain with code on Github: A demo to help illustrate practical use cases for live avatar assistants for business... I will do a video for the detailed code review so you can try it... ...

LangChain AI ▷ #tutorials (4 messages):

Explore Local RAG with LLaMA3: A YouTube tutorial titled “Local RAG agent with LLaMA3 and Langchain” demonstrates how to use Retrieval-Augmented Generation (RAG) with LLaMA3, using the Langchain framework.
Llama 3 Empowers Web Browsing: Another YouTube guide titled “Llama 3 Web Browsing Agent with Langchain and Groq” showcases the implementation of web browsing capabilities through Llama 3, in combination with Langchain and Groq technologies.
Interactive Agents UI Building Tutorial: Marc Skov Madsen provides a video on creating an interactive web UI for CrewAI applications using the Panel framework, demonstrating the process of building a visual user interface for AI agents.
Captcha Blockade on Amazon Book Link: A member posted an Amazon link to a book titled “Mastering NLP: From Foundations to LLMs” but was met with a captcha challenge, preventing direct access to the page content.

Links mentioned:

no title found: no description found
How to Create an Interactive Web UI for CrewAI Applications By Panel: In this video, I would like to provide you a quick tutorial for building a visualized CrewAI application by using the Panel framework, which includes the fe...
Local RAG agent with LLaMA3 and Langchain: We will take a look at how to do RAG with LLama3 https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb#pyth...
Llama 3 Web Browsing Agent with Langchain and Groq: We will take a look at how to implement web browsing with Llama 3 with Langchain and Groq#python #pythonprogramming #llm #ml #ai #aritificialintelligence #la...

Mozilla AI ▷ #llamafile (54 messages🔥):

Segmentation Fault When Running Llamafile: Users reported experiencing a segmentation fault when attempting to run llamafile on various platforms, such as Modal Labs. There were mentions of specific files generating errors or not being found, including Phi-3-mini-128k-instruct.F16.llamafile.
htop Bug Misrepresents Memory Usage: A member provided information about a bug in htop, which does not report shared memory usage correctly on Linux, likely influencing how memory usage is perceived by users during model operations.
Release of Llamafile v0.8.1: Announcement that the release of llamafile v0.8.1 now includes support for Phi-3 Mini 4k, addresses previous GPU module crashes, and adds bundled NVIDIA + AMD shared objects for Ubuntu users. Users are encouraged to report if the changes work or if issues persist.
LLM Behavior and Output Oddities Discussed: Members discussed unexpected behavior with LLMs, including changes in output consistency and unusual responses featuring parentheses and linebreaks. These issues appeared across different iterations of models like Llama3 70B and Mistral when running via llamafile.
Llamafile Tips and GPU Usage Questions: Users shared tips for ensuring llamafile can take full advantage of system RAM and queried about supported GPUs for running llamafiles. There were also questions related to determining whether a model is running on GPU or CPU and clarifications sought for handling endless output from llamafile.

Links mentioned:

Release llamafile v0.8.1 · Mozilla-Ocho/llamafile: Support for Phi-3 Mini 4k has been introduced A bug causing GPU module crashes on some systems has been resolved Support for Command-R Plus has now been vetted with proper 64-bit indexing We now su...
jartine/Meta-Llama-3-70B-Instruct-llamafile · Hugging Face: no description found
TikTok - Make Your Day: no description found
Error: "The server was not compiled for multimodal or the model projector can't be loaded." · Issue #144 · Mozilla-Ocho/llamafile: I noticed the message mentioned in the title in a browser alert popup. It's likely not an error, but it's also a little jarring for first-time users, so I thought I'd mentioned it. WHAT HA...
htop doesn't report shared memory usage on Linux · Issue #1443 · htop-dev/htop: In the screenshot below, you'll see that one of my processes is using 139GB of memory, but htop reports the system using 6GB of RAM. It's because htop hides mmap(MAP_SHARED) memory. This has c...
GitHub - Mozilla-Ocho/llamafile: Distribute and run LLMs with a single file.: Distribute and run LLMs with a single file. Contribute to Mozilla-Ocho/llamafile development by creating an account on GitHub.

AI Stack Devs (Yoko Li) ▷ #ai-companion (11 messages🔥):

Farewell to Tolerance for Collapse: A channel member expressed a dismissive sentiment about welcoming an impending collapse, hinting at a sense of disenchantment.
Spotlight on AI Companion Apps: A channel member highlighted two AI companion apps, Faraday and Amica, as noteworthy tools for those interested in AI companionship.
Faraday, a Personal Recommendation: The app Faraday earned a personal endorsement from a member after a month’s usage, distinguishing itself with an ability to run locally on a PC thanks to llama.cpp.
Amica, an Up-and-Comer with Privacy: The recently discovered app Amica is promised to operate similarly to Faraday with enhanced features and a strong emphasis on data privacy, available for both self-hosting and cloud services.
Privacy-Conscious AI Relationships Encouraged: Members were encouraged to explore Faraday and Amica if they value total data privacy in their interactions with AI.

Links mentioned:

Faraday.dev: Chat with AI Characters. Works offline. Zero configuration.
Amica - Your friend.: Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.

AI Stack Devs (Yoko Li) ▷ #events (2 messages):

Rosebud AI Game Jam Winners Announced: Rosebud beta testers teamed up with Rosie, the AI assistant, and showcased their creativity in game design during the Rosebud AI Sleep Game Jam. A game that stood out, Bedtime Negotiation, features an AI NPC character and Twitch co-founder Kevin Lin joined as a guest judge. Winners have been announced on Twitter.
New Game Jam: Education & AI: Rosebud AI invites the community to participate in a new Game Jam, in partnership with Week of AI, focusing on the theme of Education and AI. Participants are to create a 2D browser-based game utilizing Phaser JS on Rosebud’s AI platform, with a prize pool of $500, and they can learn more about the event on Twitter.

AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (9 messages🔥):

AI Town’s Addictive Quality Acknowledged: A user linked to a Twitter post praising AI Town for its addictive nature, inspiring the idea of creating a simulation with developers, devops, dba, infra, and product managers.
Launch of LLM-Powered NPCs: A user has made their LLM-powered NPC models and inference stack available to address common NPC limitations, with the repository and models hosted on GitHub and Huggingface’s Hub, although the linked API access page was not found.
Call for Feedback on NPCs: This user highlights their NPC models’ low-latency innovation for smaller GPUs/CPUs and plans to introduce a quest-generation model, inviting members to provide feedback on the recent release.
Deep Dive into NPC Implementation Challenges: The user unravelled some key NPC development challenges, including the importance of compressing model output, minimizing calls to models, and tackling issues with generalist instruct-models like GPT-3.5 or Mistral.
Community Engages on NPC Fine-Tuning: A conversation about NPC character development ensued, with a promise of an upcoming blog post for a deeper exploration of the challenges and strategies encountered during the project.

Links mentioned:

Tweet from ifioravanti (@ivanfioravanti): This AI Town is addictive! I can't stop watching AI characters talking to each other 😂 I should create one simulation with developers, devops, dba, infra and product managers all together... 🤯
GitHub - GigaxGames/gigax: LLM-powered NPCs running on your machine: LLM-powered NPCs running on your machine. Contribute to GigaxGames/gigax development by creating an account on GitHub.
Form - Tally: Made with Tally, the simplest way to create forms.

AI Stack Devs (Yoko Li) ▷ #ai-town-dev (11 messages🔥):

Map Rendering Optimizations in AI Town Discussed: [edgarhnd] asserts that for larger maps, storing the map as an array can be problematic, and suggests having the map rendering static and storing essential data for the engine in an array could be a practical solution.
Opinion on Map Handling Methods: [ianmacartney] advocates for the map to be a static asset rather than a parameter passed around, to reduce bandwidth usage during reads, while acknowledging the server side still needs the array for collision detection.
Returning to Original File Read Method for Maps: Both [edgarhnd] and [.casado] seem to agree that reading the map as a file, the original method, is much simpler and more efficient.
AI Town Installation Tutorial Promoted: [.casado] shares a link to a YouTube tutorial for local AI Town installation titled “100% Local “AI Town” with Llama 3 AGENTS!!!”, providing a resource for those interested in setting up the environment. The video is available at 100% Local “AI Town” with Llama 3 AGENTS!!!.

Link mentioned: 100% Local “AI Town” with Llama 3 AGENTS!!!: 🔗 Links 🔗Download Pinokio here - https://pinokio.computer/The OG AI Town - https://github.com/a16z-infra/ai-townThe forked AI town - https://github.com/pea…

DiscoResearch ▷ #mixtral_implementation (1 messages):

Mysteries of Mixtral’s Router Coefficients: A comparison between Mixtral-8x7B-Instruct-v0.1 and Mixtral-8x22B-Instruct-v0.1 revealed different router_aux_loss_coef values, 0.02 and 0.001 respectively. It sparked curiosity whether these reflect actual training values or are “fantasy values,” with a possibility that smaller experts might require a higher loss_coef.

DiscoResearch ▷ #general (6 messages):

Long Initialization Times on HPC: A member reported slow initialization times (2mins:20secs) for DiscoLM_German_7b_v1 on HPC when collecting shards, and long inference times (over 12 mins) for 4K token inputs on GPUs, despite brief initialization (3 secs) and fast inference (1.6 mins) on a local machine without GPUs.
GPU Utilization Improves Inference: Upon realizing they had not loaded the model onto GPUs, a member corrected the issue which reduced inference time to approximately 10 seconds on a two Tesla V100 setup, but shard loading times remained unchanged at 2mins:20secs.
Load Time Troubleshooting Ineffective: The suggested low_cpu_mem_usage=True argument did not yield improvements in model load times, indicating the problem may persist despite this adjustment.
Slow Storage Drive Could Be a Bottleneck: Another participant suggested that the high load times may be due to the model being stored on a slow storage drive and recommended verifying if the HF cache directory is set to a fast data partition.

DiscoResearch ▷ #discolm_german (8 messages🔥):

Discussing Practical Applications: The user hoped to see more anecdotal observations of LMs and expressed interest in testing models like lmsys arena, acknowledging that even specialized tasks might still be highly beneficial. A related tweet was shared discussing potential uses: Observation Discussion.
GPT-3’s German Model Downloads Spike: The gguf model saw an impressive uptake with 1500 downloads in just two days, signaling strong community interest and engagement.
Skepticism Over New Model Performance: A user expressed doubt about the performance of a newly released model, as community feedback suggests it doesn’t perform well, but another user disagreed, mentioning that the Phi-3 model did not overfit on the German RAG Eval dataset.
Querying Changes in Llamafied Phi-3 Model Tokenizer: PhilipMay inquired about the rationale for altering the tokenizer in a Llamafied Phi-3 model, specifically changing the end-of-sentence token. In discussions with the owner of the model, it became apparent this alteration was made for better performance with chat applications utilizing trtllm Tokenizer Change Discussion 7 and Tokenizer Change Discussion 6.
Phi-3 MoE Model Created for Experiments: A new Phi-3 MoE model has been developed using the Llamafied version with mergekit and a randomly initialized router. It is currently available for experimentation but requires training before use: Phi-3 MoE Model on Hugging Face.

Links mentioned:

PhilipMay/Phi-3-MoE-mini-4k-instruct-raw · Hugging Face: no description found
vonjack/Phi-3-mini-4k-instruct-LLaMAfied · Why did you change the eos_token in tokenizer_config.json file?: no description found
vonjack/Phi-3-mini-4k-instruct-LLaMAfied · Why did you change the added_tokens.json file?: no description found

Skunkworks AI ▷ #general (7 messages):

Cutting-Edge Research on Efficient Language Models: A new article titled “Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation” discusses CPU-compatible language models that generate Python code. The research introduces a dataset of 60 programming problems and employs a Chain-of-Thought prompt for improved model performance.
HaystackDB Enquires on Embeddings: A member questioned if the HaystackDB repository uses 2bit embeddings. They further inquired about the term “binary quantized” in the context of the repository.
Efficiency via Binary Quantization: Clarifying on binary quantized embeddings, another member explained that Binary Quantization (BQ) helps create a smaller index for similarity search, enhancing the efficiency of the database.
Llama-3 Fine-tuning Troubles: A member reached out to inquire if anyone has had success fine-tuning Llama-3, noting issues with their models not generating the End Of Sentence (EOS) token.

Links mentioned:

Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation: Large Language Models (LLMs) have become the go-to solution for many Natural Language Processing (NLP) tasks due to their ability to tackle various problems and produce high-quality results. Specifica...
GitHub - carsonpo/haystackdb: Contribute to carsonpo/haystackdb development by creating an account on GitHub.

Skunkworks AI ▷ #off-topic (3 messages):

Introducing Snowflake Arctic for Enterprise AI: A YouTube video was shared, introducing Snowflake Arctic, an enterprise-focused large language model (LLM) that aims to push the boundaries of cost-effectiveness in enterprise AI.
Exploring RAG with LLaMA3 via Langchain: A tutorial video was linked, demonstrating how to use a local Retrieval-Augmented Generation (RAG) agent with LLaMA3 and Langchain.
Web Browsing with LLaMA3 Using Langchain and Groq: The discussion included a video on implementing a web browsing agent with LLaMA 3 using the Langchain library and Groq hardware, focusing on the integration of AI and web browsing capabilities.

Links mentioned:

Snowflake Arctic: The Best LLM for Enterprise AI: Today, the Snowflake AI Research Team is thrilled to introduce Snowflake Arctic, a top-tier enterprise-focused LLM that pushes the frontiers of cost-effectiv...
Local RAG agent with LLaMA3 and Langchain: We will take a look at how to do RAG with LLama3 https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb#pyth...
Llama 3 Web Browsing Agent with Langchain and Groq: We will take a look at how to implement web browsing with Llama 3 with Langchain and Groq#python #pythonprogramming #llm #ml #ai #aritificialintelligence #la...

LLM Perf Enthusiasts AI ▷ #jobs (1 messages):

Join Gamma's AI Revolution: Gamma, recognized by a16z as a top consumer AI app, is hiring an AI engineer to work on large-scale text and image models. The role involves prompt engineering, evaluations, fine-tuning, and feature development with advanced AI models.
Pushing Boundaries in Content Creation: Gamma leverages generative AI to simplify the creation of presentations and websites, serving over 10 million users who enjoy an effortless content creation experience.
Profitable Innovation Powered by Community: With more than $10M in funding from Accel and a profitability status, Gamma maintains a lean team of 16 and continues to grow organically through word-of-mouth.
Be Part Of A Tight-Knit Squad: This San Francisco-based company is looking to expand its small but mighty team with someone passionate about pushing LLMs to their limits, offering in-person collaboration approximately 3 days a week.
Interested in Engineering the Future of AI?: Candidates eager to explore this opportunity can learn more and apply at the following link: https://careers.gamma.app/ai-engineer.

Link mentioned: AI Engineer: AI Engineer San Francisco Click here to apply

LLM Perf Enthusiasts AI ▷ #openai (3 messages):

Leaked Version Speculation: A member shared a tweet from @phill__1 commenting that gpt2-chatbot feels like gpt4.5 due to its extensive domain knowledge. This led to discussions suggesting it could be a leaked version of GPT-4.5.
Community Approval: There is a simple expression of approval on the quality of gpt2-chatbot, described as “It’s good.”

Link mentioned: Tweet from Phil (@phill__1): Whatever gpt2-chatbot might be, it definitely feels like gpt4.5. It has insane domain knowledge I have never seen before

Datasette - LLM (@SimonW) ▷ #llm (1 messages):

Quest for Custom Grammar in Code-Generation: A member inquired about the possibility of passing a custom grammar, potentially as a model-specific option, to enhance code-generation by preventing syntax errors and focusing on semantic issues.

Apr 29, 2024
A quiet weekend

Companies

Models

Topics

People

AI Reddit Recap

AI Twitter Recap

AI Discord Recap

PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

CUDA MODE Discord

Perplexity AI Discord

Stability.ai (Stable Diffusion) Discord

LM Studio Discord

Nous Research AI Discord

HuggingFace Discord

OpenAI Discord

Eleuther Discord

OpenRouter (Alex Atallah) Discord

OpenAccess AI Collective (axolotl) Discord

Modular (Mojo 🔥) Discord

LlamaIndex Discord

OpenInterpreter Discord

Latent Space Discord

LAION Discord

Cohere Discord

tinygrad (George Hotz) Discord

Interconnects (Nathan Lambert) Discord

LangChain AI Discord

Mozilla AI Discord

AI Stack Devs (Yoko Li) Discord

DiscoResearch Discord

Alignment Lab AI Discord

Skunkworks AI Discord

LLM Perf Enthusiasts AI Discord

Datasette - LLM (@SimonW) Discord

PART 2: Detailed by-Channel summaries and links